[Linux-PowerEdge] internal disks on raid controller and linux

Spike_White at Dell.com Spike_White at Dell.com
Tue Nov 27 12:16:40 CST 2012


We have a very similar setup as you, in our lab.   We're running OEL 6.3 and 5.x on R710s (& other server models).  

Like you, on R710s we're running RAID 1 (OS vol), RAID 1 (data vol) and 2 global hotspares).  Like you, we mount by UUID on OEL, or by logical volume name.

I remember long ago there was a bug in the LSI driver in OEL 5.x.   For low-end PERCs.  The LSI driver didn't pass up the unique vol identifier associated w/ a PERC volume.  This was most notable if you destroyed your second PERC volume and re-created it.  Suddenly, the OS vol /dev/sda became /dev/sdb, or /dev/sdc.

This was particularly vexing when we ran SLES 10, as we didn't mount by UUID and we used raw disk slices (instead of LVMs).  Ordinarily, destroying an unused second PERC vol (via omconfig) and re-creating it is an on-line operation.  But due to this bug, it'd hang the kernel.  It'd no longer be able to access its underlying filesystem on disk.

I believe this LSI driver bug was corrected long ago.  (I'm not sure, as we started ordering our R710s w/ PERC H700s and didn't have any problem since.)

Additionally, usually it wouldn't randomly enumerate on a reboot.   Very rarely.  

Another responder mentioned a SSD or USB key or virtual media attached.  That's theoretically possible, particularly in OEL 6.x.  On the boot media, it enumerates the virtual media and/or physical USB keys as /dev/sda.   
However, I'm unsure of that explanation for two reasons.  

1. I'm looking at an OEL 6.3 server w/ attached USB drive now.  It's enumerating the (sole) PERC vol as /dev/sda and this USB key as /dev/sdb.  So the installed kernel differs from the OEL 6.x boot media kernel's behavior.
2. You mentioned your first volume is found as /dev/sdi or /dev/sdj.   If it were only a USB or virtual media found first, it'd be /dev/sdb or /dev/sdc.

I speculate that it may be enumerating your LUNs first.  And then your PERC vols.  I've seen this before.  There's two ways to verify.

1. What does inq tell you?  (low-level utility available from EMC to list all discovered LUNs).  If you don't have inq handy, you can use sg_map -x -i to get enumeration + vendor.
2. pull your fiber cables to your HBA and reboot repeatedly.  See if it always discovers as /dev/sda (or /dev/sdb if virt media or USB key attached).

In older Linux distros, we used to have to force the enumeration of the PERC vols before the FC LUNs by using "brokenmodules=lpfc" on the kernel command line.   Haven't had to do that in OEL 5 or 6.

>From Dell I/T,
Spike White

-----Original Message-----
 
   1.  internal disks on raid controller and linux
      (JHurley at SummitRacing.com)
----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Nov 2012 07:35:36 -0500
From: <JHurley at SummitRacing.com>
Subject: [Linux-PowerEdge] internal disks on raid controller and linux
To: <linux-poweredge at lists.us.dell.com>
Message-ID:
	<05E4D346569A524D8567AE359EA3286D24E9974E at EXCHANGE-OHIO.summit.network>
	
Content-Type: text/plain; charset="us-ascii"

We have a set of new Dell servers R710's each with 6 internal disks.
Using 2 raid 1 mirrors and two hot spares.  EMC storage from VNX with powerpath also defined.  Oracle Linux OL 6.2 ( clone of redhat ) installed on first raid 1 mirror and all 3rd party software.  Also going to use Acronis to get OS backups and have bare metal restore capability.

 

Usually my OS comes up on /dev/sda ... ( so /boot on /dev/sda1 and my root directory on /dev/sda2 etc ).

 

Sometimes though it comes up on different device ... and for various reasons the Acronis backup/recovery will only work completely when it sees disk 1 on /dev/sda ... the /etc/fstab is using UUID so systems will boot up and run either way.

 

What is going on here?  Something from internal raid controller and when linux sees the device being timing dependent?  I really need to get it consistent because of the Acronis stuff.

 

Considering trying to force with udev rule but ... not sure how to proceed here exactly ( Oracle DBA not linux sys admin here ).

 

Some of most recent boots ( excerpted lines ) show this kind of pattern ( from /var/log/messages ).  Most recent one from yesterday now has /boot on /dev/sdi1 

 

(6869): Nov 26 17:07:58 westo-02 kernel: sd 0:2:1:0: [sdb] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(6870): Nov 26 17:07:58 westo-02 kernel: sd 0:2:0:0: [sda] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(8093): Nov 26 17:31:01 westo-02 kernel: sd 2:2:0:0: [sdi] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(8094): Nov 26 17:31:01 westo-02 kernel: sd 2:2:1:0: [sdj] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(9312): Nov 26 17:44:33 westo-02 kernel: sd 0:2:0:0: [sda] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(9313): Nov 26 17:44:33 westo-02 kernel: sd 0:2:1:0: [sdb] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(10528): Nov 26 17:55:14 westo-02 kernel: sd 0:2:1:0: [sdb] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(10529): Nov 26 17:55:14 westo-02 kernel: sd 0:2:0:0: [sda] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(11732): Nov 26 18:00:56 westo-02 kernel: sd 0:2:0:0: [sda] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(11733): Nov 26 18:00:56 westo-02 kernel: sd 0:2:1:0: [sdb] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(12946): Nov 26 18:06:18 westo-02 kernel: sd 2:2:0:0: [sdi] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

(12950): Nov 26 18:06:18 westo-02 kernel: sd 2:2:1:0: [sdj] 1170997248 512-byte logical blocks: (599 GB/558 GiB)

 




More information about the Linux-PowerEdge mailing list