problems retrieving data from failed PowerEdge 1750 w/RAID

Aaron Krowne akrowne at gmail.com
Wed Apr 22 11:47:56 CDT 2009


Hello gentlemen,

I recently had an old (but important) PowerEdge 1750 server fail.
Unfortunately not all the data we need was backed up remotely, so we were
relying on the RAID-1 setup to complete the redundancy picture (a mistake I
sha'nt repeat...)

By the mode of failure, the machine would no longer POST, so I had the
drives removed and sent to me.  I acquired another PERC/4 controller and put
it in another machine (NOT a PowerEdge, by the way), and hooked up the
drives.

I found that one of the drives in the RAID array was indeed failed, but the
other was fine.

However, I was still not able to get the array online to extract the data
off the working drive.

According to the RAID BIOS, this should have been working, albeit in
degraded mode.  However in linux (Ubuntu with 2.6.22-10-generic), the
megaraid driver did not seem to be successfully getting the array online and
mounted (no /dev/sd* devices).  The following entries in the syslog seem to
be key:

Apr 22 11:40:46 exile kernel: [  103.456664] megaraid cmm: 2.20.2.7 (Release
Date: Sun Jul 16 00:01:03 EST 2006)
Apr 22 11:40:46 exile kernel: [  103.457845] megaraid: 2.20.5.1 (Release
Date: Thu Nov 16 15:32:35 EST 2006)
...
Apr 22 11:40:46 exile kernel: [  105.312092] megaraid: probe new device
0x1000:0x1960:0x1028:0x0520: bus 0:slot 8:func 0
...
Apr 22 11:40:46 exile kernel: [  105.339159]  megaraid: fw version:[352A]
bios version:[1.10]
...
Apr 22 11:40:46 exile kernel: [  105.380823] megaraid: DMA mask for 64-bit
failed
Apr 22 11:40:46 exile kernel: [  105.381321] scsi0 : LSI Logic MegaRAID
driver
Apr 22 11:40:46 exile kernel: [  105.381499] scsi[0]: scanning scsi channel
0 [Phy 0] for non-raid devices
...
Apr 22 11:40:46 exile kernel: [  111.122030] megaraid: aborting-3 cmd=12
<c=0 t=2 l=0>
Apr 22 11:40:46 exile kernel: [  111.122039] megaraid abort: 3:0[0:2], fw
owner
Apr 22 11:40:46 exile kernel: [  111.122056] megaraid: 1 outstanding
commands. Max wait 300 sec
Apr 22 11:40:46 exile kernel: [  111.122061] megaraid mbox: Wait for 1
commands to complete:300

The "wait for 1 commands to complete" continues indefinitely.  The Dell PERC
manager and LSI MegaCli do not successfully talk to the controller.  The
64-bit error is presumably because it is an older PCI bus and the extended
pins are not being used.

I also tried Ubuntu 8.10 with a presumably newer megaraid driver, but got
the same result.

I am not terribly experienced with hardware RAID, so I am perplexed.  I
thought RAID was supposed to work in degraded mode.

As far as I can tell my options at this point for retrieving the data are:

(a) buy another 73GB SCSI drive to replace the failed one and let the array
rebuild
(b) try to "rebuild" the array, as a 1-drive RAID-0

As far as (b), it looks like the RAID BIOS can do that, but I'm unsure as to
whether there will be data loss (will the drive be wiped?)

Plus, I'm not sure if there is some driver problem or other system problem
that has nothing to do with the one drive being degraded.  So I definitely
dont want to spend $150 and wait another week for a replacement drive, only
to still have the array not read in the OS.

Any thoughts? Suggestions?

Thanks in advance.

Aaron Krowne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20090422/e2a384e3/attachment.htm 


More information about the Linux-PowerEdge mailing list