problems retrieving data from failed PowerEdge 1750 w/RAID

Aaron Krowne akrowne at gmail.com
Thu Apr 30 12:06:57 CDT 2009


Can anyone advise on this?  I just need to know if I need to replace the bad
drive on the array to get it to mount again, or if there is likely some
other problem.  I don't want to buy another drive when I'm just trying to
recover the data off the good drive.

Thanks in advance.

-Aaron

On Wed, Apr 22, 2009 at 12:47 PM, Aaron Krowne <akrowne at gmail.com> wrote:

> Hello gentlemen,
>
> I recently had an old (but important) PowerEdge 1750 server fail.
> Unfortunately not all the data we need was backed up remotely, so we were
> relying on the RAID-1 setup to complete the redundancy picture (a mistake I
> sha'nt repeat...)
>
> By the mode of failure, the machine would no longer POST, so I had the
> drives removed and sent to me.  I acquired another PERC/4 controller and put
> it in another machine (NOT a PowerEdge, by the way), and hooked up the
> drives.
>
> I found that one of the drives in the RAID array was indeed failed, but the
> other was fine.
>
> However, I was still not able to get the array online to extract the data
> off the working drive.
>
> According to the RAID BIOS, this should have been working, albeit in
> degraded mode.  However in linux (Ubuntu with 2.6.22-10-generic), the
> megaraid driver did not seem to be successfully getting the array online and
> mounted (no /dev/sd* devices).  The following entries in the syslog seem to
> be key:
>
> Apr 22 11:40:46 exile kernel: [  103.456664] megaraid cmm: 2.20.2.7
> (Release Date: Sun Jul 16 00:01:03 EST 2006)
> Apr 22 11:40:46 exile kernel: [  103.457845] megaraid: 2.20.5.1 (Release
> Date: Thu Nov 16 15:32:35 EST 2006)
> ...
> Apr 22 11:40:46 exile kernel: [  105.312092] megaraid: probe new device
> 0x1000:0x1960:0x1028:0x0520: bus 0:slot 8:func 0
> ...
> Apr 22 11:40:46 exile kernel: [  105.339159]  megaraid: fw version:[352A]
> bios version:[1.10]
> ...
> Apr 22 11:40:46 exile kernel: [  105.380823] megaraid: DMA mask for 64-bit
> failed
> Apr 22 11:40:46 exile kernel: [  105.381321] scsi0 : LSI Logic MegaRAID
> driver
> Apr 22 11:40:46 exile kernel: [  105.381499] scsi[0]: scanning scsi channel
> 0 [Phy 0] for non-raid devices
> ...
> Apr 22 11:40:46 exile kernel: [  111.122030] megaraid: aborting-3 cmd=12
> <c=0 t=2 l=0>
> Apr 22 11:40:46 exile kernel: [  111.122039] megaraid abort: 3:0[0:2], fw
> owner
> Apr 22 11:40:46 exile kernel: [  111.122056] megaraid: 1 outstanding
> commands. Max wait 300 sec
> Apr 22 11:40:46 exile kernel: [  111.122061] megaraid mbox: Wait for 1
> commands to complete:300
>
> The "wait for 1 commands to complete" continues indefinitely.  The Dell
> PERC manager and LSI MegaCli do not successfully talk to the controller.
> The 64-bit error is presumably because it is an older PCI bus and the
> extended pins are not being used.
>
> I also tried Ubuntu 8.10 with a presumably newer megaraid driver, but got
> the same result.
>
> I am not terribly experienced with hardware RAID, so I am perplexed.  I
> thought RAID was supposed to work in degraded mode.
>
> As far as I can tell my options at this point for retrieving the data are:
>
> (a) buy another 73GB SCSI drive to replace the failed one and let the array
> rebuild
> (b) try to "rebuild" the array, as a 1-drive RAID-0
>
> As far as (b), it looks like the RAID BIOS can do that, but I'm unsure as
> to whether there will be data loss (will the drive be wiped?)
>
> Plus, I'm not sure if there is some driver problem or other system problem
> that has nothing to do with the one drive being degraded.  So I definitely
> dont want to spend $150 and wait another week for a replacement drive, only
> to still have the array not read in the OS.
>
> Any thoughts? Suggestions?
>
> Thanks in advance.
>
> Aaron Krowne
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20090430/392a2553/attachment.htm 


More information about the Linux-PowerEdge mailing list