PV220S in a bad state. Recovery advice needed

Jason_Mick@Dell.com Jason_Mick at Dell.com
Tue May 18 12:15:01 CDT 2004


I would say that you are having some communication issues on this vault. Perhaps Controller, cables, or Zemm firmware.  From what I can tell I would say that this is the sequence of what has happened on this system.

1. Drive 4 failed
2. Drive 14 was assigned to rebuild
3. Drive 14 failed to respond to the rebuild task so it was failed
4. Drive 15 jumped in for drive 14.
5. Drive 15 failed to respond to the rebuild task so it was failed.
6. Sometime after this process drive 0 also failed to respond to a command so it was failed.
At this point the entire volume was off line. 
 
If we assume that none of the drives were accessible at this point then it is possible to force only drive 0 online and then boot the system up.  That does not take into account what data was possibly attempting to be written to that container at the time of failure.  Since you have a logical spanned volume over two RAID 5 sets there may not be any recovery since a spanned volume is not redundant.  At the time of this failure one of the drives in your volume was still accessible (the RAID 5 A01-00 thru A01-05) the other drive was not.  I am not sure if there will be any recovery in this situation without restoration from backup.  
I guess the best approach will be to force drive 0 back online and see if the logical volume will mount.  Either that or clear the config and retag it the way it was prior to the failure.  Also make sure you fail drive 4 in the new config. If the volume will mount I would check your backup and make sure that it is up to date before going any further. If the back up is up to date then I would rebuild drive 4 and see if it completes.  If so then reassign your hot spare disks.

None of the steps above do anything to determine the nature of this failure.  It is my opinion that this failure is not drive related.  The only possible way that a drive could have caused this is if the drive was failing in a way that it was causing a communication issue on entire the bus.  

Things that I would check in attempts to root cause...
What cables are you using?  
Do any of the cables have excessive bents in them?
What version of ZEMM firmware are you using?
Do the drives that are installed have any available firmware updates? (check support.dell.com) 
Did the controller log any messages prior to this failure?

Jason
 

-----Original Message-----
From: linux-poweredge-admin at dell.com [mailto:linux-poweredge-admin at dell.com] On Behalf Of Philippe Gramoullé
Sent: Tuesday, May 18, 2004 10:56 AM
To: Poweredge Mailing List
Subject: PV220S in a bad state. Recovery advice needed


 Hi,

We have a 2650+PV220s that broke minutes ago. Here's the problem:

PV220S in a RAID5 setup 2 raid sets of 6 disks each, spanned for a logical volume of about 670 Go

Before the SCSI errors happened this was the following set up

 0°  ONLINE A00-00
 1°  ONLINE A00-01
 2°  ONLINE A00-02
 3°  ONLINE A00-03
 4°  ONLINE A00-04
 5°  ONLINE A00-05
 6°  PROC
 7° 
 8°  ONLINE A01-00
 9°  ONLINE A01-01
 10° ONLINE A01-02
 11° ONLINE A01-03
 12° ONLINE A01-04
 13° ONLINE A01-05
 14° SPARE
 15° SPARE

now the layout looks like this:

 0°  FAIL   A00-00
 1°  ONLINE A00-01
 2°  ONLINE A00-02
 3°  ONLINE A00-03
 4°  READY
 5°  ONLINE A00-05
 6°  PROC
 7° 
 8°  ONLINE A01-00
 9°  ONLINE A01-00
 10° ONLINE A01-00
 11° ONLINE A01-00
 12° ONLINE A01-00
 13° ONLINE A01-00
 14° READY
 15° FAIL   A00-04

What is think is that the spare drives were somewhat broken so that when rebuild started after A00-01 and/or A00-04 broke,
the volume when offline.

I'm used to plug/unplug the shelf and redoing the config manually, so it isn't a problem if this is the way to go.

I'd rather avoid to use the "Force Disk Online" option as it always screwed the filesystem before and needed fsck.

Any suggestion welcome.

Thanks,

Philippe

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list archives at http://lists.us.dell.com/htdig/





More information about the Linux-PowerEdge mailing list