How many power-cycles does it take to get to the center of2850-cicle? [Or how does one recover RAID from a toasted box]

Kurt_Olsson at Dell.com Kurt_Olsson at Dell.com
Sat Feb 9 11:11:15 CST 2008


Interesting scenario.  Read this entire message before attempting any
step listed.

Fine print: Use at your own risk.

When recovering RAID sets from torment it is important to have a similar
controller.  Here is a brief list of "ought-to-work" controller
compatibility:
PERC 3/SC or DC, PERC 4 SC/DC
PERC 4e/Di or PERC 4e/DC

In general, it should be possible to recover an array from any one of
those to any other of those.  If it is not obvious, it will be rather
difficult to recover a PV220's config that had been setup on a PERC 4/DC
to a 2850 with an internal PERC 4e/Di... That is a cabling issue though,
not a technical limitation.

With the advent of the DDF on the PERC 5s and forward there are
different steps and you can't go back. That is, PERC 5 to PERC 4 and
vice versa is not gonna happen. (Not to mention that the drive
technology  is entirely different... SCSI vs SAS/SATA) I have also
intentionally ignored all CERCs.

To begin, start with the "recovery system." (RS below)
Remove any (all) disks that have data from RS 
Add the controller that you want to use. Assumes the RS it is not
identical hardware.
On the new controller, or integrated if using identical system, enter
the PERC BIOS via keystroke Ctrl+M
Go to Configure and Clear Config
Choose YES.
Escape to the Management Menu, select Objects | Adapter
If more than one adapter, choose the one you are using for this recovery
and hit Enter
At the bottom of the list, select Boot Time BIOS options.
Select BIOS Configuration Autoselection=DISK and then YES (this will
automatically import disk config)
Escape and exit.
Power down the RS.

Now that the RS is ready to receive the disks from the other system we
will move on the recovery.
On the failed system remove, one at a time, the drives you are
migrating.
Install each drive at the identical location (SCSI ID) in the RS as it
occupied in the failed system.
Repeat until all drives are moved.
When you are certain all the drives have been moved and are securely
seated in the RS, power up.

Depending on my mood, I will sometimes just let it boot up at that point
and see what happens. Alternatively, you can again enter the PERC BIOS
and look at what is now there. If there  is </=1 drive failure listed in
the PERC BIOS following the above procedure, you should have data
available immediately.  In Open Manage you will see a status of
scrubbing or rebuilding because the "new" controller is doing a BGI on
the "new" array. That is normal. Don't try to stop it.

In an ideal world, you will know if any drive was in a failed state
before attempting this. You will also have documented what the actual
RAID config was.

Since few of us actually live there, here are some steps to get a bit
more info.
If the failing server will allow you into the PERC BIOS, you can go to
Objects | Physical Drive or Configure | View/Add and after the channels
are scanned hit the F3 key.  This will give you the number of logical
drives configured, their state, number of stripes, etc.  This is useful
information and should be written down.  It will aid support in recovery
if you have that data.

>From the Objects | Physical Drive menu you will see all the drives and
selecting a drive and hitting F2 will yield information about the disk.
Usually, I am interested in Media Errors or Other Errors and whether or
not anything was returned. Since capacity is also listed, check to see
that there is no drive with 0 MB listed. If you get "Invalid Operation"
then either the drive is failed or it is a "ghost" entry that was
populated from the NVRAM because you failed to clear the existing config
on the controller before migrating the drives, or a PV220 is attached to
the wrong channel of the PERC. In that case, you usually see all the
drives in a READY state on the other channel. Shut down, move the cable,
reboot and look again.

Media Errors and Other Errors are NOT persistent values.  Reseating a
drive while the system is running or migrating it to a new PERC will
clear those counters. Deal with it. :-)

I think that is as far as I want to go with this.  Recovery is more of a
pain if drives are mishandled or you get "fancy" with state changes of
individual drives.  There is nothing difficult about RAID recovery, but
you can make a recoverable array toast if you are not aware of all the
implications of your actions.  When you have a problem after what I have
described, you probably ought to call support and work through them.  If
you are ONLY moving data drives and have an OS on a separate mirror /
array, DO NOT use these steps.  You will not have a bootable machine
when you are done.

Under NO CIRCUMSTANCES should you EVER initialize an array during the
recovery process.

Fine print: Use at your own risk.

-Kurt

-----Original Message-----
From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of Stephen John
Smoogen
Sent: Friday, February 08, 2008 6:08 PM
To: linux-poweredge-Lists
Subject: How many power-cycles does it take to get to the center
of2850-cicle? [Or how does one recover RAID from a toasted box]

Well we may just have found out. Our nagios found one of our remote
servers non-responsive and when we got over there, the box was in an
interesting state. The UPS had gone bad an was power cycling the box
every second. We figure it had about 1200-1600 power cycles before we
got to the system and turned off the UPS.

Suprisingly the system had not catch on fire like I had seen older
machines do with that kind of abuse. It also was able to boot up to
BIOS and try and do a PXE boot which was a lot more than I was
expecting. However it no longer sees all of its memory and its RAID
controller does not see any of the disk drives in it anymore. My plan
is to try and recover the data by moving the disks over to a working
system... but beyond that I am guessing if there is anyway to get the
other raid controller to detect the old raid setup. I have done a
simple google search and not come up with much (my google-foo is
lacking it would seem).

Are there any recovery how-to's out there for recovering RAID on a
Perc4 controller?

-- 
Stephen J Smoogen. -- CSIRT/Linux System Administrator
How far that little candle throws his beams! So shines a good deed
in a naughty world. = Shakespeare. "The Merchant of Venice"

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list