Changing a drive in predictable failure state

Patrick_Boyd at Dell.com Patrick_Boyd at Dell.com
Thu Jun 19 09:14:34 CDT 2008


If you have a hotspare already in the system, it will rebuild before you
have a chance to insert the new drive. Other than that, the process
should proceed exactly how you outlined it below.

Also on the Prepare to Remove (remove) vs. the Offline debate... there
is no Prepare to Remove on the PERC 5 or 6 controllers, that command was
deprecated. It is still present in PERC 4 and below.

-----Original Message-----
From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of Bertrand LUPART
Sent: Wednesday, June 18, 2008 10:45 AM
To: linux-poweredge-Lists
Subject: Re: Changing a drive in predictable failure state

Hello,

> Best case steps for this (least possible chance of losing data):

Thank you for your answer.

Just to be sure, isn't there any chance for the rebuild (point 5) to
rebuild the wrong disk in case of a foreign configuration? Or is it more
safe to clear the spare disk before operation?


For later reference, below's what i did with a spare PE 2950 with same
hard drive and RAID setup for testing.
The command are for pdisk #3, in vdisk #1 (300GB RAID-1 SAS).


> 1. perform a consistency check on the vd

----8<----8<----8<----8<----
$ sudo omconfig storage vdisk action=checkconsistency controller=0
vdisk=1
---->8---->8---->8---->8----

Then i got that in /var/log/syslog:
----8<----8<----8<----8<----
Jun 18 10:53:15 myserver Server Administrator: Storage Service EventID:
2058  Virtual disk Check Consistency started:  Virtual Disk 1 (Virtual
Disk 1) Controller 0 (PERC 5/i Integrated)
Jun 18 12:05:08 myserver Server Administrator: Storage Service EventID:
2085  Virtual disk Check Consistency completed:  Virtual Disk 1 (Virtual
Disk 1) Controller 0 (PERC 5/i Integrated)
---->8---->8---->8---->8----

I guess everything should be fine.


> 2. when the CC has finished, issue the offline command to the SMART
disk

Checked the drive i was about to remove was the good one:
----8<----8<----8<----8<----
$ sudo omconfig storage pdisk action=blink controller=0 pdisk=0:0:3
$ sudo omconfig storage pdisk action=blink controller=0 pdisk=0:0:3
---->8---->8---->8---->8----

Since i didn't understood the difference between remove and offline, i
went for offline :)
----8<----8<----8<----8<----
$ sudo omconfig storage pdisk action=offline controller=0 pdisk=0:0:3
---->8---->8---->8---->8----
The drive LED is now alterning amber/green.

Got this in the logs:
----8<----8<----8<----8<----
Jun 18 15:10:42 myserver Server Administrator: Storage Service EventID:
2123  Redundancy lost:  Virtual Disk 1 (Virtual Disk 1) Controller 0
(PERC 5/i Integrated)
Jun 18 15:10:43 myserver Server Administrator: Storage Service EventID:
2057  Virtual disk degraded:  Virtual Disk 1 (Virtual Disk 1) Controller
0 (PERC 5/i Integrated)
Jun 18 15:10:43 myserver Server Administrator: Storage Service EventID:
2050  Physical disk offline:  Physical Disk 0:0:3 Controller 0,
Connector 0
---->8---->8---->8---->8----


> 3. remove SMART disk
> 4. replace with replacement drive
> 5. If rebuild has not started after 5 minutes, assign the replacement
> drive as a dedicated hotspare to the vd

Seconds after the new drive was inserted, the LEDs for vdisk #1 (pdisk
#2 & #3) started to blink:
----8<----8<----8<----8<----
Jun 18 15:21:16 myserver Server Administrator: Storage Service EventID:
2052  Physical disk inserted:  Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:16 myserver Server Administrator: Storage Service EventID:
2121  Device returned to normal:  Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2050  Physical disk offline:  Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2065  Physical disk Rebuild started:  Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2121  Device returned to normal:  Physical Disk 0:0:3 Controller 0,
Connector 0
---->8---->8---->8---->8----

----8<----8<----8<----8<----
$ sudo omreport storage vdisk
...
ID                  : 1
Status              : Non-Critical
Name                : Virtual Disk 1
State               : Degraded
Progress            : Not Applicable
Layout              : RAID-1
Size                : 278.88 GB (299439751168 bytes)
Device Name         : /dev/sdb
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Through
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled
...
---->8---->8---->8---->8----

----8<----8<----8<----8<----
$ sudo omreport storage pdisk controller=0
...

ID                        : 0:0:2
Status                    : Ok
Name                      : Physical Disk 0:0:2
State                     : Online
Failure Predicted         : No
Progress                  : Not Applicable
Type                      : SAS
Capacity                  : 278.88 GB (299439751168 bytes)
Used RAID Disk Space      : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare                 : No
Vendor ID                 : DELL    
Product ID                : ST3300555SS     
Revision                  : T106
Serial No.                : 3LM0DSGY            
Negotiated Speed          : Not Available
Capable Speed             : Not Available
Manufacture Day           : 07
Manufacture Week          : 02
Manufacture Year          : 2005
SAS Address               : 5000C50001E8DBF9

ID                        : 0:0:3
Status                    : Ok
Name                      : Physical Disk 0:0:3
State                     : Rebuilding
Failure Predicted         : No
Progress                  : 5% complete
Type                      : SAS
Capacity                  : 278.88 GB (299439751168 bytes)
Used RAID Disk Space      : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare                 : No
Vendor ID                 : DELL    
Product ID                : HUS153030VLS300 
Revision                  : A280
Serial No.                : J8W3LMYC            
Negotiated Speed          : Not Available
Capable Speed             : Not Available
Manufacture Day           : 03
Manufacture Week          : 08
Manufacture Year          : 2008
SAS Address               : 5000CCA0053EEA69
...
---->8---->8---->8---->8----



Then, about one hour later:

----8<----8<----8<----8<----
Jun 18 16:28:29 myserver Server Administrator: Storage Service EventID:
2092  Physical disk Rebuild completed:  Physical Disk 0:0:3 Controller
0, Connector 0
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2121  Device returned to normal:  Virtual Disk 1 (Virtual Disk 1)
Controller 0 (PERC 5/i Integrated)
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2124  Redundancy normal:  Virtual Disk 1 (Virtual Disk 1) Controller 0
(PERC 5/i Integrated)
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2158  Physical disk online:  Physical Disk 0:0:3 Controller 0, Connector
0
---->8---->8---->8---->8----

----8<----8<----8<----8<----
$ sudo omreport storage vdisk
...

ID                  : 1
Status              : Ok
Name                : Virtual Disk 1
State               : Ready
Progress            : Not Applicable
Layout              : RAID-1
Size                : 278.88 GB (299439751168 bytes)
Device Name         : /dev/sdb
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Through
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled

...
---->8---->8---->8---->8----

----8<----8<----8<----8<----
$ sudo omreport storage pdisk controller=0
...

ID                        : 0:0:3
Status                    : Ok
Name                      : Physical Disk 0:0:3
State                     : Online
Failure Predicted         : No
Progress                  : Not Applicable
Type                      : SAS
Capacity                  : 278.88 GB (299439751168 bytes)
Used RAID Disk Space      : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare                 : No
Vendor ID                 : DELL    
Product ID                : HUS153030VLS300 
Revision                  : A280
Serial No.                : J8W3LMYC            
Negotiated Speed          : Not Available
Capable Speed             : Not Available
Manufacture Day           : 03
Manufacture Week          : 08
Manufacture Year          : 2008
SAS Address               : 5000CCA0053EEA69

...
---->8---->8---->8---->8----


Worked perfectly. I now have to reproduce that on the real production
machine.

Thank you,

-- 
Bertrand LUPART

http://bertrand.gotpike.org/

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list