Changing a drive in predictable failure state
Bertrand LUPART
bertrand.lupart at linkeo.com
Wed Jun 18 10:44:47 CDT 2008
Hello,
> Best case steps for this (least possible chance of losing data):
Thank you for your answer.
Just to be sure, isn't there any chance for the rebuild (point 5) to
rebuild the wrong disk in case of a foreign configuration? Or is it more
safe to clear the spare disk before operation?
For later reference, below's what i did with a spare PE 2950 with same
hard drive and RAID setup for testing.
The command are for pdisk #3, in vdisk #1 (300GB RAID-1 SAS).
> 1. perform a consistency check on the vd
----8<----8<----8<----8<----
$ sudo omconfig storage vdisk action=checkconsistency controller=0
vdisk=1
---->8---->8---->8---->8----
Then i got that in /var/log/syslog:
----8<----8<----8<----8<----
Jun 18 10:53:15 myserver Server Administrator: Storage Service EventID:
2058 Virtual disk Check Consistency started: Virtual Disk 1 (Virtual
Disk 1) Controller 0 (PERC 5/i Integrated)
Jun 18 12:05:08 myserver Server Administrator: Storage Service EventID:
2085 Virtual disk Check Consistency completed: Virtual Disk 1 (Virtual
Disk 1) Controller 0 (PERC 5/i Integrated)
---->8---->8---->8---->8----
I guess everything should be fine.
> 2. when the CC has finished, issue the offline command to the SMART disk
Checked the drive i was about to remove was the good one:
----8<----8<----8<----8<----
$ sudo omconfig storage pdisk action=blink controller=0 pdisk=0:0:3
$ sudo omconfig storage pdisk action=blink controller=0 pdisk=0:0:3
---->8---->8---->8---->8----
Since i didn't understood the difference between remove and offline, i
went for offline :)
----8<----8<----8<----8<----
$ sudo omconfig storage pdisk action=offline controller=0 pdisk=0:0:3
---->8---->8---->8---->8----
The drive LED is now alterning amber/green.
Got this in the logs:
----8<----8<----8<----8<----
Jun 18 15:10:42 myserver Server Administrator: Storage Service EventID:
2123 Redundancy lost: Virtual Disk 1 (Virtual Disk 1) Controller 0
(PERC 5/i Integrated)
Jun 18 15:10:43 myserver Server Administrator: Storage Service EventID:
2057 Virtual disk degraded: Virtual Disk 1 (Virtual Disk 1) Controller
0 (PERC 5/i Integrated)
Jun 18 15:10:43 myserver Server Administrator: Storage Service EventID:
2050 Physical disk offline: Physical Disk 0:0:3 Controller 0,
Connector 0
---->8---->8---->8---->8----
> 3. remove SMART disk
> 4. replace with replacement drive
> 5. If rebuild has not started after 5 minutes, assign the replacement
> drive as a dedicated hotspare to the vd
Seconds after the new drive was inserted, the LEDs for vdisk #1 (pdisk
#2 & #3) started to blink:
----8<----8<----8<----8<----
Jun 18 15:21:16 myserver Server Administrator: Storage Service EventID:
2052 Physical disk inserted: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:16 myserver Server Administrator: Storage Service EventID:
2121 Device returned to normal: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2050 Physical disk offline: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2065 Physical disk Rebuild started: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2121 Device returned to normal: Physical Disk 0:0:3 Controller 0,
Connector 0
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage vdisk
...
ID : 1
Status : Non-Critical
Name : Virtual Disk 1
State : Degraded
Progress : Not Applicable
Layout : RAID-1
Size : 278.88 GB (299439751168 bytes)
Device Name : /dev/sdb
Type : SAS
Read Policy : Adaptive Read Ahead
Write Policy : Write Through
Cache Policy : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy : Disabled
...
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage pdisk controller=0
...
ID : 0:0:2
Status : Ok
Name : Physical Disk 0:0:2
State : Online
Failure Predicted : No
Progress : Not Applicable
Type : SAS
Capacity : 278.88 GB (299439751168 bytes)
Used RAID Disk Space : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : ST3300555SS
Revision : T106
Serial No. : 3LM0DSGY
Negotiated Speed : Not Available
Capable Speed : Not Available
Manufacture Day : 07
Manufacture Week : 02
Manufacture Year : 2005
SAS Address : 5000C50001E8DBF9
ID : 0:0:3
Status : Ok
Name : Physical Disk 0:0:3
State : Rebuilding
Failure Predicted : No
Progress : 5% complete
Type : SAS
Capacity : 278.88 GB (299439751168 bytes)
Used RAID Disk Space : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : HUS153030VLS300
Revision : A280
Serial No. : J8W3LMYC
Negotiated Speed : Not Available
Capable Speed : Not Available
Manufacture Day : 03
Manufacture Week : 08
Manufacture Year : 2008
SAS Address : 5000CCA0053EEA69
...
---->8---->8---->8---->8----
Then, about one hour later:
----8<----8<----8<----8<----
Jun 18 16:28:29 myserver Server Administrator: Storage Service EventID:
2092 Physical disk Rebuild completed: Physical Disk 0:0:3 Controller
0, Connector 0
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2121 Device returned to normal: Virtual Disk 1 (Virtual Disk 1)
Controller 0 (PERC 5/i Integrated)
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2124 Redundancy normal: Virtual Disk 1 (Virtual Disk 1) Controller 0
(PERC 5/i Integrated)
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2158 Physical disk online: Physical Disk 0:0:3 Controller 0, Connector
0
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage vdisk
...
ID : 1
Status : Ok
Name : Virtual Disk 1
State : Ready
Progress : Not Applicable
Layout : RAID-1
Size : 278.88 GB (299439751168 bytes)
Device Name : /dev/sdb
Type : SAS
Read Policy : Adaptive Read Ahead
Write Policy : Write Through
Cache Policy : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy : Disabled
...
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage pdisk controller=0
...
ID : 0:0:3
Status : Ok
Name : Physical Disk 0:0:3
State : Online
Failure Predicted : No
Progress : Not Applicable
Type : SAS
Capacity : 278.88 GB (299439751168 bytes)
Used RAID Disk Space : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : HUS153030VLS300
Revision : A280
Serial No. : J8W3LMYC
Negotiated Speed : Not Available
Capable Speed : Not Available
Manufacture Day : 03
Manufacture Week : 08
Manufacture Year : 2008
SAS Address : 5000CCA0053EEA69
...
---->8---->8---->8---->8----
Worked perfectly. I now have to reproduce that on the real production
machine.
Thank you,
--
Bertrand LUPART
http://bertrand.gotpike.org/
More information about the Linux-PowerEdge
mailing list