Changing a drive in predictable failure state
Patrick_Boyd at Dell.com
Patrick_Boyd at Dell.com
Thu Jun 19 09:14:34 CDT 2008
If you have a hotspare already in the system, it will rebuild before you
have a chance to insert the new drive. Other than that, the process
should proceed exactly how you outlined it below.
Also on the Prepare to Remove (remove) vs. the Offline debate... there
is no Prepare to Remove on the PERC 5 or 6 controllers, that command was
deprecated. It is still present in PERC 4 and below.
-----Original Message-----
From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of Bertrand LUPART
Sent: Wednesday, June 18, 2008 10:45 AM
To: linux-poweredge-Lists
Subject: Re: Changing a drive in predictable failure state
Hello,
> Best case steps for this (least possible chance of losing data):
Thank you for your answer.
Just to be sure, isn't there any chance for the rebuild (point 5) to
rebuild the wrong disk in case of a foreign configuration? Or is it more
safe to clear the spare disk before operation?
For later reference, below's what i did with a spare PE 2950 with same
hard drive and RAID setup for testing.
The command are for pdisk #3, in vdisk #1 (300GB RAID-1 SAS).
> 1. perform a consistency check on the vd
----8<----8<----8<----8<----
$ sudo omconfig storage vdisk action=checkconsistency controller=0
vdisk=1
---->8---->8---->8---->8----
Then i got that in /var/log/syslog:
----8<----8<----8<----8<----
Jun 18 10:53:15 myserver Server Administrator: Storage Service EventID:
2058 Virtual disk Check Consistency started: Virtual Disk 1 (Virtual
Disk 1) Controller 0 (PERC 5/i Integrated)
Jun 18 12:05:08 myserver Server Administrator: Storage Service EventID:
2085 Virtual disk Check Consistency completed: Virtual Disk 1 (Virtual
Disk 1) Controller 0 (PERC 5/i Integrated)
---->8---->8---->8---->8----
I guess everything should be fine.
> 2. when the CC has finished, issue the offline command to the SMART
disk
Checked the drive i was about to remove was the good one:
----8<----8<----8<----8<----
$ sudo omconfig storage pdisk action=blink controller=0 pdisk=0:0:3
$ sudo omconfig storage pdisk action=blink controller=0 pdisk=0:0:3
---->8---->8---->8---->8----
Since i didn't understood the difference between remove and offline, i
went for offline :)
----8<----8<----8<----8<----
$ sudo omconfig storage pdisk action=offline controller=0 pdisk=0:0:3
---->8---->8---->8---->8----
The drive LED is now alterning amber/green.
Got this in the logs:
----8<----8<----8<----8<----
Jun 18 15:10:42 myserver Server Administrator: Storage Service EventID:
2123 Redundancy lost: Virtual Disk 1 (Virtual Disk 1) Controller 0
(PERC 5/i Integrated)
Jun 18 15:10:43 myserver Server Administrator: Storage Service EventID:
2057 Virtual disk degraded: Virtual Disk 1 (Virtual Disk 1) Controller
0 (PERC 5/i Integrated)
Jun 18 15:10:43 myserver Server Administrator: Storage Service EventID:
2050 Physical disk offline: Physical Disk 0:0:3 Controller 0,
Connector 0
---->8---->8---->8---->8----
> 3. remove SMART disk
> 4. replace with replacement drive
> 5. If rebuild has not started after 5 minutes, assign the replacement
> drive as a dedicated hotspare to the vd
Seconds after the new drive was inserted, the LEDs for vdisk #1 (pdisk
#2 & #3) started to blink:
----8<----8<----8<----8<----
Jun 18 15:21:16 myserver Server Administrator: Storage Service EventID:
2052 Physical disk inserted: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:16 myserver Server Administrator: Storage Service EventID:
2121 Device returned to normal: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2050 Physical disk offline: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2065 Physical disk Rebuild started: Physical Disk 0:0:3 Controller 0,
Connector 0
Jun 18 15:21:17 myserver Server Administrator: Storage Service EventID:
2121 Device returned to normal: Physical Disk 0:0:3 Controller 0,
Connector 0
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage vdisk
...
ID : 1
Status : Non-Critical
Name : Virtual Disk 1
State : Degraded
Progress : Not Applicable
Layout : RAID-1
Size : 278.88 GB (299439751168 bytes)
Device Name : /dev/sdb
Type : SAS
Read Policy : Adaptive Read Ahead
Write Policy : Write Through
Cache Policy : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy : Disabled
...
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage pdisk controller=0
...
ID : 0:0:2
Status : Ok
Name : Physical Disk 0:0:2
State : Online
Failure Predicted : No
Progress : Not Applicable
Type : SAS
Capacity : 278.88 GB (299439751168 bytes)
Used RAID Disk Space : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : ST3300555SS
Revision : T106
Serial No. : 3LM0DSGY
Negotiated Speed : Not Available
Capable Speed : Not Available
Manufacture Day : 07
Manufacture Week : 02
Manufacture Year : 2005
SAS Address : 5000C50001E8DBF9
ID : 0:0:3
Status : Ok
Name : Physical Disk 0:0:3
State : Rebuilding
Failure Predicted : No
Progress : 5% complete
Type : SAS
Capacity : 278.88 GB (299439751168 bytes)
Used RAID Disk Space : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : HUS153030VLS300
Revision : A280
Serial No. : J8W3LMYC
Negotiated Speed : Not Available
Capable Speed : Not Available
Manufacture Day : 03
Manufacture Week : 08
Manufacture Year : 2008
SAS Address : 5000CCA0053EEA69
...
---->8---->8---->8---->8----
Then, about one hour later:
----8<----8<----8<----8<----
Jun 18 16:28:29 myserver Server Administrator: Storage Service EventID:
2092 Physical disk Rebuild completed: Physical Disk 0:0:3 Controller
0, Connector 0
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2121 Device returned to normal: Virtual Disk 1 (Virtual Disk 1)
Controller 0 (PERC 5/i Integrated)
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2124 Redundancy normal: Virtual Disk 1 (Virtual Disk 1) Controller 0
(PERC 5/i Integrated)
Jun 18 16:28:30 myserver Server Administrator: Storage Service EventID:
2158 Physical disk online: Physical Disk 0:0:3 Controller 0, Connector
0
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage vdisk
...
ID : 1
Status : Ok
Name : Virtual Disk 1
State : Ready
Progress : Not Applicable
Layout : RAID-1
Size : 278.88 GB (299439751168 bytes)
Device Name : /dev/sdb
Type : SAS
Read Policy : Adaptive Read Ahead
Write Policy : Write Through
Cache Policy : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy : Disabled
...
---->8---->8---->8---->8----
----8<----8<----8<----8<----
$ sudo omreport storage pdisk controller=0
...
ID : 0:0:3
Status : Ok
Name : Physical Disk 0:0:3
State : Online
Failure Predicted : No
Progress : Not Applicable
Type : SAS
Capacity : 278.88 GB (299439751168 bytes)
Used RAID Disk Space : 278.88 GB (299439751168 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare : No
Vendor ID : DELL
Product ID : HUS153030VLS300
Revision : A280
Serial No. : J8W3LMYC
Negotiated Speed : Not Available
Capable Speed : Not Available
Manufacture Day : 03
Manufacture Week : 08
Manufacture Year : 2008
SAS Address : 5000CCA0053EEA69
...
---->8---->8---->8---->8----
Worked perfectly. I now have to reproduce that on the real production
machine.
Thank you,
--
Bertrand LUPART
http://bertrand.gotpike.org/
_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq
More information about the Linux-PowerEdge
mailing list