strange problem and pediags question.
Paul A
razor at meganet.net
Wed Nov 8 13:35:23 CST 2006
Recently on a PE 2650, raid 5 with hot spare, I have had a lot of drives
fail.
So today after 2 months since the last failure I noticed another drive
failed.
I 1st noticed this playing with the pediag tools (4.7) using, pediags
raidctrl --show all
I noticed that drive 0:0 was disabled. When I checked the light status on
the server, it listed the drive
Failure on the LCD and the display was orange.
Here is the output of the command above.
Device Index : 4
Device Name : Array Disk 0:0
Description : SEAGATE SX336704LC
Device Class : RAID Array Disk
Device Status : Device is disabled.
Device Location : Channel 0, Target ID 0, Lun 0
What happened next I still can't figure out.
As I took the dirve out and replaced it with another one, which changed the
lcd light to blue,
I started thinking I needed to write a script to alert me when I loose a
drive. I read the docs on pediag and exit status.
So I put the following bash script together.
----------------------------
# test the HD's and email/page noc if theres an issue.
#
./pediags raidctrl --run test-index=2
#./pediags raidctrl --show all 1> /dev/null
# store the last exit code in test
test=`echo $?`
if [ "$test" = "0" ]; then
exit
else
send email alerts etc.
fi
---------------------------------------------
I wanted to test to get an exit code other than 0 so I figured I would put
the old, none working, drive back in the server.
As I did this I noticed the lights change to orange again.
So I ran my script but to my surprise I got an exit code of 0.
When I manually ran ./pediags raidctrl --run test-index=2
I saw the lights on the server blink and after the test drive 0:0 was good
again.
I ran ./pediags raidctrl --show all to double check and I got
Device Index : 4
Device Name : Array Disk 0:0
Description : SEAGATE SX336704LC
Device Class : RAID Array Disk
Device Status : Device is working properly.
Device Location : Channel 0, Target ID 0, Lun 0
I'm not sure how it went from a failed drive to working properly.
So my question is how did this happen ?
Also am I using the correct command, ./pediags raidctrl --run test-index=2,
To test the drives and get back an exit code that is not zero when there is
a problem with one of the drives.
Thanks, P
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20061108/4d949cc5/attachment.htm
More information about the Linux-PowerEdge
mailing list