strange problem and pediags question.

Paul A razor at meganet.net
Wed Nov 8 13:35:23 CST 2006


Recently on a PE 2650, raid 5 with hot spare, I have had a lot of drives
fail. 
So today after 2 months since the last failure I noticed another drive
failed.

I 1st noticed this playing with the pediag tools (4.7) using, pediags
raidctrl --show all
I noticed that drive 0:0 was disabled. When I checked the light status on
the server, it listed the drive
Failure on the LCD and the display was orange.

Here is the output of the command above.

Device Index        : 4
Device Name         : Array Disk 0:0
Description         : SEAGATE SX336704LC
Device Class        : RAID Array Disk
Device Status       : Device is disabled.
Device Location     : Channel 0, Target ID 0, Lun 0

What happened next I still can't figure out.

As I took the dirve out and replaced it with another one, which changed the
lcd light to blue,
I started thinking I needed to write a script to alert me when I loose a
drive. I read the docs on pediag and exit status.

So I put the following bash script together.

---------------------------- 

# test the HD's and email/page noc if theres an issue.
#
./pediags raidctrl --run test-index=2

#./pediags raidctrl --show all 1> /dev/null

# store the last exit code in test
test=`echo $?`

if [ "$test" = "0" ]; then
        exit
else

send email alerts etc.

fi

---------------------------------------------

I wanted to test to get an exit code other than 0 so I figured I would put
the old, none working, drive back in the server.
As I did this I noticed the lights change to orange again.

So I ran my script but to my surprise I got an exit code of 0. 
When I manually ran ./pediags raidctrl --run test-index=2
I saw the lights on the server blink and after the test drive 0:0 was good
again.

I ran ./pediags raidctrl --show all to double check and I got


Device Index        : 4
Device Name         : Array Disk 0:0
Description         : SEAGATE SX336704LC
Device Class        : RAID Array Disk
Device Status       : Device is working properly.
Device Location     : Channel 0, Target ID 0, Lun 0


I'm not sure how it went from a failed drive to working properly.


So my question is how did this happen ?

Also am I using the correct command, ./pediags raidctrl --run test-index=2, 
To test the drives and get back an exit code that is not zero when there is
a problem with one of the drives.


Thanks, P
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20061108/4d949cc5/attachment.htm 


More information about the Linux-PowerEdge mailing list