SCSI timeouts on PERC 3/DC

Jason Dixon jason at dixongroup.net
Sun Oct 17 07:33:00 CDT 2004


On Oct 11, 2004, at 8:55 AM, Michael Weber wrote:

> Actually, what it might mean is you have a hard drive that is dying and
> the PERC card is not telling you about it.
>
> I had these symptoms and it cost my company 3 days of down-time because
> the RAID card, ever so helpfully, hid the fact that one hard drive was
> going away and coming right back on-line every few hours.  Since the OS
> can only see the logical drives, it has no way of knowing which  
> physical
> drive is having problems.  It was only found after running the  
> extensive
> diags on each drive in the pair for over 45 minutes.  The quick diags
> found nothing.  Of course, without the magic "error code" from the
> diags, Dell won't send you a new drive.  This is one of the reasons I
> now have 6 bright shiney new IBM servers in my lab.
>
> I also have swatch set to alert me on every scsi timeout that is not
> tape drive related.
>
> If it were me, I would take this server down and run diags on the
> drives until you find which physical drive is failing.  Expect an hour
> or more per drive of down time.

What is the recommended diag test for the physical drives?  I don't see  
anything in omdiag beyond the controller test.  And that isn't  
narrowing down the source of my failure:

[root at colo root]# omdiag storage raidctrl device=1 time=30  
quicktest=false
........................................................................ 
........................................................................ 
........................................................................ 
........................................................................ 
........................................................................ 
........................................................................ 
........................................................................ 
........................................................................ 
...............
Device Name     : Dell PERC 3/DC RAID Controller
Description     : Dell PERC 3/DC RAID Controller Device
Location        : PCI Bus 2, Device 0, Function 0
Additional Info : No additional information available
Test Name       : LSI RAID Controller Hardware Test
Result          : Failed
RunTime         : 30


Thanks,

--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net





More information about the Linux-PowerEdge mailing list