aacraid error messages -- please help

Patrick J. LoPresti patl at curl.com
Sat Dec 7 11:44:00 CST 2002


I have a PowerEdge 2450 server with a PERC 3/Si RAID controller and
four 36G SCSI disks.  I recently installed Red Hat 8.0 on this system;
it used to be a Win2k machine gathering dust.

I began by updating the system BIOS and PERC firmware to the latest
versions from Dell's site.  I used the PERC BIOS utility to create a
single RAID 5 container housing drives 0, 1, and 2.  I configured
device 3 as a hot spare.

I installed RH8 and all updates without any trouble, and the system
has been running fine for a few days.  Then last night it logged these
messages:

    18:20:57 kernel: aacraid:ID(0:00:0) Timeout detected on cmd[0x28]
    18:20:57 kernel: aacraid:ID(0:01:0) Timeout detected on cmd[0x28]
    18:20:57 kernel: aacraid:SCSI Channel[0]: Timeout Detected On 2 Command(s)
    18:21:07 kernel: aacraid:ID(0:00:0); Abort Timeout. Resetting Bus 0
    18:21:08 kernel: aacraid:ID(0:01:0); Aborted Command [command:0x28]
    18:21:08 kernel: aacraid:ID(0:00:0); Aborted Command [command:0x28]
    18:21:10 kernel: aacraid:ID(0:00:0); Error Event [command:0x28]
    18:21:10 kernel: aacraid:ID(0:00:0); Unit Attention [k:0x6,c:0x29,q:0x2]
    18:21:10 kernel: aacraid:ID(0:01:0); Error Event [command:0x28]
    18:21:10 kernel: aacraid:ID(0:01:0); Unit Attention [k:0x6,c:0x29,q:0x2]
    18:21:10 kernel: aacraid:ID(0:02:0); Error Event [command:0x28]
    18:21:10 kernel: aacraid:ID(0:02:0); Unit Attention [k:0x6,c:0x29,q:0x2]


These seem to indicate trouble with all three drives, or more likely,
the entire SCSI bus (?).

Just under an hour later, it logged this message:

    19:15:59 kernel: aacraid:Drive 0:3:0 returning error


The yellow light next to the "X" on drive 3 is now flashing.

I have many questions, but they boil down to: What do these messages
mean?  What does the flashing light mean?  Why is the controller even
trying to access the hot spare (drive 3)?  What should I do next?

I have lots of experience with the DAC960 driver and family of
controllers, but almost none with aacraid/PERC.  I have decided to
move my organization toward all-Dell systems, using nothing but the
hardware I can buy preinstalled.  But the result is that I am now
using a driver for which I do not understand the error messages, I do
not know how to query the controller's status, and I do not know how
to perform simple administrative tasks (e.g., replace a drive without
shutting down the machine).

I would appreciate any suggestions.  Thanks!

 - Pat




More information about the Linux-PowerEdge mailing list