aacraid error messages -- please help

jason andrade jason at dstc.edu.au
Sat Dec 7 17:57:01 CST 2002


On 7 Dec 2002, Patrick J. LoPresti wrote:

> I have a PowerEdge 2450 server with a PERC 3/Si RAID controller and
> four 36G SCSI disks.  I recently installed Red Hat 8.0 on this system;
> it used to be a Win2k machine gathering dust.

nifty and a good use for it :-)

> I began by updating the system BIOS and PERC firmware to the latest
> versions from Dell's site.  I used the PERC BIOS utility to create a
> single RAID 5 container housing drives 0, 1, and 2.  I configured
> device 3 as a hot spare.

by latest, i am assuming you are now on A07 and 2.7-1 for the bios
and perc respectively ?

> I installed RH8 and all updates without any trouble, and the system
> has been running fine for a few days.  Then last night it logged these
> messages:

[...]

> These seem to indicate trouble with all three drives, or more likely,
> the entire SCSI bus (?).

sometimes a single device on a scsi bus can cause issues with all the of the
devices.  it might also be a cabling problem.  lastly, it could well be that
all the drives were "cooked" at sometime in the past and/or are all from
the same batch which might have had problems.

> I have many questions, but they boil down to: What do these messages
> mean?  What does the flashing light mean?  Why is the controller even
> trying to access the hot spare (drive 3)?  What should I do next?

if it detected a failure on one of the other drives and automatically
marked it bad it would be trying to rebuild to the drive you designated
as a spare.

> hardware I can buy preinstalled.  But the result is that I am now
> using a driver for which I do not understand the error messages, I do
> not know how to query the controller's status, and I do not know how
> to perform simple administrative tasks (e.g., replace a drive without
> shutting down the machine).

you can do this on two levels.

you can boot into the PERC bios and use commands over there.

or you can download the afaapps rpm from dell (it is available if
you search for downloads, or alternatively from

http://planetmirror.com/pub/dell/perc-utils/perc3si/


this provides a command line interface (with some builtin help) for
linux.  chris pascoe and steve boley both pointed me at this
excellent online reference for using afacli

http://docs.us.dell.com/docs/storage/57kgr/cli/en/index.htm


i am assuming that this is still a development box and you can afford
to blow it away if need be ?  there are a number of things you can
do to try and resolve this:

o make sure the machine has good airflow
o are all the fan units working/spinning?
o is the machine under warranty ? can you get dell to replace one
  or more of the drives after running the appropriate tests ?
o if you powercycle the machine and bring it back, are the drives
  still complaining ?  if not, can you measure the internval until
  they do?
o can you bring up the system with any two drives configured in
  raid1, rather than a raid5 with 3+1.  you might be able to mix
  and match to find the offending drive.



regards,

-jason




More information about the Linux-PowerEdge mailing list