aacraid error messages -- please help
Patrick J. LoPresti
patl at curl.com
Sat Dec 7 18:46:01 CST 2002
jason andrade <jason at dstc.edu.au> writes:
> by latest, i am assuming you are now on A07 and 2.7-1 for the bios
> and perc respectively ?
Yes. (I only recently rebooted to check.)
> sometimes a single device on a scsi bus can cause issues with all
> the of the devices. it might also be a cabling problem. lastly, it
> could well be that all the drives were "cooked" at sometime in the
> past and/or are all from the same batch which might have had
The system seems to be working fine; those log messages are the only
indication of any problem. If we did not monitor our syslogs closely,
we probably would not even have noticed.
> if it detected a failure on one of the other drives and
> automatically marked it bad it would be trying to rebuild to the
> drive you designated as a spare.
Wouldn't this appear in the logs too?
> or you can download the afaapps rpm from dell (it is available if
> you search for downloads, or alternatively from
Ah, this is exactly what I was looking for. Thank you!
Again, thank you.
> i am assuming that this is still a development box and you can
> afford to blow it away if need be ?
I was starting to bring it up as a secondary DNS server, secondary NTP
server, and so on, with the idea of making it a primary for various
services which are currently running on even older hardware. It is
still early enough that I can blow it away without too much pain.
> o make sure the machine has good airflow
> o are all the fan units working/spinning?
I will double-check these.
Do these systems have an internal temperature monitor which I can
check from Linux?
> o is the machine under warranty ? can you get dell to replace one
> or more of the drives after running the appropriate tests ?
It is under warranty, and I am running the 32-bit diagnostics now.
With any luck that will turn up something.
> o if you powercycle the machine and bring it back, are the drives
> still complaining ? if not, can you measure the internval until
> they do?
The yellow blinking "X" light on drive 3 turned off when I rebooted,
but the PERC BIOS still says that drive 3 is "returning errors" during
POST. That might just be a stale message, though.
> o can you bring up the system with any two drives configured in
> raid1, rather than a raid5 with 3+1. you might be able to mix
> and match to find the offending drive.
I suppose I can just pull the drives one at a time. 1) Pull drive, 2)
wait for RAID to rebuild, 3) replace pulled drive, 4) mark it as hot
spare, 5) return to step (1). The trouble is that this looks like an
intermittent problem; it took several days to show up the first time.
Thank you again for the quick and useful reply.
More information about the Linux-PowerEdge