aacraid error messages -- please help

jason andrade jason at dstc.edu.au
Sat Dec 7 19:14:01 CST 2002


On 7 Dec 2002, Patrick J. LoPresti wrote:

> The system seems to be working fine; those log messages are the only
> indication of any problem.  If we did not monitor our syslogs closely,
> we probably would not even have noticed.

if it is working fine, perhaps it was some transitory error that
caused a problem.

> > if it detected a failure on one of the other drives and
> > automatically marked it bad it would be trying to rebuild to the
> > drive you designated as a spare.
>
> Wouldn't this appear in the logs too?

nope - well, there is no syslog message that says something like
"drive failed, rebuilding".  it's one of the issues behind monitoring
the perc arrays - people use scripts in cron to do that as a result.

> I was starting to bring it up as a secondary DNS server, secondary NTP
> server, and so on, with the idea of making it a primary for various
> services which are currently running on even older hardware.  It is
> still early enough that I can blow it away without too much pain.

good.  they are a very nice class of machine.  a number of people
still think that the 24XX/25XX machines were the best of breed and
<comments about 26XX censored> :-)

> > o make sure the machine has good airflow
> > o are all the fan units working/spinning?
>
> I will double-check these.
>
> Do these systems have an internal temperature monitor which I can
> check from Linux?

they do, but unfortunately you can only access that using the dell
OMSA.  this is only qualified for redhat 7.3.  they are expecting
a new release for RH8, but it has been promised for a while with
no ETA announced at this point.

OMSA will let you (via the command line, or a web interface) get
the internal temps/fan speeds/voltages and other bits from the
ESM - the dell module that sits behind the sensors etc in the PE.

> > o is the machine under warranty ? can you get dell to replace one
> >   or more of the drives after running the appropriate tests ?
>
> It is under warranty, and I am running the 32-bit diagnostics now.
> With any luck that will turn up something.

if you have seen more than one error then i would lodge a call and get
a dell tech to run you through diagnosing it to his satisfaction
to replace the drive(s).  you might get lucky and simply get someone
who replaces all the drives without a question, but that would be
pretty rare for replacing 4 drives..

> I suppose I can just pull the drives one at a time.  1) Pull drive, 2)
> wait for RAID to rebuild, 3) replace pulled drive, 4) mark it as hot
> spare, 5) return to step (1).  The trouble is that this looks like an
> intermittent problem; it took several days to show up the first time.

hmm.  best of luck.  i think you want to be sure it is in top-top
condition before trusting it in production.


regards,

-jason




More information about the Linux-PowerEdge mailing list