aacraid error messages -- please help
jason at dstc.edu.au
Sat Dec 7 19:14:01 CST 2002
On 7 Dec 2002, Patrick J. LoPresti wrote:
> The system seems to be working fine; those log messages are the only
> indication of any problem. If we did not monitor our syslogs closely,
> we probably would not even have noticed.
if it is working fine, perhaps it was some transitory error that
caused a problem.
> > if it detected a failure on one of the other drives and
> > automatically marked it bad it would be trying to rebuild to the
> > drive you designated as a spare.
> Wouldn't this appear in the logs too?
nope - well, there is no syslog message that says something like
"drive failed, rebuilding". it's one of the issues behind monitoring
the perc arrays - people use scripts in cron to do that as a result.
> I was starting to bring it up as a secondary DNS server, secondary NTP
> server, and so on, with the idea of making it a primary for various
> services which are currently running on even older hardware. It is
> still early enough that I can blow it away without too much pain.
good. they are a very nice class of machine. a number of people
still think that the 24XX/25XX machines were the best of breed and
<comments about 26XX censored> :-)
> > o make sure the machine has good airflow
> > o are all the fan units working/spinning?
> I will double-check these.
> Do these systems have an internal temperature monitor which I can
> check from Linux?
they do, but unfortunately you can only access that using the dell
OMSA. this is only qualified for redhat 7.3. they are expecting
a new release for RH8, but it has been promised for a while with
no ETA announced at this point.
OMSA will let you (via the command line, or a web interface) get
the internal temps/fan speeds/voltages and other bits from the
ESM - the dell module that sits behind the sensors etc in the PE.
> > o is the machine under warranty ? can you get dell to replace one
> > or more of the drives after running the appropriate tests ?
> It is under warranty, and I am running the 32-bit diagnostics now.
> With any luck that will turn up something.
if you have seen more than one error then i would lodge a call and get
a dell tech to run you through diagnosing it to his satisfaction
to replace the drive(s). you might get lucky and simply get someone
who replaces all the drives without a question, but that would be
pretty rare for replacing 4 drives..
> I suppose I can just pull the drives one at a time. 1) Pull drive, 2)
> wait for RAID to rebuild, 3) replace pulled drive, 4) mark it as hot
> spare, 5) return to step (1). The trouble is that this looks like an
> intermittent problem; it took several days to show up the first time.
hmm. best of luck. i think you want to be sure it is in top-top
condition before trusting it in production.
More information about the Linux-PowerEdge