AW: AW: PE2650 / Perc 3Di crash
mp at webfactory.de
Tue Aug 5 06:30:05 CDT 2003
> > Does not look like a broken disk, does it?
> sometimes on a power cycle you might find that the controller
> rebuilds onto the 'failed' drive and it all seems ok. for a while.
Any possibility to do some exhaustive checks? Just before I got your
mail, I called the Dell Tech Support. The guy said that sometimes the
disks may time out, so the controller starts a rebuild, although the
disk itself is fine. The container would be consistent and I should not
need to worry. Hm.
> but it is likely the drive that failed will fail again, fairly soon.
> from your previous set of error messages i am guessing you
> want to replace the drive on scsi id 2. it would be better
> to do this pro-actively rather than waiting for it to fail as
> this might cause another reboot/kernel panic (it _shouldn't_
> but it does happen with scsi aborts/interrupts/timeouts :-/)
What I don't understand: The last lines of output I could grab from the
console looked like:
scsi: aborting command due to timeout: pid ..., scsi0, channel 0, id 0,
lun 0 Read (10) ...somehexnumbers...
which shows SCSI ID 0.
On startup, the controller complained about "Container#0-Stripe
Container#62-Mirror"; the kernel log shows a "aacraid:Container 62
completed REBUILD task:". Container 62 contains SCSI IDs 2 and 3.
I am unsure wheter the ESM log ("Drive 2 drive slot sensor drive
error/removed") mentions a SCSI ID or just a slot number?
So, are you sure it's ID2 and the timeout on ID0 was just a side effect?
More information about the Linux-PowerEdge