AW: AW: PE2650 / Perc 3Di crash

Matthias Pigulla mp at webfactory.de
Tue Aug 5 06:30:05 CDT 2003


Hi,

> > Does not look like a broken disk, does it?
> 
> sometimes on a power cycle you might find that the controller 
> rebuilds onto the 'failed' drive and it all seems ok.  for a while.

Any possibility to do some exhaustive checks? Just before I got your
mail, I called the Dell Tech Support. The guy said that sometimes the
disks may time out, so the controller starts a rebuild, although the
disk itself is fine. The container would be consistent and I should not
need to worry. Hm.

> but it is likely the drive that failed will fail again, fairly soon.
> 
> from your previous set of error messages i am guessing you 
> want to replace the drive on scsi id 2.  it would be better 
> to do this pro-actively rather than waiting for it to fail as 
> this might cause another reboot/kernel panic (it _shouldn't_ 
> but it does happen with scsi aborts/interrupts/timeouts :-/)

What I don't understand: The last lines of output I could grab from the
console looked like:
scsi: aborting command due to timeout: pid ..., scsi0, channel 0, id 0,
lun 0 Read (10) ...somehexnumbers...
which shows SCSI ID 0.

On startup, the controller complained about "Container#0-Stripe
Container#62-Mirror"; the kernel log shows a "aacraid:Container 62
completed REBUILD task:". Container 62 contains SCSI IDs 2 and 3.

I am unsure wheter the ESM log ("Drive 2 drive slot sensor drive
error/removed") mentions a SCSI ID or just a slot number?

So, are you sure it's ID2 and the timeout on ID0 was just a side effect?

Best regards,
Matthias




More information about the Linux-PowerEdge mailing list