PE2650 / Perc 3Di crash

Russell Stuart rstuart at lubemobile.com.au
Sun Aug 10 17:30:08 CDT 2003


Another data point.  I lowered AAC_NUM_IO_FIB to 20.  Still crashed. 
Lowering it in this way kills I/O speed - even more so that turning off
caching.

On Sun, 2003-08-10 at 03:43, James Bourne wrote:
> Yesterday at 0700 and 52 seconds I received a timeout on the raid,  then
> shortly after that the adapter hung and I started to get I/O errors.
> Here's the kernel log for the event.

My gut feeling is that there is a hardware/software bug in the RAID
controller somewhere triggered by a change in the SCSI protocol timing -
possibly caused by disk retries.  It is definitely sensitive to the way
you access the disk.  I can not trigger it by doing badblock tests, for
instance.  In my case it only happens while doing a drive to drive
backup.

I always structure the badblock test so the cache is not useful - the
amount tested always exceeds the size of any cache in use, for obvious
reasons.  This is interesting because turning off the controllers cache
also fixes the problem in my case.




More information about the Linux-PowerEdge mailing list