PE2650 / Perc 3Di crash

Salyzyn, Mark mark_salyzyn at adaptec.com
Tue Aug 5 09:42:05 CDT 2003


Deanna no longer works for Adaptec :-(. I have been instructed to replace
her duties regarding the aacraid and dpt_i2o drivers for Linux.

2.4.22-pre6-ac1 has 100 ...

James Bourne has already indicated that with 100, the problem still occurs.
Can someone experiment with, lets say, 64 on an offending system? Adaptec
*is* exploring how to get back to 512 reliably on all variants of adapters,
but I must remind you that not all adapters have this `low' limit. This is
but a case of least common denominator ...

Good on the verify not finding an issue, this means that you are more likely
to have *this* bug rather than the a troublesome drive, but it does not
necessarily mean that you do not have a troublesome drive. There is a
possibility that the combination of outstanding commands and error recovery
on troublesome drives could be giving us your headache.

Sincerely -- Mark Salyzyn

-----Original Message-----
From: Matthias Pigulla [mailto:mp at webfactory.de]
Sent: Tuesday, August 05, 2003 10:27 AM
To: Salyzyn, Mark; James Bourne
Cc: linux-poweredge at dell.com; linux-aacraid-devel at dell.com;
matt_domsch at dell.com; deanna_bonds at adaptec.com
Subject: AW: PE2650 / Perc 3Di crash


Hi,

I added deanna_bonds at adaptec.com to the list of recipients, as the
drivers/scsi/aacraid/README file says that this driver is supported by
Adaptec and that she might be contacted and:

Deanna Bonds <deanna_bonds at adaptec.com> (non-DASD support, PAE fibs and 64
bit, added new adaptec controllers
                     added new ioctls, changed scsi interface to use new
error handler,
                     increased the number of fibs and outstanding commands
to a container)

... she seems to have increased the number of fibs (whatever they are :).

I'd like to hear some more ("official") opinions on either decreasing
AAC_NUM_IO_FIB to 100 and rebuilding one of the newer kernel versions, or
immediately switching to 2.4.22-pre6-ac1 with a value of 100. Possible
consequences, side effects?

Best regards,
Matthias

PS. @Mark I did the dd as well as the afacli/disk verify and got no errors.

> -----Ursprüngliche Nachricht-----
> Von: Salyzyn, Mark [mailto:mark_salyzyn at adaptec.com] 
> Gesendet: Dienstag, 5. August 2003 16:13
> An: 'James Bourne'; Matthias Pigulla
> Cc: linux-poweredge at dell.com; linux-aacraid-devel at dell.com; 
> matt_domsch at dell.com
> Betreff: RE: PE2650 / Perc 3Di crash
> 
> 
> If this is the case ... AAC_NUM_IO_FIB defined in 
> drivers/scsi/aacraid/aacraid.h which was originally set to 
> 512, and is reduced to 116 in the 2.4.19 generic variant of 
> the driver might have to be reduced. The 2.4.20 driver has 
> this value *increased* to 512 (!!!!)
> 
> In Adaptec's release of the driver it is reduced to a value 
> of 100, only because we determined experimentally that 128 
> would crash the adapter, and 100 did not under all test 
> circumstances for a sample of card variants. I have *no* idea 
> where the 116 came from in the . The theoretical maximum in 
> the adapter is 512 with *one* array including the RAID 
> splitting and other Firmware tasks which have to absorb some 
> of the spares above this limit.
> 
> My suggestion is to drop the AAC_NUM_IO_FIB to 100, *maybe* 
> 116, but *not* leave it at 512.
> 
> Sincerely -- Mark Salyzyn
> 
> Value of AAC_NUM_IO_FIB for various kernels:
...

> 
> -----Original Message-----
> From: James Bourne [mailto:jbourne at mtroyal.ab.ca]
> Sent: Tuesday, August 05, 2003 9:43 AM
> To: Matthias Pigulla
> Cc: linux-poweredge at dell.com; linux-aacraid-devel at dell.com; 
> matt_domsch at dell.com
> Subject: Re: PE2650 / Perc 3Di crash
> 
> 
> On Tue, 5 Aug 2003, Matthias Pigulla wrote:
> 
> > Hello everyone,
> > 
> > tonight, I lost one of my PowerEdge boxes with a kernel panic. I'm 
> > running a PERC 3/Di, RAID10, on Debian woody with a custom 2.4.19 
> > kernel. I'll try to provide all information I can collect, I hope 
> > someone can help me to track this issue down. Please bear with me, 
> > although if it's long :)
> 
> FYI, this is what we have seen on our aacraid systems under 
> heavy I/O and CPU load.  It's unclear at this time if this is 
> a firmware issue or a driver issue, but I do know that now 
> Dell and Adaptec are working on a resolution...
> 
> Turning off write caching will provide a work around, 
> although you will still get timeouts, it looks as though the 
> crashes will be prevented.
> 
> Regards
> James Bourne




More information about the Linux-PowerEdge mailing list