megasas: MFI FW status 0x3

RB aoz.syn at gmail.com
Wed Apr 25 12:32:51 CDT 2007


> the 2.6.16 kernel next.  All indications are that it will panic and
> clear any of the other kernel updates/patches of causing the issue.
> If so, one of the [many] changes between 2006/02/28 and 2006/07/03 is
> suspect for introducing this bug.

I was wrong; the following is my updated test table:

                    megasas-commit    megasas      dm-crypt    works
2.6.15-gentoo-r5    2005-11-10        02.00-rc4    1.1.0       yes
2.6.16-hardened-r11 2006-02-28        02.04        1.1.0       yes
2.6.16-hardened-r11 2006-07-02        03.01        1.1.0       yes, with RESETs
2.6.18.8            2006-07-02        03.01        1.1.0       no
2.6.18.8            2006-07-03        03.01        1.1.0       no
2.6.18-hardened-r6  2006-07-03        03.01        1.1.0       no
2.6.20-hardened-r2  current/hand      03.05/03.09  1.3.0       no

It seems to indicate that, instead of the version of the driver, the
bug is dependent on the version of the kernel.  All of the panic
traces I have indicate consistent failure in megasas_isr, and are
either "Unable to handle kernel paging request" or "Unable to handle
kernel NULL pointer dereference".  To my untrained eye, this looks
like a race condition - megasas_isr (Interrupt Service Routine) is
trying to service an interrupt that has already been handled by
another thread.  I can do singleton writes to the disk (small edits to
files), but once the load increases and there are multiple, parallel
interrupts and several items on the queue, the issue immediately
appears.  It's likely exacerbated by my use of dm-crypt and XFS as
well.

I know this may be beyond the level of this list, but it's worth a try...



More information about the Linux-PowerEdge mailing list