kernel: aacraid: Host adapter reset request. SCSI hang ?
emoenke at gwdg.de
Wed Aug 27 15:05:07 CDT 2003
On Wed, 27 Aug 2003, Salyzyn, Mark wrote:
> I may have a root cause on this issue, even though I have not been able to
> duplicate it yet.
> There is code that does the following in the driver:
> scsicmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 |
> return -1;
> This is *wrong*, because the none zero return causes the system to hold the
> command in the queue due to the use of the new error handler, yet we have
> also completed the command as `BUSY' *and* as a result of the constraints of
> the aac_io_done call which relocks (on io_request_lock) the caller had to
> unlock leaving a hole that SMP machines fill. By dropping the result and
> done calls in these situations, and holding the locks in the caller of such
> routines, I believe we will close this hole.
> Thanks, in part, to Josef Möllers for pointing out this locking problem
> under SMP, serendipitously a day after I had noticed the other problem with
> the inaccurate busy return sequences in the code and started making the
> changes to investigate. Kill two birds with one stone.
> I will report back on my tests of these changes, but will need a volunteer
> with kernel compile experience to report on the success in resolving this
> issue in the field *please*.
Gimme that patch, please!
I have four PE-2650 (out of 14) which die almost every night on this bug.
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)
More information about the Linux-PowerEdge