kernel: aacraid: Host adapter reset request. SCSI hang ?

Eberhard Moenkeberg emoenke at gwdg.de
Wed Aug 27 15:05:07 CDT 2003


Hi Mark,

On Wed, 27 Aug 2003, Salyzyn, Mark wrote:

> I may have a root cause on this issue, even though I have not been able to
> duplicate it yet.
>
> There is code that does the following in the driver:
>
> 	scsicmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 |
> SAM_STAT_TASK_SET_FULL;
> 	aac_io_done(scsicmd);
> 	return -1;
>
> This is *wrong*, because the none zero return causes the system to hold the
> command in the queue due to the use of the new error handler, yet we have
> also completed the command as `BUSY' *and* as a result of the constraints of
> the aac_io_done call which relocks (on io_request_lock) the caller had to
> unlock leaving a hole that SMP machines fill. By dropping the result and
> done calls in these situations, and holding the locks in the caller of such
> routines, I believe we will close this hole.
>
> Thanks, in part, to Josef Möllers for pointing out this locking problem
> under SMP, serendipitously a day after I had noticed the other problem with
> the inaccurate busy return sequences in the code and started making the
> changes to investigate. Kill two birds with one stone.
>
> I will report back on my tests of these changes, but will need a volunteer
> with kernel compile experience to report on the success in resolving this
> issue in the field *please*.

Gimme that patch, please!
I have four PE-2650 (out of 14) which die almost every night on this bug.

Cheers -e
-- 
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)





More information about the Linux-PowerEdge mailing list