RedHat 9 aacraid - system fails under extreme disk IO - Reproducable test case

Andrew Mann amann at mythicentertainment.com
Tue Oct 7 17:23:00 CDT 2003


	Unfortunately we've got a good number of 2550s and 2650s in use, and 
replacing the RAID cards isn't ideal.  Mostly we don't have enough load 
to cause this problem, but every now and then we do get an unexplained 
lockup that pulls someone out of bed at 2 AM.
	I searched back through the reports of this and found some posts from 
Mark Salyzyn referencing AAC_NUM_FIB and AAC_NUM_IO_FIB settings.  The 
last comment I see is on 9/9/2003:
"I am suggesting that this value be (AAC_NUM_IO_FIB+64), and limited to 
below 512 (the maximum number of hardware FIBS the Firmware can absorb). 
I will begin testing the stability and side effects of this input."
	However, I don't see any followup, nor does the latest patchset to the 
2.4 series seem to contain any modifications in this area (or 2.5 or 2.6 
since June 2003).
	Additionally, I've just rebuilt the aacraid module here from the RedHat 
SRPM of 2.4.20-20.9 with AAC_NUM_FIB=512 and AAC_NUM_IO_FIB=448, rebuilt 
the rdimage and such and got another crash within 5 minutes of starting 
the test.

	I also see a note from Mark on 8/27/2003:
-----
There is code that does the following in the driver:

	scsicmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 |
SAM_STAT_TASK_SET_FULL;
	aac_io_done(scsicmd);
	return -1;

This is *wrong*, because the none zero return causes the system to hold 
the command in the queue due to the use of the new error handler, yet we 
have also completed the command as `BUSY' *and* as a result of the 
constraints of the aac_io_done call which relocks (on io_request_lock) 
the caller had to unlock leaving a hole that SMP machines fill. By 
dropping the result and done calls in these situations, and holding the 
locks in the caller of such routines, I believe we will close this hole.

....

I will report back on my tests of these changes, but will need a 
volunteer with kernel compile experience to report on the success in 
resolving this issue in the field *please*.
-----

	I'm not familiar enough with the aacraid driver or scsi in general to 
gather the code changes necessary.  There also don't appear to be any 
followups.

	Mark, do you have any updates on this?  I can make code changes, 
recompile, and run a test case that reliably reveals the problem here if 
that's helpful.


I can't see the full panic message, but the parts I can see are 
basically (copied by hand):

CPU 1
EFLAGS: 00010086

EIP is at rmqueue [kernel] 0x127  (2.4.20-20.9smp)
eax: c0343400    ebx: c03445dc    ecx: 00000000
edx: b6d7ca63    esi: 00000000    edi: c03445d0
ebp: 00038000    esp: ee643e80     ds: 0068
es: 0068  ss: 0068

Process dd (pid: 956, stack page = ee643000)

Call trace:   wakeup_kswapd   0xfb (0xee643e90)
               __aloc_pages_limit   0x57
               __alloc_pages        0x101
               generic_file_write   0x394
               ext3_file_write      0x39
               sys_write            0x97
               system_call          0x33

	Although aacraid isn't directly implicated here, I can reproduce this 
on the 2550s and 2650s (aacraid) but not 1750s (megaraid).

Andrew

Paul Anderson wrote:

> We had this same issue with our 2650's running AS 2.1.  Don't know that this is the best answer, but it is the one that worked for us...Replace the on board adapter with a PERC 3/DC (LSI) adapter.  Make sure that you put it on its own bus, we used slot three.  In 2 of our 2650's we are even running this with the HBA's for SAN connectivity.  That said, our solution is about 2 weeks old, though I did run similar tests on the systems after the new install for 8 days and was unable to make them crash.
> 
> Paul
> 
> -----Original Message-----
> From: Andrew Mann [mailto:amann at mythicentertainment.com]
> Sent: Tuesday, October 07, 2003 12:47 PM
> To: linux-poweredge at dell.com
> Cc: Matt Domsch; deanna_bonds at adaptec.com; alan at redhat.com
> Subject: RedHat 9 aacraid - system fails under extreme disk IO -
> Reproducable test case
> 
> 
> 	This has been brought up on the Dell Linux Poweredge list previously, 
> but it doesn't appear that a definative solution or reproducable 
> situation has been presented.  It also seems like the previous reports 
> involved both heavy disk IO as well as heavy network traffic, and so the 
> NIC driver was suspect.
> 	Since we have a number of 2550s and 2650s using the onboard PERC3/Di 
> raid controller (aacraid driver), this issue concerns us.
> 
> 	The following script was run with 6 instances at once on two 2550s and 
> one 2650.
> 
> 2550 configuration
> 2 x P3 1.2 Ghz  kernel: 2.4.20-20.9smp #1 SMP
> 1GB of ram, 2GB of swap, 2 x 18 GB drives in a raid 1 configuration
> 
> 2650 configuration
> 2 x Xeon 2.2 Ghz   kernel: 2.4.20-20.9smp #1 SMP
> 2GB of ram, 2GB of swap, 2 x 18 GB drives in a raid 1 configuration
> Hyperthreading enabled
> 
> 
> 	The 2550s fail within 30 minutes of starting the tests each time (tests 
> were run 6 times in a row).  The 2650 failed prior to 2.5 days (only 1 
> test run due to duration before failure).  In some cases the 2550 
> displayed a null pointer dereference in the kernel.  I'll copy down 
> details next time I can catch it on screen.  It does not get logged to 
> disk, which doesn't surprise me in this situation.  In most cases the 
> screen was blank (due to APM I'd guess?).
> 	The systems still respond to pings, but do not respond to keyboard 
> actions and do not complete any tcp connections.  These systems do not 
> have a graphical desktop installed, and in fact have a fairly minimal 
> set of packages installed at all.
> 	I don't know why the 2550 would consistantly fail in such a brief 
> period while the 2650 would take a much longer time before failure. 
> I've been running the same tests on a 1750 (PERC4/Di - Megaraid based) 
> for some days now without a failure.
> 	I plan on testing a non-SMP kernel on the 2550 next - not because we 
> can run things that way, but to maybe give some more clues.
> 
> 	The following script creates a 300 MB file, then rm's it, then does it 
> all over again.  For my tests I ran 6 of these concurrently.  Don't 
> expect the system to respond to much while these are running, though I 
> was able to get decent updates from top.
> 	Alter the script as you see fit, I'm no guru with bash scripting!
> 
> cat diskgrind.sh
> #!/bin/sh
> 
> 
> MEGS=300
> TOTAL=0
> 
> while [ "1" != "0" ]; do
>          dd ibs=1048576 count=$MEGS if=/dev/zero of=/test/diskgrind.$$ 
> 2>&1 | cat >/dev/null
>          rm -f /test/diskgrind.$$
>          TOTAL=`expr $TOTAL + $MEGS`
>          echo "[$$] Completed $TOTAL megs."
> done
> 
> 
> ./diskgrind.sh &
> ./diskgrind.sh &
> ./diskgrind.sh &
> ./diskgrind.sh &
> ./diskgrind.sh &
> ./diskgrind.sh &
> 
> 
> 
> Andrew
> 

-- 
Andrew Mann
Systems Administrator
Mythic Entertainment
703-934-0446 x 224




More information about the Linux-PowerEdge mailing list