RedHat 9 aacraid - system fails under extreme disk IO - Repro ducable test case

Andrew Mann amann at mythicentertainment.com
Wed Oct 8 12:48:00 CDT 2003


	Spoke too soon.  Crash at 45 minutes, though in a different location 
this time  lru_cache_del 0x44  .

Andrew

Andrew Mann wrote:

> Hi Mark,
>     I've got some potentially interesting results on this.
> I downloaded kernel-source-2.4.20.SuSE-62.src.rpm from the Suse ftp 
> site.  It's laid out very nice for this purpose.  It's separated into 
> the stock 2.4.20 kernel with patches for each arch (and a common patch 
> set).  Inside of the common patch set are patches to the aacraid driver. 
>  I applied these patches to the stock 2.4.20 kernel and copied the 
> resulting /drivers/scsi/aacraid/ directory into the redhat source tree 
> for 2.4.20-20.9.  After a dep and clean the build complained of a 
> missing compat.h.  I didn't look to see if it was really used or just a 
> Makefile dependancy - instead I just copied compat.h from the aacraid 
> build 2166 directory.  It built fine.
>     I'm now up and running for 35 minutes - longer than any test yet.
>     I've looked at the patches vs the mainline kernel, and while the 
> FIBS change from 578 to 512 is the only hardware related change, the 
> linked list handling has been completely replaced.  The mainline driver 
> and the RedHat driver both use the kernel implementation of a double 
> linked list.  The new version uses a simple single linked list.  If 
> you're not protecting access to this list correctly a double linked list 
> will give you at least 2x greater chance (usually more) of a really bad 
> situation. I believe it's possible in a single linked list to get away 
> with a number of operations without locking access to the list, 
> especially if they end up being atomic ops.  I don't think it's possible 
> at all on a double linked list.
>     So, I'd search in this direction.  I'll send an update this evening 
> if things are still running fine, and I'll send one immediately if 
> things blow up again.
> 
> Andrew
> 
> Salyzyn, Mark wrote:
> 
>> I have not been able to duplicate this issue, so I am somewhat of a JAFO,
>> and am *not* a definitive resource.
>>
>> This issue is not just one problem. noapic kernel option and turning off
>> HyperThreading have resolved some of the reported issues. Driver changes
>> thus far can not eliminate the problem, but can delay the inevitable. 
>> Build
>> 3157 of the Firmware appears to work fine, Build 3170 fails, but only 
>> with
>> certain Seagate 15K rpm U320 drives.
>> I may be wrong ... any corrections to my assumptions above would be 
>> greatly
>> appreciated.
>>
>> Sincerely -- Mark Salyzyn
>>
>> -----Original Message-----
>> From: Thomas Petersen [mailto:tomp at securityminded.net]
>> Sent: Tuesday, October 07, 2003 8:52 PM
>> To: 'Andrew Mann'
>> Cc: linux-poweredge at dell.com; Salyzyn, Mark
>> Subject: RE: RedHat 9 aacraid - system fails under extreme disk IO -
>> Reproducable test case
>>
>>
>> I am pretty disappointed in Dell for failing to follow up on this and
>> resolve the issue once and for all.  This is not a new problem but it is
>> Dell's responsibility to rectify it as they -certify- Redhat on the 
>> 2650 --
>> regardless if it's a hardware or software issue Dell is responsible to 
>> their
>> customers. 
>> If this was an issue on the Microsoft platform you can bet Dell would of
>> worked with Microsoft and issued a patch/update long before it became 
>> a wide
>> spread problem.  I have always been a huge fan of Dell equipment but 
>> their
>> failure in this instance to support what they sell is very troubling.
>> Don't get me wrong I will probably purchase Dell servers again in the 
>> future
>> (though not the 2650) but can anyone name one problem affecting the
>> Microsoft platform, related to Dell hardware and had a problem of this
>> magnitude, that went unresolved for as long as this one has?  System 
>> lockups
>> are -totally- unacceptable. 
>> I guess when people start choosing with their checkbooks Dell might 
>> wake up.
>>
>> Thomas Petersen
>> SecurityMinded Technologies
>>
>>>> -----Original Message-----
>>>> From: Andrew Mann [mailto:amann at mythicentertainment.com] Sent: 
>>>> Tuesday, October 07, 2003 6:20 PM
>>>> To: linux-poweredge at dell.com
>>>> Cc: mark_salyzyn at adaptec.com
>>>> Subject: Re: RedHat 9 aacraid - system fails under extreme disk IO - 
>>>> Reproducable test case
>>>>
>>>>
>>>>     Unfortunately we've got a good number of 2550s and 2650s in use, 
>>>> and replacing the RAID cards isn't ideal.  Mostly we don't have 
>>>> enough load to cause this problem, but every now and then we do get 
>>>> an unexplained lockup that pulls someone out of bed at 2 AM.
>>>>     I searched back through the reports of this and found some posts 
>>>> from Mark Salyzyn referencing AAC_NUM_FIB and AAC_NUM_IO_FIB 
>>>> settings.  The last comment I see is on 9/9/2003:
>>>> "I am suggesting that this value be (AAC_NUM_IO_FIB+64), and limited 
>>>> to below 512 (the maximum number of hardware FIBS the Firmware can 
>>>> absorb). I will begin testing the stability and side effects of this 
>>>> input."
>>>>     However, I don't see any followup, nor does the latest patchset 
>>>> to the 2.4 series seem to contain any modifications in this area (or 
>>>> 2.5 or 2.6 since June 2003).
>>>>     Additionally, I've just rebuilt the aacraid module here 
>>>
>>>
>>>> from the RedHat 
>>>
>>>
>>>> SRPM of 2.4.20-20.9 with AAC_NUM_FIB=512 and AAC_NUM_IO_FIB=448, 
>>>> rebuilt the rdimage and such and got another crash within 5 minutes 
>>>> of starting the test.
>>>>
>>>>     I also see a note from Mark on 8/27/2003:
>>>> -----
>>>> There is code that does the following in the driver:
>>>>
>>>>     scsicmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 | 
>>>> SAM_STAT_TASK_SET_FULL;
>>>>     aac_io_done(scsicmd);
>>>>     return -1;
>>>>
>>>> This is *wrong*, because the none zero return causes the system to 
>>>> hold the command in the queue due to the use of the new error 
>>>> handler, yet we have also completed the command as `BUSY' *and* as a 
>>>> result of the constraints of the aac_io_done call which relocks (on 
>>>> io_request_lock) the caller had to unlock leaving a hole that SMP 
>>>> machines fill. By dropping the result and done calls in these 
>>>> situations, and holding the locks in the caller of such routines, I 
>>>> believe we will close this hole.
>>>>
>>>> ....
>>>>
>>>> I will report back on my tests of these changes, but will need a 
>>>> volunteer with kernel compile experience to report on the success in 
>>>> resolving this issue in the field *please*.
>>>> -----
>>>>
>>>>     I'm not familiar enough with the aacraid driver or scsi in 
>>>> general to gather the code changes necessary.  There also don't 
>>>> appear to be any followups.
>>>>
>>>>     Mark, do you have any updates on this?  I can make code changes, 
>>>> recompile, and run a test case that reliably reveals the problem 
>>>> here if that's helpful.
>>>>
>>>>
>>>> I can't see the full panic message, but the parts I can see are 
>>>> basically (copied by hand):
>>>>
>>>> CPU 1
>>>> EFLAGS: 00010086
>>>>
>>>> EIP is at rmqueue [kernel] 0x127  (2.4.20-20.9smp)
>>>> eax: c0343400    ebx: c03445dc    ecx: 00000000
>>>> edx: b6d7ca63    esi: 00000000    edi: c03445d0
>>>> ebp: 00038000    esp: ee643e80     ds: 0068
>>>> es: 0068  ss: 0068
>>>>
>>>> Process dd (pid: 956, stack page = ee643000)
>>>>
>>>> Call trace:   wakeup_kswapd   0xfb (0xee643e90)
>>>>              __aloc_pages_limit   0x57
>>>>              __alloc_pages        0x101
>>>>              generic_file_write   0x394
>>>>              ext3_file_write      0x39
>>>>              sys_write            0x97
>>>>              system_call          0x33
>>>>
>>>>     Although aacraid isn't directly implicated here, I can reproduce 
>>>> this on the 2550s and 2650s (aacraid) but not 1750s (megaraid).
>>>>
>>>> Andrew
>>>>
>>>> Paul Anderson wrote:
>>>>
>>>>
>>>>> We had this same issue with our 2650's running AS 2.1.  Don't know 
>>>>> that this is the best answer, but it is the one that worked for 
>>>>> us...Replace the on board adapter with a PERC 3/DC (LSI) adapter.  
>>>>> Make sure that you put it on its own bus, we used slot 
>>>>
>>>>
>>>> three.  In 2 of
>>>>
>>>>> our 2650's we are even running this with the HBA's for SAN 
>>>>> connectivity.  That said, our solution is about 2 weeks 
>>>>
>>>>
>>>> old, though I
>>>>
>>>>> did run similar tests on the systems after the new install 
>>>>
>>>>
>>>> for 8 days
>>>>
>>>>> and was unable to make them crash.
>>>>>
>>>>> Paul
>>>>>
>>>>> -----Original Message-----
>>>>> From: Andrew Mann [mailto:amann at mythicentertainment.com]
>>>>> Sent: Tuesday, October 07, 2003 12:47 PM
>>>>> To: linux-poweredge at dell.com
>>>>> Cc: Matt Domsch; deanna_bonds at adaptec.com; alan at redhat.com
>>>>> Subject: RedHat 9 aacraid - system fails under extreme disk IO - 
>>>>> Reproducable test case
>>>>>
>>>>>
>>>>>     This has been brought up on the Dell Linux Poweredge 
>>>>
>>>>
>>>> list previously,
>>>>
>>>>> but it doesn't appear that a definative solution or reproducable 
>>>>> situation has been presented.  It also seems like the 
>>>>
>>>>
>>>> previous reports
>>>>
>>>>> involved both heavy disk IO as well as heavy network 
>>>>
>>>>
>>>> traffic, and so the
>>>>
>>>>> NIC driver was suspect.
>>>>>     Since we have a number of 2550s and 2650s using the 
>>>>
>>>>
>>>> onboard PERC3/Di
>>>>
>>>>> raid controller (aacraid driver), this issue concerns us.
>>>>>
>>>>>     The following script was run with 6 instances at once 
>>>>
>>>>
>>>> on two 2550s
>>>>
>>>>> and
>>>>> one 2650.
>>>>>
>>>>> 2550 configuration
>>>>> 2 x P3 1.2 Ghz  kernel: 2.4.20-20.9smp #1 SMP
>>>>> 1GB of ram, 2GB of swap, 2 x 18 GB drives in a raid 1 configuration
>>>>>
>>>>> 2650 configuration
>>>>> 2 x Xeon 2.2 Ghz   kernel: 2.4.20-20.9smp #1 SMP
>>>>> 2GB of ram, 2GB of swap, 2 x 18 GB drives in a raid 1 configuration 
>>>>> Hyperthreading enabled
>>>>>
>>>>>
>>>>>     The 2550s fail within 30 minutes of starting the tests 
>>>>
>>>>
>>>> each time
>>>>
>>>>> (tests
>>>>> were run 6 times in a row).  The 2650 failed prior to 2.5 
>>>>
>>>>
>>>> days (only 1
>>>>
>>>>> test run due to duration before failure).  In some cases the 2550 
>>>>> displayed a null pointer dereference in the kernel.  I'll copy down 
>>>>> details next time I can catch it on screen.  It does not 
>>>>
>>>>
>>>> get logged to
>>>>
>>>>> disk, which doesn't surprise me in this situation.  In most 
>>>>
>>>>
>>>> cases the
>>>>
>>>>> screen was blank (due to APM I'd guess?).
>>>>>     The systems still respond to pings, but do not respond 
>>>>
>>>>
>>>> to keyboard
>>>>
>>>>> actions and do not complete any tcp connections.  These 
>>>>
>>>>
>>>> systems do not
>>>>
>>>>> have a graphical desktop installed, and in fact have a 
>>>>
>>>>
>>>> fairly minimal
>>>>
>>>>> set of packages installed at all.
>>>>>     I don't know why the 2550 would consistantly fail in 
>>>>
>>>>
>>>> such a brief
>>>>
>>>>> period while the 2650 would take a much longer time before failure. 
>>>>> I've been running the same tests on a 1750 (PERC4/Di - 
>>>>
>>>>
>>>> Megaraid based)
>>>>
>>>>> for some days now without a failure.
>>>>>     I plan on testing a non-SMP kernel on the 2550 next - 
>>>>
>>>>
>>>> not because we
>>>>
>>>>> can run things that way, but to maybe give some more clues.
>>>>>
>>>>>     The following script creates a 300 MB file, then rm's 
>>>>
>>>>
>>>> it, then does
>>>>
>>>>> it
>>>>> all over again.  For my tests I ran 6 of these concurrently.  Don't 
>>>>> expect the system to respond to much while these are 
>>>>
>>>>
>>>> running, though I
>>>>
>>>>> was able to get decent updates from top.
>>>>>     Alter the script as you see fit, I'm no guru with bash 
>>>>
>>>>
>>>> scripting!
>>>>
>>>>> cat diskgrind.sh
>>>>> #!/bin/sh
>>>>>
>>>>>
>>>>> MEGS=300
>>>>> TOTAL=0
>>>>>
>>>>> while [ "1" != "0" ]; do
>>>>>         dd ibs=1048576 count=$MEGS if=/dev/zero 
>>>>
>>>>
>>>> of=/test/diskgrind.$$
>>>>
>>>>> 2>&1 | cat >/dev/null
>>>>>         rm -f /test/diskgrind.$$
>>>>>         TOTAL=`expr $TOTAL + $MEGS`
>>>>>         echo "[$$] Completed $TOTAL megs."
>>>>> done
>>>>>
>>>>>
>>>>> ./diskgrind.sh &
>>>>> ./diskgrind.sh &
>>>>> ./diskgrind.sh &
>>>>> ./diskgrind.sh &
>>>>> ./diskgrind.sh &
>>>>> ./diskgrind.sh &
>>>>>
>>>>>
>>>>>
>>>>> Andrew
>>>>>
>>>>
>>>> -- 
>>>> Andrew Mann
>>>> Systems Administrator
>>>> Mythic Entertainment
>>>> 703-934-0446 x 224
>>>>
>>>> _______________________________________________
>>>> Linux-PowerEdge mailing list
>>>> Linux-PowerEdge at dell.com
>>>>
>>>>>> http://lists.us.dell.com/mailman/listinfo/linux->>poweredge
>>>>
>>>>
>>>>
>>>> Please read the FAQ at http://lists.us.dell.com/faq or search the 
>>>> list archives at 
>>
>>
>> http://lists.us.dell.com/htdig/
>>
>>
> 

-- 
Andrew Mann
Systems Administrator
Mythic Entertainment
703-934-0446 x 224




More information about the Linux-PowerEdge mailing list