PE2500 with RedHat v8.0 experiencing high load and hanging/lockups

Peter Smith peter.smith at utsouthwestern.edu
Thu Mar 27 10:33:00 CST 2003


At 9:10am CST this machine again locked up from what appears to be 
massive uncontrollable load.  This time I've upgraded the PERC 3/Di 
firmware from #3153 to #3170.  It is currently up.  I'll only know if 
this is the fix after some time, probably a few weeks.  I should mention 
some of the things I've seen on the screen when I get to the frozen 
system.  This time I saw the following.

"aacraid: Host adapter reset request. SCSI hang ?
<msg repeated many times>
 I/O error: dev 08:02, sector 30382784
<same msg> 30382904
<same msg> 30383032
<etc etc>
<same msg> 30386648
<same msg> 30386656"

Previously I'd had on the screen, the following message.
"unable to handle kernel paging request at virtual address 4F072740
oops: 0
unable to handle kernel paging request at virtual address 138EE508
I/O error: dev 08:31, sector 4456760
<same msg> 4456792
<same msg> 4460600
<etc etc>
<same msg> 5243104
<same msg> 5243128"

I am thinking this info might help someone...

Peter Smith


Peter Smith wrote:

> I upgraded the kernel on this box to the then newest 2.4.18-26.8.0smp 
> kernel on 3/7/2003.  Since then it has locked up on 3/15, 3/17, and 
> 3/22 .  This morning I upgraded the kernel to the now newest 
> 2.4.18-27.8.0smp kernel.  It was down from 3/22 until this morning 
> with the following message displayed, many times, "I/O error: dev 
> 08:21, sector 24641776" with different sectors, finally ending in 
> "0".  I just upgraded it to the newest 2.4.18-27.8.0smp this morning.  
> I'm fairly certain there are no disk issues.  I'll continue with my 
> testing over the next week.  Also I notice there was a massive load 
> spike >2000 on 3/22 immediately before it hung.  If I don't see any 
> load spikes and/or locking over the next week then I will move on to 
> updating the firmware on both the system and the PERC (per Jason 
> Andrade's suggestions.)
>
> Peter Smith
>
>
> Peter Smith wrote:
>
>> This is an odd issue which is why I'm notifying/contacting the list.
>>
>> I have a PE2500 which, up until about 1 1/2 weeks ago, was running 
>> RedHat v7.1 without a hitch or hiccup.  Since things were going so 
>> well, I decided it was high time to upgrade to RedHat v8.0 .  At the 
>> same time, I upgraded Squid, its main application.  Keep in mind this 
>> PE2500 is an older unit, shipped on 9/5/2001, and it is using a PERC 
>> 2/Di.  The reason I upgraded it is I have another, newer, PE2500 
>> which has been running RedHat v8.0 and my newer Squid (all same 
>> software revs) using the same PERC 2/Di but in a newer box, shipped 
>> 3/26/2002.
>>
>> The problem I am having is that the failing machine is experiencing 
>> massive load (>1000) at certain somewhat cyclic times.  I reboot this 
>> particular machine every morning at 3:00am.  I don't believe the 
>> massive load has to do with anything other than drive access.  It 
>> seems the raid driver is sometimes taking up too much time and can 
>> lock up the machine. Only one other time did I have a problem which 
>> seemed unrelated to the raid driver--recently after it rebooted at 
>> 3:00am it got stuck attempting to initialize the AIC7XXX driver at 
>> startup.  I understand this is somewhat of a known issue (but for 
>> RedHat v8?) and I'm working on getting the newest newest happiest 
>> AIC7XXX driver installed, so this probably isn't too much of a 
>> problem.  However, I am running the RedHat '2.4.18-24.8.0smp' kernel 
>> and am still experiencing massive load problems (which I used to not 
>> see when running RedHat v7.1 on this box.) I'll be setting up the 
>> newest newest kernel '2.4.18-26.8.0smp' probably tonight and will 
>> give that a whirl.  I have a feeling that unless the Aacraid driver 
>> has been changed I'll experience the same problems.  I see no 
>> massive-load or hangs on my other machine at this time.
>>
>> The only other thing is this machine is using the on-board Eepro card 
>> and two add-on 3c905's.  I've left the configuration on these fairly 
>> generic.  Plus, nothing, as far as network goes, changed in the 
>> upgrade to RedHat v8.0 .
>>
>> Any ideas?  Pointers?  More data?  I'm fairly stumped...  I suppose 
>> at the worst, I could maybe learn how to hook up a remote kernel 
>> profiler/debugger to get some real numbers on it..  When running 
>> "iostat" it looks like this box does a lot more raid-driver service 
>> time than all the other boxes which leads me to believe it is a 
>> raid-driver (aacraid) issue again.
>>
>> Thank you in advance...
>>
>> Peter Smith
>>
>> _______________________________________________
>> Linux-PowerEdge mailing list
>> Linux-PowerEdge at dell.com
>> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>> Please read the FAQ at http://lists.us.dell.com/faq or search the 
>> list archives at http://lists.us.dell.com/htdig/
>
>
>
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the list 
> archives at http://lists.us.dell.com/htdig/






More information about the Linux-PowerEdge mailing list