PE2500 with RedHat v8.0 experiencing high load and hanging/lockups

Peter Smith peter.smith at UTSouthwestern.edu
Wed Mar 5 13:11:00 CST 2003


This is an odd issue which is why I'm notifying/contacting the list.

I have a PE2500 which, up until about 1 1/2 weeks ago, was running 
RedHat v7.1 without a hitch or hiccup.  Since things were going so well, 
I decided it was high time to upgrade to RedHat v8.0 .  At the same 
time, I upgraded Squid, its main application.  Keep in mind this PE2500 
is an older unit, shipped on 9/5/2001, and it is using a PERC 2/Di.  The 
reason I upgraded it is I have another, newer, PE2500 which has been 
running RedHat v8.0 and my newer Squid (all same software revs) using 
the same PERC 2/Di but in a newer box, shipped 3/26/2002.

The problem I am having is that the failing machine is experiencing 
massive load (>1000) at certain somewhat cyclic times.  I reboot this 
particular machine every morning at 3:00am.  I don't believe the massive 
load has to do with anything other than drive access.  It seems the raid 
driver is sometimes taking up too much time and can lock up the machine. 
 Only one other time did I have a problem which seemed unrelated to the 
raid driver--recently after it rebooted at 3:00am it got stuck 
attempting to initialize the AIC7XXX driver at startup.  I understand 
this is somewhat of a known issue (but for RedHat v8?) and I'm working 
on getting the newest newest happiest AIC7XXX driver installed, so this 
probably isn't too much of a problem.  However, I am running the RedHat 
'2.4.18-24.8.0smp' kernel and am still experiencing massive load 
problems (which I used to not see when running RedHat v7.1 on this box.) 
 I'll be setting up the newest newest kernel '2.4.18-26.8.0smp' probably 
tonight and will give that a whirl.  I have a feeling that unless the 
Aacraid driver has been changed I'll experience the same problems.  I 
see no massive-load or hangs on my other machine at this time.

The only other thing is this machine is using the on-board Eepro card 
and two add-on 3c905's.  I've left the configuration on these fairly 
generic.  Plus, nothing, as far as network goes, changed in the upgrade 
to RedHat v8.0 .

Any ideas?  Pointers?  More data?  I'm fairly stumped...  I suppose at 
the worst, I could maybe learn how to hook up a remote kernel 
profiler/debugger to get some real numbers on it..  When running 
"iostat" it looks like this box does a lot more raid-driver service time 
than all the other boxes which leads me to believe it is a raid-driver 
(aacraid) issue again.

Thank you in advance...

Peter Smith




More information about the Linux-PowerEdge mailing list