PE2500 with RedHat v8.0 experiencing high load and hanging/lockups
peter.smith at UTSouthwestern.edu
Wed Mar 5 13:11:00 CST 2003
This is an odd issue which is why I'm notifying/contacting the list.
I have a PE2500 which, up until about 1 1/2 weeks ago, was running
RedHat v7.1 without a hitch or hiccup. Since things were going so well,
I decided it was high time to upgrade to RedHat v8.0 . At the same
time, I upgraded Squid, its main application. Keep in mind this PE2500
is an older unit, shipped on 9/5/2001, and it is using a PERC 2/Di. The
reason I upgraded it is I have another, newer, PE2500 which has been
running RedHat v8.0 and my newer Squid (all same software revs) using
the same PERC 2/Di but in a newer box, shipped 3/26/2002.
The problem I am having is that the failing machine is experiencing
massive load (>1000) at certain somewhat cyclic times. I reboot this
particular machine every morning at 3:00am. I don't believe the massive
load has to do with anything other than drive access. It seems the raid
driver is sometimes taking up too much time and can lock up the machine.
Only one other time did I have a problem which seemed unrelated to the
raid driver--recently after it rebooted at 3:00am it got stuck
attempting to initialize the AIC7XXX driver at startup. I understand
this is somewhat of a known issue (but for RedHat v8?) and I'm working
on getting the newest newest happiest AIC7XXX driver installed, so this
probably isn't too much of a problem. However, I am running the RedHat
'2.4.18-24.8.0smp' kernel and am still experiencing massive load
problems (which I used to not see when running RedHat v7.1 on this box.)
I'll be setting up the newest newest kernel '2.4.18-26.8.0smp' probably
tonight and will give that a whirl. I have a feeling that unless the
Aacraid driver has been changed I'll experience the same problems. I
see no massive-load or hangs on my other machine at this time.
The only other thing is this machine is using the on-board Eepro card
and two add-on 3c905's. I've left the configuration on these fairly
generic. Plus, nothing, as far as network goes, changed in the upgrade
to RedHat v8.0 .
Any ideas? Pointers? More data? I'm fairly stumped... I suppose at
the worst, I could maybe learn how to hook up a remote kernel
profiler/debugger to get some real numbers on it.. When running
"iostat" it looks like this box does a lot more raid-driver service time
than all the other boxes which leads me to believe it is a raid-driver
(aacraid) issue again.
Thank you in advance...
More information about the Linux-PowerEdge