Debug info on system hangs, PE1300's-PE1600s (SMP) and RH Linux

John Murtari jmurtari at
Wed Apr 21 08:11:01 CDT 2004

        We have run a mix of Poweredge Servers, 1300-1600, all SMP
systems, and running RedHat from 7.3 to 9.0.  Both Adaptec and LSI
SCSI chipsets, Seagate and Fujitsu disks. Non RAID.

        The systems were headless, and we would occasionally notice
they would stop operating, i.e. no logins, looked dead from the 
outside, but would still respond to pings.  Would not respond to
a ctrl-alt-del -- but we could recover with the Magic-SysReq key,
sync,unmount,boot -- although as it synced filesystems, it would 
also hang on one.

        Recently we had a busy mail server which would do this
almost once/week.  Because we had some xterms active from the
system we got a 'peek' and what was going on.  Before it became
completely unresponsive we noticed 'w' commands showed very high
task counts, jumping from 4 to 80 to 150 in just seconds.

        One day I ran a 'vmstat 5' in a window and just let it go.
The system finally hung again, I could see the jump in runnable
tasks and also noticed that disk i/o had stopped at the same time.

        I'm pretty convinced there is some type of SCSI issue
that causes a lock up on SMP systems.  Talked to some good guys
at Dell Linux tech support and they made sure we had the most
current drivers and update BIOS, still happens.  We recently
upgraded to 9.0 -- and are still seeing it.  Of course, the log
files show absolutely nothing unusual.

        We are considering upgrade to Redhat Enterprise soon,
not sure if that will make a difference?  Would appreciate any
feedback from those of you who have seen these symptoms.
John Murtari                              Software Workshop Inc.
jmurtari at 315.635.1968(x-211)  "TheBook.Com" (TM)

