Server dies under heavy I/O

Johnathan Conley jdc at
Tue Aug 26 07:56:01 CDT 2003

I was told to try 2 things from other people on this list.
Disable write cache and disable hypertheading.

I have done each individually and have not had a crash since. I chose to
disable hyperthreading in the end since cache is very useful for us.

It has only been a few days - will keep you posted if we find out more.

-----Original Message-----
From: Basil Hussain [mailto:basil.hussain at] 
Sent: Tuesday, August 26, 2003 6:00 AM
To: Dell Poweredge Linux List
Subject: Server dies under heavy I/O

Hi all,

We have a Poweredge 1550 running Redhat 8.0 (with latest 2.4.20 errata
kernel) that is having problems when subjected to heavy I/O. Attached to
server is an external RAID array (via channel B of the in-built SCSI
adapter). Other storage in the server consists of one internal SCSI
for the O/S (on channel A of internal SCSI).

It seems to be I/O to the external RAID that is the problem. About 80%
the I/O on this server is to the RAID. I have seen it die under both
load via Samba and via I/O activity initiated from the console. To give
example of a most recent problem, I wanted to grab a list of all the
directory names stored on all the partitions of the external RAID. I
off a find command, like so:

find /mount/point/of/raid/parts/ -mindepth 3 -type d > ~/somefile.txt

However, this hung about halfway through. I tried to kill the process,
to no avail - not even a kill -9 would do it! A bit of Googling revealed
that as the process was stuck in the 'D' state ("uninterruptible sleep
(usually IO)", according to ps man page), it could not be killed. A
seemed to be the common answer.

However, the server would not complete the usual shut down process and
stopped dead, complaining that various partitions could not be unmounted
because they were busy (the ones mentioned were actually the ones being
from (directories I was searching) and written to (containing
I had to power cycle the server. Needless to say, I'm glad all
are on journaled ext3!

I am at a loss as to what to suspect as the cause of the problem. Does
anyone have any suggestions?

Should I suspect the RAID array's controller? Bugs in the aic7xxx SCSI
module? Bugs in the kernel? A corrupt filesystem? (Even though every
shutdown recovered from the journal cleanly).

Any suggestions or help appreciated.


Basil Hussain
Internet Developer, Kodak Weddings
E-Mail: basil.hussain at

Linux-PowerEdge mailing list
Linux-PowerEdge at
Please read the FAQ at or search the list
archives at

More information about the Linux-PowerEdge mailing list