Server dies under heavy I/O

Steve_Boley@Dell.com Steve_Boley at Dell.com
Tue Aug 26 18:16:00 CDT 2003


update the aic7xxx and see if that fixes it.  Justin Gibbs page has
some statically compiled for various recent kernels and if not available, 
you could hit him up for latest 6.2.3x source.
Steve

-----Original Message-----
From: Basil Hussain [mailto:basil.hussain at kodakweddings.com]
Sent: Tuesday, August 26, 2003 6:00 AM
To: Dell Poweredge Linux List
Subject: Server dies under heavy I/O


Hi all,

We have a Poweredge 1550 running Redhat 8.0 (with latest 2.4.20 errata
kernel) that is having problems when subjected to heavy I/O. Attached to the
server is an external RAID array (via channel B of the in-built SCSI
adapter). Other storage in the server consists of one internal SCSI drive
for the O/S (on channel A of internal SCSI).

It seems to be I/O to the external RAID that is the problem. About 80% of
the I/O on this server is to the RAID. I have seen it die under both heavy
load via Samba and via I/O activity initiated from the console. To give an
example of a most recent problem, I wanted to grab a list of all the
directory names stored on all the partitions of the external RAID. I kicked
off a find command, like so:

find /mount/point/of/raid/parts/ -mindepth 3 -type d > ~/somefile.txt

However, this hung about halfway through. I tried to kill the process, but
to no avail - not even a kill -9 would do it! A bit of Googling revealed
that as the process was stuck in the 'D' state ("uninterruptible sleep
(usually IO)", according to ps man page), it could not be killed. A reboot
seemed to be the common answer.

However, the server would not complete the usual shut down process and
stopped dead, complaining that various partitions could not be unmounted
because they were busy (the ones mentioned were actually the ones being read
from (directories I was searching) and written to (containing somefile.txt).
I had to power cycle the server. Needless to say, I'm glad all filesystems
are on journaled ext3!

I am at a loss as to what to suspect as the cause of the problem. Does
anyone have any suggestions?

Should I suspect the RAID array's controller? Bugs in the aic7xxx SCSI
module? Bugs in the kernel? A corrupt filesystem? (Even though every unclean
shutdown recovered from the journal cleanly).

Any suggestions or help appreciated.

Regards,

Basil Hussain
---------------------------------------
Internet Developer, Kodak Weddings
E-Mail: basil.hussain at kodakweddings.com

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list
archives at http://lists.us.dell.com/htdig/




More information about the Linux-PowerEdge mailing list