Server dies under heavy I/O

Basil Hussain basil.hussain at kodakweddings.com
Wed Aug 27 04:07:01 CDT 2003


Hi,

Thanks for that tip. I will try updating the aic7xxx module to the latest
6.2.36 (it's currently running 6.2.8). Oh, and to save anyone Googling
around for half-an-hour, I imagine you would've wanted to give me a link to
the following... ;-)

http://people.freebsd.org/~gibbs/linux/

Regards,

Basil Hussain
---------------------------------------
Internet Developer, Kodak Weddings
E-Mail: basil.hussain at kodakweddings.com


> -----Original Message-----
> From: linux-poweredge-admin at dell.com
> [mailto:linux-poweredge-admin at dell.com]On Behalf Of Steve_Boley at dell.com
> Sent: 26 August 2003 22:23
> To: basil.hussain at kodakweddings.com
> Cc: Linux-Poweredge at dell.com
> Subject: RE: Server dies under heavy I/O
>
>
> update the aic7xxx and see if that fixes it.  Justin Gibbs page has
> some statically compiled for various recent kernels and if not available,
> you could hit him up for latest 6.2.3x source.
> Steve
>
> -----Original Message-----
> From: Basil Hussain [mailto:basil.hussain at kodakweddings.com]
> Sent: Tuesday, August 26, 2003 6:00 AM
> To: Dell Poweredge Linux List
> Subject: Server dies under heavy I/O
>
>
> Hi all,
>
> We have a Poweredge 1550 running Redhat 8.0 (with latest 2.4.20 errata
> kernel) that is having problems when subjected to heavy I/O.
> Attached to the
> server is an external RAID array (via channel B of the in-built SCSI
> adapter). Other storage in the server consists of one internal SCSI drive
> for the O/S (on channel A of internal SCSI).
>
> It seems to be I/O to the external RAID that is the problem. About 80% of
> the I/O on this server is to the RAID. I have seen it die under both heavy
> load via Samba and via I/O activity initiated from the console. To give an
> example of a most recent problem, I wanted to grab a list of all the
> directory names stored on all the partitions of the external
> RAID. I kicked
> off a find command, like so:
>
> find /mount/point/of/raid/parts/ -mindepth 3 -type d > ~/somefile.txt
>
> However, this hung about halfway through. I tried to kill the process, but
> to no avail - not even a kill -9 would do it! A bit of Googling revealed
> that as the process was stuck in the 'D' state ("uninterruptible sleep
> (usually IO)", according to ps man page), it could not be killed. A reboot
> seemed to be the common answer.
>
> However, the server would not complete the usual shut down process and
> stopped dead, complaining that various partitions could not be unmounted
> because they were busy (the ones mentioned were actually the ones
> being read
> from (directories I was searching) and written to (containing
> somefile.txt).
> I had to power cycle the server. Needless to say, I'm glad all filesystems
> are on journaled ext3!
>
> I am at a loss as to what to suspect as the cause of the problem. Does
> anyone have any suggestions?
>
> Should I suspect the RAID array's controller? Bugs in the aic7xxx SCSI
> module? Bugs in the kernel? A corrupt filesystem? (Even though
> every unclean
> shutdown recovered from the journal cleanly).
>
> Any suggestions or help appreciated.
>
> Regards,
>
> Basil Hussain
> ---------------------------------------
> Internet Developer, Kodak Weddings
> E-Mail: basil.hussain at kodakweddings.com
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the list
> archives at http://lists.us.dell.com/htdig/
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the
> list archives at http://lists.us.dell.com/htdig/
>




More information about the Linux-PowerEdge mailing list