Server dies under heavy I/O
sturolla at eso.org
Wed Aug 27 04:58:00 CDT 2003
is the raid controller a Perc/3Di from Dell?
we are having problems with 1650 and 2650.
the machine hangs with a lots of i/o errors and then the only thing
you can do is a power-cycle.
If the kernel module you are using is aacraid (try with lsmod)
then you can same problems as us.
You can have a look on bugzilla, there is an ongoing issue
Still no soultion at the moment, but seems to be related with aacraid
On Tue, 2003-08-26 at 13:00, Basil Hussain wrote:
> Hi all,
> We have a Poweredge 1550 running Redhat 8.0 (with latest 2.4.20 errata
> kernel) that is having problems when subjected to heavy I/O. Attached to the
> server is an external RAID array (via channel B of the in-built SCSI
> adapter). Other storage in the server consists of one internal SCSI drive
> for the O/S (on channel A of internal SCSI).
> It seems to be I/O to the external RAID that is the problem. About 80% of
> the I/O on this server is to the RAID. I have seen it die under both heavy
> load via Samba and via I/O activity initiated from the console. To give an
> example of a most recent problem, I wanted to grab a list of all the
> directory names stored on all the partitions of the external RAID. I kicked
> off a find command, like so:
> find /mount/point/of/raid/parts/ -mindepth 3 -type d > ~/somefile.txt
> However, this hung about halfway through. I tried to kill the process, but
> to no avail - not even a kill -9 would do it! A bit of Googling revealed
> that as the process was stuck in the 'D' state ("uninterruptible sleep
> (usually IO)", according to ps man page), it could not be killed. A reboot
> seemed to be the common answer.
> However, the server would not complete the usual shut down process and
> stopped dead, complaining that various partitions could not be unmounted
> because they were busy (the ones mentioned were actually the ones being read
> from (directories I was searching) and written to (containing somefile.txt).
> I had to power cycle the server. Needless to say, I'm glad all filesystems
> are on journaled ext3!
> I am at a loss as to what to suspect as the cause of the problem. Does
> anyone have any suggestions?
> Should I suspect the RAID array's controller? Bugs in the aic7xxx SCSI
> module? Bugs in the kernel? A corrupt filesystem? (Even though every unclean
> shutdown recovered from the journal cleanly).
> Any suggestions or help appreciated.
> Basil Hussain
> Internet Developer, Kodak Weddings
> E-Mail: basil.hussain at kodakweddings.com
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> Please read the FAQ at http://lists.us.dell.com/faq or search the list archives at http://lists.us.dell.com/htdig/
| Stefano Turolla Phone : +49 89 32006537 |
| UNIX System Manager Fax : +49 89 32006380 |
| European Southern Observatory (ESO): E-Mail: sturolla at eso.org |
| Karl-Schwarzschild-strasse 2 D-85748 Garching bei Muenchen |
Computers are like airconditioners ,
they stop working properly if you open WINDOWS
More information about the Linux-PowerEdge