Server dies under heavy I/O

Basil Hussain basil.hussain at kodakweddings.com
Wed Aug 27 05:46:01 CDT 2003


Hi,

Just so everyone knows, this is *not* a problem to-do with on-board RAID of
any kind. Not PERC (megaraid), PERC/Di (aacraid) or Linux software RAID. Nor
is it a problem with a PE 1650, 1750 or 2650.

My server, a PE 1550, has an external RAID array that has it's own
integrated, self-contained controller. It is connected externally via the
standard built-in AIC7899 SCSI controller on-board the server. As far as the
server is concerned, it's just another SCSI device.

But, thanks to others for taking the time to make suggestions about aacraid,
etc.

Regards,

Basil Hussain
---------------------------------------
Internet Developer, Kodak Weddings
E-Mail: basil.hussain at kodakweddings.com

> -----Original Message-----
> From: linux-poweredge-admin at dell.com
> [mailto:linux-poweredge-admin at dell.com]On Behalf Of Stefano Turolla
> Sent: 27 August 2003 10:56
> To: Basil Hussain
> Cc: linux-poweredge at dell.com
> Subject: Re: Server dies under heavy I/O
>
>
> Hi Basil,
> is the raid controller a Perc/3Di from Dell?
> we are having problems with 1650 and 2650.
> the machine hangs with a lots of i/o errors and then the only thing
> you can do is a power-cycle.
> If the kernel module you are using is  aacraid (try with lsmod)
> then you can same problems as us.
> You can have a look on bugzilla, there is an ongoing issue
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=92129
>
> Still no soultion at the moment, but seems to be related with aacraid
> driver
> ciao
> stefano
> On Tue, 2003-08-26 at 13:00, Basil Hussain wrote:
> > Hi all,
> >
> > We have a Poweredge 1550 running Redhat 8.0 (with latest 2.4.20 errata
> > kernel) that is having problems when subjected to heavy I/O.
> Attached to the
> > server is an external RAID array (via channel B of the in-built SCSI
> > adapter). Other storage in the server consists of one internal
> SCSI drive
> > for the O/S (on channel A of internal SCSI).
> >
> > It seems to be I/O to the external RAID that is the problem.
> About 80% of
> > the I/O on this server is to the RAID. I have seen it die under
> both heavy
> > load via Samba and via I/O activity initiated from the console.
> To give an
> > example of a most recent problem, I wanted to grab a list of all the
> > directory names stored on all the partitions of the external
> RAID. I kicked
> > off a find command, like so:
> >
> > find /mount/point/of/raid/parts/ -mindepth 3 -type d > ~/somefile.txt
> >
> > However, this hung about halfway through. I tried to kill the
> process, but
> > to no avail - not even a kill -9 would do it! A bit of Googling revealed
> > that as the process was stuck in the 'D' state ("uninterruptible sleep
> > (usually IO)", according to ps man page), it could not be
> killed. A reboot
> > seemed to be the common answer.
> >
> > However, the server would not complete the usual shut down process and
> > stopped dead, complaining that various partitions could not be unmounted
> > because they were busy (the ones mentioned were actually the
> ones being read
> > from (directories I was searching) and written to (containing
> somefile.txt).
> > I had to power cycle the server. Needless to say, I'm glad all
> filesystems
> > are on journaled ext3!
> >
> > I am at a loss as to what to suspect as the cause of the problem. Does
> > anyone have any suggestions?
> >
> > Should I suspect the RAID array's controller? Bugs in the aic7xxx SCSI
> > module? Bugs in the kernel? A corrupt filesystem? (Even though
> every unclean
> > shutdown recovered from the journal cleanly).
> >
> > Any suggestions or help appreciated.
> >
> > Regards,
> >
> > Basil Hussain
> > ---------------------------------------
> > Internet Developer, Kodak Weddings
> > E-Mail: basil.hussain at kodakweddings.com
> >
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > Please read the FAQ at http://lists.us.dell.com/faq or search
> the list archives at http://lists.us.dell.com/htdig/
> --
> +------+---------+--------+--------+--------+---------+--------+-------+
> | Stefano Turolla                             Phone : +49 89 32006537  |
> | UNIX System Manager                         Fax   : +49 89 32006380  |
> | European Southern Observatory (ESO):        E-Mail: sturolla at eso.org |
> | Karl-Schwarzschild-strasse 2 D-85748 Garching bei Muenchen           |
> +------+---------+--------+--------+--------+---------+--------+-------+
> Computers are like airconditioners ,
> they stop working properly if you open WINDOWS
>
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the
> list archives at http://lists.us.dell.com/htdig/
>




More information about the Linux-PowerEdge mailing list