PERC 4e/Di/Red Hat ES 3 hang on high I/O load

Marcus Franke mfranke at evendi.de
Mon May 22 10:55:17 CDT 2006


On Mon, May 22, 2006 at 11:34:02AM -0400, Kishore Jalleda wrote:
> On 5/21/06, Mike M <saetaes at gmail.com> wrote:
> > **Apologies for the possible duplicate message; it appears the
> > attachments to the original caused it to be trapped by the listserv
> > software...**

We had these errors with an old bios revision of A01 after upgrading
the memory to 8G RAM.

The server ran fine for 1 year and then started corrupting the
filesystem of the database (lots of I/O-activity).

After upgrading to the latest firmware of the controller I had
two crashes with these error messages.

Last week I updated the servers bios to A05 and stressed the server
two times 3-4 hours with bonnie.. no crashes..



Marcus


> >
> >
> > Hi list, I'm hoping someone can help me out here:
> >
> > We have about 450 PowerEdge 2850s purchased late last summer, in the
> > July/August timeframe.  All are running hardware RAID on the internal
> > PERC 4e/Di controller.  OS is Red Hat Enterprise Linux 3, Update 5,
> > kernel 2.4.21-32.0.1.  MegaRAID driver version v2.10.8.2-RH1, PERC
> > Firmware 521X:H430.
> >
> > Over the past week we have had at least 10 machines hang with the
> > errors shown in the attached screenshots.  (If you can't see the jpegs
> > or they get stripped, there are loads of EXT3-fs errors, the infamous
> > "megaraid: aborting", and "megaraid: hardware error, cannot reset"
> > messages).  The only way to fix this is to physically power down the
> > machines.  When rebooted, it's as if the OS "lost" it's disks -
> > nothing in /var/log/messages, nothing else out of the ordinary other
> > than the fact that the machine wasn't shutdown properly.
> >
> > I/O load has been higher than normal, but nothing the controller
> > shouldn't be able to handle.  In fact, we ran similar I/O loads on
> > these boxes in the past, and they didn't do this.  I'm stumped, as is
> > Dell's technical support.
> >
> > Has anyone else seen this, and if so, were you able to find a resolution?
> >
> > A little more information below from a machine that crashed last night:
> >
> > [root at host root]# cat /proc/megaraid/hba0/raiddrives-0-9
> > Logical drive: 0:, state: optimal
> > Span depth:  1, RAID level:  5, Stripe size:128, Row size:  6
> > Read Policy: Adaptive, Write Policy: Write back, Cache Policy: Direct IO
> >
> >
> > [root at host root]# cat /proc/megaraid/hba0/diskdrives-ch0
> > Channel: 0 Id: 0 State: Online.
> >  Vendor: MAXTOR    Model: ATLAS15K2_146SCA  Rev: JT00
> >  Type:   Direct-Access                      ANSI SCSI revision: 03
> > Channel: 0 Id: 1 State: Online.
> >  Vendor: MAXTOR    Model: ATLAS15K2_146SCA  Rev: JT00
> >  Type:   Direct-Access                      ANSI SCSI revision: 03
> > Channel: 0 Id: 2 State: Online.
> >  Vendor: MAXTOR    Model: ATLAS15K2_146SCA  Rev: JT00
> >  Type:   Direct-Access                      ANSI SCSI revision: 03
> > Channel: 0 Id: 3 State: Online.
> >  Vendor: MAXTOR    Model: ATLAS15K2_146SCA  Rev: JT00
> >  Type:   Direct-Access                      ANSI SCSI revision: 03
> > Channel: 0 Id: 4 State: Online.
> >  Vendor: MAXTOR    Model: ATLAS15K2_146SCA  Rev: JT00
> >  Type:   Direct-Access                      ANSI SCSI revision: 03
> > Channel: 0 Id: 5 State: Online.
> >  Vendor: MAXTOR    Model: ATLAS15K2_146SCA  Rev: JT00
> >  Type:   Direct-Access                      ANSI SCSI revision: 03
> >
> >
> > Thanks in advance for any help,
> >
> > Mike
> >
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > Please read the FAQ at http://lists.us.dell.com/faq
> >
> 
> We were having this issue on our 2650's with Perc 3/Di, the first
> thing I would do is to make sure you have the latest firmware, and
> then for an easy fix, try booting the kernel with the "--noapic"
> option , i have seen uptimes in the range of 80-100 days with this
> option (lot  better than 1-5 days earlier)
> 
> Kishore Jalleda
> http://kjalleda.googlepages.com/projects
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq

-- 

What kind of love is that?  Not to be loved; never to have shown love.
		-- Commissioner Nancy Hedford, "Metamorphosis",
		   stardate 3219.8



More information about the Linux-PowerEdge mailing list