PERC 4e/Di/Red Hat ES 3 hang on high I/O load
Marcus Franke
mfranke at evendi.de
Mon May 22 10:55:17 CDT 2006
On Mon, May 22, 2006 at 11:34:02AM -0400, Kishore Jalleda wrote:
> On 5/21/06, Mike M <saetaes at gmail.com> wrote:
> > **Apologies for the possible duplicate message; it appears the
> > attachments to the original caused it to be trapped by the listserv
> > software...**
We had these errors with an old bios revision of A01 after upgrading
the memory to 8G RAM.
The server ran fine for 1 year and then started corrupting the
filesystem of the database (lots of I/O-activity).
After upgrading to the latest firmware of the controller I had
two crashes with these error messages.
Last week I updated the servers bios to A05 and stressed the server
two times 3-4 hours with bonnie.. no crashes..
Marcus
> >
> >
> > Hi list, I'm hoping someone can help me out here:
> >
> > We have about 450 PowerEdge 2850s purchased late last summer, in the
> > July/August timeframe. All are running hardware RAID on the internal
> > PERC 4e/Di controller. OS is Red Hat Enterprise Linux 3, Update 5,
> > kernel 2.4.21-32.0.1. MegaRAID driver version v2.10.8.2-RH1, PERC
> > Firmware 521X:H430.
> >
> > Over the past week we have had at least 10 machines hang with the
> > errors shown in the attached screenshots. (If you can't see the jpegs
> > or they get stripped, there are loads of EXT3-fs errors, the infamous
> > "megaraid: aborting", and "megaraid: hardware error, cannot reset"
> > messages). The only way to fix this is to physically power down the
> > machines. When rebooted, it's as if the OS "lost" it's disks -
> > nothing in /var/log/messages, nothing else out of the ordinary other
> > than the fact that the machine wasn't shutdown properly.
> >
> > I/O load has been higher than normal, but nothing the controller
> > shouldn't be able to handle. In fact, we ran similar I/O loads on
> > these boxes in the past, and they didn't do this. I'm stumped, as is
> > Dell's technical support.
> >
> > Has anyone else seen this, and if so, were you able to find a resolution?
> >
> > A little more information below from a machine that crashed last night:
> >
> > [root at host root]# cat /proc/megaraid/hba0/raiddrives-0-9
> > Logical drive: 0:, state: optimal
> > Span depth: 1, RAID level: 5, Stripe size:128, Row size: 6
> > Read Policy: Adaptive, Write Policy: Write back, Cache Policy: Direct IO
> >
> >
> > [root at host root]# cat /proc/megaraid/hba0/diskdrives-ch0
> > Channel: 0 Id: 0 State: Online.
> > Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> > Type: Direct-Access ANSI SCSI revision: 03
> > Channel: 0 Id: 1 State: Online.
> > Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> > Type: Direct-Access ANSI SCSI revision: 03
> > Channel: 0 Id: 2 State: Online.
> > Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> > Type: Direct-Access ANSI SCSI revision: 03
> > Channel: 0 Id: 3 State: Online.
> > Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> > Type: Direct-Access ANSI SCSI revision: 03
> > Channel: 0 Id: 4 State: Online.
> > Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> > Type: Direct-Access ANSI SCSI revision: 03
> > Channel: 0 Id: 5 State: Online.
> > Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> > Type: Direct-Access ANSI SCSI revision: 03
> >
> >
> > Thanks in advance for any help,
> >
> > Mike
> >
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > Please read the FAQ at http://lists.us.dell.com/faq
> >
>
> We were having this issue on our 2650's with Perc 3/Di, the first
> thing I would do is to make sure you have the latest firmware, and
> then for an easy fix, try booting the kernel with the "--noapic"
> option , i have seen uptimes in the range of 80-100 days with this
> option (lot better than 1-5 days earlier)
>
> Kishore Jalleda
> http://kjalleda.googlepages.com/projects
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
--
What kind of love is that? Not to be loved; never to have shown love.
-- Commissioner Nancy Hedford, "Metamorphosis",
stardate 3219.8
More information about the Linux-PowerEdge
mailing list