PERC 4e/Di/Red Hat ES 3 hang on high I/O load
Harald_Jensas at Dell.com
Harald_Jensas at Dell.com
Mon May 22 01:29:46 CDT 2006
> -----Original Message-----
> From: linux-poweredge-bounces at dell.com
> [mailto:linux-poweredge-bounces at dell.com] On Behalf Of Mike M
> Sent: 21 May 2006 21:44
> To: linux-poweredge-Lists
> Subject: PERC 4e/Di/Red Hat ES 3 hang on high I/O load
>
> **Apologies for the possible duplicate message; it appears
> the attachments to the original caused it to be trapped by
> the listserv
> software...**
>
>
> Hi list, I'm hoping someone can help me out here:
>
> We have about 450 PowerEdge 2850s purchased late last summer,
> in the July/August timeframe. All are running hardware RAID
> on the internal PERC 4e/Di controller. OS is Red Hat
> Enterprise Linux 3, Update 5, kernel 2.4.21-32.0.1. MegaRAID
> driver version v2.10.8.2-RH1, PERC Firmware 521X:H430.
>
> Over the past week we have had at least 10 machines hang with
> the errors shown in the attached screenshots. (If you can't
> see the jpegs or they get stripped, there are loads of
> EXT3-fs errors, the infamous
> "megaraid: aborting", and "megaraid: hardware error, cannot reset"
> messages). The only way to fix this is to physically power
> down the machines. When rebooted, it's as if the OS "lost"
> it's disks - nothing in /var/log/messages, nothing else out
> of the ordinary other than the fact that the machine wasn't
> shutdown properly.
>
> I/O load has been higher than normal, but nothing the
> controller shouldn't be able to handle. In fact, we ran
> similar I/O loads on these boxes in the past, and they didn't
> do this. I'm stumped, as is Dell's technical support.
>
> Has anyone else seen this, and if so, were you able to find a
> resolution?
>
> A little more information below from a machine that crashed
> last night:
>
> [root at host root]# cat /proc/megaraid/hba0/raiddrives-0-9
> Logical drive: 0:, state: optimal
> Span depth: 1, RAID level: 5, Stripe size:128, Row size: 6
> Read Policy: Adaptive, Write Policy: Write back, Cache
> Policy: Direct IO
>
>
> [root at host root]# cat /proc/megaraid/hba0/diskdrives-ch0
> Channel: 0 Id: 0 State: Online.
> Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> Type: Direct-Access ANSI SCSI revision: 03
> Channel: 0 Id: 1 State: Online.
> Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> Type: Direct-Access ANSI SCSI revision: 03
> Channel: 0 Id: 2 State: Online.
> Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> Type: Direct-Access ANSI SCSI revision: 03
> Channel: 0 Id: 3 State: Online.
> Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> Type: Direct-Access ANSI SCSI revision: 03
> Channel: 0 Id: 4 State: Online.
> Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> Type: Direct-Access ANSI SCSI revision: 03
> Channel: 0 Id: 5 State: Online.
> Vendor: MAXTOR Model: ATLAS15K2_146SCA Rev: JT00
> Type: Direct-Access ANSI SCSI revision: 03
>
>
> Thanks in advance for any help,
>
> Mike
>
Don't know if it will help, but it seems there is a later megaraid driver available v.2.10.10.1.
perc-EM64T-2.10.10.1-A03.tar.gz
Maby setting up your servers to log to a syslog daemon on an other system will give you more info. As access to disks are lost system is not able to write to /var/log/messages ..
//
Harald Jensås
More information about the Linux-PowerEdge
mailing list