High IO Wait - PowerEdge 2650 - Perc 3/Di

Ray Van Dolson rvandolson at esri.com
Fri Jan 19 10:59:04 CST 2007

On Fri, Jan 19, 2007 at 09:18:35AM -0500, Tillotson, Jeff wrote:
> Hello all,
> I have a Dell 2650 with a RAID1 drive connected to a Dell PowerEdge
> Expandable RAID controller 3/Di on a SuSE Enterprise Linux 9.3 server
> running 2.6.5-7.283-bigsmp #1 SMP Wed Nov 29 16:55:53 UTC 2006 i686 i686
> i386 GNU/Linux.
> The system is hanging during or after file writes.  If I do something
> like:
> dd if=/dev/zero of=/home/jefft/file1 bs=512 count=864000 the system will
> eventually hang and come back.  Using sar, iostat, top, I can watch
> iowait
> percentage climb and climb until it hangs.  Then, when I can gain access
> again, it drops slowly.  The iowait percentage goes up even after the dd
> has completed.
> I have updated the PERC's firmware to
> There is no error messages in dmesg, log messages or anyplace else.  Is
> there
> some mechanism for tuning this?  Any help would be greatly appreciated.

Jeff, also running some 2650's.  RHEL3.  The machines seem to have similar
issues when placed under high IO load (DB servers).  We'll see errors in dmesg
or via netdump such as the following:

  aacraid: Host adapter reset request. SCSI hang ?
  aacraid: Host adapter reset request. SCSI hang ?
  aacraid: Host adapter reset request. SCSI hang ?

Followed by ext3 journal errors.  Sometimes things will clear up on their own,
other times we just have to reset the machine.  The machine will then run fine
again for a while.

Several machines do this, so we're pretty sure it's an issue with the PERC or
the driver.

I just updated the PERC's BIOS to 6098 as you did, and also updated the system
BIOS to A21.  The RAID driver for the 2.4.21-47 kernel (RH provided) shows up

  Adaptec aacraid driver (1.1-5[2412])

There's a driver on Dell's site for the PERC, but I am guessing it is
older?  In any case, I've been hoping that the system will not crash any
longer after updating the BIOS.  I'll have to check the kscand/kswapd stuff
that the other poster mentioned as well.

I've seen threads in the past on IO issues with the 2650's.  Never really seen
a clean cut solution though.


