MegaRAID woes - kernel: scsi : aborting command due to timeout on 2.4.18-19-7.xbigmem

Rechenberg, Andrew ARechenberg at shermanfinancialgroup.com
Sat Jan 18 15:53:00 CST 2003


Good afternoon,

We have a PowerEdge 6600 that has a number of PERC3 RAID controllers in
it and we are getting the following error in /var/log/messages, dmesg,
and the console:

kernel: scsi : aborting command due to timeout : pid 802364, scsi3

There are quite a few of these errors in sequence, not just one.  Here
is a ps -l of the mke2fs that just locked up on us:

[root at box ~]# ps -l -p 2578
  F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY          TIME
CMD
000 D     0  2578  1929  0  75   0    -  1623 get_re pts/2    00:00:05
mke2fs

We have a DC and 2 QC's in this box that is being used as a database
server.  We are trying to increase the disk I/O on the box because we
believe that our application is disk bound right now.  

We had seen these timeouts before when there was only one DC and one QC
in the box but upgrading the firmware of the QC seemed to fix that
issue.  Now after adding a second PERC3/QC, when there is very high disk
activity we received the scsi timeout.

According to the megaraid.c source in Red Hat 2.4.18-19-7.x: 

 *     SD_TIMEOUT in
 *     /drivers/scsi/sd.c, is too short for this controller. SD_TIMEOUT
 *     value must be increased to (30 * HZ) otherwise false timeouts
 *     will occur in the upper layer.

The Red Hat kernel has SD_TIMEOUT set to (60 * HZ), but we are still
receiving timeouts.

Here is some information about my box.  The PERC3's have the most
up-to-date firmware that we know of.  

If anyone can provide assistance, or needs any more information, please
let me know ASAP as this is a production system and we want to get the
best performance possible and don't want the box going down in the
middle of the business day.

Thanks for you help,
Andy.

Andrew Rechenberg
Infrastructure Team, Sherman Financial Group

******************************************************

PE6600 4x1.4GHz Xeon HT Enabled
8GB RAM
OS/backup partition on PERC3/DC
3 data paritions spread across both PERC3/QC's

[root at box ~]# dmesg -s 32767 | grep -i megaraid
megaraid: v1.18d (Release Date: Wed Aug  7 18:51:51 EDT 2002)
megaraid: found 0x101e:0x1960:idx 0:bus 5:slot 0:func 0
scsi1 : Found a MegaRAID controller at 0xf885a000, IRQ: 21
megaraid: [1.74:3.27] detected 2 logical drives
megaraid: supports extended CDBs.
megaraid: channel[1] is raid.
megaraid: channel[2] is raid.
megaraid: found 0x101e:0x1960:idx 1:bus 23:slot 0:func 0
scsi2 : Found a MegaRAID controller at 0xf885c000, IRQ: 30
megaraid: [1.74:3.27] detected 13 logical drives
megaraid: supports extended CDBs.
megaraid: channel[1] is raid.
megaraid: channel[2] is raid.
megaraid: channel[3] is raid.
megaraid: channel[4] is raid.
megaraid: found 0x101e:0x1960:idx 2:bus 30:slot 0:func 0
scsi3 : Found a MegaRAID controller at 0xf885e000, IRQ: 28
megaraid: [1.74:3.27] detected 2 logical drives
megaraid: supports extended CDBs.
megaraid: channel[1] is raid.
megaraid: channel[2] is raid.
megaraid: channel[3] is raid.
megaraid: channel[4] is raid.
scsi1 : LSI Logic MegaRAID 1.74 254 commands 15 targs 5 chans 7 luns
scsi2 : LSI Logic MegaRAID 1.74 254 commands 15 targs 7 chans 7 luns
scsi3 : LSI Logic MegaRAID 1.74 254 commands 15 targs 7 chans 7 luns
  Vendor: MegaRAID  Model: LD 0 RAID1   17G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 1 RAID0  559G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 0 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 1 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 2 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 3 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 4 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 5 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 6 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 7 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 8 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 9 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD10 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD11 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD12 RAID1   34G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 0 RAID1  120G  Rev: 1.74
  Vendor: MegaRAID  Model: LD 1 RAID1  103G  Rev: 1.74

[root at box ~]# cat /etc/redhat-release
Red Hat Linux release 7.3 (Valhalla)

[root at box ~]# uname -a
Linux box.shermfin.com 2.4.18-19.7.xbigmem #1 SMP Thu Dec 12 07:32:12
EST 2002 i686 unknown

[root at box ~]# lsmod
Module                  Size  Used by    Not tainted
lp                      8672   0  (autoclean)
parport                35616   0  (autoclean) [lp]
autofs                 11620   0  (autoclean) (unused)
tg3                    47200   1
usb-ohci               21856   0  (unused)
usbcore                74400   1  [usb-ohci]
ext3                   67360   7
jbd                    51464   7  [ext3]
raid0                   4064   1
megaraid               28608  19
aic7xxx               129856   0  (unused)
sd_mod                 12832  38
scsi_mod              110800   3  [megaraid aic7xxx sd_mod]




More information about the Linux-PowerEdge mailing list