legacy megaraid based cards sucks ?
Eric Belhomme
eric.belhomme at icsb.fr
Tue Nov 6 04:39:25 CST 2007
Hi,
I own two refurbished legacy LSI Megaraid based cards :
- a LSI MegaRAID i4,
- a HP NetRAID-1M (LSI Megaraid series 475 if I remember well)
They are listed as this
02:01.0 SCSI storage controller: Adaptec AHA-2940U2/U2W / 7890/7891
02:04.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 02)
02:07.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 21)
02:04.0 0104: 101e:1960 (rev 02)
02:07.0 0104: 101e:1960 (rev 21)
The Adaptec SCSI controller is onboard (CDROM drive and DAT will be
connected on it)
Devices are detected as this :
holy:/usr/src/linux# dmesg|grep scsi
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
scsi1 : LSI Logic MegaRAID driver
scsi[1]: scanning scsi channel 0 [Phy 0] for non-raid devices
scsi[1]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi2 : LSI Logic MegaRAID driver
scsi[2]: scanning scsi channel 0 [Phy 0] for non-raid devices
scsi[1]: scanning scsi channel 2 [Phy 2] for non-raid devices
scsi[1]: scanning scsi channel 3 [Phy 3] for non-raid devices
scsi[1]: scanning scsi channel 4 [virtual] for logical drives
scsi 1:4:0:0: Direct-Access MegaRAID LD 0 RAID5 715G N661 PQ: 0 ANSI: 2
scsi 1:4:1:0: Direct-Access MegaRAID LD 1 RAID5 343G N661 PQ: 0 ANSI: 2
scsi 2:0:5:0: Processor HP SAFTE; U160/M BP 1023 PQ: 0 ANSI: 2
scsi[2]: scanning scsi channel 1 [virtual] for logical drives
scsi 2:1:0:0: Direct-Access MegaRAID LD 0 RAID5 69G H PQ: 0 ANSI: 2
sd 1:4:0:0: Attached scsi generic sg0 type 0
sd 1:4:1:0: Attached scsi generic sg1 type 0
scsi 2:0:5:0: Attached scsi generic sg2 type 3
sd 2:1:0:0: Attached scsi generic sg3 type 0
Of course, the system (GNU/Linux Debian Etch) is installed on the SCSI
RAID5 volume (/dev/sdc) and both raid cards are detected by the
megaraid_mbox driver :
holy:/usr/src/linux# lsmod|grep raid
megaraid_mbox 30448 3
megaraid_mm 10464 1 megaraid_mbox
scsi_mod 136620 6
sg,sd_mod,megaraid_mbox,aic7xxx,scsi_transport_spi,libata
The problem I have is the volumes sda and sdb (so attached to the i4
raid card) goes offline when they are accessed and then i get this kind
of logs :
Nov 5 21:36:05 holy kernel: megaraid: aborting-1030 cmd=2a <c=4 t=1 l=0>
Nov 5 21:36:05 holy kernel: megaraid abort: 1030:1[255:129], fw owner
Nov 5 21:36:05 holy kernel: megaraid: aborting-1031 cmd=2a <c=4 t=1 l=0>
Nov 5 21:36:05 holy kernel: megaraid abort: 1031:0[255:129], fw owner
Nov 5 21:36:05 holy kernel: megaraid: aborting-1032 cmd=2a <c=4 t=1 l=0>
Nov 5 21:36:05 holy kernel: megaraid abort: 1032:3[255:129], fw owner
Nov 5 21:36:05 holy kernel: megaraid: aborting-1033 cmd=2a <c=4 t=1 l=0>
Nov 5 21:36:05 holy kernel: megaraid abort: 1033:2[255:129], fw owner
Nov 5 21:36:05 holy kernel: megaraid: aborting-1034 cmd=2a <c=4 t=1 l=0>
Nov 5 21:36:05 holy kernel: megaraid abort: 1034:4[255:129], fw owner
Nov 5 21:36:05 holy kernel: megaraid: 5 outstanding commands. Max wait
300 sec
Nov 5 21:36:05 holy kernel: megaraid mbox: Wait for 5 commands to
complete:300
Nov 5 21:36:10 holy kernel: megaraid mbox: Wait for 5 commands to
complete:295
Nov 5 21:36:15 holy kernel: megaraid mbox: Wait for 5 commands to
complete:290
...
Nov 5 21:40:56 holy kernel: megaraid mbox: Wait for 5 commands to
complete:10
Nov 5 21:41:01 holy kernel: megaraid mbox: Wait for 5 commands to
complete:5
Nov 5 21:41:06 holy kernel: megaraid mbox: critical hardware error!
Nov 5 21:41:06 holy kernel: megaraid: hw error, cannot reset
Nov 5 21:41:06 holy kernel: megaraid: hw error, cannot reset
Nov 5 21:41:06 holy kernel: sd 0:4:1:0: scsi: Device offlined - not
ready after error recovery
Nov 5 21:41:06 holy last message repeated 4 times
Nov 5 21:41:06 holy kernel: sd 0:4:1:0: [sdb] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
Nov 5 21:41:06 holy kernel: end_request: I/O error, dev sdb, sector
351721439
Nov 5 21:41:06 holy kernel: sd 0:4:1:0: [sdb] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
Nov 5 21:41:06 holy kernel: end_request: I/O error, dev sdb, sector
351721566
Nov 5 21:41:06 holy kernel: sd 0:4:1:0: [sdb] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
Nov 5 21:41:06 holy kernel: end_request: I/O error, dev sdb, sector
351721694
Nov 5 21:41:06 holy kernel: sd 0:4:1:0: [sdb] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
Nov 5 21:41:06 holy kernel: end_request: I/O error, dev sdb, sector
351721822
Nov 5 21:41:06 holy kernel: sd 0:4:1:0: [sdb] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
Nov 5 21:41:06 holy kernel: end_request: I/O error, dev sdb, sector
351721950
I have to precise this kind of error occured also on the NetRaid
controller : I was not able to install Debian on the host because the
install process systematically crashed while copying packages from
official Etch CDROM to raid volume with the same kernel messages. The
workaround I found was to update the netraid firmware to lastest
version, whitch allowed me to finalize the process installation, but the
problem reapeared after an "apt-get upgrade" that caused the
installation of the "linux-image-2.6.18-4-686" kernel.
I solved again by installing "linux-image-2.6.22-1-686" from Debian
Testing, but note that a vanilla kernel 2.6.23-1 freshly downloaded from
http://www.kernel.org" cause again the problem...
Concerning the i4 card, I have to precise how it is configured :
ide channel 0 master is a Fujitsu 250Gb drive,
ide channel 0 slave is a Seagate 120Gb drive,
ide channel 1 master is a Fujitsu 250Gb drive,
ide channel 1 slave is a Seagate 120Gb drive,
ide channel 2 master is a Fujitsu 250Gb drive,
ide channel 2 slave is a Seagate 120Gb drive,
ide channel 3 master is a Fujitsu 250Gb drive,
ide channel 3 slave is a Seagate 120Gb drive,
Logical volume 0 is a raid5 volume with all 4 Fujistu drives,
Logical volume 1 is a raid5 volume with all 4 Seagate drives,
And the firmware revision is the lastest found from the LSI web site.
It seems sda can be accessed safely (at least I was able to format it as
xfs, but I didn't tried to stess it yet), but I can even archieve to
format sdb (kernel messages listed above occured while formating sdb)
So maybe it is not a so good idea to use both master and slave channels
on this card ?
Moreover, this card previously worked on another computer whith only the
4 Seagate drives (all as single master on each IDE channel) and worked
fine with megaraid_mbox driver (and no other megaraid device...)
Of course, the dellmgr utility reports all logical volumes are online,
and all physical drives are alive
So my questions are :
- why the netraid operating is so erratic between kernel revisions ? is
it due to the fact there is 2 controllers onboard ?
- there is a known bug when multiple megaraid based controller are in
use, that could explain my problems ?
- I readen Documentation/scsi/megaraid.txt provided on kernel archive to
get help on how to get health status on controllers and volumes, I also
browsing /sys entries, but I didn't found anything usable. So how to
monitor raid status with the newgen megaraid ???
Sorry for this long post, and many thanks for support :)
--
Rico
More information about the Linux-PowerEdge
mailing list