PE2650/LSI MegaRaid/PV220 issue...

Brad Viviano viviano at geomagic.com
Thu Apr 15 09:02:00 CDT 2004


Hello,
	I have a PE2650 with an LSI MegaRaid U320-4X controller card in the
133Mhz PCI-X slot.  Attached to it are 2 Dell PV220S's each w/14 146GB 10K RPM
drives.  The 2650 is running RHEL ES 3 Update 1, kernel 2.4.21-9.0.1.  The
2650 is has BIOS A17 with the latest server firmware updates from Dell's
web site.  The LSI card is running BIOS H409 and Firmware 413B (Latest
available from LSI's web site).  The 220's have a split connection so I have
7 drives connected to each of the 4 chanels (No clustering).  Channel 0 & 1
have the first 220, Channel 2 & 3 have the second 220.  I have each 220
configured into 1 large 2TB raid 5 configuration.  Approximatly once a day or
so I am getting the following in my syslog:

megaraid: aborting-2284388 cmd=1c <c=6 t=6 l=0>
megaraid: Waiting for 2 commands to flush: iter:0
megaraid: Waiting for 2 commands to flush: iter:1000
megaraid: Waiting for 2 commands to flush: iter:2000
megaraid: Waiting for 2 commands to flush: iter:3000
megaraid: Waiting for 2 commands to flush: iter:4000
megaraid: Waiting for 2 commands to flush: iter:5000
megaraid: Waiting for 2 commands to flush: iter:6000
megaraid: Waiting for 2 commands to flush: iter:7000
megaraid: Waiting for 2 commands to flush: iter:8000
megaraid: Waiting for 2 commands to flush: iter:9000
.
.
.
megaraid: Waiting for 2 commands to flush: iter:57000
megaraid: Waiting for 2 commands to flush: iter:58000
megaraid: Waiting for 2 commands to flush: iter:59000
megaraid: Waiting for 2 commands to flush: iter:60000
megaraid: critical hardware error!
megaraid: reset-3021981 cmd=1c <c=7 t=6 l=0>
megaraid: aborted cmd 2e1c9d[7b] complete.
megaraid: reservation reset failed.
megaraid: reset sequence successfully completed.

I have tried with the stock 2.10.1 driver that comes with RedHat, and I have
also updated to 2.10.3 from LSI's ftp site with the same results.  The result
of the above error is that my system hangs for about 1-2 seconds and then
continues on without issue.  Otherwise the system operates fine.  I have run
stress tests on the drives for several hours at a shot without issue (No
hangs/crashes, execelent throughput, etc).  But getting a "critical hardware
error" worries me.  I am trying to figure out if this is a problem with the
LSI card or the power vault.  All the "aborting" and "reset" lines are for
cmd=1c, and all are for t=6 and l=0, The only thing that changes is c=, and
that has been either 5, 6, or 7.  I am pretty sure t=6 is the SCSI ID for
target 6, which if I remember correctly is the ID given to the SCSI
controller inside the Powervault.  I assume l=0 refers to Logicial Drive
number (But I could be wrong on that), and I am not sure what c= refers to.
So I am wondering if one of the back plane devices in my 220 is bad, or
having problems.  Does anyone have any pointers or ideas as to what could be
causing these messages.

	Thanks,
		-Brad Viviano
-- 
+--------------------------------------------------------------------------+
| viviano at geomagic.com          Systems Support          Raindrop Geomagic |
+--------------------------------------------------------------------------+




More information about the Linux-PowerEdge mailing list