megaraid (PERC 4e/DC) resets under I/O load

Joshua Schmidlkofer joshland at gmail.com
Thu Jun 14 14:36:18 CDT 2007


I am having this same problem.  I have an ubuntu 6.10 server (edgy) .  I
have been having this problem since I got the PV220S.  I have been
struggling to resolve the errors over a year.  However, the errors continue,
and I have no idea why.

This is from dmesg, with 2.6.20.14
[    2.542299] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST
2006)
[    2.551732] megaraid: 2.20.4.9 (Release Date: Sun Jul 16 12:27:22 EST
2006)
[    2.561304] megaraid: probe new device 0x1000:0x1960:0x1028:0x0518: bus
2:slot 5:func 0
[    2.591936] megaraid: fw version:[352B] bios version:[1.10]
[    2.631981] scsi0 : LSI Logic MegaRAID driver
[    7.830988] scsi 0:2:0:0: Direct-Access     MegaRAID LD 0 RAID1   69G
352B PQ: 0 ANSI: 2
[    7.841199] scsi 0:2:1:0: Direct-Access     MegaRAID LD 1 RAID5 1430G
352B PQ: 0 ANSI: 2

Note: I am running Fimware 3.52b and BIOS 1.10.

On the first channel I have two 73GB drives.  On the Secondary channel, I
have a PV220S, with 6 300GB drives in RAID5.  We just upgraded the firmware
on the disks.  The errors continue.  The box has been replace.  Dell has
replaced the Perc, we upgraded the PERC, we are using 100% Dell hardware,
and Ubuntu linux.   I have not seen this issue resolved anywhere.

About 8 months ago an embedded device manufacturer contacted me, asking if I
had resolved my problems.  I have posted to the LKML.  I have posted to the
Linux SCSI List.   AKPM requested that if 2.6.20 didn't fix it, that I
notify him.   I will be doing so, though likely he has forgotten by now.

We usually only hit this once a day, and I have no idea why.  It actually
started _prior_ to the PV220S, and I cannot reproduce it on demand.

If anyone else has any wisdom or experience to lend, I would appreciate it.


Snippets:
....several hundred truncated....
[53863.814663] megaraid abort: 18128625:78[255:129], fw owner
[53863.814670] megaraid: aborting-18128626 cmd=2a <c=2 t=1 l=0>
[53863.814677] megaraid abort: 18128626:21[255:129], fw owner
[53863.814684] megaraid: aborting-18128631 cmd=2a <c=2 t=1 l=0>
[53863.814691] megaraid abort: 18128631:94[255:129], fw owner
[53863.814703] megaraid: 20 outstanding commands. Max wait 300 sec
[53863.814711] megaraid mbox: Wait for 20 commands to complete:300
[53868.821542] megaraid mbox: Wait for 14 commands to complete:295
[53873.829035] megaraid mbox: Wait for 5 commands to complete:290
[53875.832030] megaraid mbox: reset sequence completed sucessfully
....several hundred truncated....
[54344.428691] megaraid: aborting-18135315 cmd=2a <c=2 t=1 l=0>
[54344.428698] megaraid abort: scsi cmd:18135315, do now own
[54344.428705] megaraid: aborting-18135325 cmd=2a <c=2 t=1 l=0>
[54344.428711] megaraid abort: scsi cmd:18135325, do now own
[54344.428718] megaraid: aborting-18135328 cmd=2a <c=2 t=1 l=0>
[54344.428725] megaraid abort: scsi cmd:18135328, do now own
[54344.428732] megaraid: aborting-18135336 cmd=2a <c=2 t=1 l=0>
[54344.428765] megaraid abort: scsi cmd:18135336, do now own
[54344.428772] megaraid: aborting-18135340 cmd=2a <c=2 t=1 l=0>
[54344.428778] megaraid abort: scsi cmd:18135340, do now own
(END)

This is genuinely bizarre
Sincerely,
  Joshua


On 5/17/07, Kilian CAVALOTTI <kilian.cavalotti at lip6.fr> wrote:
>
> Hi all,
>
> I have two PE 1950 sharing a PV220S setup in cluster mode, using PERC
> 4/eDC
> cards. PERC 4e/DC firmware is 522A, and kernel modules are from
> megaraid-v2.20.4.4-2dkms, which I believe are the latest. The virtual disk
> is a 3 disks Raid5 volume.
>
> Under I/O load, I got the following errors:
> megaraid: aborting-59364 cmd=2a <c=2 t=0 l=0>
> megaraid abort: 59364:6[255:128], fw owner
> megaraid: reseting the host...
> megaraid: 2 outstanding commands. Max wait 180 sec
> megaraid mbox: Wait for 2 commands to complete:180
> megaraid mbox: reset sequence completed sucessfully
> megaraid: fast sync command timed out
> megaraid: reservation reset failed
> megaraid: reseting the host...
> megaraid mbox: reset sequence completed sucessfully
> megaraid: fast sync command timed out
> megaraid: reservation reset failed
> megaraid: reseting the host...
> megaraid mbox: reset sequence completed sucessfully
> megaraid: fast sync command timed out
> megaraid: reservation reset failed
> SCSI error : <1 2 0 0> return code = 0x6000000
> end_request: I/O error, dev sdb, sector 293683517
> scsi1 (0:0): rejecting I/O to offline device
> scsi1 (0:0): rejecting I/O to offline device
> Buffer I/O error on device sdb2, logical block 88269
> lost page write due to I/O error on sdb2
> Aborting journal on device sdb2.
> journal commit I/O error
> journal commit I/O error
>
> I don't know if that's related, but the controller logs show the
> following:
> 05/16/07 19:02:24: EVT#03560-05/16/07 19:02:24: 113=Unexpected sense: PD
> 08
> (e1/s255), CDB: 12 01 80 00 3a 00, Sense: 70 00 05 00 00 00 00 0a 00 00 00
> 00 24 00 00 00 00 00
>
> I thought this issue has been solved by previous firmware releases, but it
> doesn't seem to be the case.
>
> I'd appreciate any hint regarding this issue.
> Thanks a lot,
> --
> Kilian
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20070614/694d1163/attachment.htm 


More information about the Linux-PowerEdge mailing list