PERC 5/E Firmware Crash (?)

Fischer, Carl fischerc at ll.mit.edu
Wed Sep 5 09:17:53 CDT 2007


I'm pretty sure we saw a PERC 5/E firmware crash over the weekend.  There appears to have been some kind of medium error on Device 18 (MD-1000 #0, Disk #9).  The disk got booted, the hot-spare kicked in, then the controller crashed.  Is my read of the events correct?  

I've got a service request in with the storage group.

 - Carl

****************** LOG SNIPPETS ******************
T0: LSI Logic MegaRAID firmware loaded
T0: Firmware version 1.03.10-0216 built on Feb 26 2007 at 14:45:28
T0: Board is type 1028/0015/1028/1f01

T0: Initializing 1MB memory pool
T0: EVT#06043-T0:   0=Firmware initialization started (PCI ID 0015/1028/1f01/1028)
T0: EVT#06044-T0:   1=Firmware version 1.03.10-0216
T0: Authenticating RAID key: Done!
T0: EepromInit: Family=33, SN=a12671010000
********* [LOTS OF BORING STUFF] *********
09/01/07  4:10:22: prDiskStart: starting Patrol Read on PD=13
********* [LOTS MORE PATROL READ STARTS] *********
09/01/07  4:10:22: prDiskStart: starting Patrol Read on PD=30
09/01/07  4:10:22: EVT#06474-09/01/07  4:10:22:  39=Patrol Read started
09/01/07  4:15:39: DEV_REC:Medium Error DevId[18] Tgt 7 retires=0
09/01/07  4:15:39: ErrLBAOffset (6040) LBA(2448000) BadLba=244e040
09/01/07  4:15:39: prCallback: Medium Error on pd=18, StartLba=2448000, ErrLba=244e040
09/01/07  4:15:39: prRecQueue: starting pd=18 recovery - blocking host commands
09/01/07  4:15:39: EVT#06475-09/01/07  4:15:39: 113=Unexpected sense: PD 18(e1/s9), CDB: 2f 00 02 44 80 00 00 80 00 00, Sense: f0 00 03 
02 44 e0 40 0a 00 00 00 00 11 00 00 00 00 0
09/01/07  4:15:39: prRecGo: Ready to attempt recovery errLBA=244e040 on pd=18
09/01/07  4:15:39: prGetLDInfo: MediaErr in ld=0, span=0, arm=9
09/01/07  4:15:39: prRecGo: dataErr found on ld 0 span 0 arm 9
09/01/07  4:15:39: prRecGo: data NOT in cache; cacheLn=ffffffff, row=489c0, stripe=7f1116, refBlk=40, type=0
09/01/07  4:15:39: prRecGo: R5-get cacheLn=609, c_ptr=a14666c0 mem=a13d26b4 & setup cInx=41c c=a0db5760
09/01/07  4:15:39: RtnFrmPrcRcyRd for arm=f mem=a13dbd2c stripe=489c0 type=0 i=1 status=0
********* [LOTS MORE RtnFrmPrcRcyRd's] *********
09/01/07  4:15:40: RtnFrmPrcRcyRd for arm=e mem=a13dbd18 stripe=7f111b type=0 i=0 status=1c
09/01/07  4:15:40: Issuing write verify pd=18, arm=9, span=0, blk=244e040
09/01/07  4:15:40: EVT#06476-09/01/07  4:15:40: 110=Corrected medium error during recovery on PD 18(e1/s9) at 244e040
09/01/07  4:15:41: DEV_REC:Medium Error DevId[18] Tgt 7 retires=0
09/01/07  4:15:41: ErrLBAOffset (0) LBA(244e040) BadLba=244e040
09/01/07  4:15:41: Write MED ERR!!! ErrLBA(0) LBA(244e040)
09/01/07  4:15:41: EVT#06477-09/01/07  4:15:41: 113=Unexpected sense: PD 18(e1/s9), CDB: 2e 00 02 44 e0 40 00 00 01 00, Sense: f0 00 03 
02 44 e0 40 0a 00 00 00 00 11 00 00 00 00 0
09/01/07  4:15:41: EVT#06478-09/01/07  4:15:41: 108=Reassign write operaiton failed on PD 18(e1/s9) at 44020000
09/01/07  4:15:41: EVT#06479-09/01/07  4:15:41:  87=Error on PD 18(e1/s9) (Error 02)
09/01/07  4:15:41: EVT#06480-09/01/07  4:15:41:  81=State change on VD 00/0 from OPTIMAL(3) to DEGRADED(2)
09/01/07  4:15:41: EVT#06481-09/01/07  4:15:41: 251=VD 00/0 is now DEGRADED
09/01/07  4:15:41: EVT#06482-09/01/07  4:15:41: 114=State change on PD 18(e1/s9) from ONLINE(18) to FAILED(11)
09/01/07  4:15:41: EVT#06483-09/01/07  4:15:41: 108=Reassign write operaiton failed on PD 18(e1/s9) at 244e040
09/01/07  4:15:41: EVT#06484-09/01/07  4:15:41:  93=Patrol Read corrected medium error on PD 18(e1/s9) at 244e040
09/01/07  4:15:42: EVT#06485-09/01/07  4:15:42: 114=State change on PD 18(e1/s9) from FAILED(11) to UNCONFIGURED_BAD(1)
09/01/07  4:15:42: EVT#06486-09/01/07  4:15:42: 106=Rebuild automatically started on PD 22(e2/s14)
09/01/07  4:15:42: EVT#06487-09/01/07  4:15:42: 114=State change on PD 22(e2/s14) from HOT SPARE(2) to REBUILD(14)
09/01/07  4:15:42: prDiskCheckOkToRun: PR cannot run on this pd=13 this array=0 is rebuilding
********* [LOTS MORE prDiskCheckOkToRun's] *********
09/01/07  4:15:42: prDiskCheckOkToRun: PR cannot run on this pd=22 not a spare and online
********* [LOTS MORE prDiskCheckOkToRun;s] *********
09/01/07  4:15:42: prDiskCheckOkToRun: PR cannot run on this pd=30 this array=0 is rebuilding
09/01/07  4:15:42: PR cycle complete
09/01/07  4:15:42: EVT#06488-09/01/07  4:15:42:  35=Patrol Read complete
09/01/07  4:15:42: Next PR scheduled to start at 09/08/07  4:10:22
09/01/07  4:15:42: DM_ProcessMsg: DevState UnKnown DevId 18 Flags f0400005 Rdm a0545000 
09/01/07  4:15:42:  MPT_ProcessIo: SMP/STP Completed without ReplyFrame Rdm a0545000 Cmd 9 
[0]: fp=a00ffe78, lr=a0841730  -  MPT_ProcessIo+17c
[1]: fp=a00ffea8, lr=a0841a04  -  MPT_ISR+58
[2]: fp=a00ffed4, lr=a0897a2c  -  FIQ_isr+48
[3]: fp=a00ffefc, lr=a000b164  -  dbits+1788750
[4]: fp=a00fff14, lr=a000a94c  -  dbits+1787f38
[5]: fp=a00fff5c, lr=a087d928  -  set_state+40
[6]: fp=a00fff90, lr=a087d7a8  -  raid_task+304
[7]: fp=a00fffb8, lr=a08976b0  -  main+3b0
[8]: fp=a00fffe4, lr=a0895dc0  -  c_start+30
[9]: fp=a00ffffc, lr=9e8804cc  -  _start+6c
[10]: fp=a0018350, lr=a0006204  -  dbits+17837f0
[11]: fp=a00183fc, lr=4a8  -  000004a8
MonTask: line 293 in file ../../dm/mpt/mptcmpl.c
INTCTL=16c00000:1003dcf, IINTSRC=0:0, FINTSRC=0:1002080, CPSR=600000d3, sp=a00ffbbc
MegaMon> ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
********* END OF RIDICULOUSLY VERBOSE LOG MESSAGES *********

________________________________
Carl H. Fischer IV, PhD	 Phone:	 +1 (781) 981-6702 	
MIT Lincoln Laboratory S3-227	 Cell: 	+1 (339) 440-1849 	
244 Wood Street, Lexington, MA 02420	 Fax: 	+1 (781) 981-7271 	



More information about the Linux-PowerEdge mailing list