Major PERC 4e/DC problem

Tuomas Toropainen tuomas.toropainen at lanwan.fi
Mon Jan 22 02:56:27 CST 2007


Hello all,

This morning revealed me some very bad news. Extrenal raid-5 on our new 
test email server had dropped offline during weekend. After rebooting 
everything seems to work fine again (to my surprise, reiserfs filesystem 
was not irrecoverably corrupted). As everybody understands, this is 
fatal problem in production environment. I really appreciate any help. 
Is this hardware problem? Is the controller or external disk cabinet 
(Dell PV22XS) faulty?

It seems that everything started friday evening. Server is poweredge 
1950 and operating system itself (debian etch) is installed on internal 
raid-1 (PERC 5/i) which was working fine. Only the external raid holding 
mail spool had dropped offline.

In the end of this message is excerpt from kernel log showing the errors 
at the start of the problem. In case somebody wants to take a look, 
complete logs can be found from following URLs:

http://www.lanwan.fi/~ttor/messages
http://www.lanwan.fi/~ttor/messages.0

Thank you very much in advance :)


Here is controller info as displayed by omreport:

---8<---

delta:~# /opt/dell/srvadmin/oma/bin/omreport storage controller
List of Controllers in the system

Controllers
ID                                : 0
Status                            : Ok
Name                              : PERC 4e/DC
Slot ID                           : PCI Slot 1
State                             : Ready
Firmware Version                  : 522A
Minimum Required Firmware Version : Not Applicable
Driver Version                    : Not Applicable
Minimum Required Driver Version   : Not Applicable
Number of Connectors              : 2
Rebuild Rate                      : 30%
BGI Rate                          : Not Applicable
Check Consistency Rate            : Not Applicable
Reconstruct Rate                  : Not Applicable
Alarm State                       : Enabled
Cluster Mode                      : Not Applicable
SCSI Initiator ID                 : 7
Cache Memory Size                 : 128 MB
Patrol Read Mode                  : Not Applicable
Patrol Read State                 : Not Applicable
Patrol Read Rate                  : Not Applicable
Patrol Read Iterations            : Not Applicable

ID                                : 1
Status                            : Ok
Name                              : PERC 5/i Integrated
Slot ID                           : Embedded
State                             : Ready
Firmware Version                  : 5.0.2-0003
Minimum Required Firmware Version : Not Applicable
Driver Version                    : 00.00.03.01
Minimum Required Driver Version   : Not Applicable
Number of Connectors              : 2
Rebuild Rate                      : 30%
BGI Rate                          : 30%
Check Consistency Rate            : 30%
Reconstruct Rate                  : 30%
Alarm State                       : Disabled
Cluster Mode                      : Not Applicable
SCSI Initiator ID                 : Not Applicable
Cache Memory Size                 : 256 MB
Patrol Read Mode                  : Auto
Patrol Read State                 : Stopped
Patrol Read Rate                  : 30%
Patrol Read Iterations            : 3

---8<---


---8<---

Jan 19 19:45:38 delta kernel: megaraid: aborting-85359 cmd=2a <c=2 t=0 l=0>
Jan 19 19:45:38 delta kernel: megaraid abort: 85359:57[255:128], fw owner
Jan 19 19:45:38 delta kernel: megaraid: aborting-85360 cmd=2a <c=2 t=0 l=0>
Jan 19 19:45:38 delta kernel: megaraid abort: 85360[255:128], driver owner
Jan 19 19:45:38 delta kernel: megaraid: aborting-85361 cmd=2a <c=2 t=0 l=0>
Jan 19 19:45:38 delta kernel: megaraid abort: 85361[255:128], driver owner
Jan 19 19:45:38 delta kernel: megaraid: aborting-85362 cmd=2a <c=2 t=0 l=0>
Jan 19 19:45:38 delta kernel: megaraid abort: 85362[255:128], driver owner
Jan 19 19:45:38 delta kernel: megaraid: aborting-85363 cmd=2a <c=2 t=0 l=0>
Jan 19 19:45:38 delta kernel: megaraid abort: 85363[255:128], driver owner
Jan 19 19:45:38 delta kernel: megaraid: IOCTL packet with 
128[65535:65535] being reset
Jan 19 19:45:38 delta kernel: megaraid: IOCTL packet with 
129[65535:65535] being reset
Jan 19 19:45:38 delta kernel: megaraid: 1 outstanding commands. Max wait 
300 sec
Jan 19 19:45:38 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:300
Jan 19 19:45:43 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:295
Jan 19 19:45:48 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:290
Jan 19 19:45:53 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:285
Jan 19 19:45:58 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:280
Jan 19 19:46:03 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:275
Jan 19 19:46:08 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:270
Jan 19 19:46:13 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:265
Jan 19 19:46:18 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:260
Jan 19 19:46:23 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:255
Jan 19 19:46:28 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:250
Jan 19 19:46:33 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:245
Jan 19 19:46:38 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:240
Jan 19 19:46:43 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:235
Jan 19 19:46:48 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:230
Jan 19 19:46:53 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:225
Jan 19 19:46:58 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:220
Jan 19 19:47:03 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:215
Jan 19 19:47:08 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:210
Jan 19 19:47:13 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:205
Jan 19 19:47:18 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:200
Jan 19 19:47:23 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:195
Jan 19 19:47:28 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:190
Jan 19 19:47:33 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:185
Jan 19 19:47:39 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:180
Jan 19 19:47:44 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:175
Jan 19 19:47:49 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:170
Jan 19 19:47:54 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:165
Jan 19 19:47:59 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:160
Jan 19 19:48:04 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:155
Jan 19 19:48:09 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:150
Jan 19 19:48:14 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:145
Jan 19 19:48:19 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:140
Jan 19 19:48:24 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:135
Jan 19 19:48:29 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:130
Jan 19 19:48:34 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:125
Jan 19 19:48:39 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:120
Jan 19 19:48:44 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:115
Jan 19 19:48:49 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:110
Jan 19 19:48:54 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:105
Jan 19 19:48:59 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:100
Jan 19 19:49:04 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:95
Jan 19 19:49:09 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:90
Jan 19 19:49:14 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:85
Jan 19 19:49:19 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:80
Jan 19 19:49:24 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:75
Jan 19 19:49:29 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:70
Jan 19 19:49:34 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:65
Jan 19 19:49:39 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:60
Jan 19 19:49:44 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:55
Jan 19 19:49:49 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:50
Jan 19 19:49:54 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:45
Jan 19 19:49:59 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:40
Jan 19 19:50:04 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:35
Jan 19 19:50:09 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:30
Jan 19 19:50:14 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:25
Jan 19 19:50:19 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:20
Jan 19 19:50:24 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:15
Jan 19 19:50:29 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:10
Jan 19 19:50:34 delta kernel: megaraid mbox: Wait for 1 commands to 
complete:5
Jan 19 19:50:38 delta kernel: megaraid cmm: ioctl timed out
Jan 19 19:50:38 delta kernel: megaraid cmm: controller cannot accept 
cmds due to earlier errors
Jan 19 19:50:38 delta last message repeated 5 times
Jan 19 19:50:39 delta kernel: megaraid mbox: critical hardware error!
Jan 19 19:50:39 delta kernel: megaraid: hw error, cannot reset
Jan 19 19:50:39 delta kernel: megaraid: hw error, cannot reset
Jan 19 19:50:39 delta kernel: sd 1:2:0:0: scsi: Device offlined - not 
ready after error recovery
Jan 19 19:50:39 delta kernel: sd 1:2:0:0: SCSI error: return code = 
0x00050000
Jan 19 19:50:39 delta kernel: end_request: I/O error, dev sdb, sector 279055
Jan 19 19:50:39 delta kernel: lost page write due to I/O error on dm-0
Jan 19 19:50:39 delta kernel: sd 1:2:0:0: SCSI error: return code = 
0x00050000
Jan 19 19:50:39 delta kernel: end_request: I/O error, dev sdb, sector 279895
Jan 19 19:50:39 delta kernel: lost page write due to I/O error on dm-0
Jan 19 19:50:39 delta kernel: sd 1:2:0:0: SCSI error: return code = 
0x00050000
Jan 19 19:50:39 delta kernel: end_request: I/O error, dev sdb, sector 280575
Jan 19 19:50:39 delta kernel: lost page write due to I/O error on dm-0
Jan 19 19:50:39 delta kernel: sd 1:2:0:0: SCSI error: return code = 
0x00050000
Jan 19 19:50:39 delta kernel: end_request: I/O error, dev sdb, sector 279671
Jan 19 19:50:39 delta kernel: lost page write due to I/O error on dm-0
Jan 19 19:50:39 delta kernel: sd 1:2:0:0: SCSI error: return code = 
0x06000000
Jan 19 19:50:39 delta kernel: end_request: I/O error, dev sdb, sector 278999
Jan 19 19:50:39 delta kernel: lost page write due to I/O error on dm-0
Jan 19 19:50:39 delta kernel: sd 1:2:0:0: SCSI error: return code = 
0x00010000
Jan 19 19:50:39 delta kernel: end_request: I/O error, dev sdb, sector 41671
Jan 19 19:50:39 delta kernel: lost page write due to I/O error on dm-0
Jan 19 19:50:39 delta last message repeated 4 times
Jan 19 19:50:40 delta kernel: megaraid cmm: controller cannot accept 
cmds due to earlier errors
Jan 19 19:51:15 delta last message repeated 15 times
Jan 19 19:52:14 delta last message repeated 41 times
Jan 19 19:52:14 delta kernel: printk: 3 messages suppressed.
Jan 19 19:52:14 delta kernel: lost page write due to I/O error on dm-0
Jan 19 19:52:14 delta last message repeated 9 times
Jan 19 19:52:24 delta kernel: megaraid cmm: controller cannot accept 
cmds due to earlier errors
Jan 19 19:52:56 delta last message repeated 20 times
Jan 19 19:53:56 delta last message repeated 36 times
Jan 19 19:55:03 delta last message repeated 37 times

---8<---



More information about the Linux-PowerEdge mailing list