Dell 2550 Perc 3/di hardware failure (?)
Brant Faircloth
faircloth at gmail.com
Wed Apr 1 12:51:55 CDT 2009
Hi everyone,
I have an issue with one of our 2550's and a container that keeps dying.
Long story short is that problems arose ~2 weeks ago when the machine
kicked over into read-only mode following a container failure. Given
that the drives in the machine were very old, we purchased new drives,
set them back up in RAID 1 and I updated to Debian 5.0. After about 5
days of uptime, the container died yet again. Following some
searching around on the web, I came across info. indicating that more
recent kernels may have problems with the Perc 3/di and aacraid (i,e.
issues with aacraid.dacmode: http://bugzilla.kernel.org/show_bug.cgi?id=9133)
. Hoping that this was the issue, I low-level formatted the drives,
and reinstalled Debian 4.0 (reverting to an older kernel @ 2.6.18 - we
have an identical machine with RAID 5 that is running well under
Debian 4). The install(s) have always gone without a hitch, but after
~15 hours of uptime, the problems have returned.
The machine is currently up, but in read-only mode. I am pondering my
options, which include: replacing the Perc 3/di with a different
controller or switching over to SCSI-only (i,e. no raid). I'm not
entirely sure that the latter option is going to help, depending on
the source of the problems.
Any suggestions are appreciated and thanks in advance.
cheers,
-brant
The issue kicks-off in syslog with:
Apr 1 03:14:03 charybdis kernel: AAC:ID(0:00:0) Timeout detected on
cmd[0x2a]
Apr 1 03:14:03 charybdis kernel: AAC:ID(0:01:0) Timeout detected on
cmd[0x2a]
Apr 1 03:14:03 charybdis kernel: AAC:SCSI Channel[0]: Timeout
Detected On 10 Command(s)
Apr 1 03:14:03 charybdis kernel: AAC:HIM_EVENT_HA_FAILED:SCSI bus
reset issued on channel 0
Apr 1 03:14:13 charybdis kernel: AAC: <...repeats 1 more times>
Apr 1 03:14:13 charybdis kernel: AAC:SCSI Channel[0]: Timeout
Detected On 12 Command(s)
Apr 1 03:14:13 charybdis kernel: AAC:HIM_EVENT_HA_FAILED:SCSI bus
reset issued on channel 0
Apr 1 03:14:23 charybdis kernel: AAC: <...repeats 1 more times>
Apr 1 03:14:13 charybdis kernel: AAC:HIM_EVENT_HA_FAILED:SCSI bus
reset issued on channel 0
Apr 1 03:14:23 charybdis kernel: AAC: <...repeats 1 more times>
Various additional info:
charybdis:/home/bcf# uname -a
Linux charybdis 2.6.18-6-686 #1 SMP Sat Dec 27 09:31:05 UTC 2008 i686
GNU/Linux
charybdis:/home/bcf# modinfo aacraid
filename: /lib/modules/2.6.18-6-686/kernel/drivers/scsi/aacraid/
aacraid.ko
author: Red Hat Inc and Adaptec
description: Dell PERC2, 2/Si, 3/Si, 3/Di, Adaptec Advanced Raid
Products, HP NetRAID-4M, IBM ServeRAID & ICP SCSI driver
license: GPL
version: 1.1-5[2409]-mh2
vermagic: 2.6.18-6-686 SMP mod_unload 686 REGPARM gcc-4.1
AFA0> controller details
Executing: controller details
Controller Information
----------------------
Device Name: AFA0
Controller Type: PERC 3/Di
Access Mode: READ-WRITE
Controller Serial Number: Last Six Digits = 9C21D2
Number of Buses: 2
Devices per Bus: 15
Controller CPU: i960 R series
Controller CPU Speed: 100 Mhz
Controller Memory: 128 Mbytes
Battery State: Ok
Component Revisions
-------------------
CLI: 2.8-0 (Build #6076)
API: 2.8-0 (Build #6076)
Miniport Driver: 1.1-5 (Build #2409)
Controller Software: 2.8-1 (Build #6098)
Controller BIOS: 2.8-1 (Build #6098)
Controller Firmware: (Build #6098)
AFA0> container list
Executing: container list
Num Total Oth Chunk Scsi Partition
Label Type Size Ctr Size Usage B:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
0 Mirror 68.4GB Valid 0:00:0 64.0KB!68.4GB
/dev/sda R1Mirror0 0:01:0 64.0KB!68.4GB
AFA0> disk list
Executing: disk list
B:ID:L Device Type Blocks Bytes/Block Usage Shared
------ -------------- --------- ----------- ---------------- ------
0:01:0 Disk 0 0 Offline NO
AFA0> disk show space
Executing: disk show space
Scsi B:ID:L Usage Size
----------- ---------- -------------
0:00:0 Dead 64.0KB:68.4GB
0:01:0 Dead 64.0KB:68.4GB
AFA0> enclosure show status
Executing: enclosure show status
Command Error: <The command or requested operation to the disk
enclosure failed.>
AFA0> disk show smart
Executing: disk show smart
Smart Method of Enable
Capable Informational Exception Performance Error
B:ID:L Device Exceptions(MRIE) Control Enabled Count
------ ------- ---------------- --------- ----------- ------
0:01:0 N
AFA0> diagnostic show history
Executing: diagnostic show history
No switches specified, defaulting to "/current".
*** HISTORY BUFFER FROM CURRENT CONTROLLER RUN ***
[00]: MpdEvent event bus 0 = 1 (HIM_EVENT_IO_CHANNEL_RESET).
[01]: SCSI bus reset detected on channel 0
[02]: neither side of mirror exists, can't write data
[03]: <...repeats 50 more times>
[04]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[05]:
[06]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[07]: annel 0
[08]: neither side of mirror exists, can't write data
[09]: <...repeats 1031 more times>
[10]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[11]:
[12]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[13]: annel 0
[14]: neither side of mirror exists, can't write data
[15]: <...repeats 235999 more times>
[16]: MpdEvent event bus 0 = 1 (HIM_EVENT_IO_CHANNEL_RESET).
[17]: SCSI bus reset detected on channel 0
[18]: neither side of mirror exists, can't write data
[19]: <...repeats 125 more times>
[20]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[21]:
[22]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[23]: annel 0
[24]: neither side of mirror exists, can't write data
[25]: <...repeats 571453 more times>
[26]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[27]:
[28]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[29]: annel 0
[30]: neither side of mirror exists, can't write data
[31]: <...repeats 1303 more times>
[32]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[33]:
[34]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[35]: annel 0
[36]: neither side of mirror exists, can't write data
[37]: <...repeats 31686 more times>
[38]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[39]:
[40]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[41]: annel 0
[42]: neither side of mirror exists, can't write data
[43]: <...repeats 128 more times>
[44]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[45]:
[46]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[47]: annel 0
[48]: neither side of mirror exists, can't write data
[49]: <...repeats 5099 more times>
[50]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[51]:
[52]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[53]: annel 0
[54]: MpdEvent event bus 0 = 1 (HIM_EVENT_IO_CHANNEL_RESET).
[55]: SCSI bus reset detected on channel 0
[56]: neither side of mirror exists, can't write data
[57]: <...repeats 142 more times>
[58]: MpdEvent event bus 0 = 1 (HIM_EVENT_IO_CHANNEL_RESET).
[59]: SCSI bus reset detected on channel 0
[60]: neither side of mirror exists, can't write data
[61]: <...repeats 56 more times>
[62]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[63]:
[64]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[65]: annel 0
[66]: neither side of mirror exists, can't write data
[67]: <...repeats 3299 more times>
[68]: MpdEvent event bus 0 = 8 (HIM_EVENT_TRANSPORT_MODE_CHANGE).
[69]:
[70]: HIM_EVENT_TRANSPORT_MODE_CHANGE-SCSI bus reset issued on ch
[71]: annel 0
[72]: neither side of mirror exists, can't write data
[73]: <...repeats 4512299 more times>
[74]: ID(0:01:0); Simulating selection timeout due to NEXUS_ERROR
[75]: [command:0x28]
[76]: ID(0:01:0) Cmd[0x28] Fail: Block Range 594332 : 594333 at 4
[77]: 8104 sec
[78]: failing read io
[79]: neither side of mirror exists, can't write data
[80]: <...repeats 1586099 more times>
[81]: ID(0:01:0); Simulating selection timeout due to NEXUS_ERROR
[82]: [command:0x28]
[83]: ID(0:01:0) Cmd[0x28] Fail: Block Range 601574 : 601575 at 4
[84]: 9720 sec
[85]: failing read io
[86]: neither side of mirror exists, can't write data
[87]: <...repeats 9357402 more times>
[88]: 2 can't read mbr dev_t:0
[89]: 2 can't read mbr dev_t:1
[90]: 2 can't read mbr dev_t:0
[91]: 2 can't read mbr dev_t:1
[92]: neither side of mirror exists, can't write data
[93]: <...repeats 24399 more times>
[94]: 2 can't read mbr dev_t:1
[95]: neither side of mirror exists, can't write data
[96]: <...repeats 596299 more times>
[97]: 2 can't read mbr dev_t:1
[98]: neither side of mirror exists, can't write data
[99]:
========================
History Output Complete.
-------------------
Brant Faircloth
< * )
(_ \\
_ ||
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part
Url : http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20090401/14c6f29d/attachment.sig
More information about the Linux-PowerEdge
mailing list