Disk failures when rebuilding RAID1 - drive or controller?

Adam Nielsen adam.nielsen at uq.edu.au
Fri Jul 23 02:25:55 CDT 2010


Hi all,

As there is still no ETA on when the firmware fix for Seagate's 1TB SATA 
disks will make it into the Dell branded drives, Dell have offered to 
replace my Seagate disks with Western Digital ones to fix the speed issue.

So I have two disks in a software RAID1 configuration, and I've swapped 
out one of the disks and started rebuilding.  Unfortunately it broke 
half way through citing media errors on the *new* disk.  So I restarted 
the rebuild and it stopped yet again in a different spot.

I thought at this point I had a bad disk, but swapping it again with the 
second new disk yields the same result.  What are the chances of getting 
two bad disks?  The controller is also spouting some weird messages, 
which makes me wonder whether it's having issues that are causing the 
media errors.

So before I ask Dell for another two WD drives, can anyone shed some 
light on what might be happening?

dmesg reports hundreds of these, followed by the error:

[17821879.999442] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999474] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999516] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999565] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999603] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999654] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999689] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999735] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999772] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999806] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999851] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821879.999883] mptbase: ioc0: LogInfo(0x31110900): Originator={PL}, 
Code={Reset}, SubCode(0x0900)
[17821922.000823] mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, 
Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000)
[17821922.000943] mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, 
Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000)
[17821922.001035] mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, 
Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000)
[17821922.001130] mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, 
Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000)
[17821922.001234] mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, 
Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000)
[17821922.001334] mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, 
Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000)
[17821922.001440] mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, 
Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000)
[17821922.001591] sd 0:0:4:0: [sdb] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE,SUGGEST_OK
[17821922.001596] sd 0:0:4:0: [sdb] Sense Key : Medium Error [current]
[17821922.001604] sd 0:0:4:0: [sdb] Add. Sense: Record not found
[17821922.001611] end_request: I/O error, dev sdb, sector 331686963
[17821922.001616] raid1: Disk failure on sdb4, disabling device.
[17821922.001617] raid1: Operation continuing on 1 devices.

Many thanks,
Adam.



More information about the Linux-PowerEdge mailing list