RAID 5 Errors, Disk Problems : What to do?

Rory Campbell-Lange rory at
Tue Aug 26 10:54:00 CDT 2003

I am running a Dell PowerEdge 2650 Xeon 2.0GHz/512k x2 with 4 disks. It
is running Linux 2.4.19-ac4 on Debian. Three are 146GB ULTRA 3 disks,
SOFT-RAIDed at RAID 5 and use EXT3 (the fourth disk is a system disk).

Twice in the last two weeks md0 has kicked out sdd. The RAID system
coped fine. 

    SCSI disk error : host 0 channel 0 id 3 lun 0 return code = 8000002
    Info fld=0xc8e0040, Current sd08:30: sense key Hardware Error
    Additional sense indicates Mechanical positioning error
     I/O error: dev 08:30, sector 210632768
    raid5: Disk failure on sdd, disabling device. Operation continuing on 2 devices

I was unable to hot-add the disk back in either time...

I rebooted the server and did a low level verification of the disk using
the DELL SCSI utility on startup. No errors were found. After reboot I
was able to re-add the failed disk.

    md: trying to hot-add sdd to md0 ... 
    md: bind<sdd,3>
    RAID5 conf printout:
     --- rd:3 wd:2 fd:1
     disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdb
     disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
     disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00]
      --- rd:3 wd:3 fd:0
      disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdb
      disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
      disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd
     md: updating md0 RAID superblock on device
     md: sdd [events: 00000037]<6>(write) sdd's sb offset: 143374656
     md: sdc [events: 00000037]<6>(write) sdc's sb offset: 143374656
     md: sdb [events: 00000037]<6>(write) sdb's sb offset: 143374656
     md: recovery thread finished ...

What should I do? DELL technical support won't replace the disk unless
it fails the SCSI verification. Is this possibly a SOFT-RAID problem?
Should I add another disk to the array and keep the present sdd as an
emergency disk?

Thoughts and advice much appreciated.

Rory Campbell-Lange 
<rory at>

More information about the Linux-PowerEdge mailing list