Perc 4e/Di, RAID member with Failure Predicted = Yes

Kuba Ober kuba at mareimbrium.org
Wed Mar 28 14:09:48 CST 2007


On Wednesday 28 March 2007, Gilles Hamel wrote:
> Tom Brown wrote:
> >> I have 4 disks in a RAID5. One disk have the status Failure Predicted
> >> = Yes.
> >> I monitor this disk for few days, and media errors don't increase.
> >> Is there any mean to reset the status and the media error counter ?
> >> I have checked with dellmgr or OMSA, but they don't allow that ...
> >
> > replace the disk??
>
> Ok, but since the disk is fully functional for days (perhaps weeks), I
> ask me if it is really faulty.

You know, a failure is *predicted*. That means that things are bad enough that 
the drive's SMART subsystem thinks the drive won't last long. Of course it 
may last for a year. But it may fail tomorrow. With a new drive, it's 
severals orders of magnitude less likely to fail tomorrow.

I think you miss the point of what a *prediction* means. It'd be useless to 
have a failure prediction that happens so close to actual failure as to give 
you no marigin for error. Besides, the hard drive can include only so much 
hardware & software to diagnose itself. You could have a drive that can much 
more accurately predict failure, but you'd pay through the nose for it. So, 
there you have it: like all engineering, the drive makes had to weigh their 
options and make compromises.

Failure prediction is based on several things, some of them may be defect 
growth rate, and the amount of spare sectors/cylinders used up. A drive may 
fail because the defects grow at an exponential rate (e.g. a contaminant is 
abrading something and creating more contaminants) so eventually it will 
*suddenly* run out of spares and most of the data will be wrecked. 
Alternatively, things may go bad slowly but the spare sector pool is almost 
used up, so that pretty soon, during a raid-driven patrol read, the drive 
will rescue the last flaky sector possible to the spare, and then it won't be 
able to safeguard your data.

If I were you, I'd just replace the drive. 

Cheers, Kuba



More information about the Linux-PowerEdge mailing list