[Linux-PowerEdge] 2 predicted failure disks and RAID5

Stephen Dowdy sdowdy at ucar.edu
Tue Nov 14 13:17:09 CST 2017


On 11/14/2017 11:52 AM, Grzegorz Bakalarski wrote:
> Thanks for valuable input.
> Regarding punctured block:  from fwtermlog I got several (not much) lines of type:
> 
> 11/13/17  3:24:45: EVT#08603-11/13/17  3:24:45:  97=Puncturing bad block on PD 02(e0x20/s2) at 9ecd
that's bad.  You have a punctured stripe.

> T35:     maintainPdFailHistory=0 disablePuncturing=0 zeroBasedEnclEnumeration=1 disableBootCLI=1
This is and informational line indicating that the controller doesn't have the disablePuncturing config option set.

> All the same PD, the same bad block (different time)
> 
> Is my raid useless?

No, it's good enough to recover what data you can before you rebuild it.  However, you can't trust the data that uses the bad block.   You'll get a read error from any object that maps to it.

Here's a good doc Dell put out:

https://www.dell.com/support/article/us/en/4/438291#2
   "...If the data within a punctured stripe is accessed errors will continue to be reported against the affected badLBAs with no possible correction available. Eventually (this could be minutes, days, weeks, months, etc.), the Bad Block Management (BBM) Table will fill up causing one or more drives to become flagged as predictive failure.,,,:

> BTW: why do think raid level migration to raid-6 with 2 additional disk would be better than with one disk. I would keep VD size the same.

I'm not talking about a migration, i'm talking a complete WIPE of what you have, and a recreation from scratch.  At this point, you can recover what you can to a staging location, rebuild, then restore.
Keep track of data with I/O errors, because it's going to have a corrupted block at the punctured block address.  this could (if you're lucky), be in unallocated space.  could also be in filesystem structures and lead to widescale corruption of the filesystem.

I would mount it all READONLY and do a file-level dump (not a 'dd' or anything like that, which would migrate corrupted filesystem structures).  (i typically 'rsync' data to another machine.).  You don't want any backup tool that does infinite retries, as it'll likely result in another disk failure. (from the above)  

> Anyway will migration too raid-6 fail with this  "awful Puncturing)???

RAID-6 is going to lessen the likelihood of a puncture, with 2 parity drives.  While you're rebuilding a RAID5, any unrecoverable bad block event on any of the "good" drives during the rebuild will result in a puncture, with RAID6, you still have parity to cope with an uncorrectable error.

The above is especially true of some of the less reliable seagate drives from past years.  You can't count on them not throwing UCEs during a rebuild (or before you get the replacement drive installed), thereby puncturing the RAID.  :-(

--stephen
-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdowdy at ucar.edu        -  http://www.ral.ucar.edu/~sdowdy/



More information about the Linux-PowerEdge mailing list