how to get rid of bad blocks in a file on PERC 5/I?

Patrick_Fischer at Dell.com Patrick_Fischer at Dell.com
Tue May 4 01:26:26 CDT 2010


This was all times in my mind as I read your error, that it sound like a
punctured stripe. And now it is confirmed :-)
So it is no HW error, only a stripe which is damaged but earliest the
PErc6i have an feature to repair it.
For earlier controllers it is much difficult:

- the punctured stripe can occur in a written stripe or empty stripe
- to repair it on a written stripe you need a tool to locate the file
and overwrite it with a known good (sometimes the backup SW would tell
you which file/s is/are damaged)
- in a empty stripe you need to write on these block (in the past you
can download the MHDD utility, but it was long time not ago as I used it
last time)

The consistency check can't fix it.

Other way is, backup the data, delete the array, recreate it with
initialize, restore. (that is was the support say, as all other options
are to difficult)

Observations how it CAN BE occur: (only my experience and under rar
circumstances....) (but max. in one of 1000 HDD Issues)
- if you try to rebuild a disk with a media error
- if a predictive failure disk was removed without setting it
offline....


Some Lines out of a Dell Document to address these error on earlier
Perc's:

If media errors resides in user space (allocated space)
The first step to fix a punctured stripe is to do a full backup of the
logical disk. This will show if there are any media errors in user
space, i.e. one or several files will be reported as corrupt. Any file
reported corrupt must be overwritten with a known good copy. DO NOT
DELETE THE FILE since this will basically mean that the media errors are
"moved" to free space. Clearing media errors in free space is possible
but will require some downtime.
 
If a copy of the file doesn't exist that data will be lost. To still be
able to clear/overwrite the media errors you will have to create a dummy
file with the same name, same size and use it to overwrite the corrupt
file. 

Next step is to wait until Patrol Read have done at least one
cycle/iteration, then check the Windows event log/PERC controller log.
The punctured stripe have been fixed if sense key
3 11 00 doesn't show up anymore.
 
It's not unusual that media errors are still being reported after
replacing corrupt files but the number of affected LBA's should have at
least been reduced. This tell us that any remaining media errors resides
in free space.

If media errors resides in free space (unallocated space)
Media errors in free space can be cleared by using the MHDD program.
It's a freeware DOS program that can be used to write to a specific
LBA/specific disk on the PERC controller. It will require some downtime
since the system will need to be booted on a DOS diskette.


-----Original Message-----
From: Bond Masuda [mailto:bond.masuda at jlbond.com] 
Sent: Monday, May 03, 2010 9:05 PM
To: Fischer, Patrick
Cc: linux-poweredge-Lists
Subject: RE: how to get rid of bad blocks in a file on PERC 5/I?

Thanks Patrick for your reply.

I know my original message was long, so perhaps it was missed, but I did
run
a consistency check, at least twice. However, after each CC run, we
tested a
dd_rescue attempt on the file in question and still had unreadable
blocks. I
was expecting one of two things: 1) the consistency check reporting back
all
sorts of problems, or 2) the unreadable blocks would go away. Neither
was
the case and hence I decided to reach out.

I had forgotten about the "action=exportlog", thanks for reminding me
about
that. This is what i found:

04/29/10 14:45:10: EVT#17279-04/29/10 14:45:10:  97=Puncturing bad block
on
PD 03(e0/s3) at 33334430
04/29/10 14:45:10: EVT#17280-04/29/10 14:45:10:  97=Puncturing bad block
on
PD 06(e0/s6) at 33334430
04/29/10 14:45:11: EVT#17282-04/29/10 14:45:11:  97=Puncturing bad block
on
PD 04(e0/s4) at 33334430
04/29/10 14:45:11: EVT#17283-04/29/10 14:45:11:  97=Puncturing bad block
on
PD 06(e0/s6) at 33334430
04/29/10 14:45:11: EVT#17284-04/29/10 14:45:11:  97=Puncturing bad block
on
PD 03(e0/s3) at 33334430

-Bond

> -----Original Message-----
> From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-
> bounces at dell.com] On Behalf Of Patrick_Fischer at dell.com
> Sent: Monday, May 03, 2010 2:46 AM
> To: tim at seoss.co.uk; adam.nielsen at uq.edu.au
> Cc: linux-poweredge at lists.us.dell.com
> Subject: RE: how to get rid of bad blocks in a file on PERC 5/I?
> 
> Consistency Check:
> Check consistency. A check consistency determines the integrity of a
> virtual disk's redundant data. When necessary, this feature rebuilds
> the redundant information.
> Source:
>
http://support.dell.com/support/edocs/software/svradmin/6.2/en/OMSS/cnt
> rls.htm#wp681476
> 
> the remapping of bad sectors should be run automatically if the sector
> can't be written and the controller try to write or read from it.
> 
> Please check all times the controller log if you got filesystem erros
> like you described.
> Check the log for Bad LBA's on the disk like searching the log file
> with "bad"
> Check the Count of the LBA's and check if it occurs on multiple disks
> like a punctured stripe....
> 
> The log you can get per megacli or open manage:
> 
> Server Administrator cli:
> omconfig storage controller action=exportlog controller=0
> where controller 0 = id of the involved controller




More information about the Linux-PowerEdge mailing list