Hard Drive Rebuild under Linux

Jean Lofts jean.lofts at eng.ox.ac.uk
Tue Jan 14 06:46:00 CST 2003


Steve

Many thanks for your response.

I received the replacement drive today, and attempted the rebuild
again. Same behaviour as previously seen. The rebuild started OK
but finished at around 2% done in state BAD.I have now discovered
the following in the controller log


[56]: parallel rebuild container 0
[57]: ID(0:08:0); Error Event [command:0x28]
[58]: ID(0:08:0); Medium Error, Block Range 1672000 : 1672063
[59]: ID(0:08:0); Unrecovered Read Error
.
.
.
[77]: Container 0 failed REBUILD task: I/O error - drive 0:8:0 fa
[78]: iled
.
.
.
Presumably, this is telling me that the rebuild of Scsi id 0 failed because
of errors on Scsi id 8 ?
Is there anything I can do to recover from this situation other than
a reinstall of the O/S and restore of data? I have never seen errors in
the system log to indicate that there was a problem with id 8.

Jean

Steve_Boley at Dell.com wrote:

> Your error was after the rebuild obviously failed you needed to do a
> controller rescan and you would then have seen the drive in failed status
> again.  You need to replace id0.  Pull it while the system is rebooting
> before the perc posts and have it come up as missing member and then replace
> as soon as possible.
> Steve
>
> -----Original Message-----
> From: Jean Lofts [mailto:jean.lofts at eng.ox.ac.uk]
> Sent: Friday, January 10, 2003 7:51 AM
> To: linux-poweredge at exchange.dell.com
> Subject: Hard Drive Rebuild under Linux
>
> Hello All
>
> I have a Dell PE 4400 with PERC 3/Di running RedHat Linux 6.2.
> It has eight drives configured as a single RAID-5 container.
>
> On reboot the system reported
>
> following containers have missing members and are degraded
> container #0 RAID 5 237.29GB critical
>
> afacli reported
>
> AFA0> container list
> Executing: container list
> Num          Total  Oth Chunk          Scsi   Partition
> Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
> ----- ------ ------ --- ------ ------- ------ -------------
>  0    RAID-5  237GB       32KB Open    0:00:0 64.0KB!33.8GB
>  /dev/sda                              0:01:0 64.0KB:33.8GB
>                                        0:02:0 64.0KB:33.8GB
>                                        0:03:0 64.0KB:33.8GB
>                                        0:04:0 64.0KB:33.8GB
>                                        0:05:0 64.0KB:33.8GB
>                                        0:08:0 64.0KB:33.8GB
>                                        0:09:0 64.0KB:33.8GB
>
> AFA0> disk list
> Executing: disk list
>
> B:ID:L  Device Type     Blocks    Bytes/Block Usage            Shared
> ------  --------------  --------- ----------- ---------------- ------
> 0:00:0   Disk            71132959  512         Initialized     NO
> 0:01:0   Disk            71132959  512         Initialized     NO
> 0:02:0   Disk            71132959  512         Initialized     NO
> 0:03:0   Disk            71132959  512         Initialized     NO
> 0:04:0   Disk            71132959  512         Initialized     NO
> 0:05:0   Disk            71132959  512         Initialized     NO
> 0:08:0   Disk            71132959  512         Initialized     NO
> 0:09:0   Disk            71132959  512         Initialized     NO
>
> AFA0> disk show space
> Executing: disk show space
>
> Scsi B:ID:L Usage      Size
> ----------- ---------- -------------
>   0:00:0     Dead      64.0KB:33.8GB
>   0:00:0     Free      33.8GB:59.0KB
>   0:01:0     Container 64.0KB:33.8GB
>   0:01:0     Free      33.8GB:59.0KB
>   0:02:0     Container 64.0KB:33.8GB
>   0:02:0     Free      33.8GB:59.0KB
>   0:03:0     Container 64.0KB:33.8GB
>   0:03:0     Free      33.8GB:59.0KB
>   0:04:0     Container 64.0KB:33.8GB
>   0:04:0     Free      33.8GB:59.0KB
>   0:05:0     Container 64.0KB:33.8GB
>   0:05:0     Free      33.8GB:59.0KB
>   0:08:0     Container 64.0KB:33.8GB
>   0:08:0     Free      33.8GB:59.0KB
>   0:09:0     Container 64.0KB:33.8GB
>   0:09:0     Free      33.8GB:59.0KB
>
> After researching the newsgroups, I attempted to rebuild the failed
> drive
> as follows
>
> disk remove dead_partitions (0,0,0)
> container set failover 0 (0,0,0)
>
> task list
>
> Controller Tasks
>
> TaskId Function Done%  Container State Specific1 Specific2
> ------ -------- ------- --------- ----- --------- ---------
>   100   Rebuild   0.1%     00     RUN   00000000  00000000
>
> So far so good. But the rebuild task only continued for about 5 min
> before task list reported that there were no tasks current.
> The previous task list that I had done only a minute earlier reported
> only about 1% done. After the rebuild finished, I could see no
> activity on the failed disk when there was clearly activity on the
> other seven drives.
>
> afacli now reports
>
> AFA0> container list
> Executing: container list
> Num          Total  Oth Chunk          Scsi   Partition
> Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
> ----- ------ ------ --- ------ ------- ------ -------------
>  0    RAID-5  237GB       32KB Open    0:00:0 64.0KB:33.8GB
>  /dev/sda                              0:01:0 64.0KB:33.8GB
>                                        0:02:0 64.0KB:33.8GB
>                                        0:03:0 64.0KB:33.8GB
>                                        0:04:0 64.0KB:33.8GB
>                                        0:05:0 64.0KB:33.8GB
>                                        0:08:0 64.0KB:33.8GB
>                                        0:09:0 64.0KB:33.8GB
>
> which looks good, but
>
> AFA0> enclosure show slot
> Executing: enclosure show slot
>
> Enclosure
> ID (B:ID:L) Slot scsiId Insert  Status
> ----------- ---- ------ -------
> ------------------------------------------
>  0  0:06:0   0   0:00:0     0   OK FAILED CRITICAL ACTIVATE
>  0  0:06:0   1   0:01:0     0   OK FAILED CRITICAL ACTIVATE
>  0  0:06:0   2   0:02:0     0   OK FAILED CRITICAL ACTIVATE
>  0  0:06:0   3   0:03:0     0   OK FAILED CRITICAL ACTIVATE
>  0  0:06:0   4   0:04:0     0   OK FAILED CRITICAL ACTIVATE
>  0  0:06:0   5   0:05:0     0   OK FAILED CRITICAL ACTIVATE
>  0  0:06:0   6   0:08:0     0   OK FAILED CRITICAL ACTIVATE
>  0  0:06:0   7   0:09:0     0   OK FAILED CRITICAL ACTIVATE
>
> would seem to indicate that there is still a problem.
>
> I have now rebooted the system and I receive the same message from
> the controller
>
> following containers have missing members and are degraded
> container #0 RAID 5 237.29GB critical
>
> On attempting to view the container information in the Configuration
> Utility, I am presented with the message
>
> configuration changes have been detected in the system. If you reject
> the change you will not be able to modify the current configuration.
> If you accept it will be updated to the current configuration.
>
> Is there any risk in choosing accept here? I currently have a working
> system and don't want to risk doing further damage.
>
> Any suggestions or advice would be most welcome. I would prefer
> to rebuild from the afacli utility if possible, but will take
> the system down and rebuild in the Configuration Utility if necessary.
>
> Thanks
>
> Jean
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the list
> archives at http://lists.us.dell.com/htdig/

--
Jean Lofts                     E-mail: jean.lofts at eng.ox.ac.uk

Computing Officer (Medical Vision Laboratory)
Dept of Engineering Science
University of Oxford
Parks Rd                            Tel: (0)1865-280921
Oxford OX1 3PJ UK                   Fax: (0)1865-280922






More information about the Linux-PowerEdge mailing list