Hard Drive Rebuild under Linux

Steve_Boley@Dell.com Steve_Boley at Dell.com
Fri Jan 10 09:15:01 CST 2003


Your error was after the rebuild obviously failed you needed to do a
controller rescan and you would then have seen the drive in failed status
again.  You need to replace id0.  Pull it while the system is rebooting
before the perc posts and have it come up as missing member and then replace
as soon as possible.
Steve

-----Original Message-----
From: Jean Lofts [mailto:jean.lofts at eng.ox.ac.uk]
Sent: Friday, January 10, 2003 7:51 AM
To: linux-poweredge at exchange.dell.com
Subject: Hard Drive Rebuild under Linux


Hello All

I have a Dell PE 4400 with PERC 3/Di running RedHat Linux 6.2.
It has eight drives configured as a single RAID-5 container.

On reboot the system reported

following containers have missing members and are degraded
container #0 RAID 5 237.29GB critical

afacli reported

AFA0> container list
Executing: container list
Num          Total  Oth Chunk          Scsi   Partition
Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
 0    RAID-5  237GB       32KB Open    0:00:0 64.0KB!33.8GB
 /dev/sda                              0:01:0 64.0KB:33.8GB
                                       0:02:0 64.0KB:33.8GB
                                       0:03:0 64.0KB:33.8GB
                                       0:04:0 64.0KB:33.8GB
                                       0:05:0 64.0KB:33.8GB
                                       0:08:0 64.0KB:33.8GB
                                       0:09:0 64.0KB:33.8GB

AFA0> disk list
Executing: disk list

B:ID:L  Device Type     Blocks    Bytes/Block Usage            Shared
------  --------------  --------- ----------- ---------------- ------
0:00:0   Disk            71132959  512         Initialized     NO
0:01:0   Disk            71132959  512         Initialized     NO
0:02:0   Disk            71132959  512         Initialized     NO
0:03:0   Disk            71132959  512         Initialized     NO
0:04:0   Disk            71132959  512         Initialized     NO
0:05:0   Disk            71132959  512         Initialized     NO
0:08:0   Disk            71132959  512         Initialized     NO
0:09:0   Disk            71132959  512         Initialized     NO


AFA0> disk show space
Executing: disk show space

Scsi B:ID:L Usage      Size
----------- ---------- -------------
  0:00:0     Dead      64.0KB:33.8GB
  0:00:0     Free      33.8GB:59.0KB
  0:01:0     Container 64.0KB:33.8GB
  0:01:0     Free      33.8GB:59.0KB
  0:02:0     Container 64.0KB:33.8GB
  0:02:0     Free      33.8GB:59.0KB
  0:03:0     Container 64.0KB:33.8GB
  0:03:0     Free      33.8GB:59.0KB
  0:04:0     Container 64.0KB:33.8GB
  0:04:0     Free      33.8GB:59.0KB
  0:05:0     Container 64.0KB:33.8GB
  0:05:0     Free      33.8GB:59.0KB
  0:08:0     Container 64.0KB:33.8GB
  0:08:0     Free      33.8GB:59.0KB
  0:09:0     Container 64.0KB:33.8GB
  0:09:0     Free      33.8GB:59.0KB


After researching the newsgroups, I attempted to rebuild the failed
drive
as follows

disk remove dead_partitions (0,0,0)
container set failover 0 (0,0,0)

task list

Controller Tasks

TaskId Function Done%  Container State Specific1 Specific2
------ -------- ------- --------- ----- --------- ---------
  100   Rebuild   0.1%     00     RUN   00000000  00000000


So far so good. But the rebuild task only continued for about 5 min
before task list reported that there were no tasks current.
The previous task list that I had done only a minute earlier reported
only about 1% done. After the rebuild finished, I could see no
activity on the failed disk when there was clearly activity on the
other seven drives.

afacli now reports

AFA0> container list
Executing: container list
Num          Total  Oth Chunk          Scsi   Partition
Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
 0    RAID-5  237GB       32KB Open    0:00:0 64.0KB:33.8GB
 /dev/sda                              0:01:0 64.0KB:33.8GB
                                       0:02:0 64.0KB:33.8GB
                                       0:03:0 64.0KB:33.8GB
                                       0:04:0 64.0KB:33.8GB
                                       0:05:0 64.0KB:33.8GB
                                       0:08:0 64.0KB:33.8GB
                                       0:09:0 64.0KB:33.8GB

which looks good, but

AFA0> enclosure show slot
Executing: enclosure show slot

Enclosure
ID (B:ID:L) Slot scsiId Insert  Status
----------- ---- ------ -------
------------------------------------------
 0  0:06:0   0   0:00:0     0   OK FAILED CRITICAL ACTIVATE
 0  0:06:0   1   0:01:0     0   OK FAILED CRITICAL ACTIVATE
 0  0:06:0   2   0:02:0     0   OK FAILED CRITICAL ACTIVATE
 0  0:06:0   3   0:03:0     0   OK FAILED CRITICAL ACTIVATE
 0  0:06:0   4   0:04:0     0   OK FAILED CRITICAL ACTIVATE
 0  0:06:0   5   0:05:0     0   OK FAILED CRITICAL ACTIVATE
 0  0:06:0   6   0:08:0     0   OK FAILED CRITICAL ACTIVATE
 0  0:06:0   7   0:09:0     0   OK FAILED CRITICAL ACTIVATE


would seem to indicate that there is still a problem.


I have now rebooted the system and I receive the same message from
the controller

following containers have missing members and are degraded
container #0 RAID 5 237.29GB critical

On attempting to view the container information in the Configuration
Utility, I am presented with the message

configuration changes have been detected in the system. If you reject
the change you will not be able to modify the current configuration.
If you accept it will be updated to the current configuration.

Is there any risk in choosing accept here? I currently have a working
system and don't want to risk doing further damage.

Any suggestions or advice would be most welcome. I would prefer
to rebuild from the afacli utility if possible, but will take
the system down and rebuild in the Configuration Utility if necessary.

Thanks

Jean


_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list
archives at http://lists.us.dell.com/htdig/




More information about the Linux-PowerEdge mailing list