My RAID failover test

BHohl@grotecompany.com BHohl at grotecompany.com
Sat Oct 16 10:46:00 CDT 2004


A few weeks ago I was looking for a basic write up on how to install the
PERC CLI tool and test the RAID5 failover.  I didn't find a write up so I
made the following write-up which may be useful to others.  If someone sees
something incorrect in this write up please respond.  I did note that there
was recently a thread regarding hot versus cold swaping of disks.  In the
thread one person said that Dell tech support recommended cold swap and one
person referenced a line to a Dell support doc recommending hot swap
(http://support.dell.com/support/topics/global.aspx/support/kb/en/document?DN=1070984
).  For my test I used cold swap as I considered that to be less risky to
the hardware.


C - Install RAID PERC CLI software (afacli) and test fail over.
1.Link to downloadable PERC CLI software for Linux:
http://support.dell.com/support/topics/global.aspx/support/kb/en/document?dn=1089105&c=us&l=en&s=gen&cs
=

2.Install of CLI tool:
RPMs for CLI tool and snmp are inside afa-linux-app-A01.tar.gz
Unzip with KDE Ark or CLI tar tool.
Install with KDE Yast or CLI rpm tool.

>From shell:
# tar -xzvf afa-linux-app-A01.tar.gz
# rpm -ivh afaapps-2.7-1.i386.rpm

3.Some basic lookups:
To open the PERC CLI FAST interface:
# afacli
FASTCMD>

To see the controller list (there is one controller [afa0] on this box):
FASTCMD> controller list

To open the afa0 controller on a read only basis:
FASTCMD> open /readonly=true afa0
Result is following command prompt:
AFA0>

Some basic lookups:
AFA0> container list
AFA0> disk list
AFA0> task list

AFA0> enclosure show slot
AFA0> disk show space
AFA0> disk show partition

AFA0> container show failover
AFA0> controller show automatic_failover


Disk light blinking to match physical disks to SCSI device IDs:
Use disk list to find SCSI device IDs.
5 sec blink followed by 0 sec blink.
AFA0> disk blink <SCSI device ID> 5
AFA0> disk blink <SCSI device ID> 0


4.Adding a failover disk (hot spare)
Shutdown computer.
Insert new disk.
Boot computer.

a)Use the disk list command to find the disk SCSI device ID:
filesrv1:~# afacli
FASTCMD> open afa0
AFA0> disk list
Get disk SCSI devive ID from list

b)Initialize disk and verify:
AFA0> disk initialize <SCSI device ID>
AFA0> disk list

c)Make disk a global failover disk and verify:
If SCSI device ID of disk = (0,4,0) than command is:
AFA0> container set global_failover (0,4,0)

Verify success with following lookup:
AFA0> container show failover

Remove a global failover disk
AFA0>  (0,4,0)

d)Make sure automatic failover is enabled.
AFA0> controller show automatic_failover

If not enabled, enable as follows:
AFA0> controller set automatic_failover /failover_enabled=true




5.Test disk failover
a)Shutdown computer; Remove a disk that is part of the RAID5 array;  Boot
  computer.
If a hot spare is available (as in above set up) it will automatically
  added to the array.

b)Monitoring the rebuild of the array:
Open the FAST CLI.
The rebuild status should be displayed at the bottom of the console.
The "task list" command also shows the rebuild information.
The "enclosure show slot" command shows disk status information.

c)Before, during and after "enclosure show slot" information:

BEFORE REMOVING DISK:
AFA0> enclosure show slot
Executing: enclosure show slot
Enclosure
ID (B:ID:L) Slot scsiId Insert  Status
----------- ---- ------ ------- --------------------------------------
0  0:06:0   0   0:00:0     1   OK ACTIVATE
0  0:06:0   1   0:01:0     1   OK ACTIVATE
0  0:06:0   2   0:02:0     1   OK ACTIVATE
0  0:06:0   3   0:03:0     1   OK ACTIVATE
0  0:06:0   4   0:04:0     1   OK UNCONFIG HOTSPARE ACTIVATE

DURING ARRAY REBUILD:
AFA0> enclosure show slot
Executing: enclosure show slot
Enclosure
ID (B:ID:L) Slot scsiId Insert  Status
----------- ---- ------ ------- --------------------------------------
0  0:06:0   0   0:00:0     1   OK UNCONFIG EMPTY I/R READY NOTACTIVATE
0  0:06:0   1   0:01:0     1   OK REBUILD FAILED CRITICAL ACTIVATE
0  0:06:0   2   0:02:0     1   OK REBUILD FAILED CRITICAL ACTIVATE
0  0:06:0   3   0:03:0     1   OK REBUILD FAILED CRITICAL ACTIVATE
0  0:06:0   4   0:04:0     1   OK REBUILD FAILED CRITICAL HOTSPARE ACTIVATE

AFTER ARRAY REBUILD:
AFA0> enclosure show slot
Executing: enclosure show slot
Enclosure
ID (B:ID:L) Slot scsiId Insert  Status
----------- ---- ------ ------- --------------------------------------
0  0:06:0   0   0:255:0    0   OK UNCONFIG EMPTY I/R READY NOTACTIVATE
0  0:06:0   1   0:01:0     1   OK ACTIVATE
0  0:06:0   2   0:02:0     1   OK ACTIVATE
0  0:06:0   3   0:03:0     1   OK ACTIVATE
0  0:06:0   4   0:04:0     1   OK HOTSPARE ACTIVATEID



6.Replacing a failed disk after automatic failover has rebuilt the array
with a hot spare:

d)Shutdown computer; Replace the failed disk; Boot computer.
Use the "enclosure show slot" command to determine the SCSI device ID for
  the disk.
Use the "disk blink" command to determine the physical location of the
  disk.

e)Check if replacement disk is initialized
AFA0> disk list

If not, initialize new disk as follows:
AFA0> disk initialize <SCSI device ID>

f)Remove the global failover designation for the original failover disk:
AFA0>  (0,4,0)

Verify success with following lookup:
AFA0> container show failover

AFA0> enclosure show slot
Note: disk (0,4,0) continued to show as a HOTSPARE from the "enclosure show
  slot" command   after the " (0,4,0)".  A "controller rescan" did not
  correct this problem but it was corrected after a reboot.

g)Make replacement disk a global failover disk and verify:
If SCSI device ID of disk = (0,0,0) than command is:
AFA0> container set global_failover (0,0,0)

Verify success with following lookup:
AFA0> container show failover
AFA0> enclosure show slot

h)Make sure automatic failover is enabled.
AFA0> controller show automatic_failover

If not enabled, enable as follows:
AFA0> controller set automatic_failover /failover_enabled=TRUE



7.Adding RAID event notification:
Follow instructions included in raid.cron.script obtained from
http://linux.dell.com/files/aacraid/aacraid_monitoring_script.txt.
Be sure to convert txt file to unix format (dos2unix <filename>).





More information about the Linux-PowerEdge mailing list