PERC 4e/DC in 2850 - lost 1 disk, RAID5 array failed
Fran Fabrizio
fran at cis.uab.edu
Thu Jul 6 22:06:08 CDT 2006
I had a disturbing experience with a RAID5 array on a PERC 4e/DC in a
2850 today. I was sitting in my office when I heard the PERC's alarm go
off, so I went into the server room to discover that one of the 5 drives
in the array was blinking amber. This server is a VMware ESX server and
OS, and I have an identical one as well, so I calmly went about shutting
down the virtual machines one by one, and copying their disk filess over
to the other host, figuring the degraded array would keep serving data
at least, in the meantime.
The first couple of virtual machine disk files went fine, but when I got
to the third and fourth ones, it would not let me copy them, reporting
Device or Resource Busy. The virtual machines corresponding to those
disk files were completely dead - could not access any services on them,
could not log in, and pulling up the console showed a blank screen.
Then the filesystem on the VMware host itself started acting up, hanging
midway through commands, etc....
In short, the RAID5 did not work as advertised. My understanding is
that it should survive one disk failing and continue to serve data from
this degraded state, in fact, this is one of the major reasons I chose
RAID5. Am I misunderstanding something here, or did my PERC 4e/DC
completely fail to do its job?
I eventually had to hard reboot the server, and upon reboot, the PERC
complained that one disk had failed and that the array was in a degraded
state. Since it did not want to serve up the data, I'm now trying to
rebuild that disk from the BIOS, but I thought all of this could be done
online, while still serving data, and not having 12 hours of downtime
while the disk rebuilds!
Am I not understanding this, or did this PERC completely fail?
Thanks,
Fran
More information about the Linux-PowerEdge
mailing list