Preventing I/O starvation on MD1000s triggered by a failed disk.
hescominsoon at emmanuelcomputerconsulting.com
Tue Aug 24 06:54:40 CDT 2010
On 8/23/2010 10:23 PM, Jeff Ewing wrote:
> I had a 1TB SATA disk fail on an NFS server running RHEL5.2. A rebuild onto a global hot spare was triggered. One hour later, when the rebuild was 42% complete, serviced NFS requests dropped from 20 per second to zero. CPU2 went to 100% utilization, in an I/O wait state. Soon after, the internal drives became read only and the server needed to be power reset through the DRAC (server was not configured to take crash dumps).
> This hardware configuration had been in production and stable for many months.
> How could this be prevented in future?
> Server Configuration
> Dell PowerEdge 2950
> Two Quad core E5440 CPUs
> 16 GB RAM
> Red Hat Enterprise Linux Version 5.2
> Kernel 2.6.18-92.1.6.el5 (x86_64)
> PERC Driver : 00.00.03.21
> PERC Firmware : 6.2.0-0013
> Dell Support (Server/MD1000) Pro Support for IT
> Storage configuration:
> 2 * PERC6E with two MD1000s attached to each
> Controller 1:
> MD1000 with SAS 400GB 10K RPM
> MD1000 with SATA 1 TB 7.2K RPM
> Controller 2
> MD1000 with SATA 750GB 7.2K RPM
> MD1000 with SATA 2 TB 7.2K RPM
> PERC6E Controller Configurations:
> Controller Rebuild Rate : 30%
> Three RAID 5 Virtual Disks on each MD1000
> (5 disks / 5 disks /4 disks + 1 Hot Spare)
> Read Policy : No Read Ahead
> Write Policy : Write Back
> Jeff Ewing
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> Please read the FAQ at http://lists.us.dell.com/faq
Looks like a classic URE to me. Modern SATA drives have a read failure
rate of about 1 per every 12 TB.
More and more likely with these huge capacity Disks RAID 5 is going to
hit a URE upon rebuilding which means it fails and the whole array
dies. RAID 6 right now is the bet option to get around it. Consistency
checks aren't going to get around this as you have to read back every
disk in the array to rebuild.
More information about the Linux-PowerEdge