RAID-5 and database servers

Jefferson Ogata poweredge at
Fri Mar 12 01:06:53 CST 2010

On 2010-03-12 04:26, Craig White wrote:
> On Fri, 2010-03-12 at 02:23 +0000, Jefferson Ogata wrote:
>> On 2010-03-11 22:23, Matthew Geier wrote:
>>> I've had a disk fail in such a way on a SCSI array that all disks on
>>> that SCSI bus became unavailable simultaneously. When half the disks
>>> dropped of the array at the same time, it gave up and corrupted the RAID
>>> 5 meta data so that even after removing the offending drive, the array
>>> didn't recover.
>> I also should point out (in case it isn't obvious), that that sort of
>> failure would take out the typical RAID 10 as well.
> ----
> ignoring that a 2nd failed disk on RAID 5 is always fatal and only 50%
> fatal on RAID 10, I suppose that would be true.

The poster wrote that all of the disks on a bus failed, not just a
second one. Depending on the RAID structure, this could take out a RAID
10 100% of the time.

In your "second disk" scenario, comparing RAID 5 with RAID 10 in terms
of failure likelihood isn't fair; you need to compare RAID 50 with RAID
10. And the odd depend on the number of disks and the RAID structure.

Suppose you have 12 disks arranged as a 6x2 RAID 10, and the same number
of disks as a 2x6 RAID 50. When the second disk fails the odds of loss are:

- RAID 50: 5/11.
- RAID 10: 1/11.

If instead we have the 12 disks as a 3x4 RAID 50, then the odds of loss
when the second disk fails are:

- RAID 50: 3/11.
- RAID 10: 1/11.

We can now tolerate a third disk failure with our RAID 50 with the odds
of loss:

- RAID 50: 6/10.
- RAID 10: 2/10.

How often does this happen? It hasn't happened to me, and it hasn't
happened to anyone I know.

In the alternative fair comparison, RAID 5 vs. RAID 1, the second
failure kills both RAIDs 100% of the time.

And there's always RAID 6.

> So if Dell is selling a high quality hard drive with more than average
> durability and the anticipation that it is going to last longer under
> 24/7 usage, its entirely reasonable to have to pay more than the
> cheapest dirt SATA drive you can find online. Of course you will have to
> live with the consequences if you go with the dirt cheap drive.
> Personally, I put a lot of value on my time and my customers data.

I have hundreds of Dell disks online. They fail regularly. Often they
fail during system burn-in. For the kind of markup Dell is charging on
these drives I don't think I should be finding dead ones after only 24
hours of operation. And a one-year warranty is just ridiculous.

> I read this article last year...
> and I had already forsaken RAID 5 but it pretty much confirmed what my
> experiences had been... that when I considered the life cycle of the
> installation, the time lost in waiting for file transfer, etc. on RAID
> 5, etc. that it was foolish for me to recommend RAID 5 to anyone. 

It's pretty clear you don't speak from any recent experience as far as
RAID 5 performance goes, and you yourself say as much when you say you
"had already forsaken RAID 5". Like Oracle, you're living in the past.
You should do some of your own benchmarks.

In any case, the argument in that article applies to RAID 10 as well; it
gives you better probabilities but eventually it will take too long to
rebuild mirrors and failure will be just as inevitable as with RAID 5.
Error rates will have to drop to prevent this, and no doubt they will,
sufficiently that the article's argument is moot. Eventually they will
drop to the point where we will be using RAID 0.

>  On top of that,
> it seems to me that RAID 10 smokes RAID 5 on every performance
> characteristic my clients are likely to use (and yes, that means
> databases). RAID 5 primarily satisfies the needs for maximum storage for
> the least amount of money and that was rarely what I need in a storage
> system for a server.

For a lot of access patterns, RAID 5 yields much better write bandwidth
than RAID 10. I don't know why you think RAID 10 "smokes" RAID 5. You
should grab a PERC 6 and a couple of MD1000s and try some different
configurations. I don't think you'll see any smoke in the margins, even
over the oddly limited gamut of access patterns your clients use.

More information about the Linux-PowerEdge mailing list