high latency for tailored random I/O scenario with PERC6/i - not seen with older RAID controllers (guy)
jal at mdacorporation.com
Mon May 17 13:19:52 CDT 2010
> we have some issues with random I/O running on recent nehalem-based
> power-edge blades (i think these are M610 or M710 - but i'm not sure if
> it's relevant), that we do not have on slightly older (>5 month)
> Here are the details:
> The setup includes 2 hard disks (approx. 150GB in size) set in a
> RAID-0 configuration under a Dell PERC6/i RAID controller.
> Operating system: CentOS 5.4.
> The scenario: we run random I/O to one of the several linux logical
> volumes (LVM) created on top of the RAID-0 Volume (the partition's size
> is about 240GB), using I/O meter. We have 12 parallel 512 bytes write
> requests, plus 5 or 6 parallel 512KB read requests - all of them are
> random inside a specified range of space on that partition. The I/O is
> generated using direct I/O, so we know Linux's buffer cache is not
> When the range we access is small (a few GB) - the latency for the
> I/O operations looks ok (worst case is a fraction of a second). When we
> let the test generate the same random I/O over a much larger part of
> disk (a few 10s of GB) - we start seeing I/O requests with a latency of
> up to 2.6 seconds. When we tried the same test on top of a file system
> (ext3) still with direct I/O (ODIRECT) - the worst-case latency
> increased by a factor of 5 or more.
> Note: this problem is seen only on blades purchased in the last 5
> month. we have some slightly older blades (they are supposed to be of
> similar models) who do not exhibit this problem (i.e. with the same
> test, the latency on them doesn't go above about half a second). Those
> older blades seem to have a somewhat older version of the PERC6 RAID
> controller on them. Both the old and new blades run exactly the same
> operating system software (they are all installed via kickstart and the
> operating system is further configured using the same software - so
> their operating system configuration should be identical).
It is very possibly a firmware problem in your PERC. If you can verify that the firmware revisions are different, you can then open a call with Dell and get them to look into it.
The fix would be to re-load the firmware on the problematic PERC to the "working" version. Note that Dell may ask you to perform some tests with either version so you may want to keep it on the bad version for a while.
> Note 2: we tried disabling the write cache via the BIOS (changed it
> to write-through), and we verified we do not have read-ahead enabled.
> This did not change the test results.
> did anyone see a similar behavior? any ideas what we should check in
> config that could cause this, or whether this is a PERC6/i bug?
Does this test (lots of small IOs plus lots of large IO transfers) represent a benchmark test or is it really similar to your actual application workload?
Just because you can make it look bad doesn't mean it is...
Also, another fix might be doing the RAID-0 yourself (use LVM or MD). The throughput I've seen with MD using large stripe sizes is much higher than anything a PERC 6 can do. But that is for larger stream IOs, not a mix of big and small, and also using XFS not ext(2|3|4).
More information about the Linux-PowerEdge