high latency for tailored random I/O scenario with PERC6/i - not seen with older RAID controllers
guy.choo.keren at gmail.com
Sun May 16 16:25:58 CDT 2010
(sorry if this is a double-post - previously sent from the wrong address)
we have some issues with random I/O running on recent nehalem-based dell
power-edge blades (i think these are M610 or M710 - but i'm not sure if
it's relevant), that we do not have on slightly older (>5 month) hardware.
Here are the details:
The setup includes 2 hard disks (approx. 150GB in size) set in a
RAID-0 configuration under a Dell PERC6/i RAID controller.
Operating system: CentOS 5.4.
The scenario: we run random I/O to one of the several linux logical
volumes (LVM) created on top of the RAID-0 Volume (the partition's size
is about 240GB), using I/O meter. We have 12 parallel 512 bytes write
requests, plus 5 or 6 parallel 512KB read requests - all of them are
random inside a specified range of space on that partition. The I/O is
generated using direct I/O, so we know Linux's buffer cache is not involved.
When the range we access is small (a few GB) - the latency for the
I/O operations looks ok (worst case is a fraction of a second). When we
let the test generate the same random I/O over a much larger part of the
disk (a few 10s of GB) - we start seeing I/O requests with a latency of
up to 2.6 seconds. When we tried the same test on top of a file system
(ext3) still with direct I/O (ODIRECT) - the worst-case latency
increased by a factor of 5 or more.
Note: this problem is seen only on blades purchased in the last 5
month. we have some slightly older blades (they are supposed to be of
similar models) who do not exhibit this problem (i.e. with the same
test, the latency on them doesn't go above about half a second). Those
older blades seem to have a somewhat older version of the PERC6 RAID
controller on them. Both the old and new blades run exactly the same
operating system software (they are all installed via kickstart and the
operating system is further configured using the same software - so
their operating system configuration should be identical).
Note 2: we tried disabling the write cache via the BIOS (changed it
to write-through), and we verified we do not have read-ahead enabled.
This did not change the test results.
did anyone see a similar behavior? any ideas what we should check in our
config that could cause this, or whether this is a PERC6/i bug?
More information about the Linux-PowerEdge