SAS 6/iR write performance depends on kernel version, blade server m600

Peter Grandi pg_dlxpe at dlxpe.for.sabi.co.UK
Thu Jun 5 02:33:03 CDT 2008


[ ... ]

>> Well, it appears that in fact, there is a change in 2.6.22
>> about MM and there is less memory for write caching. So if
>> you stay under 80Mo, it works well.

In my experience a small write cache (worth 1-2s of disk
transfer rate) gives overall rather better performance,
particularly in the sense of lower latency.

A large write cache means that a lot of ''dirty'' pages will
accumulate in memory, and when the page cache gets ''full'', the
IO subsystem ends up writing a very large amount of data in one
go (I have seen 1-2GB of sudden writes happen), and those large
writes, depending on workload and elevator paramters, tend to
hog the disk subsystem and cause delays in reads etc. It can
also have severe consequences on NFS usage:

  http://WWW.sabi.co.UK/blog/0707jul.html#070701b

Unfortunately the default kernel IO parameters for this are not
only too large, they are fundamentally misdesigned:

  http://WWW.sabi.co.UK/blog/0707jul.html#070701

That misdesign is bad enough that RedHat have added to the RHEL4
series kernels the 'vm/max_queue_depth' (which has an equivalent
effect to the patch in the link above).

Largish delayed writes can be however sometimes of advantage,
for example with XFS, if one tweaks parameters and the choice of
elevator to mitigate the negative impact.

>> After you get disk performance.

If you don't want that for whatever reason, consider 'tmpfs' or
just weaking the kernel flusher parameters to allow for more
delayed writes.

> BTW: You should always use sizes a lot larger than your main
> memory to get past caching issues. Or better yet, use bonnie++
> or similar performance measurement tools to get real-world
> values.

Bonnie++ is widely and mistakenly used. It lacks 'O_DIRECT'
which means that meaningful IO subsystem tests can only be done
by using the '-M' option, but then very few people seem to
realize that either.

My preference is for using Bonnie 1.4, with the '-u -y
-o_direct' options, and/or some suitably large files (with
'-o_direct' they can be just 1-2x100MB-200MB depending for
many smaller machines). Now Bonnie 1.4 lacks something like the
'-M' option which can be useful as a halfway between having
'-o_direct' and not having it, but that's less important than
support for 'O_DIRECT'.



More information about the Linux-PowerEdge mailing list