Time-honoured problem : RHEL3 iowait performance

Aaron dell at microchp.org
Sun Feb 18 00:29:27 CST 2007


No worries.  I initially kept it off the list as I often bump into folk 
that have religious beliefs in particular configurations. :-)

I was not aware of your LVM configuration.  That brings to mind a few 
different issues that have come across the list; though as my memory has 
been sucking as of late, I am having issues recalling the specific 
details.  Certainly a good starting point would be to run through the 
list archive searching for LVM and multipathing/multipath as search 
terms.  I recall a few instances in where people were having problems 
with both IO wait and overall performance because of their 
LVM/Multipathing configuration.  There have been a few bugs highlighted 
in that, though I have no idea as to the current status.  Most of my 
servers are using NetApp for storage and only using the local disks for 
the OS.   My prior employer used HP SureStore which also handled the 
stripping, so I have been spoiled in this area and thus this has 
admittedly become my weaker area.

This is probably not a logistical option, but you might try creating a 
non-lvm volume and test the performance on it using the same controller 
and same model of disks and raid configuration.  I am just throwing that 
out there in the event that you have enough channels/drive bays to do 
this.  Should it work out, you could always rsync the data over to the 
new volume.

The dag repo is an rpm package repository that is becoming very popular 
due to the large number of packages being incorporated and the excellent 
standards being followed in the packaging.  There are a few repo's 
growing in popularity; though dag seems to have most of the packages I 
am looking for.  What I especially like about it is that I can easily 
contact the maintainer in IRC if i run into a conflict, etc... (#centos 
on freenode)

To sum up, I won't be much help with this particular IO problem.  I hope 
you find what is causing the wait time. 


Brendan Heading wrote:
> Aaron wrote:
>> Have you tried disabling jumbo frames?  If so, have you ran tcpdump 
>> and pulled up the capture file in ethereal to see if you are doing 
>> allot of retransmissions (also possibly visible in netstat -s)?
> Aaron,
> Thanks for the reply. I hope you don't mind if I CC the list.
> As of right now, the server has rx'd 224507725 segments, tx'd 
> 262637305, and there have been 25457 retransmits. Not that high in the 
> schemeof things, I reckon.
>> If you do local file transfers, many at the same time, do you see the 
>> same problem?  i.e. /one/group/of/disks to /another/group/of/disks 
>> and/or /var/tmp local transfers.
> I will have to check again but I believe yes, I see the problem if I 
> simulate the operation locally.
>> I assume you have also set noatime on the ext mounts.  (Lots of 
>> simultaneous reads add up to allot of atime writes and thrashing)
> Yes, noatime is turned off, and the ext3 commit value is set to 30 to 
> try to reduce any thrash caused by commits to the journal.
>> Have you also tried running bonnie++ ?  It is in the dag repo and can 
>> show you individual disk performance.
> I am not sure what you mean by "dag repo", is this a set of test tools 
> somewhere, and if so where can I get it ? .. I have a version of 
> bonnie++ (1.03) from a while back, running it on locally gives me the 
> following results :
> =====================================
> Version  1.03       ------Sequential Output------ --Sequential Input- 
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
> /sec %CP
> xxxxx            8G 17948  42 13785  16  9561  14 22268  60 26889  17 
> 122.8   3
>                     ------Sequential Create------ --------Random 
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
> /sec %CP
>                  16   613   3 21720  38  1610   5   690   4 28433  41 
> 1183   3
> ,8G,17948,42,13785,16,9561,14,22268,60,26889,17,122.8,3,16,613,3,21720,38,1610,5,690,4,28433,41,1183,3 
> =====================================
> It's a bit hard to test reliably, since the volume is LVM so you can't 
> tell which of the two RAID arrays it's writing stuff to. LVM seems to 
> try to distribute activity across the volumes in a group, rather than 
> wait for one PV to fill up before going to the next one. This makes 
> things "interesting" because the larger RAID array has a larger stripe 
> size, 128K, than the other one which is 64K. (in both cases, read 
> policy is adaptive, write policy is writeback, and cache policy is 
> direct I/O).
>> As far as performance, there are also many things you can do in 
>> /etc/sysctl.conf to tune for being a file server, but those won't 
>> help until you fix the IO wait issues.
>> Just a few thoughts.
>> --Aarön
>> P.S. - Oh.. and anything interesting in dmesg?
> There is nothing in dmesg that looks like an error...

More information about the Linux-PowerEdge mailing list