Time-honoured problem : RHEL3 iowait performance

Richard Ford rford at candis.com.cn
Sat Feb 17 22:05:42 CST 2007


Do a search on google for dag wieers.... I can't believe that you are  
in the dark on this one!  :-)



On 18 Feb 2007, at 8:37 AM, Brendan Heading wrote:

> Aaron wrote:
>> Have you tried disabling jumbo frames?  If so, have you ran  
>> tcpdump and
>> pulled up the capture file in ethereal to see if you are doing  
>> allot of
>> retransmissions (also possibly visible in netstat -s)?
>
> Aaron,
>
> Thanks for the reply. I hope you don't mind if I CC the list.
>
> As of right now, the server has rx'd 224507725 segments, tx'd  
> 262637305,
> and there have been 25457 retransmits. Not that high in the schemeof
> things, I reckon.
>
>> If you do local file transfers, many at the same time, do you see the
>> same problem?  i.e. /one/group/of/disks to /another/group/of/disks
>> and/or /var/tmp local transfers.
>
> I will have to check again but I believe yes, I see the problem if I
> simulate the operation locally.
>
>> I assume you have also set noatime on the ext mounts.  (Lots of
>> simultaneous reads add up to allot of atime writes and thrashing)
>
> Yes, noatime is turned off, and the ext3 commit value is set to 30 to
> try to reduce any thrash caused by commits to the journal.
>
>> Have you also tried running bonnie++ ?  It is in the dag repo and can
>> show you individual disk performance.
>
> I am not sure what you mean by "dag repo", is this a set of test tools
> somewhere, and if so where can I get it ? .. I have a version of
> bonnie++ (1.03) from a while back, running it on locally gives me the
> following results :
>
> =====================================
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xxxxx            8G 17948  42 13785  16  9561  14 22268  60 26889  17
> 122.8   3
>                      ------Sequential Create------ --------Random
> Create--------
>                      -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> /sec %CP
>                   16   613   3 21720  38  1610   5   690   4 28433  41
> 1183   3
>
> ,8G, 
> 17948,42,13785,16,9561,14,22268,60,26889,17,122.8,3,16,613,3,21720,38, 
> 1610,5,690,4,28433,41,1183,3
>
>
> =====================================
>
> It's a bit hard to test reliably, since the volume is LVM so you can't
> tell which of the two RAID arrays it's writing stuff to. LVM seems to
> try to distribute activity across the volumes in a group, rather than
> wait for one PV to fill up before going to the next one. This makes
> things "interesting" because the larger RAID array has a larger stripe
> size, 128K, than the other one which is 64K. (in both cases, read  
> policy
> is adaptive, write policy is writeback, and cache policy is direct  
> I/O).
>
>> As far as performance, there are also many things you can do in
>> /etc/sysctl.conf to tune for being a file server, but those won't  
>> help
>> until you fix the IO wait issues.
>>
>> Just a few thoughts.
>>
>> --Aarön
>>
>> P.S. - Oh.. and anything interesting in dmesg?
>
> There is nothing in dmesg that looks like an error...
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq







More information about the Linux-PowerEdge mailing list