Time-honoured problem : RHEL3 iowait performance
brendanheading at clara.co.uk
Sat Feb 17 16:27:15 CST 2007
I have a PowerEdge 2850, containing a PERC4 RAID controller with six
10000rpm SCSI disks. There are three 146GB disks and three 300GB disks;
both sets are arranged as a pair of RAID5 arrays. It has 4GB RAM and a
pair of Xeon 3.6Ghz CPUs with hyperthreading enabled. The box and the
clients are all running Red Hat Enterprise Linux 3, all patched up to
the latest update (Update 8).
The OS boot, swap, and other system partitions all sit on the smaller
RAID array. The greater proportion of that array, and the entire of the
larger array, are strapped together as an LVM volume group.
The machine's main purpose in life is to serve as a file server for a
bunch of NFS and (less so) Samba clients. All of the NFS clients (there
are three) are connected to the server via a gigabit ethernet link using
I seem to be having problems with it that a lot of other people have.
When two or three users on remote clients start doing heavy I/O access
over NFS, the server grinds almost to a complete halt with iowait shown
in the high 90%s. The /sbin/iostat reports relatively low throughput
with one of the two RAID arrays 100% utilized.
I've found that it seems to be simultaneous activity that causes it to
grind. If I go to my users and tell them to stop, then get one user at a
time to do whatever operation they need, the iowait is lower (but still
in the 50%s) but each operation completes relatively quickly compared
with several users working in parallel. That suggests to me that the
issue is something to do either with the randomness of the access
pattern, or caching.
To fix this I tried :
- messing with elvtune. This seemed to help a little but not much.
Presently it's at 2048 read, 8192 write with max_bomb_segments at 0. I
believe that's something similar to the Red Hat defaults.
- setting the systcl parameters (in /proc/sys/vm) in accordance with
previous messages posted on this list. Again, these seem to have helped
a bit, but not a lot.
I am wondering if I am asking too much of the box and OS, but I don't
think so. I've got 6 year old Sun Enterprise 220R boxes running Solaris
8 which can handle NFS workloads better than this. Does anyone have any
suggestions about what other options I can try ? The options available
to me are :
- upgrading to RHEL4.
- upgrading the RAID controller's firmware. It's still running the
- purchasing more disks, putting them in a Powervault 220S, and using
RAID10 or RAID50 instead of RAID5
- putting more memory in the box in case this is a caching matter
However, I do not want to go to the trouble and expense of doing the
above unless I can be sure that they will improve things.
Any pointers or tips would be greatly appreciated.
More information about the Linux-PowerEdge