Linux NFS and NetApp filers
steven.kehlet at conexant.com
Mon Mar 24 18:22:00 CST 2003
Thanks a ton, that document is a goldmine of information. I hope I can
make its suggestions pay off. I'll report back here when I have some
On Thu, 2003-03-20 at 10:26, Ed Martin wrote:
> Hi Steve,
> A quick reply for the moment. We use what sounds like a similar environment to you. We have a number of Dell servers (PE1550,1650,2450,2550, and 2650's) running Linux. And we also own a NetApp F820 filer, and are planning the purchase of an F825 to keep it company. We mount a lot of stuff over NFS from the Linux server to the filer. This includes home grown code, using things like mod_perl. And it also includes some pretty heavy database type stuff - some PostgreSQL databases and a big Notes Domino db, and Verity K2 collections. Plus we plan to put mount some Oracle databases this way within a month or so.
> Generally this works very well for us. We have a mix of kernels talking to the filer, and no problems like you are describing. We only run stock RH kernels (including AS kernels).
> My biggest tip is to check out the following, if you haven't already. It's pretty thorough and has a lot of good info in it. Written by a NetApp engineer: Using the Linux® NFS Client with Network Appliance Filers
> Some of what we've found:
> - Our default standard mount options are: proto=udp,hard,bg,intr,wsize=8192,rsize=8192
> - For Oracle we'll try the recommended tcp protocol, and probably test 16k windows
> - Early kernels seem to work best with network stack socket buffer tuning in /etc/sysctl.conf
> - We've tried to ensure heavy traffic servers run newer kernels - 2.4.18-17 or later
> - We've got the filer (twice) and major servers on Gb ethernet, which improved things a lot over 100Mb
> I think that the IP fragmentation issue can be partly resolved via newer kernels and/or bigger socket buffers via sysctl.conf. At one point, as well as the NetApp suggestions I played with:
> net.ipv4.tcp_rmem = 8192 262143 8388608
> net.ipv4.tcp_wmem = 4096 262143 8388608
> Though later testing didn't seem to require this. Use at your own risk. May be worth checking nfsstat -r to see if you are getting a lot of rpc retransmits.
> Hope this helps!
> >I've actually been dealing with a couple different NFS-related issues,
> >on desktops and servers, on RedHat 7.2, 7.3, and 8.0. We have a fleet
> >(~ 22) of PE2650s in the back room (for batch job processing), and
> >several Precision 530ns for desktops. I would imagine other people may
> >not see the same severity of problems I'm seeing because our environment
> >is heavily NFS-driven, i.e. I run Oracle on Linux over NFS, /usr/local,
> >our project directories, and just about everything is NFS mounted. We
> >use numerous Network Appliance NFS filers to serve data.
> >On heavily loaded (NFS traffic) RH 7.3 systems (2.4.18-4), the NFS
> >performance is spotty and erratic. I see tons of "kernel: nfs: server
> >xxx not responding, still trying" errors in /var/log/messages, followed
> >by variable amounts of time (usu. 1-60secs), then "server xxx OK". At
> >its worst my Oracle server stalled for 1.5 hours in the middle of a
> >long-running report. While this is slowing things down, at least
> >nothing is dying because of it :-), as NFS does pick up eventually and
> >things continue.
> >NetApp has a bug on file similar to this issue, claiming the bug is
> >actually in the Linux IP fragmentation code, and that switching to tcp
> >mounts will help. And so at first using tcp mounts seemed to help,
> >because the barrage of "server not reponding" messages went away, but
> >sadly they were replaced by random hangings, accompanied by "kernel:
> >lockd: server xx.xx.xx.xx not responding, still trying" messages.
> >Great--now it's lockd. Argh :-).
> >I've wanted to try later kernels, but given the widespread reports of
> >problems with the tg3 driver, I felt I'd be trading one set of problems
> >for another :-). Also, I'm speculating that since RedHat is just
> >patching the same old 2.4.18 kernel, there probably aren't really any
> >bug fixes for the NFS code between, say 2.4.18-4 and 2.4.18-26 (maybe
> >I'm mistaken here, please let me know if so). What I need is fixes to
> >the NFS client-side code which I'm thinking will only come with an
> >upgrade to a later kernel version (e.g. 2.4.2x).
> >On the desktop side, our Precisions came with RedHat 8.0 (2.4.18-14),
> >and we would experience random hangings periodically throughout the day
> >while accessing files over NFS. I tried upgrading to a stock 2.4.20
> >kernel, but then reading files was preceded by a 1-2 second pause (I've
> >seen other reports of this with 2.4.20). I then installed RH's
> >2.4.18-24 and my pauses went away, but other users are still
> >complaining. I tried converting those users to tcp, but then the NFS
> >performance dropped through the floor, so I had to switch them back to
> >udp :-).
> >I've also tried a bunch of other things too (e.g. changing NFS block
> >sizes), but it's hard to remember everything. If it weren't my job, at
> >this point I'd just say "oh well" and wait 6-12 months for Linux's NFS
> >to get better :-). But I've gotten enough positive responses to my
> >query about tg3 in 2.4.18-26, so I'll try upgrading one of my 2650s to
> >it and report back here. Thanks again everyone.
> Ed Martin
> Head of Systems and Network Performance
> IOP Publishing Ltd
> Dirac House, Temple Back
> Bristol BS1 6BE
> ddi: +44 (0)117 930 1102
> www: http://www.iop.org
> Institute of Physics
> Registered charity No. 293851
> 76 Portland Place, London, W1B 1NT, England
> IOP Publishing Limited
> Registered in England under Registration No 467514.
> Registered Office: Dirac House, Temple Back, Bristol BS1 6BE England
> This e-mail message has been checked by MIMEsweeper using
> F-Secure Anti-Virus for the presence of computer viruses.
More information about the Linux-PowerEdge