Linux NFS and NetApp filers

Steven Kehlet steven.kehlet at conexant.com
Mon Mar 24 18:22:00 CST 2003


Thanks a ton, that document is a goldmine of information.  I hope I can
make its suggestions pay off.  I'll report back here when I have some
results.

Steve



On Thu, 2003-03-20 at 10:26, Ed Martin wrote:
> Hi Steve,
> 
> A quick reply for the moment.  We use what sounds like a similar environment to you.  We have a number of Dell servers (PE1550,1650,2450,2550, and 2650's) running Linux.  And we also own a NetApp F820 filer, and are planning the purchase of an F825 to keep it company.  We mount a lot of stuff over NFS from the Linux server to the filer.  This includes home grown code, using things like mod_perl.  And it also includes some pretty heavy database type stuff - some PostgreSQL databases and a big Notes Domino db, and Verity K2 collections.  Plus we plan to put mount some Oracle databases this way within a month or so.
> 
> Generally this works very well for us.  We have a mix of kernels talking to the filer, and no problems like you are describing.  We only run stock RH kernels (including AS kernels).
> 
> My biggest tip is to check out the following, if you haven't already.  It's pretty thorough and has a lot of good info in it.  Written by a NetApp engineer: Using the Linux® NFS Client with Network Appliance Filers
> http://www.netapp.com/tech_library/3183.html
> 
> Some of what we've found:
> - Our default standard mount options are:  proto=udp,hard,bg,intr,wsize=8192,rsize=8192
> - For Oracle we'll try the recommended tcp protocol, and probably test 16k windows
> - Early kernels seem to work best with network stack socket buffer tuning in /etc/sysctl.conf
> - We've tried to ensure heavy traffic servers run newer kernels - 2.4.18-17 or later
> - We've got the filer (twice) and major servers on Gb ethernet, which improved things a lot over 100Mb
> 
> I think that the IP fragmentation issue can be partly resolved via newer kernels and/or bigger socket buffers via sysctl.conf.  At one point, as well as the NetApp suggestions I played with:
>   net.ipv4.tcp_rmem = 8192 262143 8388608
>   net.ipv4.tcp_wmem = 4096 262143 8388608
> Though later testing didn't seem to require this.  Use at your own risk.  May be worth checking nfsstat -r to see if you are getting a lot of rpc retransmits.
> 
> Hope this helps!
> 
> Yours,
> 
> Ed
> 
> 
> >I've actually been dealing with a couple different NFS-related issues,
> >on desktops and servers, on RedHat 7.2, 7.3, and 8.0.  We have a fleet
> >(~ 22) of PE2650s in the back room (for batch job processing), and
> >several Precision 530ns for desktops.  I would imagine other people may
> >not see the same severity of problems I'm seeing because our environment
> >is heavily NFS-driven, i.e.  I run Oracle on Linux over NFS, /usr/local,
> >our project directories, and just about everything is NFS mounted.  We
> >use numerous Network Appliance NFS filers to serve data.
> >
> >On heavily loaded (NFS traffic) RH 7.3 systems (2.4.18-4), the NFS
> >performance is spotty and erratic.  I see tons of "kernel: nfs: server
> >xxx not responding, still trying" errors in /var/log/messages, followed
> >by variable amounts of time (usu. 1-60secs), then "server xxx OK".  At
> >its worst my Oracle server stalled for 1.5 hours in the middle of a
> >long-running report.  While this is slowing things down, at least
> >nothing is dying because of it :-), as NFS does pick up eventually and
> >things continue.
> >
> >NetApp has a bug on file similar to this issue, claiming the bug is
> >actually in the Linux IP fragmentation code, and that switching to tcp
> >mounts will help.  And so at first using tcp mounts seemed to help,
> >because the barrage of "server not reponding" messages went away, but
> >sadly they were replaced by random hangings, accompanied by "kernel:
> >lockd: server xx.xx.xx.xx not responding, still trying" messages. 
> >Great--now it's lockd.  Argh :-).  
> >
> >I've wanted to try later kernels, but given the widespread reports of
> >problems with the tg3 driver, I felt I'd be trading one set of problems
> >for another :-).  Also, I'm speculating that since RedHat is just
> >patching the same old 2.4.18 kernel, there probably aren't really any
> >bug fixes for the NFS code between, say 2.4.18-4 and 2.4.18-26 (maybe
> >I'm mistaken here, please let me know if so).  What I need is fixes to
> >the NFS client-side code which I'm thinking will only come with an
> >upgrade to a later kernel version (e.g. 2.4.2x).
> >
> >On the desktop side, our Precisions came with RedHat 8.0 (2.4.18-14),
> >and we would experience random hangings periodically throughout the day
> >while accessing files over NFS.  I tried upgrading to a stock 2.4.20
> >kernel, but then reading files was preceded by a 1-2 second pause (I've
> >seen other reports of this with 2.4.20).  I then installed RH's
> >2.4.18-24 and my pauses went away, but other users are still
> >complaining.  I tried converting those users to tcp, but then the NFS
> >performance dropped through the floor, so I had to switch them back to
> >udp :-).
> >
> >I've also tried a bunch of other things too (e.g. changing NFS block
> >sizes), but it's hard to remember everything.  If it weren't my job, at
> >this point I'd just say "oh well" and wait 6-12 months for Linux's NFS
> >to get better :-).  But I've gotten enough positive responses to my
> >query about tg3 in 2.4.18-26, so I'll try upgrading one of my 2650s to
> >it and report back here.  Thanks again everyone.
> 
> 
> 
> ---
> Ed Martin
> Head of Systems and Network Performance
> IOP Publishing Ltd
> Dirac House, Temple Back
> Bristol  BS1 6BE
> ddi: +44 (0)117 930 1102
> www:  http://www.iop.org
> 
> 
> **********************************************************************
> Institute of Physics
> Registered charity No. 293851
> 76 Portland Place, London, W1B 1NT, England
> 
> IOP Publishing Limited
> Registered in England under Registration No 467514.
> Registered Office: Dirac House, Temple Back, Bristol BS1 6BE England
> 
> This e-mail message has been checked by MIMEsweeper using
> F-Secure Anti-Virus for the presence of computer viruses.
> **********************************************************************
> 




More information about the Linux-PowerEdge mailing list