NFS problems with 2650 and 2.4.18-26.7

Steven Kehlet steven.kehlet at conexant.com
Mon Mar 24 18:43:01 CST 2003


Sorry man, I mixed this thread up with another one I've been posting
to.  Linux's tcp nfs server support is pretty new at this point so I
wouldn't expect it to perform very well under heavy load.

Steve




On Mon, 2003-03-24 at 11:36, Steven Kehlet wrote:
> Sorry, wrong url:
> 
> http://now.netapp.com/AskNOW/highlight_html.jsp?url=http%3A%2F%2Fnow.netapp.com%2FNOW%2Fcgi-bin%2Fbol%3FType%3DDetail%26amp%3BDisplay%3D72769&sentenceId=57402008&titleIds=57401998
> 
> 
> 
> On Mon, 2003-03-24 at 11:00, Steven Kehlet wrote:
> > Have you tried tcp?
> > 
> > fileserver:/home      /home  nfs
> > tcp,nosuid,rsize=8192,wsize=8192,timeo=600,hard,intr 0 0
> > 
> > I'm still trying various kernel revs/mount options to get rid of my NFS
> > woes...  tcp looks promising, would probably help your problem of
> > fragmentation, see the following NetApp bug:
> > 
> > http://now.netapp.com/Knowledgebase/solutionarea.asp?id=4.0.852643.2515326&resource=
> > 
> > 
> > I'm currently trying 2.4.18-27 with the following mount options:
> > 
> > tcp,rsize=32768,wsize=32768,hard,intr,timeo=600
> > 
> > so far no errors on a very lightly loaded system.  I'll try my Oracle
> > server later this week...
> > 
> > Steve
> > 
> > 
> > 
> > 
> > 
> > On Thu, 2003-03-20 at 19:59, David C. Kovar wrote:
> > > Good evening,
> > > 
> > > I'm rather stumped by an NFS error we're seeing and I am not sure where
> > > to investigate next.
> > > 
> > > Earlier in the week I posted a question about some NFS errors we were
> > > seeing. One particular file, when written to an NFS partition, would
> > > cause the NFS mount to hang. I've narrowed the problem down
> > > considerably.
> > > 
> > > Both machines are 2650's running 2.4.18-26.7, connected to a fairly
> > > lightly loaded Dell 1GB switch in a very generic configuration.
> > > 
> > > In my initial tcpdump traces and netstat -r output we were seeing a lot
> > > of ip fragmentation. Once the file server replied with an "icmp
> > > reassembly" message, it would not respond to any new packets from the
> > > client.
> > > 
> > > We turned the block sizes down to 1K on the client to prevent
> > > fragmentation and ran the test again. 
> > > 
> > > The write will succeed most of the time, but it will take anywhere from
> > > 9 seconds to 1 minute. When it fails, the file server stops replying to
> > > packets, the client ARPs for it, the client tries again, and the cycle
> > > repeats.
> > > 
> > > In both cases - fragmentation and no fragmentation - once the file
> > > server stops replying, the exchange fails.
> > > 
> > > It's only this one 196M file. Larger files and smaller files work fine,
> > > and an identically sized file does not have any problems.
> > > 
> > > What bug am I exercising and what is it about this file that is
> > > exercising the bug?
> > > 
> > > Mount options:
> > > 
> > > fileserver:/home      /home  nfs    
> > > rw,nosuid,soft,rsize=1024,wsize=1024,timeo=14,intr 0 0
> > > 
> > > NFS mount, read write, no root write, soft mount, read and write block
> > > size at 1024, and it'll time out.
> > > 
> > > Sometimes it works:
> > > 
> > > 57 >time cp cmc-backup.mpp foo1.mpp
> > > 
> > > real    0m30.834s
> > > user    0m0.002s
> > > sys     0m0.000s
> > > Thu Mar 20 18:54:10 kovar at bee02.kealia.com:~/test
> > > 
> > > (The time will vary between 9 seconds and a minute for the cp command to
> > > complete.)
> > > 
> > > Sometimes it fails:
> > > 
> > > 58 >^1^2
> > > time cp cmc-backup.mpp foo2.mpp
> > > cp: writing `foo2.mpp': Input/output error
> > > 
> > > real    1m2.932s
> > > user    0m0.002s
> > > sys     0m0.004s
> > > Thu Mar 20 18:55:36 kovar at bee02.kealia.com:~/test
> > > 
> > > 
> > > Network traffic at failure:
> > > 
> > > [A normal write/ACK exchange.]
> > > 18:54:33.216095 bee02.kealia.com.2952007036 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006400
> > > 1024 bytes @ 0x000006400 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:54:33.216309 fileserver.nfs > bee02.kealia.com.2952007036: reply ok
> > > 136 write PRE: POST: REG 100644 ids 2031/2031 sz 0x000006800 nlink 1
> > > rdev 0/0 fsid 0x000000000 nodeid 0x000000000 a/m/ctime 1048215273.000000
> > > 1048215273.000000 1048215273.000000 1024 bytes <filesync> (DF) (ttl 64,
> > > id 0, len 164)
> > > [Four unACK'd writes.]
> > > 18:54:33.216334 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:54:34.613551 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:54:37.410087 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:54:43.003177 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > [Bee02 checks to fileserver's address with ARP, gets a reply.]
> > > 18:54:48.002570 arp who-has fileserver tell bee02.kealia.com
> > > 18:54:48.002642 arp reply fileserver is-at 0:6:5b:f2:c0:5f
> > > [Seven unACK'd writes.]
> > > 18:54:54.189479 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:54:55.587612 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:54:58.384155 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:55:03.977238 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:55:15.163486 bee02.kealia.com.3002338684 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:55:16.561675 bee02.kealia.com.3002338684 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > 18:55:19.358218 bee02.kealia.com.3002338684 > fileserver.nfs: 1156 write
> > > fh
> > > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > > [Cycle repeats until timeout.]
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the list archives at http://lists.us.dell.com/htdig/




More information about the Linux-PowerEdge mailing list