NFS problems with 2650 and 2.4.18-26.7

Steven Kehlet steven.kehlet at conexant.com
Mon Mar 24 13:38:01 CST 2003


Sorry, wrong url:

http://now.netapp.com/AskNOW/highlight_html.jsp?url=http%3A%2F%2Fnow.netapp.com%2FNOW%2Fcgi-bin%2Fbol%3FType%3DDetail%26amp%3BDisplay%3D72769&sentenceId=57402008&titleIds=57401998



On Mon, 2003-03-24 at 11:00, Steven Kehlet wrote:
> Have you tried tcp?
> 
> fileserver:/home      /home  nfs
> tcp,nosuid,rsize=8192,wsize=8192,timeo=600,hard,intr 0 0
> 
> I'm still trying various kernel revs/mount options to get rid of my NFS
> woes...  tcp looks promising, would probably help your problem of
> fragmentation, see the following NetApp bug:
> 
> http://now.netapp.com/Knowledgebase/solutionarea.asp?id=4.0.852643.2515326&resource=
> 
> 
> I'm currently trying 2.4.18-27 with the following mount options:
> 
> tcp,rsize=32768,wsize=32768,hard,intr,timeo=600
> 
> so far no errors on a very lightly loaded system.  I'll try my Oracle
> server later this week...
> 
> Steve
> 
> 
> 
> 
> 
> On Thu, 2003-03-20 at 19:59, David C. Kovar wrote:
> > Good evening,
> > 
> > I'm rather stumped by an NFS error we're seeing and I am not sure where
> > to investigate next.
> > 
> > Earlier in the week I posted a question about some NFS errors we were
> > seeing. One particular file, when written to an NFS partition, would
> > cause the NFS mount to hang. I've narrowed the problem down
> > considerably.
> > 
> > Both machines are 2650's running 2.4.18-26.7, connected to a fairly
> > lightly loaded Dell 1GB switch in a very generic configuration.
> > 
> > In my initial tcpdump traces and netstat -r output we were seeing a lot
> > of ip fragmentation. Once the file server replied with an "icmp
> > reassembly" message, it would not respond to any new packets from the
> > client.
> > 
> > We turned the block sizes down to 1K on the client to prevent
> > fragmentation and ran the test again. 
> > 
> > The write will succeed most of the time, but it will take anywhere from
> > 9 seconds to 1 minute. When it fails, the file server stops replying to
> > packets, the client ARPs for it, the client tries again, and the cycle
> > repeats.
> > 
> > In both cases - fragmentation and no fragmentation - once the file
> > server stops replying, the exchange fails.
> > 
> > It's only this one 196M file. Larger files and smaller files work fine,
> > and an identically sized file does not have any problems.
> > 
> > What bug am I exercising and what is it about this file that is
> > exercising the bug?
> > 
> > Mount options:
> > 
> > fileserver:/home      /home  nfs    
> > rw,nosuid,soft,rsize=1024,wsize=1024,timeo=14,intr 0 0
> > 
> > NFS mount, read write, no root write, soft mount, read and write block
> > size at 1024, and it'll time out.
> > 
> > Sometimes it works:
> > 
> > 57 >time cp cmc-backup.mpp foo1.mpp
> > 
> > real    0m30.834s
> > user    0m0.002s
> > sys     0m0.000s
> > Thu Mar 20 18:54:10 kovar at bee02.kealia.com:~/test
> > 
> > (The time will vary between 9 seconds and a minute for the cp command to
> > complete.)
> > 
> > Sometimes it fails:
> > 
> > 58 >^1^2
> > time cp cmc-backup.mpp foo2.mpp
> > cp: writing `foo2.mpp': Input/output error
> > 
> > real    1m2.932s
> > user    0m0.002s
> > sys     0m0.004s
> > Thu Mar 20 18:55:36 kovar at bee02.kealia.com:~/test
> > 
> > 
> > Network traffic at failure:
> > 
> > [A normal write/ACK exchange.]
> > 18:54:33.216095 bee02.kealia.com.2952007036 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006400
> > 1024 bytes @ 0x000006400 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:54:33.216309 fileserver.nfs > bee02.kealia.com.2952007036: reply ok
> > 136 write PRE: POST: REG 100644 ids 2031/2031 sz 0x000006800 nlink 1
> > rdev 0/0 fsid 0x000000000 nodeid 0x000000000 a/m/ctime 1048215273.000000
> > 1048215273.000000 1048215273.000000 1024 bytes <filesync> (DF) (ttl 64,
> > id 0, len 164)
> > [Four unACK'd writes.]
> > 18:54:33.216334 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:54:34.613551 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:54:37.410087 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:54:43.003177 bee02.kealia.com.2968784252 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > [Bee02 checks to fileserver's address with ARP, gets a reply.]
> > 18:54:48.002570 arp who-has fileserver tell bee02.kealia.com
> > 18:54:48.002642 arp reply fileserver is-at 0:6:5b:f2:c0:5f
> > [Seven unACK'd writes.]
> > 18:54:54.189479 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:54:55.587612 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:54:58.384155 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:55:03.977238 bee02.kealia.com.2985561468 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:55:15.163486 bee02.kealia.com.3002338684 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:55:16.561675 bee02.kealia.com.3002338684 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > 18:55:19.358218 bee02.kealia.com.3002338684 > fileserver.nfs: 1156 write
> > fh
> > Unknown/010000020008000501800000A3027900A47D59ABE80279000000000000006800
> > 1024 bytes @ 0x000006800 <filesync> (DF) (ttl 64, id 0, len 1184)
> > [Cycle repeats until timeout.]
> > 
> > 
> > 
> > 
> > 
> > 




More information about the Linux-PowerEdge mailing list