Subject : 2.6.32 nfs regression? Submitter : Nikola Ciprich <extmaillist@linuxbox.cz> Date : 2009-12-28 12:10 References : http://marc.info/?l=linux-kernel&m=126200276223524&w=4 This entry is being used for tracking a regression from 2.6.31. Please don't close it until the problem is fixed in the mainline.
interesting case, I just worry about following messages: [60144.870436] nfsd: peername failed (err 107)! So, socket error -107 means Transport endpoint is not connected. This warning message was outputed by svc_tcp_accept() [net/sunrpc/svcsock.c], when kernel_getpeername returns -107. This means socket is CLOSED. And svc_tcp_accept was called by svc_recv() [net/sunrpc/svc_xprt.c] if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) { <snip> newxpt = xprt->xpt_ops->xpo_accept(xprt); <snip> So this must happen when xprt->xpt_flags has both XPT_LISTENER and XPT_CLOSE. Let's take a look at commit b0401d72, this commit has moved the close processing after do recvfrom method, but this commit also introduces this warnings, if the xpt_flags has both XPT_LISTENER and XPT_CLOSED, we should close it, not accpet then close. I'm not sure if the extra accept process will cause the hang. But I'm sure attached patch will fix warning of peername failed 107, could you please try this patch? And tell me if it fixes your nfs hang?
Created attachment 24388 [details] patch to fix nfs peername warning
Updating to 2.6.32.3 + Xiaotian's patch seems to have completely fix the problems on one of my boxes, second one is still having strange NFS issues, so I'd keep this open for a little more while till I sort this out. Thanks for Your efforts guys! I'll report later..
Hi, so with 2.6.32.4 I'm still getting NFS hangs. No traces appear, only peername failed messages: [86251.030894] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket [86251.031325] nfsd: peername failed (err 107)! [86251.031390] nfsd: peername failed (err 107)! [86251.031406] nfsd: peername failed (err 107)!
2.6.33-rc6-git3, still getting those: [57067.191327] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket [57067.191649] nfsd: peername failed (err 107)! [57067.191770] nfsd: peername failed (err 107)! [57067.191815] nfsd: peername failed (err 107)! [57067.191831] nfsd: peername failed (err 107)! [57067.191844] nfsd: peername failed (err 107)! [62042.913991] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket [62042.914276] nfsd: peername failed (err 107)!
Fixed by commit b292cf9ce70d221c3f04ff62db5ab13d9a249ca8.
b292cf9ce70d221c3f04ff62db5ab13d9a249ca8 has been commited between 2.6.33-rc3 and 2.6.33-rc4, I'm seeing this messages in 2.6.33-rc6, so this must be different issue...
But that's exactly the patch from comment #2, isn't it?
well, it is.... so either I'm experiencing some other issue, or the patch doesn't fix all cases of the problem...
Reopening, then.
Nikola, is this still an issue on current (v2.6.37) kernels?
References: http://thread.gmane.org/gmane.linux.nfs/30981 b0401d72 and b292cf9ce70d22 got reverted and this is probably fixed in v2.6.35-rc1 with: commit 301e99ce4a2f42a317129230fd42e6cd874c64b0 Author: Neil Brown <neilb@suse.de> Date: Sun Feb 28 22:01:05 2010 -0500 nfsd: ensure sockets are closed on error