Bug 14943

Summary: nfs regression?
Product: File System Reporter: Rafael J. Wysocki (rjw)
Component: OtherAssignee: fs_other
Status: CLOSED CODE_FIX    
Severity: normal CC: dfeng, florian, nik
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14230    
Attachments: patch to fix nfs peername warning

Description Rafael J. Wysocki 2009-12-29 13:32:59 UTC
Subject    : 2.6.32 nfs regression?
Submitter  : Nikola Ciprich <extmaillist@linuxbox.cz>
Date       : 2009-12-28 12:10
References : http://marc.info/?l=linux-kernel&m=126200276223524&w=4

This entry is being used for tracking a regression from 2.6.31.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Xiaotian Feng 2009-12-31 02:26:41 UTC
interesting case, I just worry about following messages:

[60144.870436] nfsd: peername failed (err 107)!

So, socket error -107 means Transport endpoint is not connected.
This warning message was outputed by svc_tcp_accept() [net/sunrpc/svcsock.c], when kernel_getpeername returns -107. This means socket is CLOSED.

And svc_tcp_accept was called by svc_recv() [net/sunrpc/svc_xprt.c]

        if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
        <snip>
                newxpt = xprt->xpt_ops->xpo_accept(xprt);
        <snip>

So this must happen when xprt->xpt_flags has both XPT_LISTENER and XPT_CLOSE.

Let's take a look at commit b0401d72, this commit has moved the close processing after do recvfrom method, but this commit also introduces this warnings, if the xpt_flags has both XPT_LISTENER and XPT_CLOSED, we should close it, not accpet then close. I'm not sure if the extra accept process will cause the hang.

But I'm sure attached patch will fix warning of peername failed 107, could you please try this patch? And tell me if it fixes your nfs hang?
Comment 2 Xiaotian Feng 2009-12-31 02:29:46 UTC
Created attachment 24388 [details]
patch to fix nfs peername warning
Comment 3 nik@linuxbox.cz 2010-01-14 09:26:55 UTC
Updating to 2.6.32.3 + Xiaotian's patch seems to have completely fix the problems on one of my boxes, second one is still having strange NFS issues, so I'd keep this open for a little more while till I sort this out. Thanks for Your efforts guys! I'll report later..
Comment 4 nik@linuxbox.cz 2010-01-20 11:12:20 UTC
Hi, so with 2.6.32.4 I'm still getting NFS hangs. No traces appear, only peername failed messages:

[86251.030894] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[86251.031325] nfsd: peername failed (err 107)!
[86251.031390] nfsd: peername failed (err 107)!
[86251.031406] nfsd: peername failed (err 107)!
Comment 5 nik@linuxbox.cz 2010-02-04 14:05:05 UTC
2.6.33-rc6-git3, still getting those:
[57067.191327] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[57067.191649] nfsd: peername failed (err 107)!
[57067.191770] nfsd: peername failed (err 107)!
[57067.191815] nfsd: peername failed (err 107)!
[57067.191831] nfsd: peername failed (err 107)!
[57067.191844] nfsd: peername failed (err 107)!
[62042.913991] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
[62042.914276] nfsd: peername failed (err 107)!
Comment 6 Rafael J. Wysocki 2010-02-07 22:40:37 UTC
Fixed by commit b292cf9ce70d221c3f04ff62db5ab13d9a249ca8.
Comment 7 nik@linuxbox.cz 2010-02-08 17:56:57 UTC
b292cf9ce70d221c3f04ff62db5ab13d9a249ca8 has been commited between 2.6.33-rc3 and 2.6.33-rc4, I'm seeing this messages in 2.6.33-rc6, so this must be different issue...
Comment 8 Rafael J. Wysocki 2010-02-08 18:57:15 UTC
But that's exactly the patch from comment #2, isn't it?
Comment 9 nik@linuxbox.cz 2010-02-08 19:14:07 UTC
well, it is.... so either I'm experiencing some other issue, or the patch doesn't fix all cases of the problem...
Comment 10 Rafael J. Wysocki 2010-02-08 19:27:40 UTC
Reopening, then.
Comment 11 Florian Mickler 2011-02-02 09:39:21 UTC
Nikola, is this still an issue on current (v2.6.37) kernels?
Comment 12 Florian Mickler 2011-02-02 10:40:43 UTC
References: http://thread.gmane.org/gmane.linux.nfs/30981
b0401d72 and b292cf9ce70d22 got reverted and this is probably fixed in v2.6.35-rc1 with:
commit 301e99ce4a2f42a317129230fd42e6cd874c64b0
Author: Neil Brown <neilb@suse.de>
Date:   Sun Feb 28 22:01:05 2010 -0500

    nfsd: ensure sockets are closed on error