Bug 16494
Summary: | NFS client over TCP hangs due to packet loss | ||
---|---|---|---|
Product: | Networking | Reporter: | andyc.bluearc |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | akpm, alan |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34.1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Abort SUNRPC connection if it's still in shutdown state when it's reused. |
Description
andyc.bluearc
2010-08-02 16:14:43 UTC
Please don't send patches via bugzilla - it causes lots of problems with our usual patch management and review processes. Please send this patch via email as per Documentation/SubmittingPatches. Suitable recipients may be found via scripts/get_maintainer.pl. Please also cc myself on the email. Fort hsi one I'd suggest cc'ing netdev@vger.kernel.org and linux-nfs@vger.kernel.org at least. Thanks. patch submitted: http://lkml.org/lkml/2010/8/3/91 This problem also affects 2.6.32 and 2.6.30 series kernels. We have not seen such a hang with 2.6.26. FWIW I've found it easier to reproduce this problem if Ethernet flow control is off but it still happens with it on as well This is how I reproduce the problem. If I do this in 4 different xterm windows having cd to the same NFS mounted directory: xterm1: rm -rf * xterm2: while true; do let iter+=1; echo $iter; dd if=/dev/zero of=$$ bs=1M count=1000; done xterm3: while true; do let iter+=1; echo $iter; dd if=/dev/zero of=$$ bs=1M count=1000; done xterm4: while true; do let iter+=1; echo $iter; dd if=/dev/zero of=$$ bs=1M count=1000; done then it normally hangs before the 3rd iteration starts. The directory contains loads of information (eg 5 linux source trees). This happens with different types of Ethernet hardware too. The rm -rf isn't necessary but makes the problem easier to reproduce (for me anyway). Changing to regression as this can not be reproduced on 2.6.26 series kernels. I've reproduced the problem on 2.6.34.2. Created attachment 27384 [details]
Abort SUNRPC connection if it's still in shutdown state when it's reused.
The sk_shutdown flag was left set on a socket thus causing tcp_sendmsg() to return an error thus causing the RPC layer to attempt to repeat recovery. The patch detects this situation and causes the connection to be aborted if sk_shutdown is set when a connection is being reused.
|