Bug 194691
Summary: | Unreachable CIFS mount hangs system for an inconsistent amount of time ( between 60 mins to 5 hours) | ||
---|---|---|---|
Product: | File System | Reporter: | Rajesh (rajesh.chivas) |
Component: | CIFS | Assignee: | fs_cifs (fs_cifs) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | blocking | CC: | 2contras, piastryyy, rajesh.chivas, rdiezmail-kernelbugzilla |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.16.38 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Kernel Configuration |
Description
Rajesh
2017-02-24 07:40:52 UTC
Upon noticing fixes related to CIFS in 3.16.40, upgrading to it did not help either. Created attachment 254905 [details]
Kernel Configuration
Attaching Kernel configuration. Please let me know if the any other information is required for debugging. Also, the issue is 100% reproducible.
I have the same problem on 4.4 and 4.8 kernels (Ubuntu 14.04 LTS and 16.04 LTS) and share on Windows Server 2003 and/or Windows Server 2008 R2. For testing I used simplest form of mount: sudo mount -t cifs //srv/public /home/user/public -o username=user,rw,uid=user,gid=users . And if /home/user/public is not used for some time (stays idle for 10-60 minutes) the connection stalls and next operation with it (i.e. "ls /home/user/public" ) will hang for dozens of minutes or even hours. It's all started from kernel 4.4.16. Using git bisect between builds 4.4.15 and 4.4.16 pointed the following patch to be the cause of the problem: "Fix reconnect to not defer smb3 session reconnect long after socket reconnect" https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.4.16&id=4ce7aa4e44d88ce64ea8ae2337b8910f3670b0ba This patch finds its way both to 4.8 ( https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.8.17&id=4fcd1813e6404dd4420c7d12fb483f9320f0bf93 ) and 3.16 trees ( https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v3.16.38&id=2e4378ee60049b752c9dce16f62ce6fbd11b379a ) It seems that the bug was corrected in main line between builds 4.10.12 and 4.10.13. Reverse bisecting points to commit 5cd77ebf2254e6f27753ec2041fa5e084bf3eb5e which is a backport of commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit?id=62a6cfddcc0a5313e7da3e8311ba16226fe0ac10 . So the question is whether this fix finds its way back to old (supported) lines of kernel. I found some discussion on the question here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1686099 Patch for issue: "Fix reconnect to not defer smb3 session reconnect long after socket reconnect" https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.4.16&id=4ce7aa4e44d88ce64ea8ae2337b8910f3670b0ba is already on kernel version 3.16.38 and I still see the system being unresponsive when the remote share goes offline. The following patch fixes the issue with reconnects: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53e0e11efe9289535b060a51d4cf37c25e0d0f2b v3.16.42: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/cifs?h=linux-3.16.y&id=0ba4c6eaaacbcc4b18f51bb3b1567c65a8fecca9 v4.4.40: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/cifs?h=linux-4.4.y&id=f0b715409cb9cf7e21e690f9b163047739761962 v4.8.16: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/cifs?h=linux-4.8.y&id=ff04da387c10b6bf7b510392742c8cd46c130fd6 |