Bug 215782

Summary: CIFS umount fails since 14302ee33 with some servers (exit code 32)
Product: File System Reporter: Moritz Duge (MoritzDuge)
Component: CIFSAssignee: fs_cifs (fs_cifs)
Status: RESOLVED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.16.9 Subsystem:
Regression: No Bisected commit-id:

Description Moritz Duge 2022-03-31 16:47:35 UTC
With upstream kernel 5.16.9 CIFS umount fails when using certain SMB servers.

"umount" returns exit code 32 and the "mount" command still lists the mount as being present.
See below for the bad commit I've bisected.

The bug has been reproduced multiple times with upstream kernel 5.16.9!
But additionally I've done much testing with openSUSE kernels.
Here's the openSUSE bugreport:
https://bugzilla.opensuse.org/show_bug.cgi?id=1194945
Additionally with the same servers there's a problem showing the free space with the "df" command. But I haven't been able to find out if this is really related to the umount problem.




= SMB Server =

I haven't been able to identify the exact server side settings. But this problem occured with at least this SMB server (with upstream kernel 5.16.9):
NetApp (Release 9.7P12) with dfs and CIFS mount options "vers=3.1.1,seal"
(quota state unknown)

Additionally I've verified the bug with the openSUSE kernel 5.3.18-lp152.72-default and this SMB server:
Windows Server 2019 with dfs and quota enabled
(no explicit "vers" or "seal" mount options)

Additionally the bug appeared with another NetApp SMB server (tested upstream 5.16.9) and two unknown servers (tested only openSUSE-15.2 kernels).

Also it looks like the bug may need a setup where the user can only read //server/share/username/ but has no permissions to read //server/share/.




= Bad Commit =

With the openSUSE kernel I bisected the problem down to this commit (6ae27f2b2) between openSUSE-15.2 kernels 5.3.18-lp152.69.1 and 5.3.18-lp152.72.1.
https://github.com/SUSE/kernel/commit/6ae27f2b260e91f16583bbc1ded3147e0f7c5d94

This commit is also present in the upstream kernel (14302ee33).
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=14302ee3301b3a77b331cc14efb95bf7184c73cc
And it has been merged between 5.11 and 5.12.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=d0df9aabefda4d0a64730087f939f53f91e29ee6

As said I can't reproduce this with arbitrary SMB servers. And it's always a time consuming procedure for me to do a test with the affected production SMB servers. But if you're really unhappy with the bisect search on the openSUSE kernel, I can repeat the test with the upstream commit 14302ee33 and it's predecessor.
Comment 1 Moritz Duge 2022-05-04 16:50:26 UTC
Update:

I talked to Paulo (the Author of the mentioned commits).

I'll try to get network traces of that behavior for Paulo.
But I may not get the permission for that because of organizational regulations ... :-/
(still trying)


Sadly the bug only happens inside networks with some annoying security regulations. And as said I haven't found a way to reproduce it.

If anyone, who may have more experience with NetApp or Windows Server, has any idea how to reproduce this with a clean setup, please give me a hint.
Comment 2 Moritz Duge 2022-05-19 15:01:28 UTC
The problem disappeared with one of the latest openSUSE kernel updates.
It must have been one of these:
2022-04-06: ​5.3.18-150300.59.63
​2022-05-05: 5.3.18-150300.59.68

I didn't do a bisect. But the following commit, contained in SUSE kernel 5.3.18-150300.59.68, seems most likely to me.

kernel.org
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=5d7e282541fc91b831a5c4477c5d72881c623df9

SUSE kernel:
https://github.com/SUSE/kernel/commit/f6c7673fbee1985e8fcfe4936ca6d91852f86b13

SUSE kernel source:
https://github.com/SUSE/kernel-source/commit/e7007189db138241fce6440c3bcfa084a0cf7c72