Bug 215782

Summary: CIFS umount fails since 14302ee33 with some servers (exit code 32)
Product: File System Reporter: Moritz Duge (duge)
Component: CIFSAssignee: fs_cifs (fs_cifs)
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.16.9 Subsystem:
Regression: No Bisected commit-id:

Description Moritz Duge 2022-03-31 16:47:35 UTC
With upstream kernel 5.16.9 CIFS umount fails when using certain SMB servers.

"umount" returns exit code 32 and the "mount" command still lists the mount as being present.
See below for the bad commit I've bisected.

The bug has been reproduced multiple times with upstream kernel 5.16.9!
But additionally I've done much testing with openSUSE kernels.
Here's the openSUSE bugreport:
Additionally with the same servers there's a problem showing the free space with the "df" command. But I haven't been able to find out if this is really related to the umount problem.

= SMB Server =

I haven't been able to identify the exact server side settings. But this problem occured with at least this SMB server (with upstream kernel 5.16.9):
NetApp (Release 9.7P12) with dfs and CIFS mount options "vers=3.1.1,seal"
(quota state unknown)

Additionally I've verified the bug with the openSUSE kernel 5.3.18-lp152.72-default and this SMB server:
Windows Server 2019 with dfs and quota enabled
(no explicit "vers" or "seal" mount options)

Additionally the bug appeared with another NetApp SMB server (tested upstream 5.16.9) and two unknown servers (tested only openSUSE-15.2 kernels).

Also it looks like the bug may need a setup where the user can only read //server/share/username/ but has no permissions to read //server/share/.

= Bad Commit =

With the openSUSE kernel I bisected the problem down to this commit (6ae27f2b2) between openSUSE-15.2 kernels 5.3.18-lp152.69.1 and 5.3.18-lp152.72.1.

This commit is also present in the upstream kernel (14302ee33).
And it has been merged between 5.11 and 5.12.

As said I can't reproduce this with arbitrary SMB servers. And it's always a time consuming procedure for me to do a test with the affected production SMB servers. But if you're really unhappy with the bisect search on the openSUSE kernel, I can repeat the test with the upstream commit 14302ee33 and it's predecessor.
Comment 1 Moritz Duge 2022-05-04 16:50:26 UTC

I talked to Paulo (the Author of the mentioned commits).

I'll try to get network traces of that behavior for Paulo.
But I may not get the permission for that because of organizational regulations ... :-/
(still trying)

Sadly the bug only happens inside networks with some annoying security regulations. And as said I haven't found a way to reproduce it.

If anyone, who may have more experience with NetApp or Windows Server, has any idea how to reproduce this with a clean setup, please give me a hint.
Comment 2 Moritz Duge 2022-05-19 15:01:28 UTC
The problem disappeared with one of the latest openSUSE kernel updates.
It must have been one of these:
2022-04-06: ​5.3.18-150300.59.63
​2022-05-05: 5.3.18-150300.59.68

I didn't do a bisect. But the following commit, contained in SUSE kernel 5.3.18-150300.59.68, seems most likely to me.


SUSE kernel:

SUSE kernel source: