kernel export threads (nfsd/lockd/ksmbd) cannot process blocking locks synchronously: they have limited number of worker threads and if all of them will wait on blocking lock server will unable to handle other requests and can deadlock. To avoid this problem these servers forces vfs_lock_file to return asynchronously(): * Callers expecting ->lock() to return asynchronously * will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if) * the request is for a blocking lock. (from comment above vfs_lock_file()) by the other words, to correctly handle blocking lock requests: a) servers should use F_SETLK with FL_SLEEP b) ->lock() file system functions should handle such request correctly. At present (v5.15) this does not happen in some cases: - on server side: ksmbd still use F_SETLKW (patch sent already) - file systems defined own ->lock function in most cases does not expect such requests and handles them incorrectly, can block them and cause export threads to deadlock. To temporally work around this problem kernel threads will drop FL_SLEEP flag if exported file system defines own -> lock function (according patches are sent already). (file system without own ->lock() function uses safe posix_lock_file()) This bug is required to track fixes in all affected file systems. When this task will be done, work around dropped FL_SLEEP in kernel threads can be reverted. Below is list of affected in-tree file systems, with noticed issues: - fuse: fuse_setlk() ignores F_SETLK cmd, handles FL_SLEEP flag and call blocking FUSE_SETLKW. (One more problem is that kernel part of FUSE cannot guarantee that user space handles such requests correctly. On the other hand any other remote file system depends on reply of server side, that potentially can handle non-blocking requests incorrectly, so I think we can hope for the best and ignore this problem) - afs: afs_do_setlk() ignores F_SETLK cmd but handles FL_SLEEP, uses blocking locks_lock_file_wait - cifs: cifs_lock ignores F_SETLK cmd, handles FL_SLEEP flag and set up wait_flag forced cifs_setlk() to submit blocking requests. (then it executes potentially blocking locks_lock_file_wait(), but it looks safe, see commend about dlm_posix_lock below) - v9fs: v9fs_file_do_lock() uses blocking locks_lock_file_wait() - nfs: do_setlk() uses blocking locks_lock_file_wait() _nfs4_proc_setlk uses blocking locks_lock_file_wait() - gfs2 and ocfs: uses dlm_posix_lock() that calls locks_lock_file_wait() after successfully finished remote request. However in this situation this looks safe. - orangefs: - ceph: looks safe
fuse: [PATCH] fuse: async processing of F_SETLK with FL_SLEEP flag
(In reply to Vasily Averin from comment #1) > fuse: > [PATCH] fuse: async processing of F_SETLK with FL_SLEEP flag https://patchwork.kernel.org/project/linux-fsdevel/patch/f0b7b363-2820-d184-52ea-c67bb3e4054d@virtuozzo.com/
afs: [PATCH] afs: handle async processing of F_SETLK with FL_SLEEP flag https://lkml.org/lkml/2021/12/23/321
cifs: [PATCH] cifs: handle async processing of F_SETLK with FL_SLEEP flag https://patchwork.kernel.org/project/cifs-client/patch/a71b8e8d-5393-25f2-1d63-991e903c262c@virtuozzo.com/
v9fs: [PATCH] v9fs: handle async processing of F_SETLK with FL_SLEEP flag https://lkml.org/lkml/2021/12/23/530
nfs local_locks: [PATCH] nfs: local_lock: handle async processing of F_SETLK with FL_SLEEP https://patchwork.kernel.org/project/linux-nfs/patch/6613b17b-43bd-07d0-2ca7-1581a39cdf7b@virtuozzo.com/ nfs4: [PATCH] nfs4: handle async processing of F_SETLK with FL_SLEEP https://patchwork.kernel.org/project/linux-nfs/patch/3a2c6cb9-abe7-ab32-b11c-d78621361555@virtuozzo.com/
(In reply to Vasily Averin from comment #6) > nfs local_locks: > nfs4: [PATCH v3 1/3] nfs: local_lock: handle async processing of F_SETLK with FL_SLEEP [PATCH v3 2/3] nfs4: handle async processing of F_SETLK with FL_SLEEP [PATCH v3 3/3] nfs v2/3: nlmclnt_lock: handle async processing of F_SETLK with FL_SLEEP https://patchwork.kernel.org/project/linux-nfs/list/?series=601825