Bug 209739
Summary: | NFSd oops on shutdown, regression in 5.9 vs 5.8 | ||
---|---|---|---|
Product: | File System | Reporter: | Kris Karas (bugs-a21) |
Component: | NFS | Assignee: | bfields |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | trondmy |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.9 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg output of oops during "modprobe nfs"
Oops after applying test-patch from comment 6 |
Description
Kris Karas
2020-10-19 05:23:07 UTC
That's weird, shrinker registration and unregistration looks pretty simple and I haven't spotted any relevant nfsd changes in the git logs between v5.8 and v5.9.... Is there anything else interesting in the logs before that "Oops: ..." line? Ah, I should have looked for [T1760] in the dmesg, as there are a few additional lines: BUG: unable to handle page fault for address: ffffffffa08648e8 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 240b067 P4D 240b067 PUD 240c063 PMD 7c20ee067 PTE 0 But nothing that interesting (except perhaps line #2) [Bugzilla's "preview" button shows this comment without any <CRLF>, making me wonder if I should reply via email instead of using this interface?] I rebuilt a clean 5.9.1 (distro uses GCC 9.3.0, binutils 2.33.1), booted into console mode, /etc/init.d/nfs stop. Seemed to stop OK. Then /etc/init.d/nfs start. That triggered an immediate oops in "modprobe nfs" which then hung in D/io-wait. I'll add that output as an attachment here, without any trimming, as it is considerably more verbose than the earlier oops-on-shutdown. Created attachment 293069 [details]
dmesg output of oops during "modprobe nfs"
And the winner is... 95ad37f90c338e3fd4abf61cecfe02b6f3e080f0 is the first bad commit commit 95ad37f90c338e3fd4abf61cecfe02b6f3e080f0 Author: Frank van der Linden <fllinden@amazon.com> Date: Tue Jun 23 22:39:04 2020 +0000 NFSv4.2: add client side xattr caching. Implement client side caching for NFSv4.2 extended attributes. The cache is a per-inode hashtable, with name/value entries. There is one special entry for the listxattr cache. NFS inodes have a pointer to a cache structure. The cache structure is allocated on demand, freed when the cache is invalidated. Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE) are collected by a separate shrinker, and freed more aggressively than others. Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> fs/nfs/Makefile | 2 +- fs/nfs/inode.c | 9 +- fs/nfs/nfs42proc.c | 12 + fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++++++++++ fs/nfs/nfs4_fs.h | 22 + fs/nfs/nfs4proc.c | 42 +- fs/nfs/nfs4super.c | 10 + include/linux/nfs_fs.h | 6 + include/uapi/linux/nfs_fs.h | 1 + 9 files changed, 1179 insertions(+), 8 deletions(-) create mode 100644 fs/nfs/nfs42xattr.c Just to verify, I backed out that single commit on a clean linux-5.9.tar.xz (imperfectly due to later commits), and am running it now as I type this. No NFS oops. As I look at the patch itself, it's reasonably substantial. Happy to test a sub-patch if, perhaps, the author would care to suggest a test case. Oh, interesting, that patch registers three new shrinkers, then only unregisters two, which probably leaves the global list of shrinkers corrupted after the nfs module unloads, and then nfsd stumbles across that corruption. So, untested, but maybe all you need is: diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c index 86777996cfec..efe938fac96e 100644 --- a/fs/nfs/nfs42xattr.c +++ b/fs/nfs/nfs42xattr.c @@ -1048,6 +1048,7 @@ int __init nfs4_xattr_cache_init(void) void nfs4_xattr_cache_exit(void) { + unregister_shrinker(&nfs4_xattr_large_entry_shrinker); unregister_shrinker(&nfs4_xattr_entry_shrinker); unregister_shrinker(&nfs4_xattr_cache_shrinker); list_lru_destroy(&nfs4_xattr_entry_lru); Created attachment 293093 [details] Oops after applying test-patch from comment 6 That one-liner patch looked promising, but alas, triggers a different oops, this one when NFS is being restarted. (The bug may be due to corruption introduced in the shutdown, but it only shows itself on the restart.) This oops contains do_mount(), which suggests it's occurring roughly here: mount -t nfsd none /proc/fs/nfsd Whoops, how about this: commit 25c328172348 (HEAD -> for-trond) Author: J. Bruce Fields <bfields@redhat.com> Date: Tue Oct 20 11:45:10 2020 -0400 NFSv4.2: fix failure to unregister shrinker We forgot to unregister the nfs4_xattr_large_entry_shrinker. That leaves the global list of shrinkers corrupted after unload of the nfs module, after which possibly unrelated code that calls register_shrinker() or unregister_shrinker() gets a BUG() with "supervisor write access in kernel mode". Reported-by: Kris Karas <bugs-a17@moonlit-rail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c index 86777996cfec..55b44a42d625 100644 --- a/fs/nfs/nfs42xattr.c +++ b/fs/nfs/nfs42xattr.c @@ -1048,8 +1048,10 @@ int __init nfs4_xattr_cache_init(void) void nfs4_xattr_cache_exit(void) { + unregister_shrinker(&nfs4_xattr_large_entry_shrinker); unregister_shrinker(&nfs4_xattr_entry_shrinker); unregister_shrinker(&nfs4_xattr_cache_shrinker); + list_lru_destroy(&nfs4_xattr_large_entry_lru); list_lru_destroy(&nfs4_xattr_entry_lru); list_lru_destroy(&nfs4_xattr_cache_lru); kmem_cache_destroy(nfs4_xattr_cache_cachep); I have beaten on the patch in comment 8 quite a bit, and cannot make it fail. Has my blessing. Feel free to close with resolved/code-fix if no other feedback needed. Tested-By: Kris Karas <bugs-a17@moonlit-rail.com> As an aside, I am idly curious why others running 5.9 have not encountered this, too. I suppose most people are running systemd-based distros, which might step around the bug somehow without triggering it. Thanks for the confirmation. You only hit the bug if you unload the nfs module. I bet most standard scripts don't bother to--the module probably gets loaded automatically on mount and never unmounted. Has the fix for this been sent upstream? I've had to hand-patch all the 5.9.x shipped out by Greg KH, still no sign of the fix there. Thanks for checking, looks like it may have gotten lost somewhere, I've sent a reminder. |