Bug 219008 - Reproducible NFSv4 server crash during client access, page_fault_oops at nfsd_file_lease_notifier_call
Summary: Reproducible NFSv4 server crash during client access, page_fault_oops at nfsd...
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: Chuck Lever
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-06 07:43 UTC by Florian Evers
Modified: 2024-07-12 16:53 UTC (History)
3 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg crash log (9.86 KB, text/plain)
2024-07-06 07:43 UTC, Florian Evers
Details

Description Florian Evers 2024-07-06 07:43:56 UTC
Created attachment 306536 [details]
dmesg crash log

I have a reproducible kernel crash here at my NFSv4 server.

The server is NFSv4 only on a Gentoo Linux system with this kernel:

Linux xxxx 6.9.8-gentoo #2 SMP Sat Jul  6 09:23:06 CEST 2024 x86_64 Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz GenuineIntel GNU/Linux

Second, I have my local NFSv4 client work station which is connected to this server using Ethernet. NFSv4 is TCP-only. NFSv3 is not enabled.

Normal NFS file access seems to work fine, such as "ls -al" and manually playing around with single files. However, I have lots of JPEGs on my NFS export (pictures) and to manage them I use the "digikam" photo suite of KDE/Plasma. As soon as I start "digikam" _and_ click on one of those photos I see a kernel crash appear on the NFS file server.

Then I have to reboot the server. This crash is 100% reproducible here, however, having to use "digikam" to trigger this behavior seems a litte bit chunky.

I attach the dmesg crash log.
Comment 1 Florian Evers 2024-07-06 10:56:25 UTC
Okay, more info. Forget that Digikam stuff. It is much easier to crash the server:

1. On the client, I went into one of the folders imported as NFSv4.2.
2. cat-ing a specific file worked fine.
3. However, opening the same file with an editor (I used kwrite) causes the crash reliably.

This seems to be a very basic usage pattern. No idea what I did setup here to cause such a behavior... that nobody else seems to observe.

Any ideas what I could do to debug this further to provide you with more information?

Regards,
Florian
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-09 08:42:47 UTC
Is this a regression, e.g. did this use to work fine in earlier kernels like 6.8.y?
Comment 3 Florian Evers 2024-07-09 16:17:04 UTC
Hi,

yes, it seems to be a regression. I use this NFS mount only once in a while if I have new photos to add, or if I want to view some photos. The last time before I had no such problems, but this was maybe 1-3 month ago. I never got this hard crash that propagates to the NFS client freezing the application. So I think that this behavior was introduced in the near past, probably with Linux kernel 6.9.x.

After the crash I started debugging and removed any remains of NFSv3 (stale daemons) but the system was NFSv4-only before due to a firewall: only TCP port 2049 was open. No change. The crash still happens immediately if I open a single file with an editor. 

I could perform a bisect, but that would be tedious on that machine. Maybe there are other things to try first?

Regards,
Florian
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-09 16:30:36 UTC
(In reply to Florian Evers from comment #3)
> I could perform a bisect, but that would be tedious on that machine. Maybe
> there are other things to try first?

Not my area of expertise, but maybe the nfs developer have an idea.

But there is two things that might be good to know:

* if 6.10-rc7 is affected as well (reminder, 6.9 will likely be eol in 3 or 4 weeks anyway).

* which kernel version introduced this (e.g. did 6.8 or 6.7 really work?)
Comment 5 Florian Evers 2024-07-09 17:31:32 UTC
Hi Thorsten,

thanks for the hint, in git-sources-6.10_rc7 this bug is no longer reproducible. Great... I'm going to stay with it.

However, I can't be 100% sure... maybe there is more to try?

Thank you, kind regards!
Florian
Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-09 17:37:41 UTC
(In reply to Florian Evers from comment #5)
> bug is no longer reproducible

Great.

> However, I can't be 100% sure... maybe there is more to try?

That's up to the NFS developers again. But a bisection nevertheless would be good, as then this still could be fixed in 6.9 and safe others from running into the problem.
Comment 7 Chuck Lever 2024-07-10 17:39:12 UTC
A bisection would be very helpful.

If you want to narrow the range of commits, you might try bisecting over the two NFSD branches that were merged into v6.9-rc and v6.10-rc. They are the "nfsd-fixes" and "nfsd-next" branches in this repo:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
Comment 8 Florian Evers 2024-07-10 17:47:50 UTC
Hi,

okay I'm trying this... might take some time... hold my beer ^^

Florian
Comment 9 Jeff Layton 2024-07-10 19:21:04 UTC
What might also be helpful is to use faddr2line to determine where it crashed. From the kernel source tree:

    $ ./scripts/faddr2line --list ./vmlinux nfsd_file_lease_notifier_call+0x51/0x70

You may have to point it at the kernel module if this is a modular kernel. My first thought is that the lease code passed the notifier a NULL pointer for the lease, but I don't see how that can happen right offhand.
Comment 10 Florian Evers 2024-07-11 19:13:53 UTC
Hi,

bisecting is ongoing, but it seems to depend on some user-land stuff too... still testing.

Meanwhile, I had to recompile with some debug-related config switches that were missing. Now I can provide you at least the output of the faddr2line script:

$ uname -a
Linux xxxx 6.9.8-gentoo #6 SMP Thu Jul 11 20:59:15 CEST 2024 x86_64 Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz GenuineIntel GNU/Linux


$ ./scripts/faddr2line --list ./vmlinux nfsd_file_lease_notifier_call+0x51/0x70                                      
nfsd_file_lease_notifier_call+0x51/0x70:

file_inode at include/linux/fs.h:1078
 1073 
 1074   extern void send_sigio(struct fown_struct *fown, int fd, int band);
 1075 
 1076   static inline struct inode *file_inode(const struct file *f)
 1077   {
>1078<          return f->f_inode;
 1079   }
 1080 
 1081   /*
 1082    * file_dentry() is a relic from the days that overlayfs was using files with a
 1083    * "fake" path, meaning, f_path on overlayfs and f_inode on underlying fs.

(inlined by) nfsd_file_lease_notifier_call at fs/nfsd/filecache.c:671
 666    {
 667            struct file_lock *fl = data;
 668 
 669            /* Only close files for F_SETLEASE leases */
 670            if (fl->c.flc_flags & FL_LEASE)
>671<                   nfsd_file_close_inode(file_inode(fl->c.flc_file));
 672            return 0;
 673    }
 674 
 675    static struct notifier_block nfsd_file_lease_notifier = {
 676            .notifier_call = nfsd_file_lease_notifier_call,

(inlined by) nfsd_file_lease_notifier_call at fs/nfsd/filecache.c:664
 659                    nfsd_file_free(nf);
 660            }
 661    }
 662 
 663    static int
>664<   nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long
>arg,
 665                                void *data)
 666    {
 667            struct file_lock *fl = data;
 668 
 669            /* Only close files for F_SETLEASE leases */

Regards,
Florian
Comment 11 Jeff Layton 2024-07-11 19:56:17 UTC
Interesting. flc_file was NULL then. In this codepath, that would have gotten set in nfs4_alloc_init_lease:

    fl->c.flc_file = dp->dl_stid.sc_file->fi_deleg_file->nf_file;

The nf_file pointer is initialized to NULL on allocation, but should always be set to point to a valid struct file before the nfsd_file can be used.

I've no idea how we ended up here. I'll be interested to see what the bisect turns up!
Comment 12 Jeff Layton 2024-07-11 20:16:47 UTC
Even weirder, we shouldn't even be trying to notify at all in this codepath:

static int
nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
                            void *data)
{ 
        struct file_lock *fl = data;

        /* Only close files for F_SETLEASE leases */
        if (fl->c.flc_flags & FL_LEASE)
                nfsd_file_close_inode(file_inode(fl->c.flc_file));
        return 0;
} 

nfs4_alloc_init_lease sets:

    fl->c.flc_flags = FL_DELEG;

...so why did we end up calling nfsd_file_close_inode at all? FL_LEASE shouldn't be set.

There is a bug in nfsd_file_lease_notifier_call. We should be casting fl to a file_lease instead, but the structs both have the file_lock_core as the first field, so that seems unlikely to be the problem here.
Comment 13 Florian Evers 2024-07-11 20:18:20 UTC
Ok, new info, but still not 100% sure what's going on here.

In order to get the bisection running I cloned from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git and started with tag v6.8.9 as a "good" start for git bisect. Then I checked out v6.9.8 in order to mark the first "bad" commit... but then this happens:

My self compiled v6.9.8 from linux.git does not produce the crash!

Interestingly, I used "gentoo-sources" which introduces additional patches on its own, but more important, enables multiple kernel settings on its own.

The list of additional patches can be found here:

https://dev.gentoo.org/~mpagano/genpatches/patches-6.9-9.html

Only 12 patches, and most look harmless or irrelevant to me, but 4567_distro-Gentoo-Kconfig.patch enables a lot of config switches. Who knows...

The only difference is this patch set and this config file diff:


/boot # diff config-6.9.8-gentoo config-6.9.8                       
3c3
< # Linux/x86 6.9.8-gentoo Kernel Configuration
---
> # Linux/x86 6.9.8 Kernel Configuration
4907,4924d4905
< 
< #
< # Gentoo Linux
< #
< CONFIG_GENTOO_LINUX=y
< CONFIG_GENTOO_LINUX_UDEV=y
< CONFIG_GENTOO_LINUX_PORTAGE=y
< 
< #
< # Support for init systems, system and service managers
< #
< CONFIG_GENTOO_LINUX_INIT_SCRIPT=y
< # CONFIG_GENTOO_LINUX_INIT_SYSTEMD is not set
< # end of Support for init systems, system and service managers
< 
< CONFIG_GENTOO_KERNEL_SELF_PROTECTION=y
< CONFIG_GENTOO_PRINT_FIRMWARE_INFO=y
< # end of Gentoo Linux

This indicates that bisection may not lead anyway. Maybe you see anything of interest... the patchset is very small.

Maybe CONFIG_GENTOO_KERNEL_SELF_PROTECTION? I'm continuing to test... any ideas?
Comment 14 Jeff Layton 2024-07-11 20:25:03 UTC
Given the analysis in comment #12, this looks like this could be some sort of reproducible memory scribble. Maybe a UAF of a struct file_lease? If you have a (semi-)reliable reproducer, it might be interesting to turn on KASAN and see if that shows anything.
Comment 15 Florian Evers 2024-07-11 20:29:21 UTC
Ok, I'm compiling with KASAN right now... lets see.

CONFIG_GENTOO_KERNEL_SELF_PROTECTION was not the culprit. It does nothing (on its own). All witches I manually enabled are also enabled in the kernel built from git-sources.

What should I do with KASAN? Compile, boot, crash... then what? :-D

Regards,
Florian
Comment 16 Jeff Layton 2024-07-11 20:44:06 UTC
(In reply to Florian Evers from comment #15)

> What should I do with KASAN? Compile, boot, crash... then what? :-D
> 

Mainly just keep an eye on the kernel log. It may log some things before it crashes if it sees a problem (use-after-free, etc.). If anything shows up, it might tell us what's happening.
Comment 17 Florian Evers 2024-07-11 20:54:53 UTC
At least its reproducible:


[Do, 11. Jul 2024, 22:53:13] ==================================================================
[Do, 11. Jul 2024, 22:53:13] BUG: KASAN: slab-out-of-bounds in nfsd_file_lease_notifier_call+0x14a/0x160
[Do, 11. Jul 2024, 22:53:13] Read of size 8 at addr ffff8881de8446d0 by task nfsd/8075

[Do, 11. Jul 2024, 22:53:13] CPU: 1 PID: 8075 Comm: nfsd Tainted: G                T  6.9.8-gentoo #8
[Do, 11. Jul 2024, 22:53:13] Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-C Series, BIOS 6702 07/23/2013
[Do, 11. Jul 2024, 22:53:13] Call Trace:
[Do, 11. Jul 2024, 22:53:13]  <TASK>
[Do, 11. Jul 2024, 22:53:13]  dump_stack_lvl+0x4f/0x70
[Do, 11. Jul 2024, 22:53:13]  print_report+0xc4/0x670
[Do, 11. Jul 2024, 22:53:13]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? nfsd_file_lease_notifier_call+0x14a/0x160
[Do, 11. Jul 2024, 22:53:13]  kasan_report+0xc2/0x100
[Do, 11. Jul 2024, 22:53:13]  ? nfsd_file_lease_notifier_call+0x14a/0x160
[Do, 11. Jul 2024, 22:53:13]  nfsd_file_lease_notifier_call+0x14a/0x160
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_file_lease_notifier_call+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? kasan_save_track+0x10/0x40
[Do, 11. Jul 2024, 22:53:13]  ? __kasan_slab_alloc+0x95/0xa0
[Do, 11. Jul 2024, 22:53:13]  srcu_notifier_call_chain+0xb6/0x120
[Do, 11. Jul 2024, 22:53:13]  kernel_setlease+0xad/0x100
[Do, 11. Jul 2024, 22:53:13]  nfs4_open_delegation+0xeb2/0x2b60
[Do, 11. Jul 2024, 22:53:13]  ? nfsd4_truncate.isra.0+0x77/0x140
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfs4_open_delegation+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? mutex_unlock+0x7a/0xd0
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_mutex_unlock+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __asan_memcpy+0x38/0x70
[Do, 11. Jul 2024, 22:53:13]  nfsd4_process_open2+0xf6c/0x2280
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd4_process_open2+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? fh_verify+0x4b4/0x1740
[Do, 11. Jul 2024, 22:53:13]  nfsd4_open+0x14d0/0x2f80
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_setuser_and_check_port+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd4_open+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd4_encode_noop+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? nfsd4_encode_operation+0x233/0xee0
[Do, 11. Jul 2024, 22:53:13]  nfsd4_proc_compound+0xacf/0x2090
[Do, 11. Jul 2024, 22:53:13]  nfsd_dispatch+0x2c5/0x500
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_dispatch+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __asan_memset+0x1f/0x50
[Do, 11. Jul 2024, 22:53:13]  svc_process+0x126c/0x2260
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_svc_process+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_dispatch+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  svc_recv+0x1455/0x1c90
[Do, 11. Jul 2024, 22:53:13]  nfsd+0x269/0x3b0
[Do, 11. Jul 2024, 22:53:13]  ? __kthread_parkme+0x90/0x120
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  kthread+0x27d/0x340
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_kthread+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ret_from_fork+0x2b/0x70
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_kthread+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ret_from_fork_asm+0x1a/0x30
[Do, 11. Jul 2024, 22:53:13]  </TASK>

[Do, 11. Jul 2024, 22:53:13] Allocated by task 8075:
[Do, 11. Jul 2024, 22:53:13]  kasan_save_stack+0x34/0x60
[Do, 11. Jul 2024, 22:53:13]  kasan_save_track+0x10/0x40
[Do, 11. Jul 2024, 22:53:13]  __kasan_slab_alloc+0x95/0xa0
[Do, 11. Jul 2024, 22:53:13]  kmem_cache_alloc+0x16e/0x340
[Do, 11. Jul 2024, 22:53:13]  locks_alloc_lease+0x13/0x1c0
[Do, 11. Jul 2024, 22:53:13]  nfs4_open_delegation+0xca2/0x2b60
[Do, 11. Jul 2024, 22:53:13]  nfsd4_process_open2+0xf6c/0x2280
[Do, 11. Jul 2024, 22:53:13]  nfsd4_open+0x14d0/0x2f80
[Do, 11. Jul 2024, 22:53:13]  nfsd4_proc_compound+0xacf/0x2090
[Do, 11. Jul 2024, 22:53:13]  nfsd_dispatch+0x2c5/0x500
[Do, 11. Jul 2024, 22:53:13]  svc_process+0x126c/0x2260
[Do, 11. Jul 2024, 22:53:13]  svc_recv+0x1455/0x1c90
[Do, 11. Jul 2024, 22:53:13]  nfsd+0x269/0x3b0
[Do, 11. Jul 2024, 22:53:13]  kthread+0x27d/0x340
[Do, 11. Jul 2024, 22:53:13]  ret_from_fork+0x2b/0x70
[Do, 11. Jul 2024, 22:53:13]  ret_from_fork_asm+0x1a/0x30

[Do, 11. Jul 2024, 22:53:13] The buggy address belongs to the object at ffff8881de844620
                              which belongs to the cache file_lock_cache of size 160
[Do, 11. Jul 2024, 22:53:13] The buggy address is located 16 bytes to the right of
                              allocated 160-byte region [ffff8881de844620, ffff8881de8446c0)

[Do, 11. Jul 2024, 22:53:13] The buggy address belongs to the physical page:
[Do, 11. Jul 2024, 22:53:13] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1de844
[Do, 11. Jul 2024, 22:53:13] head: order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[Do, 11. Jul 2024, 22:53:13] flags: 0x200000000000840(slab|head|node=0|zone=2)
[Do, 11. Jul 2024, 22:53:13] page_type: 0xffffffff()
[Do, 11. Jul 2024, 22:53:13] raw: 0200000000000840 ffff8881001e0c80 dead000000000122 0000000000000000
[Do, 11. Jul 2024, 22:53:13] raw: 0000000000000000 0000000080240024 00000001ffffffff 0000000000000000
[Do, 11. Jul 2024, 22:53:13] head: 0200000000000840 ffff8881001e0c80 dead000000000122 0000000000000000
[Do, 11. Jul 2024, 22:53:13] head: 0000000000000000 0000000080240024 00000001ffffffff 0000000000000000
[Do, 11. Jul 2024, 22:53:13] head: 0200000000000001 ffffea00077a1101 dead000000000122 00000000ffffffff
[Do, 11. Jul 2024, 22:53:13] head: 0000000200000000 0000000000000000 00000000ffffffff 0000000000000000
[Do, 11. Jul 2024, 22:53:13] page dumped because: kasan: bad access detected

[Do, 11. Jul 2024, 22:53:13] Memory state around the buggy address:
[Do, 11. Jul 2024, 22:53:13]  ffff8881de844580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[Do, 11. Jul 2024, 22:53:13]  ffff8881de844600: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00
[Do, 11. Jul 2024, 22:53:13] >ffff8881de844680: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[Do, 11. Jul 2024, 22:53:13]                                                  ^
[Do, 11. Jul 2024, 22:53:13]  ffff8881de844700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[Do, 11. Jul 2024, 22:53:13]  ffff8881de844780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[Do, 11. Jul 2024, 22:53:13] ==================================================================
[Do, 11. Jul 2024, 22:53:13] Disabling lock debugging due to kernel taint
[Do, 11. Jul 2024, 22:53:13] general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] SMP KASAN PTI
[Do, 11. Jul 2024, 22:53:13] KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
[Do, 11. Jul 2024, 22:53:13] CPU: 1 PID: 8075 Comm: nfsd Tainted: G    B           T  6.9.8-gentoo #8
[Do, 11. Jul 2024, 22:53:13] Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-C Series, BIOS 6702 07/23/2013
[Do, 11. Jul 2024, 22:53:13] RIP: 0010:nfsd_file_lease_notifier_call+0xf9/0x160
[Do, 11. Jul 2024, 22:53:13] Code: f9 48 c1 e9 03 80 3c 01 00 75 67 48 8b aa b0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d bd 88 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 51 48 8b bd 88 00 00 00 48 8d 6c 24 28 48 89 ee 48
[Do, 11. Jul 2024, 22:53:13] RSP: 0018:ffff88814329f628 EFLAGS: 00010206
[Do, 11. Jul 2024, 22:53:13] RAX: dffffc0000000000 RBX: 1ffff11028653ec6 RCX: 0000000000000000
[Do, 11. Jul 2024, 22:53:13] RDX: 0000000000000011 RSI: 0000000000000000 RDI: 0000000000000088
[Do, 11. Jul 2024, 22:53:13] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[Do, 11. Jul 2024, 22:53:13] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[Do, 11. Jul 2024, 22:53:13] R13: ffff8881de844620 R14: 00000000ffffffff R15: 0000000000000000
[Do, 11. Jul 2024, 22:53:13] FS:  0000000000000000(0000) GS:ffff88839f880000(0000) knlGS:0000000000000000
[Do, 11. Jul 2024, 22:53:13] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Do, 11. Jul 2024, 22:53:13] CR2: 0000330c2425b000 CR3: 0000000259662002 CR4: 00000000001706f0
[Do, 11. Jul 2024, 22:53:13] Call Trace:
[Do, 11. Jul 2024, 22:53:13]  <TASK>
[Do, 11. Jul 2024, 22:53:13]  ? die_addr+0x37/0x90
[Do, 11. Jul 2024, 22:53:13]  ? exc_general_protection+0x150/0x230
[Do, 11. Jul 2024, 22:53:13]  ? asm_exc_general_protection+0x22/0x30
[Do, 11. Jul 2024, 22:53:13]  ? nfsd_file_lease_notifier_call+0xf9/0x160
[Do, 11. Jul 2024, 22:53:13]  ? nfsd_file_lease_notifier_call+0x14a/0x160
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_file_lease_notifier_call+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? kasan_save_track+0x10/0x40
[Do, 11. Jul 2024, 22:53:13]  ? __kasan_slab_alloc+0x95/0xa0
[Do, 11. Jul 2024, 22:53:13]  srcu_notifier_call_chain+0xb6/0x120
[Do, 11. Jul 2024, 22:53:13]  kernel_setlease+0xad/0x100
[Do, 11. Jul 2024, 22:53:13]  nfs4_open_delegation+0xeb2/0x2b60
[Do, 11. Jul 2024, 22:53:13]  ? nfsd4_truncate.isra.0+0x77/0x140
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfs4_open_delegation+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? mutex_unlock+0x7a/0xd0
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_mutex_unlock+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __asan_memcpy+0x38/0x70
[Do, 11. Jul 2024, 22:53:13]  nfsd4_process_open2+0xf6c/0x2280
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd4_process_open2+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? fh_verify+0x4b4/0x1740
[Do, 11. Jul 2024, 22:53:13]  nfsd4_open+0x14d0/0x2f80
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_setuser_and_check_port+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd4_open+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd4_encode_noop+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? nfsd4_encode_operation+0x233/0xee0
[Do, 11. Jul 2024, 22:53:13]  nfsd4_proc_compound+0xacf/0x2090
[Do, 11. Jul 2024, 22:53:13]  nfsd_dispatch+0x2c5/0x500
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_dispatch+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __asan_memset+0x1f/0x50
[Do, 11. Jul 2024, 22:53:13]  svc_process+0x126c/0x2260
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_svc_process+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd_dispatch+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  svc_recv+0x1455/0x1c90
[Do, 11. Jul 2024, 22:53:13]  nfsd+0x269/0x3b0
[Do, 11. Jul 2024, 22:53:13]  ? __kthread_parkme+0x90/0x120
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_nfsd+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  kthread+0x27d/0x340
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_kthread+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ret_from_fork+0x2b/0x70
[Do, 11. Jul 2024, 22:53:13]  ? __pfx_kthread+0x10/0x10
[Do, 11. Jul 2024, 22:53:13]  ret_from_fork_asm+0x1a/0x30
[Do, 11. Jul 2024, 22:53:13]  </TASK>
[Do, 11. Jul 2024, 22:53:13] Modules linked in:
[Do, 11. Jul 2024, 22:53:13] ---[ end trace 0000000000000000 ]---
[Do, 11. Jul 2024, 22:53:13] RIP: 0010:nfsd_file_lease_notifier_call+0xf9/0x160
[Do, 11. Jul 2024, 22:53:13] Code: f9 48 c1 e9 03 80 3c 01 00 75 67 48 8b aa b0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d bd 88 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 51 48 8b bd 88 00 00 00 48 8d 6c 24 28 48 89 ee 48
[Do, 11. Jul 2024, 22:53:14] RSP: 0018:ffff88814329f628 EFLAGS: 00010206
[Do, 11. Jul 2024, 22:53:14] RAX: dffffc0000000000 RBX: 1ffff11028653ec6 RCX: 0000000000000000
[Do, 11. Jul 2024, 22:53:14] RDX: 0000000000000011 RSI: 0000000000000000 RDI: 0000000000000088
[Do, 11. Jul 2024, 22:53:14] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[Do, 11. Jul 2024, 22:53:14] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[Do, 11. Jul 2024, 22:53:14] R13: ffff8881de844620 R14: 00000000ffffffff R15: 0000000000000000
[Do, 11. Jul 2024, 22:53:14] FS:  0000000000000000(0000) GS:ffff88839f880000(0000) knlGS:0000000000000000
[Do, 11. Jul 2024, 22:53:14] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Do, 11. Jul 2024, 22:53:14] CR2: 0000330c2425b000 CR3: 0000000259662002 CR4: 00000000001706f0
Comment 18 Jeff Layton 2024-07-12 12:35:02 UTC
I sent this patch this morning:

    https://lore.kernel.org/linux-nfs/20240712-nfsd-next-v1-1-58c5f2557436@kernel.org/T/#u

Do you have struct randomization enabled? If so then that might explain what we're seeing here. Can you try that patch and see if it helps?
Comment 19 Florian Evers 2024-07-12 16:10:45 UTC
Hi Jeff,

yes, I have struct randomization enabled and yes, your patch works: I was unable to trigger that crash again :-D

Indeed, that explains a lot... in particular why nobody else seems to be affected by this behavior. Thank you very much for your findings... and I learned a lot.

Kind regards,
Florian
Comment 20 Jeff Layton 2024-07-12 16:14:44 UTC
Thanks for testing it!

Chuck, mind adding Reported-by and Tested-by credit to that patch for Florian?
Comment 21 Chuck Lever 2024-07-12 16:52:45 UTC
(In reply to Jeff Layton from comment #20)
> Chuck, mind adding Reported-by and Tested-by credit to that patch for
> Florian?

Absolutely, and I will apply the patch for 6.11, since it's simple.

Note You need to log in before you can comment on or make changes to this bug.