Bug 216691 - nfsd general protection fault in nfsd4_encode_operation
Summary: nfsd general protection fault in nfsd4_encode_operation
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: i386 Linux
: P1 blocking
Assignee: Chuck Lever
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-14 14:06 UTC by Alberto Boldrini
Modified: 2024-01-16 14:28 UTC (History)
1 user (show)

See Also:
Kernel Version: 6.0.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Alberto Boldrini 2022-11-14 14:06:38 UTC
An exception is raised sometime by nfsd serving a zfs mounted dataset.
After that nfs client seems frozen.

Mount options for nfs4:
rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,local_lock=none

Export options:
rw,secure,no_root_squash,no_subtree_check

Best regards,
Alberto

[Sat Nov 12 03:31:00 2022] general protection fault, probably for non-canonical address 0xccbeabbcd2f04ceb: 0000 [#2] PREEMPT SMP NOPTI
[Sat Nov 12 03:31:00 2022] CPU: 35 PID: 5195 Comm: nfsd Tainted: P      D W  OE      6.0.6-arch1-1 #1 a46cc4b882cfc11c3bbb09d6a0fab3dcad53b5c2
[Sat Nov 12 03:31:00 2022] Hardware name: Supermicro SYS-220U-TNR/X12DPU-6, BIOS 1.4 07/12/2022
[Sat Nov 12 03:31:00 2022] RIP: 0010:nfsd4_encode_read+0xd9/0x390 [nfsd]
[Sat Nov 12 03:31:00 2022] Code: 0a 80 3c 24 00 0f 85 b6 01 00 00 49 83 7e 28 00 0f 85 3e 02 00 00 49 8b 46 08 8b 50 3c 2b 50 40 8b 43 18 48 39 c2 48 0f 47 d0 <49> 8b 47 28 48 83 b8 c8 00 00 00 00 74 0a 80 3c 24 00 0f 85 89 00
[Sat Nov 12 03:31:00 2022] RSP: 0018:ff59f4101f1c7d70 EFLAGS: 00010202
[Sat Nov 12 03:31:00 2022] RAX: 0000000000001000 RBX: ff4decfc5127c4a0 RCX: ff4decfc5127c4a0
[Sat Nov 12 03:31:00 2022] RDX: 0000000000001000 RSI: 0000000000000008 RDI: 0000000000000000
[Sat Nov 12 03:31:00 2022] RBP: ff4decfc1819a000 R08: 0000000000000019 R09: ff59f4101f1c7d08
[Sat Nov 12 03:31:00 2022] R10: ff4decfc8165c230 R11: ff4decfc8165c280 R12: 0000000000000000
[Sat Nov 12 03:31:00 2022] R13: ff4ded3029f99060 R14: ff4decfc8165c230 R15: ccbeabbcd2f04cc3
[Sat Nov 12 03:31:00 2022] FS:  0000000000000000(0000) GS:ff4ded79feec0000(0000) knlGS:0000000000000000
[Sat Nov 12 03:31:00 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Nov 12 03:31:00 2022] CR2: 00007fb118bc1000 CR3: 000000111c210006 CR4: 0000000000771ee0
[Sat Nov 12 03:31:00 2022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Sat Nov 12 03:31:00 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Sat Nov 12 03:31:00 2022] PKRU: 55555554
[Sat Nov 12 03:31:00 2022] Call Trace:
[Sat Nov 12 03:31:00 2022]  <TASK>
[Sat Nov 12 03:31:00 2022]  nfsd4_encode_operation+0xaf/0x280 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Sat Nov 12 03:31:00 2022]  nfsd4_proc_compound+0x1d0/0x6f0 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Sat Nov 12 03:31:00 2022]  nfsd_dispatch+0x16b/0x280 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Sat Nov 12 03:31:00 2022]  svc_process_common+0x284/0x5e0 [sunrpc dda24f7ebb18486303e1518da1aed245d694a806]
[Sat Nov 12 03:31:00 2022]  ? nfsd_svc+0x370/0x370 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Sat Nov 12 03:31:00 2022]  ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Sat Nov 12 03:31:00 2022]  svc_process+0xb9/0xf0 [sunrpc dda24f7ebb18486303e1518da1aed245d694a806]
[Sat Nov 12 03:31:00 2022]  nfsd+0xd9/0x190 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Sat Nov 12 03:31:00 2022]  kthread+0xdb/0x110
[Sat Nov 12 03:31:00 2022]  ? kthread_complete_and_exit+0x20/0x20
[Sat Nov 12 03:31:00 2022]  ret_from_fork+0x1f/0x30
[Sat Nov 12 03:31:00 2022]  </TASK>
[Sat Nov 12 03:31:00 2022] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd auth_rpcgss grace sunrpc 8021q garp mrp stp llc vfat fat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc32c_generic xt_tcpudp wireguard iptable_filter curve25519_x86_64 libchacha20poly1305 chacha_x86_64 bonding poly1305_x86_64 libcurve25519_generic tls libchacha ip6_udp_tunnel udp_tunnel cfg80211 rfkill nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common nvidia(POE) i10nm_edac ipmi_ssif nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_hdmi crc32c_intel snd_hda_intel polyval_clmulni snd_intel_dspcfg polyval_generic snd_intel_sdw_acpi gf128mul ghash_clmulni_intel snd_hda_codec aesni_intel crypto_simd cryptd snd_hda_core rapl snd_hwdep intel_cstate ast ixgbe snd_pcm drm_vram_helper rndis_host snd_timer mdio_devres spi_nor
[Sat Nov 12 03:31:00 2022]  cdc_ether drm_ttm_helper libphy mei_me snd isst_if_mbox_pci isst_if_mmio joydev usbnet i2c_i801 ioatdma mdio pcspkr soundcore mousedev isst_if_common mii ttm mtd mei i2c_smbus intel_vsec intel_pch_thermal intel_uncore dca acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid fuse bpf_preload ip_tables x_tables usbhid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) nvme nvme_core spi_intel_pci xhci_pci nvme_common spi_intel xhci_pci_renesas vmd
[Sat Nov 12 03:31:00 2022] ---[ end trace 0000000000000000 ]---
Comment 1 Chuck Lever 2022-11-14 15:04:05 UTC
If your clients explicitly mount with NFSv4.1 rather than NFSv4.2, do you still see this issue?
Comment 2 Chuck Lever 2022-11-14 15:21:29 UTC
(In reply to Chuck Lever from comment #1)
> If your clients explicitly mount with NFSv4.1 rather than NFSv4.2, do you
> still see this issue?

Strike that. I initially thought this might be related to READ_PLUS, but nfsd4_encode_read() is called only for plain READ operations.

If you've built your kernel yourself, can you report the results of:

  linux$ scripts/faddr2line .... nfsd4_encode_read+0xd9  ??
Comment 3 Alberto Boldrini 2022-11-14 17:36:49 UTC
Hi Chuck,
I didn't build the kernel, but is from the package linux-6.0.6.arch1-1-x86_64.pkg.tar.zst of ArchLinux. 

It should be built by this script https://github.com/archlinux/svntogit-packages/blob/packages/linux/trunk/PKGBUILD
using this source https://github.com/archlinux/linux/commits/v6.0.6-arch1

If I try to compile it, do you think that faddr2line give the same result?
Comment 4 Chuck Lever 2022-11-14 17:47:53 UTC
(In reply to Alberto Boldrini from comment #3)
> Hi Chuck,
> I didn't build the kernel, but is from the package
> linux-6.0.6.arch1-1-x86_64.pkg.tar.zst of ArchLinux. 
> 
> It should be built by this script
> https://github.com/archlinux/svntogit-packages/blob/packages/linux/trunk/
> PKGBUILD
> using this source https://github.com/archlinux/linux/commits/v6.0.6-arch1
> 
> If I try to compile it, do you think that faddr2line give the same result?

It might. You'd need the same .config and the same compiler, so I'm thinking this isn't going to work with precision. Considering how much of that function is likely to be inlined, maybe we should try something else.

My next thought would be to bisect between v5.19 and v6.0.6, but it looks like Arch uses SVN.

Since this is a distributor's kernel and not mainline, can you open a bug with them and have them nail this down to a line of code or a commit in their source code base? Given that this issue comes up with ZFS, an out-of-tree filesystem, there's not much I can do here to reproduce or provide support. Starting with Arch might be the best bet, and then bring the results back here if it's truly an issue in mainline NFSD.

(I can see a handful of NFSD commits that were merged into v6.0 that alter nfsd4_encode_read(), one of which could be the culprit, but it's hard to say which at this point).
Comment 5 Jeff Layton 2022-11-14 18:02:01 UTC
If you have the debuginfo for the kernel (or it the binaries weren't stripped in the first place), you can open the nfsd.ko.debug file in the debuginfo with gdb, and then do:

    gdb> list *(nfsd4_encode_read+0xd9)

...that should give you a file and line number in the arch sources. Unfortunately, I'm not familiar enough with arch and it's packaging to give you detailed instructions on how to get that file. In Fedora and RHEL there is a kernel-debuginfo package.
Comment 6 Alberto Boldrini 2022-11-14 20:19:26 UTC
Thank you for the suggestions.

I was able to rebuild the arch package with the same compiler version but with debugging info following the their official wiki.

I cannot be sure that they perfectly correspond, but the results with faddr2line are those here:

nfsd4_encode_read+0xd9 corresponds to https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4xdr.c#L3998


nfsd4_encode_operation+0xaf corresponds to https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4xdr.c#L5316

nfsd4_proc_compound+0x1d0 corresponds to
https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4proc.c#L2733

I think that Arch use svn only for build scripts. The fork of linux is on git, but I don't known what is different from the mainline.
I put the links to the source used to build the package.

Currently I'm exporting a zfs filesystem, but I don't know if this is related to  it.
Comment 7 Chuck Lever 2022-11-14 20:30:55 UTC
The nfsd4_encode_operation and nfsd4_proc_compound line numbers look plausible.

nfsd4_encode_read+0xd9 seems to correspond to this:

>>>>    if (file->f_op->splice_read && splice_ok)
		nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
	else
		nfserr = nfsd4_encode_readv(resp, read, file, maxcount);

which confuses me. The pointer dereference there has not changed recently.

Can you confirm what was the last working kernel on your server?
Comment 8 Jeff Layton 2022-11-14 21:16:27 UTC
FWIW, here's the decoded asm around the crash.

[jlayton@tleilax linux]$ ./scripts/decodecode < /tmp/oops.txt 
[Sat Nov 12 03:31:00 2022] Code: 0a 80 3c 24 00 0f 85 b6 01 00 00 49 83 7e 28 00 0f 85 3e 02 00 00 49 8b 46 08 8b 50 3c 2b 50 40 8b 43 18 48 39 c2 48 0f 47 d0 <49> 8b 47 28 48 83 b8 c8 00 00 00 00 74 0a 80 3c 24 00 0f 85 89 00
All code
========
   0:	0a 80 3c 24 00 0f    	or     0xf00243c(%rax),%al
   6:	85 b6 01 00 00 49    	test   %esi,0x49000001(%rsi)
   c:	83 7e 28 00          	cmpl   $0x0,0x28(%rsi)
  10:	0f 85 3e 02 00 00    	jne    0x254
  16:	49 8b 46 08          	mov    0x8(%r14),%rax
  1a:	8b 50 3c             	mov    0x3c(%rax),%edx
  1d:	2b 50 40             	sub    0x40(%rax),%edx
  20:	8b 43 18             	mov    0x18(%rbx),%eax
  23:	48 39 c2             	cmp    %rax,%rdx
  26:	48 0f 47 d0          	cmova  %rax,%rdx
  2a:*	49 8b 47 28          	mov    0x28(%r15),%rax		<-- trapping instruction
  2e:	48 83 b8 c8 00 00 00 	cmpq   $0x0,0xc8(%rax)
  35:	00 
  36:	74 0a                	je     0x42
  38:	80 3c 24 00          	cmpb   $0x0,(%rsp)
  3c:	0f                   	.byte 0xf
  3d:	85                   	.byte 0x85
  3e:	89 00                	mov    %eax,(%rax)

Code starting with the faulting instruction
===========================================
   0:	49 8b 47 28          	mov    0x28(%r15),%rax
   4:	48 83 b8 c8 00 00 00 	cmpq   $0x0,0xc8(%rax)
   b:	00 
   c:	74 0a                	je     0x18
   e:	80 3c 24 00          	cmpb   $0x0,(%rsp)
  12:	0f                   	.byte 0xf
  13:	85                   	.byte 0x85
  14:	89 00                	mov    %eax,(%rax)

...on my machine splice_read is 0xc8 bytes into file_operations. The next instruction after the crash is:

  2e:	48 83 b8 c8 00 00 00 	cmpq   $0x0,0xc8(%rax)

...so it looks like the crashing instruction was trying to load the address of the file_operations struct into %rax, so it looks like the file pointer was bogus here.

Could this be fallout from the filecache handling bugs?
Comment 9 Alberto Boldrini 2022-11-14 21:41:44 UTC
I tried to disassemble the package built with gdb and I can confirm the line because match the decoded asm.

The result:
3995		maxcount = min_t(unsigned long, read->rd_length,
   0x00000000000286d5 <+197>:	mov    0x8(%r14),%rax
   0x00000000000286d9 <+201>:	mov    0x3c(%rax),%edx
   0x00000000000286dc <+204>:	sub    0x40(%rax),%edx
   0x00000000000286df <+207>:	mov    0x18(%rbx),%eax
   0x00000000000286e2 <+210>:	cmp    %rax,%rdx
   0x00000000000286e5 <+213>:	cmova  %rax,%rdx

3996				 (xdr->buf->buflen - xdr->buf->len));
3997	
3998		if (file->f_op->splice_read && splice_ok)
   0x00000000000286e9 <+217>:	mov    0x28(%r15),%rax
   0x00000000000286ed <+221>:	cmpq   $0x0,0xc8(%rax)
   0x00000000000286f5 <+229>:	je     0x28701 <nfsd4_encode_read+241>
   0x00000000000286f7 <+231>:	cmpb   $0x0,(%rsp)
   0x00000000000286fb <+235>:	jne    0x2878a <nfsd4_encode_read+378>

3999			nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
   0x000000000002878a <+378>:	mov    %rdx,0x20(%rsp)
Comment 10 Chuck Lever 2022-11-14 21:43:31 UTC
(In reply to Jeff Layton from comment #8)
> Could this be fallout from the filecache handling bugs?

If the OP sees crashes after other calls to nfs4_preprocess_stateid_op(), then maybe.

That would be the result of 5e138c4a750d ("NFSD: NFSv4 CLOSE should release an nfsd_file immediately") ?
Comment 11 Alberto Boldrini 2022-11-14 21:53:04 UTC
I noticed on dmesg another crash that might be related.

[Mon Nov 14 15:37:46 2022] ------------[ cut here ]------------
[Mon Nov 14 15:37:46 2022] refcount_t: underflow; use-after-free.
[Mon Nov 14 15:37:46 2022] WARNING: CPU: 86 PID: 6050 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
[Mon Nov 14 15:37:46 2022] Modules linked in: 8021q garp mrp stp llc bonding tls cfg80211 rfkill rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd auth_rpcgss grace sunrpc wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel vfat fat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc32c_generic xt_tcpudp iptable_filter nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common nvidia(POE) i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel snd_hda_codec_hdmi polyval_clmulni polyval_generic snd_hda_intel gf128mul ghash_clmulni_intel snd_intel_dspcfg aesni_intel snd_intel_sdw_acpi crypto_simd cryptd snd_hda_codec rapl snd_hda_core intel_cstate snd_hwdep ixgbe snd_pcm ast rndis_host snd_timer drm_vram_helper mdio_devres libphy
[Mon Nov 14 15:37:46 2022]  drm_ttm_helper cdc_ether snd joydev mei_me isst_if_mmio i2c_i801 isst_if_mbox_pci spi_nor usbnet ioatdma mdio intel_uncore pcspkr soundcore acpi_ipmi ttm isst_if_common mousedev mtd intel_pch_thermal mei i2c_smbus intel_vsec mii dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid fuse bpf_preload ip_tables x_tables usbhid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) nvme nvme_core xhci_pci spi_intel_pci xhci_pci_renesas spi_intel nvme_common vmd
[Mon Nov 14 15:37:46 2022] CPU: 86 PID: 6050 Comm: nfsd Tainted: P           OE      6.0.6-arch1-1 #1 a46cc4b882cfc11c3bbb09d6a0fab3dcad53b5c2
[Mon Nov 14 15:37:46 2022] Hardware name: Supermicro SYS-220U-TNR/X12DPU-6, BIOS 1.4 07/12/2022
[Mon Nov 14 15:37:46 2022] RIP: 0010:refcount_warn_saturate+0xbe/0x110
[Mon Nov 14 15:37:46 2022] Code: 01 01 e8 f5 a7 62 00 0f 0b c3 cc cc cc cc 80 3d 85 64 85 01 00 75 85 48 c7 c7 50 a7 34 b8 c6 05 75 64 85 01 01 e8 d2 a7 62 00 <0f> 0b c3 cc cc cc cc 80 3d 60 64 85 01 00 0f 85 5e ff ff ff 48 c7
[Mon Nov 14 15:37:46 2022] RSP: 0018:ff72c0a85f2d3da8 EFLAGS: 00010286
[Mon Nov 14 15:37:46 2022] RAX: 0000000000000000 RBX: ff339cac91ebd300 RCX: 0000000000000027
[Mon Nov 14 15:37:46 2022] RDX: ff339d297ffa1668 RSI: 0000000000000001 RDI: ff339d297ffa1660
[Mon Nov 14 15:37:46 2022] RBP: ff339d2b95383300 R08: 0000000000000000 R09: ff72c0a85f2d3c30
[Mon Nov 14 15:37:46 2022] R10: 0000000000000003 R11: ff339da97ecfffe8 R12: ff339cac91ebd314
[Mon Nov 14 15:37:46 2022] R13: ff339cad046431a0 R14: ffffffffb9388e40 R15: ff339cac94de46a0
[Mon Nov 14 15:37:46 2022] FS:  0000000000000000(0000) GS:ff339d297ff80000(0000) knlGS:0000000000000000
[Mon Nov 14 15:37:46 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Nov 14 15:37:46 2022] CR2: 00007f0f94ccbf78 CR3: 000000298f210005 CR4: 0000000000771ee0
[Mon Nov 14 15:37:46 2022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Mon Nov 14 15:37:46 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Mon Nov 14 15:37:46 2022] PKRU: 55555554
[Mon Nov 14 15:37:46 2022] Call Trace:
[Mon Nov 14 15:37:46 2022]  <TASK>
[Mon Nov 14 15:37:46 2022]  destroy_unhashed_deleg+0xc2/0xd0 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Mon Nov 14 15:37:46 2022]  nfsd4_delegreturn+0x128/0x130 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Mon Nov 14 15:37:46 2022]  nfsd4_proc_compound+0x3a4/0x6f0 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Mon Nov 14 15:37:46 2022]  nfsd_dispatch+0x16b/0x280 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Mon Nov 14 15:37:46 2022]  svc_process_common+0x284/0x5e0 [sunrpc dda24f7ebb18486303e1518da1aed245d694a806]
[Mon Nov 14 15:37:46 2022]  ? nfsd_svc+0x370/0x370 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Mon Nov 14 15:37:46 2022]  ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Mon Nov 14 15:37:46 2022]  svc_process+0xb9/0xf0 [sunrpc dda24f7ebb18486303e1518da1aed245d694a806]
[Mon Nov 14 15:37:46 2022]  nfsd+0xd9/0x190 [nfsd 0d4f7161ec4af5d335a43572ddfe34915b30f27a]
[Mon Nov 14 15:37:46 2022]  kthread+0xdb/0x110
[Mon Nov 14 15:37:46 2022]  ? kthread_complete_and_exit+0x20/0x20
[Mon Nov 14 15:37:46 2022]  ret_from_fork+0x1f/0x30
[Mon Nov 14 15:37:46 2022]  </TASK>
[Mon Nov 14 15:37:46 2022] ---[ end trace 0000000000000000 ]---


Source code info:
destroy_unhashed_deleg+0xc2/0xd0:
put_deleg_file at https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4state.c#L1216
(inlined by) nfs4_unlock_deleg_lease at https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4state.c#L1227
(inlined by) destroy_unhashed_deleg at https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4state.c#L1233

nfsd4_delegreturn+0x128/0x130:
destroy_delegation at https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4state.c#L1333
(inlined by) nfsd4_delegreturn at https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4state.c#L6783

nfsd4_proc_compound+0x3a4/0x6f0:
nfsd4_proc_compound at https://github.com/archlinux/linux/blob/2c2ec5ffca63bbef24e10d4eb4756e1db2fac5f6/fs/nfsd/nfs4proc.c#L2708
Comment 12 Chuck Lever 2022-11-14 21:58:21 UTC
That's a familiar crash, but it's been known to happen before v6.0.

Can you confirm what was the last working kernel on your server?
Comment 13 Alberto Boldrini 2022-11-14 22:11:46 UTC
(In reply to Chuck Lever from comment #12)
> Can you confirm what was the last working kernel on your server?
I installed the system in these days for the first time, so I don't have an older working kernel. 

I had a similar setup working for years on an another server with kernel 5.12.15 and ext4 instead of zfs.  

In the next days I will try the version 5.15.78 (because is supported by Arch as "linux-lts") to see if the problem disappears.
Comment 14 Chuck Lever 2022-11-14 22:21:19 UTC
(In reply to Alberto Boldrini from comment #13)
> (In reply to Chuck Lever from comment #12)
> > Can you confirm what was the last working kernel on your server?
> I installed the system in these days for the first time, so I don't have an
> older working kernel.

Though both of the reported traces are likely related to the same issue, I suspect this is not specific to v6.0.y.

> I had a similar setup working for years on an another server with kernel
> 5.12.15 and ext4 instead of zfs.

OK. Just to note that ZFS is considered unsupported by the upstream Linux community, if that makes a difference to you. I will try to answer error reports that target ZFS with NFSD, but as I said, I don't have it running here and don't have access to source code. Btrfs has similar functionality and is supported, fwiw.

That said, I don't see anything that suggests that ZFS is causing a problem here.

> In the next days I will try the version 5.15.78 (because is supported by
> Arch as "linux-lts") to see if the problem disappears.

I believe we have seen reports of the "destroy_unhashed_deleg" crash on that kernel. But YMMV.
Comment 15 Chuck Lever 2024-01-16 14:28:44 UTC
The v6.2 and later kernels likely have fixes that address this issue. Please re-open if you encounter this problem again.

Note You need to log in before you can comment on or make changes to this bug.