Bug 219580 - kernel BUG at fs/nfsd/nfs4recover.c:534 Oops: invalid opcode: 0000
Summary: kernel BUG at fs/nfsd/nfs4recover.c:534 Oops: invalid opcode: 0000
Status: RESOLVED ANSWERED
Alias: None
Product: File System
Classification: Unclassified
Component: NFSD (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: Filesystem/NFSD virtual assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-09 11:53 UTC by Jur van der Burg
Modified: 2024-12-09 16:47 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Jur van der Burg 2024-12-09 11:53:10 UTC
I've got an instant crash when starting the nfs server on kernel 6.12.2. Previous kernels (at least up to 6.6.56) were fine:

Dec  9 12:51:15 hostx kernel: [   66.833082] RPC: Registered named UNIX socket transport module.
Dec  9 12:51:15 hostx kernel: [   66.833088] RPC: Registered udp transport module.
Dec  9 12:51:15 hostx kernel: [   66.833089] RPC: Registered tcp transport module.
Dec  9 12:51:15 hostx kernel: [   66.833090] RPC: Registered tcp-with-tls transport module.
Dec  9 12:51:15 hostx kernel: [   66.833091] RPC: Registered tcp NFSv4.1 backchannel transport module.
Dec  9 12:51:17 hostx kernel: [   68.010707] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Dec  9 12:51:17 hostx kernel: [   68.010740] NFSD: Using legacy client tracking operations.
Dec  9 12:51:17 hostx kernel: [   68.010744] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Dec  9 12:51:17 hostx kernel: [   68.010790] ------------[ cut here ]------------
Dec  9 12:51:17 hostx kernel: [   68.010792] kernel BUG at fs/nfsd/nfs4recover.c:534!
Dec  9 12:51:17 hostx kernel: [   68.010808] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Dec  9 12:51:17 hostx kernel: [   68.011526] CPU: 3 UID: 0 PID: 5020 Comm: rpc.nfsd Not tainted 6.12.2-0.1-vtserver #1
Dec  9 12:51:17 hostx kernel: [   68.011921] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Dec  9 12:51:17 hostx kernel: [   68.012320] RIP: 0010:nfsd4_legacy_tracking_init+0x20b/0x260 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.012843] Code: 48 8b 1c d8 e8 c6 e1 68 e0 48 8b bb 30 01 00 00 48 85 ff 74 10 e8 c5 81 8f e0 48 c7 83 30 01 00 00 00 00 00 00 44 89 e3 eb 8b <0f> 0b 48 c7 c6 80 97 ae a0 48 c7 c7 b8 de b3 a0 e8 70 62 66 e0 8b
Dec  9 12:51:17 hostx kernel: [   68.013647] RSP: 0018:ffffc900009bbce8 EFLAGS: 00010282
Dec  9 12:51:17 hostx kernel: [   68.014043] RAX: 0000000000000049 RBX: 000000000000000b RCX: 0000000000000000
Dec  9 12:51:17 hostx kernel: [   68.014437] RDX: 0000000000000000 RSI: ffffffff824792fb RDI: 00000000ffffffff
Dec  9 12:51:17 hostx kernel: [   68.014825] RBP: ffff888115dda000 R08: 00000000ffff7fff R09: 0000000000000058
Dec  9 12:51:17 hostx kernel: [   68.015214] R10: 00000000ffff7fff R11: ffffffff82852f00 R12: ffff888115dda000
Dec  9 12:51:17 hostx kernel: [   68.015602] R13: ffff888115dda000 R14: ffff888115dda000 R15: ffff8881172b9e40
Dec  9 12:51:17 hostx kernel: [   68.015989] FS:  00007f99b2cb1740(0000) GS:ffff888237cc0000(0000) knlGS:0000000000000000
Dec  9 12:51:17 hostx kernel: [   68.016383] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 12:51:17 hostx kernel: [   68.016791] CR2: 00007f0f31eef190 CR3: 000000013f422003 CR4: 00000000001706f0
Dec  9 12:51:17 hostx kernel: [   68.017237] Call Trace:
Dec  9 12:51:17 hostx kernel: [   68.017640]  <TASK>
Dec  9 12:51:17 hostx kernel: [   68.018024]  ? die+0x43/0xb0
Dec  9 12:51:17 hostx kernel: [   68.018415]  ? do_trap+0x119/0x150
Dec  9 12:51:17 hostx kernel: [   68.018798]  ? nfsd4_legacy_tracking_init+0x20b/0x260 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.019294]  ? do_error_trap+0x87/0xc0
Dec  9 12:51:17 hostx kernel: [   68.019692]  ? nfsd4_legacy_tracking_init+0x20b/0x260 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.020179]  ? exc_invalid_op+0x53/0x70
Dec  9 12:51:17 hostx kernel: [   68.020560]  ? nfsd4_legacy_tracking_init+0x20b/0x260 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.021031]  ? asm_exc_invalid_op+0x16/0x20
Dec  9 12:51:17 hostx kernel: [   68.021410]  ? nfsd4_legacy_tracking_init+0x20b/0x260 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.021894]  ? nfsd4_legacy_tracking_init+0xc4/0x260 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.022367]  nfsd4_client_tracking_init+0x1a5/0x1e0 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.022862]  nfs4_state_start_net+0x2d1/0x440 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.023347]  nfsd_svc+0x1c5/0x330 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.023827]  ? simple_strntoull+0xa8/0xc0
Dec  9 12:51:17 hostx kernel: [   68.024203]  write_threads+0xc8/0x1a0 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.024684]  ? preempt_count_add+0x69/0xa0
Dec  9 12:51:17 hostx kernel: [   68.025065]  ? _copy_from_user+0x25/0x60
Dec  9 12:51:17 hostx kernel: [   68.025449]  ? _raw_spin_unlock+0x15/0x30
Dec  9 12:51:17 hostx kernel: [   68.025850]  ? simple_transaction_get+0xcb/0xe0
Dec  9 12:51:17 hostx kernel: [   68.026233]  ? __pfx_write_threads+0x10/0x10 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.026724]  nfsctl_transaction_write+0x51/0xa0 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.027219]  vfs_write+0x136/0x480
Dec  9 12:51:17 hostx kernel: [   68.027610]  ksys_write+0x71/0x100
Dec  9 12:51:17 hostx kernel: [   68.027990]  do_syscall_64+0x4b/0x110
Dec  9 12:51:17 hostx kernel: [   68.028371]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Dec  9 12:51:17 hostx kernel: [   68.028779] RIP: 0033:0x7f99b2679be4
Dec  9 12:51:17 hostx kernel: [   68.029162] Code: 84 00 00 00 00 00 48 8b 05 09 84 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 0a c8 20 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
Dec  9 12:51:17 hostx kernel: [   68.029994] RSP: 002b:00007ffdd307a0d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Dec  9 12:51:17 hostx kernel: [   68.030427] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f99b2679be4
Dec  9 12:51:17 hostx kernel: [   68.030857] RDX: 0000000000000002 RSI: 0000000000608600 RDI: 0000000000000003
Dec  9 12:51:17 hostx kernel: [   68.031291] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
Dec  9 12:51:17 hostx kernel: [   68.031748] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Dec  9 12:51:17 hostx kernel: [   68.032181] R13: 0000000000000001 R14: 0000000000000000 R15: 00007ffdd307bde6
Dec  9 12:51:17 hostx kernel: [   68.032610]  </TASK>
Dec  9 12:51:17 hostx kernel: [   68.033015] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi target_core_mod usbip_host vhci_hcd usbip_core vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock veth bridge stp llc crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vmwgfx drm_ttm_helper ttm drm_kms_helper drm pata_acpi cpuspeed vmw_balloon uhci_hcd sr_mod psmouse e1000e serio_raw pcspkr vmxnet3 ehci_pci e1000 vmw_vmci cdrom ehci_hcd vmw_pvscsi i2c_piix4 i2c_smbus ac button usbhid xhci_pci xhci_hcd usbcore usb_common dm_multipath dm_mod dax nvme nvme_core virtio_net net_failover failover virtio_scsi virtio_blk virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring virtio ata_generic pata_atiixp fan thermal
Dec  9 12:51:17 hostx kernel: [   68.035844] ---[ end trace 0000000000000000 ]---
Dec  9 12:51:17 hostx kernel: [   68.036353] RIP: 0010:nfsd4_legacy_tracking_init+0x20b/0x260 [nfsd]
Dec  9 12:51:17 hostx kernel: [   68.036946] Code: 48 8b 1c d8 e8 c6 e1 68 e0 48 8b bb 30 01 00 00 48 85 ff 74 10 e8 c5 81 8f e0 48 c7 83 30 01 00 00 00 00 00 00 44 89 e3 eb 8b <0f> 0b 48 c7 c6 80 97 ae a0 48 c7 c7 b8 de b3 a0 e8 70 62 66 e0 8b
Dec  9 12:51:17 hostx kernel: [   68.037974] RSP: 0018:ffffc900009bbce8 EFLAGS: 00010282
Dec  9 12:51:17 hostx kernel: [   68.038509] RAX: 0000000000000049 RBX: 000000000000000b RCX: 0000000000000000
Dec  9 12:51:17 hostx kernel: [   68.039021] RDX: 0000000000000000 RSI: ffffffff824792fb RDI: 00000000ffffffff
Dec  9 12:51:17 hostx kernel: [   68.039562] RBP: ffff888115dda000 R08: 00000000ffff7fff R09: 0000000000000058
Dec  9 12:51:17 hostx kernel: [   68.040074] R10: 00000000ffff7fff R11: ffffffff82852f00 R12: ffff888115dda000
Dec  9 12:51:17 hostx kernel: [   68.040618] R13: ffff888115dda000 R14: ffff888115dda000 R15: ffff8881172b9e40
Dec  9 12:51:17 hostx kernel: [   68.041135] FS:  00007f99b2cb1740(0000) GS:ffff888237cc0000(0000) knlGS:0000000000000000
Dec  9 12:51:17 hostx kernel: [   68.041692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 12:51:17 hostx kernel: [   68.042230] CR2: 00007f0f31eef190 CR3: 000000013f422003 CR4: 00000000001706f0

I found one reference to the same issue but no solution:

https://www.mail-archive.com/debian-kernel@lists.debian.org/msg139065.html

This is 100% reproducable at will.
Comment 1 Chuck Lever 2024-12-09 14:44:48 UTC
The Debian folks have been seeing this issue since at least 6.11.9, so it was introduced well before 6.12.2. Start by bisecting the Linus branch (not stable) to see which commit introduced this issue.
Comment 2 Jur van der Burg 2024-12-09 16:47:06 UTC
I tried kernel 6.10.1 and that one is ok. In the mean time I upgraded nfs-utils from 2.5.1 to 2.8.1 which seems to fix the issue. Sorry for the noise, case closed.

Note You need to log in before you can comment on or make changes to this bug.