Bug 216582 - BUG: kernel NULL pointer dereference - nlmclnt_setlockargs
Summary: BUG: kernel NULL pointer dereference - nlmclnt_setlockargs
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Chuck Lever
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-13 22:04 UTC by Daire Byrne
Modified: 2023-03-09 15:06 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.17+
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Daire Byrne 2022-10-13 22:04:19 UTC
Hi,

I've started seeing this crash at least once or twice a week with our NFS re-export workloads (re-exporting a Linux NFsv3 server as NFSv3).

We have been stepping through kernel versions a bit on the server recently so it feels like something new introduced somewhere around v5.17 but I also can't rule out that our clients are doing something "different" with their workloads to stress this code in some new way. It still occurs in v6.0 too.


[106412.314663] BUG: kernel NULL pointer dereference, address: 0000000000000020
[106412.321879] #PF: supervisor read access in kernel mode
[106412.327237] #PF: error_code(0x0000) - not-present page
[106412.332599] PGD 0 P4D 0 
[106412.335353] Oops: 0000 [#1] PREEMPT SMP NOPTI
[106412.339935] CPU: 34 PID: 2382 Comm: lockd Tainted: G            E     5.18.10-1.dneg.x86_64 #1
[106412.348773] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
[106412.358223] RIP: 0010:nlmclnt_setlockargs+0x4a/0x100 [lockd]
[106412.364116] Code: 00 00 49 81 c0 88 00 00 00 f0 0f c1 05 bf 06 01 00 83 c0 01 c7 47 30 04 00 00 00 48 8d 4f 44 48 8d 7f 4c 89 47 c4 48 8b 46 78 <48> 8b 40 20 48 8b 90 60 fe ff ff 48 8d b0 60 fe ff ff 48 89 57 f8
[106412.383117] RSP: 0018:ffffb3db50cdfa80 EFLAGS: 00010202
[106412.388569] RAX: 0000000000000000 RBX: ffff8a36749c9400 RCX: ffff8a36749c9444
[106412.395924] RDX: ffff8a37f8696300 RSI: ffffb3db50cdfbd8 RDI: ffff8a36749c944c
[106412.403277] RBP: ffffb3db50cdfa90 R08: ffff8a750b49bc88 R09: ffff8a37f8696300
[106412.410634] R10: 0000000000000230 R11: ffffffffffffffff R12: ffffb3db50cdfbd8
[106412.417984] R13: ffff8a7508beac00 R14: ffffb3db50cdfca0 R15: ffffb3db50cdfbd8
[106412.425338] FS:  0000000000000000(0000) GS:ffff8a73ffa80000(0000) knlGS:0000000000000000
[106412.433649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[106412.439611] CR2: 0000000000000020 CR3: 00000001118e6006 CR4: 00000000003706e0
[106412.446984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[106412.454346] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[106412.461696] Call Trace:
[106412.464361]  <TASK>
[106412.466689]  nlmclnt_proc+0x1c6/0x5b0 [lockd]
[106412.471272]  nfs3_proc_lock+0x33/0xb0 [nfsv3]
[106412.475848]  ? nfs_put_lock_context+0x86/0x90 [nfs]
[106412.481008]  do_unlk+0x8f/0xd0 [nfs]
[106412.484837]  nfs_lock+0xcd/0x180 [nfs]
[106412.488815]  ? nlmsvc_mark_host+0x30/0x30 [lockd]
[106412.493752]  vfs_lock_file+0x1e/0x40
[106412.497547]  nlm_unlock_files.isra.0+0x6d/0xc0 [lockd]
[106412.502905]  nlm_traverse_files+0x163/0x2a0 [lockd]
[106412.508020]  nlmsvc_free_host_resources+0x2b/0x40 [lockd]
[106412.513648]  nlm_host_rebooted+0x2c/0x90 [lockd]
[106412.518483]  nlmsvc_proc_sm_notify+0xc0/0x130 [lockd]
[106412.523759]  ? nlmsvc_decode_reboot+0x7d/0xa0 [lockd]
[106412.529027]  nlmsvc_dispatch+0x8e/0x1a0 [lockd]
[106412.534312]  svc_process_common+0x484/0x620 [sunrpc]
[106412.539521]  ? lockd+0x1d0/0x1d0 [lockd]
[106412.543661]  ? set_grace_period+0xa0/0xa0 [lockd]
[106412.548582]  svc_process+0xbc/0xf0 [sunrpc]
[106412.553008]  lockd+0xd2/0x1d0 [lockd]
[106412.556906]  ? set_grace_period+0xa0/0xa0 [lockd]
[106412.561849]  kthread+0xee/0x120
[106412.565228]  ? kthread_complete_and_exit+0x20/0x20
[106412.570239]  ret_from_fork+0x1f/0x30
[106412.574033]  </TASK>
[106412.576436] Modules linked in: tcp_diag(E) inet_diag(E) nfsv3(E) nfs(E) cachefiles(E) fscache(E) netfs(E) ext4(E) mbcache(E) jbd2(E) intel_uncore_frequency_common(E) isst_if_common(E) sg(E) nfit(E) virtio_rng(E) rapl(E) i2c_piix4(E) input_leds(E) nfsd(E) sch_fq(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) tcp_bbr(E) binfmt_misc(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) crc64(E) crct10dif_pclmul(E) crc32_pclmul(E) virtio_scsi(E) crc32c_intel(E) ghash_clmulni_intel(E) 8021q(E) garp(E) mrp(E) virtio_pci(E) scsi_transport_iscsi(E) virtio_pci_legacy_dev(E) aesni_intel(E) virtio_pci_modern_dev(E) crypto_simd(E) virtio_ring(E) cryptd(E) gve(E) serio_raw(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) fuse(E)
[106412.646242] CR2: 0000000000000020
[106412.649780] ---[ end trace 0000000000000000 ]---
[106412.654617] RIP: 0010:nlmclnt_setlockargs+0x4a/0x100 [lockd]
[106412.660495] Code: 00 00 49 81 c0 88 00 00 00 f0 0f c1 05 bf 06 01 00 83 c0 01 c7 47 30 04 00 00 00 48 8d 4f 44 48 8d 7f 4c 89 47 c4 48 8b 46 78 <48> 8b 40 20 48 8b 90 60 fe ff ff 48 8d b0 60 fe ff ff 48 89 57 f8
[106412.679481] RSP: 0018:ffffb3db50cdfa80 EFLAGS: 00010202
[106412.684922] RAX: 0000000000000000 RBX: ffff8a36749c9400 RCX: ffff8a36749c9444
[106412.692269] RDX: ffff8a37f8696300 RSI: ffffb3db50cdfbd8 RDI: ffff8a36749c944c
[106412.699617] RBP: ffffb3db50cdfa90 R08: ffff8a750b49bc88 R09: ffff8a37f8696300
[106412.706969] R10: 0000000000000230 R11: ffffffffffffffff R12: ffffb3db50cdfbd8
[106412.714329] R13: ffff8a7508beac00 R14: ffffb3db50cdfca0 R15: ffffb3db50cdfbd8
[106412.721676] FS:  0000000000000000(0000) GS:ffff8a73ffa80000(0000) knlGS:0000000000000000
[106412.729981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[106412.736472] CR2: 0000000000000020 CR3: 00000001118e6006 CR4: 00000000003706e0
[106412.743821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[106412.751171] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[106412.758520] Kernel panic - not syncing: Fatal exception
[106412.764850] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[106412.775850] ---[ end Kernel panic - not syncing: Fatal exception ]---


All I know is that I didn't notice this crash from v5.12 to v5.16 but I have not been able to test this qualitatively yet. The crash is rare enough that it makes A/B testing quite tricky.

It's somewhat similar to https://bugzilla.kernel.org/show_bug.cgi?id=213273 but that was for a NFv4.2 re-export of NFSv3 and this is for a NFSv3 re-export of NFSv3 (for WAN caching).

We are using nfs-utils-2.5.4.

Daire
Comment 1 Daire Byrne 2022-10-20 14:41:43 UTC
So, I have slowly been working through previous kernels and I have been able to reproduce this crash all the way back to v5.16.

I have been running v5.15.3 for a few days now and haven't reproduced it yet. It's not yet certain, but a few more days and I'll go back to retesting v5.16 to make sure I can reproduce again.

I couldn't really see anything suspicious in the changes between v5.15 and v5.16 that could be related to this...

Daire
Comment 2 a1bert 2022-11-06 15:26:48 UTC
is this the same bug (started to occure after upgrade from 5.19.11 to 6.0.6,6.0.7

[112807.471573] BUG: kernel NULL pointer dereference, address: 00000000000000d0
[112807.471590] #PF: supervisor read access in kernel mode
[112807.471595] #PF: error_code(0x0000) - not-present page
[112807.471599] PGD 0 P4D 0 
[112807.471605] Oops: 0000 [#1] PREEMPT SMP NOPTI
[112807.471611] CPU: 1 PID: 2156 Comm: nfsd Tainted: G     U  W          6.0.7-060007-generic #202211031652
[112807.471618] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4105M, BIOS L1.99 12/31/2019
[112807.471622] RIP: 0010:vfs_setlease+0x2b/0x90
[112807.471633] Code: 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 49 8b 45 28 48 89 da 4c 89 ef <48> 8b 80 d0 00 00 00 48 85 c0 74 3b ff d0 0f 1f 00 48 83 c4 10 5b
[112807.471641] RSP: 0018:ffffb275c2313c90 EFLAGS: 00010246
[112807.471646] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb275c2313cc0
[112807.471650] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffa0b20539faa8
[112807.471654] RBP: ffffb275c2313cb0 R08: 0000000000000000 R09: 0000000000000000
[112807.471659] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b20aa80ab0
[112807.471663] R13: ffffa0b20539faa8 R14: ffffa0b216def5e0 R15: ffffa0b2140d0c00
[112807.471668] FS:  0000000000000000(0000) GS:ffffa0b56fc80000(0000) knlGS:0000000000000000
[112807.471673] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[112807.471677] CR2: 00000000000000d0 CR3: 00000002f2e10000 CR4: 0000000000352ee0
[112807.471682] Call Trace:
[112807.471686]  <TASK>
[112807.471694]  ? nfsd4_cld_remove+0x18a/0x200 [nfsd]
[112807.471757]  destroy_unhashed_deleg+0x6d/0x100 [nfsd]
[112807.471803]  __destroy_client+0xdd/0x250 [nfsd]
[112807.471846]  expire_client+0x63/0x80 [nfsd]
[112807.471887]  nfsd4_create_session+0x8d6/0xaf0 [nfsd]
[112807.471929]  nfsd4_proc_compound+0x3b5/0x770 [nfsd]
[112807.472001]  nfsd_dispatch+0x174/0x2a0 [nfsd]
[112807.472076]  svc_process_common+0x2a8/0x640 [sunrpc]
[112807.472204]  ? svc_handle_xprt+0x18d/0x370 [sunrpc]
[112807.472269]  ? nfsd_svc+0x1b0/0x1b0 [nfsd]
[112807.472311]  ? nfsd_shutdown_threads+0xb0/0xb0 [nfsd]
[112807.472352]  svc_process+0xba/0x110 [sunrpc]
[112807.472413]  nfsd+0xdc/0x1b0 [nfsd]
[112807.472453]  kthread+0xe6/0x110
[112807.472463]  ? kthread_complete_and_exit+0x20/0x20
[112807.472470]  ret_from_fork+0x1f/0x30
[112807.472478]  </TASK>
[112807.472480] Modules linked in: rpcsec_gss_krb5 tls xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype nft_compat veth vhost_net vhost vhost_iotlb tap wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel lz4 lz4_compress z3fold bridge snd_sof_pci_intel_apl snd_sof_intel_hda_common nls_iso8859_1 soundwire_intel soundwire_generic_allocation soundwire_cadence xfs snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_bus ir_nec_decoder rc_astrometa_t2hybrid snd_soc_avs snd_soc_hda_codec snd_soc_skl snd_soc_hdac_hda rtl2832_sdr snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_soc_acpi_intel_match videobuf2_common snd_soc_acpi videodev snd_soc_core snd_compress r820t snd_hda_codec_hdmi intel_pmc_bxt mn88473 ac97_bus intel_telemetry_pltdrv rtl2832 intel_punit_ipc ir_rc5_decoder snd_hda_codec_realtek intel_telemetry_core
[112807.472596]  snd_pcm_dmaengine snd_hda_codec_generic si2157 x86_pkg_temp_thermal intel_powerclamp ledtrig_audio rc_dvbsky kvm_intel snd_hda_intel si2168 snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec mei_pxp mei_hdcp ppdev snd_hda_core dvb_usb_rtl28xxu ts2020 kvm snd_hwdep intel_rapl_msr dvb_usb_dvbsky dvb_usb_v2 snd_pcm smipcie snd_timer m88ds3103 i2c_mux ch341 snd usbserial dvb_core processor_thermal_device_pci_legacy processor_thermal_device processor_thermal_rfim mc rapl input_leds intel_cstate soundcore processor_thermal_mbox mei_me processor_thermal_rapl serio_raw ee1004 intel_rapl_common mei intel_soc_dts_iosf parport_pc int3400_thermal int3403_thermal int340x_thermal_zone int3406_thermal mac_hid dptf_power parport acpi_thermal_rel nft_nat nft_masq nft_chain_nat nf_nat nf_log_syslog nft_log sch_fq_codel nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc nf_tables nct6775 nct6775_core nfsd dm_multipath hwmon_vid nfnetlink wmi scsi_dh_rdac scsi_dh_emc
[112807.472810]  scsi_dh_alua coretemp auth_rpcgss nfs_acl lockd grace sunrpc ramoops reed_solomon efi_pstore pstore_blk pstore_zone ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 multipath linear raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 i915 i2c_algo_bit drm_buddy ttm drm_display_helper crct10dif_pclmul crc32_pclmul cec rc_core drm_kms_helper syscopyarea sysfillrect polyval_generic sysimgblt fb_sys_fops i2c_i801 ghash_clmulni_intel aesni_intel crypto_simd cryptd i2c_smbus drm r8169 xhci_pci ahci xhci_pci_renesas realtek libahci video pinctrl_geminilake
[112807.472925] CR2: 00000000000000d0
[112807.472930] ---[ end trace 0000000000000000 ]---
[112807.605087] RIP: 0010:vfs_setlease+0x2b/0x90
[112807.605107] Code: 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 49 8b 45 28 48 89 da 4c 89 ef <48> 8b 80 d0 00 00 00 48 85 c0 74 3b ff d0 0f 1f 00 48 83 c4 10 5b
[112807.605116] RSP: 0018:ffffb275c2313c90 EFLAGS: 00010246
[112807.605121] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb275c2313cc0
[112807.605126] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffa0b20539faa8
[112807.605130] RBP: ffffb275c2313cb0 R08: 0000000000000000 R09: 0000000000000000
[112807.605135] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b20aa80ab0
[112807.605139] R13: ffffa0b20539faa8 R14: ffffa0b216def5e0 R15: ffffa0b2140d0c00
[112807.605144] FS:  0000000000000000(0000) GS:ffffa0b56fc80000(0000) knlGS:0000000000000000
[112807.605149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[112807.605153] CR2: 00000000000000d0 CR3: 0000000106344000 CR4: 0000000000352ee0
Comment 3 a1bert 2022-11-08 06:59:03 UTC
another day, another mount attempt, the same outcome

[263384.355406] BUG: kernel NULL pointer dereference, address: 0000000000000028
[263384.355422] #PF: supervisor read access in kernel mode
[263384.355427] #PF: error_code(0x0000) - not-present page
[263384.355431] PGD 0 P4D 0 
[263384.355437] Oops: 0000 [#1] PREEMPT SMP NOPTI
[263384.355443] CPU: 3 PID: 2226 Comm: nfsd Tainted: G     U             6.0.7-060007-generic #202211031652
[263384.355449] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4105M, BIOS L1.99 12/31/2019
[263384.355454] RIP: 0010:vfs_setlease+0x21/0x90
[263384.355464] Code: 19 fe ff ff e8 c0 6e a5 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 <49> 8b 45 28 48 89 da 4c 89 ef 48 8b 80 d0 00 00 00 48 85 c0 74 3b
[263384.355472] RSP: 0018:ffff9668424e7c90 EFLAGS: 00010246
[263384.355477] RAX: ffff8a17ca7da8f0 RBX: 0000000000000000 RCX: ffff9668424e7cc0
[263384.355481] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
[263384.355485] RBP: ffff9668424e7cb0 R08: 0000000000000000 R09: 0000000000000000
[263384.355490] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a18f1704ab0
[263384.355494] R13: 0000000000000000 R14: ffff8a17d8bb6160 R15: ffff8a17dd2de000
[263384.355499] FS:  0000000000000000(0000) GS:ffff8a1b2fd80000(0000) knlGS:0000000000000000
[263384.355504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[263384.355507] CR2: 0000000000000028 CR3: 00000002e5210000 CR4: 0000000000352ee0
[263384.355513] Call Trace:
[263384.355517]  <TASK>
[263384.355526]  ? nfsd4_cld_remove+0x18a/0x200 [nfsd]
[263384.355582]  destroy_unhashed_deleg+0x6d/0x100 [nfsd]
[263384.355630]  __destroy_client+0xdd/0x250 [nfsd]
[263384.355672]  expire_client+0x63/0x80 [nfsd]
[263384.355712]  nfsd4_create_session+0x8d6/0xaf0 [nfsd]
[263384.355754]  nfsd4_proc_compound+0x3b5/0x770 [nfsd]
[263384.355798]  nfsd_dispatch+0x174/0x2a0 [nfsd]
[263384.355841]  svc_process_common+0x2a8/0x640 [sunrpc]
[263384.355915]  ? svc_handle_xprt+0x18d/0x370 [sunrpc]
[263384.355980]  ? nfsd_svc+0x1b0/0x1b0 [nfsd]
[263384.356022]  ? nfsd_shutdown_threads+0xb0/0xb0 [nfsd]
[263384.356062]  svc_process+0xba/0x110 [sunrpc]
[263384.356124]  nfsd+0xdc/0x1b0 [nfsd]
[263384.356165]  kthread+0xe6/0x110
[263384.356174]  ? kthread_complete_and_exit+0x20/0x20
[263384.356181]  ret_from_fork+0x1f/0x30
[263384.356189]  </TASK>
[263384.356192] Modules linked in: bluetooth ecdh_generic ecc msr rpcsec_gss_krb5 tls xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype nft_compat veth vhost_net vhost vhost_iotlb tap wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel lz4 lz4_compress z3fold bridge xfs snd_sof_pci_intel_apl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp ir_nec_decoder nls_iso8859_1 rc_astrometa_t2hybrid snd_sof rtl2832_sdr snd_sof_utils videobuf2_vmalloc soundwire_bus videobuf2_memops videobuf2_v4l2 videobuf2_common snd_soc_avs snd_soc_hda_codec videodev snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc r820t snd_soc_sst_dsp snd_hda_codec_hdmi snd_soc_acpi_intel_match mn88473 ir_rc5_decoder rtl2832 snd_hda_codec_realtek rc_dvbsky intel_pmc_bxt snd_soc_acpi intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core
[263384.356279]  si2157 snd_soc_core x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp snd_compress si2168 kvm_intel ac97_bus ppdev ts2020 snd_pcm_dmaengine ledtrig_audio kvm snd_hda_intel intel_rapl_msr snd_intel_dspcfg mei_hdcp mei_pxp snd_intel_sdw_acpi dvb_usb_rtl28xxu dvb_usb_dvbsky smipcie snd_hda_codec dvb_usb_v2 snd_hda_core m88ds3103 snd_hwdep ch341 dvb_core i2c_mux snd_pcm usbserial processor_thermal_device_pci_legacy mc processor_thermal_device rapl processor_thermal_rfim intel_cstate parport_pc input_leds processor_thermal_mbox processor_thermal_rapl snd_timer mei_me int3406_thermal intel_rapl_common snd mei dptf_power intel_soc_dts_iosf serio_raw soundcore ee1004 int3403_thermal mac_hid int3400_thermal parport int340x_thermal_zone acpi_thermal_rel nft_nat nft_masq nft_chain_nat nf_nat nf_log_syslog nft_log nft_ct nf_conntrack nf_defrag_ipv6 sch_fq_codel nf_defrag_ipv4 8021q nf_tables garp dm_multipath mrp stp llc nfnetlink nct6775 nfsd scsi_dh_rdac nct6775_core
[263384.356399]  scsi_dh_emc scsi_dh_alua auth_rpcgss hwmon_vid wmi nfs_acl lockd coretemp grace ramoops pstore_blk reed_solomon efi_pstore sunrpc pstore_zone ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 multipath linear raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit drm_buddy ttm drm_display_helper polyval_generic ghash_clmulni_intel cec r8169 aesni_intel rc_core drm_kms_helper syscopyarea sysfillrect crypto_simd sysimgblt fb_sys_fops ahci xhci_pci drm i2c_i801 cryptd xhci_pci_renesas realtek i2c_smbus libahci video pinctrl_geminilake
[263384.356511] CR2: 0000000000000028
[263384.356516] ---[ end trace 0000000000000000 ]---
[263384.548316] RIP: 0010:vfs_setlease+0x21/0x90
[263384.548353] Code: 19 fe ff ff e8 c0 6e a5 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 <49> 8b 45 28 48 89 da 4c 89 ef 48 8b 80 d0 00 00 00 48 85 c0 74 3b
[263384.548375] RSP: 0018:ffff9668424e7c90 EFLAGS: 00010246
[263384.548390] RAX: ffff8a17ca7da8f0 RBX: 0000000000000000 RCX: ffff9668424e7cc0
[263384.548402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
[263384.548413] RBP: ffff9668424e7cb0 R08: 0000000000000000 R09: 0000000000000000
[263384.548425] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a18f1704ab0
[263384.548436] R13: 0000000000000000 R14: ffff8a17d8bb6160 R15: ffff8a17dd2de000
[263384.548448] FS:  0000000000000000(0000) GS:ffff8a1b2fd80000(0000) knlGS:0000000000000000
[263384.548462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[263384.548472] CR2: 0000000000000028 CR3: 000000012017c000 CR4: 0000000000352ee0
Comment 4 Chuck Lever 2022-11-08 19:07:16 UTC
(In reply to a1bert from comment #2)
> is this the same bug (started to occure after upgrade from 5.19.11 to
> 6.0.6,6.0.7

This is not the same bug at all. Please open a new bug, thanks!
Comment 5 Chuck Lever 2022-11-08 20:10:10 UTC
(In reply to Daire Byrne from comment #1)
> So, I have slowly been working through previous kernels and I have been able
> to reproduce this crash all the way back to v5.16.
> 
> I have been running v5.15.3 for a few days now and haven't reproduced it
> yet. It's not yet certain, but a few more days and I'll go back to retesting
> v5.16 to make sure I can reproduce again.
> 
> I couldn't really see anything suspicious in the changes between v5.15 and
> v5.16 that could be related to this...

We have a putative fix for this one.

https://lore.kernel.org/linux-nfs/07EF2FD9-C49E-4414-B05B-83CE9D3AD6C3@hammerspace.com/T/#m64ac4fa33b122ec9607c7f949a1b8caccb97533f

Daire, please try this.

The fix is currently queued for v6.2. This bug is marked P1. Does that mean you want us to consider the fix for v6.1-rc ?
Comment 6 Daire Byrne 2022-11-08 20:42:19 UTC
Thanks both. I saw the post on linux-nfs and applied Trond's patch to v6.1-rc4 today and put it into production.

It'll probably take a week or so to verify that it fixes our issue. There is probably a much quicker reproducer we could devise...

As for the P1, I didn't change that but it might have been updated because of the possibility it was a regression?

I'm personally fine to wait until v6.2.

Cheers.
Comment 7 Daire Byrne 2022-11-08 20:43:06 UTC
Thanks both. I saw the post on linux-nfs and applied Trond's patch to v6.1-rc4 today and put it into production.

It'll probably take a week or so to verify that it fixes our issue. There is probably a much quicker reproducer we could devise...

As for the P1, I didn't change that but it might have been updated because of the possibility it was a regression?

I'm personally fine to wait until v6.2.

Cheers.
Comment 8 Daire Byrne 2022-12-07 16:44:22 UTC
Just to follow up - our re-export servers have been rock solid with this patch. We went from a crash every 2 days, to none in a month so far.

Cheers,

Daire

Note You need to log in before you can comment on or make changes to this bug.