Hi, I've started seeing this crash at least once or twice a week with our NFS re-export workloads (re-exporting a Linux NFsv3 server as NFSv3). We have been stepping through kernel versions a bit on the server recently so it feels like something new introduced somewhere around v5.17 but I also can't rule out that our clients are doing something "different" with their workloads to stress this code in some new way. It still occurs in v6.0 too. [106412.314663] BUG: kernel NULL pointer dereference, address: 0000000000000020 [106412.321879] #PF: supervisor read access in kernel mode [106412.327237] #PF: error_code(0x0000) - not-present page [106412.332599] PGD 0 P4D 0 [106412.335353] Oops: 0000 [#1] PREEMPT SMP NOPTI [106412.339935] CPU: 34 PID: 2382 Comm: lockd Tainted: G E 5.18.10-1.dneg.x86_64 #1 [106412.348773] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 [106412.358223] RIP: 0010:nlmclnt_setlockargs+0x4a/0x100 [lockd] [106412.364116] Code: 00 00 49 81 c0 88 00 00 00 f0 0f c1 05 bf 06 01 00 83 c0 01 c7 47 30 04 00 00 00 48 8d 4f 44 48 8d 7f 4c 89 47 c4 48 8b 46 78 <48> 8b 40 20 48 8b 90 60 fe ff ff 48 8d b0 60 fe ff ff 48 89 57 f8 [106412.383117] RSP: 0018:ffffb3db50cdfa80 EFLAGS: 00010202 [106412.388569] RAX: 0000000000000000 RBX: ffff8a36749c9400 RCX: ffff8a36749c9444 [106412.395924] RDX: ffff8a37f8696300 RSI: ffffb3db50cdfbd8 RDI: ffff8a36749c944c [106412.403277] RBP: ffffb3db50cdfa90 R08: ffff8a750b49bc88 R09: ffff8a37f8696300 [106412.410634] R10: 0000000000000230 R11: ffffffffffffffff R12: ffffb3db50cdfbd8 [106412.417984] R13: ffff8a7508beac00 R14: ffffb3db50cdfca0 R15: ffffb3db50cdfbd8 [106412.425338] FS: 0000000000000000(0000) GS:ffff8a73ffa80000(0000) knlGS:0000000000000000 [106412.433649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106412.439611] CR2: 0000000000000020 CR3: 00000001118e6006 CR4: 00000000003706e0 [106412.446984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106412.454346] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106412.461696] Call Trace: [106412.464361] <TASK> [106412.466689] nlmclnt_proc+0x1c6/0x5b0 [lockd] [106412.471272] nfs3_proc_lock+0x33/0xb0 [nfsv3] [106412.475848] ? nfs_put_lock_context+0x86/0x90 [nfs] [106412.481008] do_unlk+0x8f/0xd0 [nfs] [106412.484837] nfs_lock+0xcd/0x180 [nfs] [106412.488815] ? nlmsvc_mark_host+0x30/0x30 [lockd] [106412.493752] vfs_lock_file+0x1e/0x40 [106412.497547] nlm_unlock_files.isra.0+0x6d/0xc0 [lockd] [106412.502905] nlm_traverse_files+0x163/0x2a0 [lockd] [106412.508020] nlmsvc_free_host_resources+0x2b/0x40 [lockd] [106412.513648] nlm_host_rebooted+0x2c/0x90 [lockd] [106412.518483] nlmsvc_proc_sm_notify+0xc0/0x130 [lockd] [106412.523759] ? nlmsvc_decode_reboot+0x7d/0xa0 [lockd] [106412.529027] nlmsvc_dispatch+0x8e/0x1a0 [lockd] [106412.534312] svc_process_common+0x484/0x620 [sunrpc] [106412.539521] ? lockd+0x1d0/0x1d0 [lockd] [106412.543661] ? set_grace_period+0xa0/0xa0 [lockd] [106412.548582] svc_process+0xbc/0xf0 [sunrpc] [106412.553008] lockd+0xd2/0x1d0 [lockd] [106412.556906] ? set_grace_period+0xa0/0xa0 [lockd] [106412.561849] kthread+0xee/0x120 [106412.565228] ? kthread_complete_and_exit+0x20/0x20 [106412.570239] ret_from_fork+0x1f/0x30 [106412.574033] </TASK> [106412.576436] Modules linked in: tcp_diag(E) inet_diag(E) nfsv3(E) nfs(E) cachefiles(E) fscache(E) netfs(E) ext4(E) mbcache(E) jbd2(E) intel_uncore_frequency_common(E) isst_if_common(E) sg(E) nfit(E) virtio_rng(E) rapl(E) i2c_piix4(E) input_leds(E) nfsd(E) sch_fq(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) tcp_bbr(E) binfmt_misc(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) crc64(E) crct10dif_pclmul(E) crc32_pclmul(E) virtio_scsi(E) crc32c_intel(E) ghash_clmulni_intel(E) 8021q(E) garp(E) mrp(E) virtio_pci(E) scsi_transport_iscsi(E) virtio_pci_legacy_dev(E) aesni_intel(E) virtio_pci_modern_dev(E) crypto_simd(E) virtio_ring(E) cryptd(E) gve(E) serio_raw(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) fuse(E) [106412.646242] CR2: 0000000000000020 [106412.649780] ---[ end trace 0000000000000000 ]--- [106412.654617] RIP: 0010:nlmclnt_setlockargs+0x4a/0x100 [lockd] [106412.660495] Code: 00 00 49 81 c0 88 00 00 00 f0 0f c1 05 bf 06 01 00 83 c0 01 c7 47 30 04 00 00 00 48 8d 4f 44 48 8d 7f 4c 89 47 c4 48 8b 46 78 <48> 8b 40 20 48 8b 90 60 fe ff ff 48 8d b0 60 fe ff ff 48 89 57 f8 [106412.679481] RSP: 0018:ffffb3db50cdfa80 EFLAGS: 00010202 [106412.684922] RAX: 0000000000000000 RBX: ffff8a36749c9400 RCX: ffff8a36749c9444 [106412.692269] RDX: ffff8a37f8696300 RSI: ffffb3db50cdfbd8 RDI: ffff8a36749c944c [106412.699617] RBP: ffffb3db50cdfa90 R08: ffff8a750b49bc88 R09: ffff8a37f8696300 [106412.706969] R10: 0000000000000230 R11: ffffffffffffffff R12: ffffb3db50cdfbd8 [106412.714329] R13: ffff8a7508beac00 R14: ffffb3db50cdfca0 R15: ffffb3db50cdfbd8 [106412.721676] FS: 0000000000000000(0000) GS:ffff8a73ffa80000(0000) knlGS:0000000000000000 [106412.729981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106412.736472] CR2: 0000000000000020 CR3: 00000001118e6006 CR4: 00000000003706e0 [106412.743821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106412.751171] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106412.758520] Kernel panic - not syncing: Fatal exception [106412.764850] Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [106412.775850] ---[ end Kernel panic - not syncing: Fatal exception ]--- All I know is that I didn't notice this crash from v5.12 to v5.16 but I have not been able to test this qualitatively yet. The crash is rare enough that it makes A/B testing quite tricky. It's somewhat similar to https://bugzilla.kernel.org/show_bug.cgi?id=213273 but that was for a NFv4.2 re-export of NFSv3 and this is for a NFSv3 re-export of NFSv3 (for WAN caching). We are using nfs-utils-2.5.4. Daire
So, I have slowly been working through previous kernels and I have been able to reproduce this crash all the way back to v5.16. I have been running v5.15.3 for a few days now and haven't reproduced it yet. It's not yet certain, but a few more days and I'll go back to retesting v5.16 to make sure I can reproduce again. I couldn't really see anything suspicious in the changes between v5.15 and v5.16 that could be related to this... Daire
is this the same bug (started to occure after upgrade from 5.19.11 to 6.0.6,6.0.7 [112807.471573] BUG: kernel NULL pointer dereference, address: 00000000000000d0 [112807.471590] #PF: supervisor read access in kernel mode [112807.471595] #PF: error_code(0x0000) - not-present page [112807.471599] PGD 0 P4D 0 [112807.471605] Oops: 0000 [#1] PREEMPT SMP NOPTI [112807.471611] CPU: 1 PID: 2156 Comm: nfsd Tainted: G U W 6.0.7-060007-generic #202211031652 [112807.471618] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4105M, BIOS L1.99 12/31/2019 [112807.471622] RIP: 0010:vfs_setlease+0x2b/0x90 [112807.471633] Code: 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 49 8b 45 28 48 89 da 4c 89 ef <48> 8b 80 d0 00 00 00 48 85 c0 74 3b ff d0 0f 1f 00 48 83 c4 10 5b [112807.471641] RSP: 0018:ffffb275c2313c90 EFLAGS: 00010246 [112807.471646] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb275c2313cc0 [112807.471650] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffa0b20539faa8 [112807.471654] RBP: ffffb275c2313cb0 R08: 0000000000000000 R09: 0000000000000000 [112807.471659] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b20aa80ab0 [112807.471663] R13: ffffa0b20539faa8 R14: ffffa0b216def5e0 R15: ffffa0b2140d0c00 [112807.471668] FS: 0000000000000000(0000) GS:ffffa0b56fc80000(0000) knlGS:0000000000000000 [112807.471673] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [112807.471677] CR2: 00000000000000d0 CR3: 00000002f2e10000 CR4: 0000000000352ee0 [112807.471682] Call Trace: [112807.471686] <TASK> [112807.471694] ? nfsd4_cld_remove+0x18a/0x200 [nfsd] [112807.471757] destroy_unhashed_deleg+0x6d/0x100 [nfsd] [112807.471803] __destroy_client+0xdd/0x250 [nfsd] [112807.471846] expire_client+0x63/0x80 [nfsd] [112807.471887] nfsd4_create_session+0x8d6/0xaf0 [nfsd] [112807.471929] nfsd4_proc_compound+0x3b5/0x770 [nfsd] [112807.472001] nfsd_dispatch+0x174/0x2a0 [nfsd] [112807.472076] svc_process_common+0x2a8/0x640 [sunrpc] [112807.472204] ? svc_handle_xprt+0x18d/0x370 [sunrpc] [112807.472269] ? nfsd_svc+0x1b0/0x1b0 [nfsd] [112807.472311] ? nfsd_shutdown_threads+0xb0/0xb0 [nfsd] [112807.472352] svc_process+0xba/0x110 [sunrpc] [112807.472413] nfsd+0xdc/0x1b0 [nfsd] [112807.472453] kthread+0xe6/0x110 [112807.472463] ? kthread_complete_and_exit+0x20/0x20 [112807.472470] ret_from_fork+0x1f/0x30 [112807.472478] </TASK> [112807.472480] Modules linked in: rpcsec_gss_krb5 tls xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype nft_compat veth vhost_net vhost vhost_iotlb tap wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel lz4 lz4_compress z3fold bridge snd_sof_pci_intel_apl snd_sof_intel_hda_common nls_iso8859_1 soundwire_intel soundwire_generic_allocation soundwire_cadence xfs snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_bus ir_nec_decoder rc_astrometa_t2hybrid snd_soc_avs snd_soc_hda_codec snd_soc_skl snd_soc_hdac_hda rtl2832_sdr snd_hda_ext_core snd_soc_sst_ipc snd_soc_sst_dsp videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_soc_acpi_intel_match videobuf2_common snd_soc_acpi videodev snd_soc_core snd_compress r820t snd_hda_codec_hdmi intel_pmc_bxt mn88473 ac97_bus intel_telemetry_pltdrv rtl2832 intel_punit_ipc ir_rc5_decoder snd_hda_codec_realtek intel_telemetry_core [112807.472596] snd_pcm_dmaengine snd_hda_codec_generic si2157 x86_pkg_temp_thermal intel_powerclamp ledtrig_audio rc_dvbsky kvm_intel snd_hda_intel si2168 snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec mei_pxp mei_hdcp ppdev snd_hda_core dvb_usb_rtl28xxu ts2020 kvm snd_hwdep intel_rapl_msr dvb_usb_dvbsky dvb_usb_v2 snd_pcm smipcie snd_timer m88ds3103 i2c_mux ch341 snd usbserial dvb_core processor_thermal_device_pci_legacy processor_thermal_device processor_thermal_rfim mc rapl input_leds intel_cstate soundcore processor_thermal_mbox mei_me processor_thermal_rapl serio_raw ee1004 intel_rapl_common mei intel_soc_dts_iosf parport_pc int3400_thermal int3403_thermal int340x_thermal_zone int3406_thermal mac_hid dptf_power parport acpi_thermal_rel nft_nat nft_masq nft_chain_nat nf_nat nf_log_syslog nft_log sch_fq_codel nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc nf_tables nct6775 nct6775_core nfsd dm_multipath hwmon_vid nfnetlink wmi scsi_dh_rdac scsi_dh_emc [112807.472810] scsi_dh_alua coretemp auth_rpcgss nfs_acl lockd grace sunrpc ramoops reed_solomon efi_pstore pstore_blk pstore_zone ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 multipath linear raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 i915 i2c_algo_bit drm_buddy ttm drm_display_helper crct10dif_pclmul crc32_pclmul cec rc_core drm_kms_helper syscopyarea sysfillrect polyval_generic sysimgblt fb_sys_fops i2c_i801 ghash_clmulni_intel aesni_intel crypto_simd cryptd i2c_smbus drm r8169 xhci_pci ahci xhci_pci_renesas realtek libahci video pinctrl_geminilake [112807.472925] CR2: 00000000000000d0 [112807.472930] ---[ end trace 0000000000000000 ]--- [112807.605087] RIP: 0010:vfs_setlease+0x2b/0x90 [112807.605107] Code: 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 49 8b 45 28 48 89 da 4c 89 ef <48> 8b 80 d0 00 00 00 48 85 c0 74 3b ff d0 0f 1f 00 48 83 c4 10 5b [112807.605116] RSP: 0018:ffffb275c2313c90 EFLAGS: 00010246 [112807.605121] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb275c2313cc0 [112807.605126] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffa0b20539faa8 [112807.605130] RBP: ffffb275c2313cb0 R08: 0000000000000000 R09: 0000000000000000 [112807.605135] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b20aa80ab0 [112807.605139] R13: ffffa0b20539faa8 R14: ffffa0b216def5e0 R15: ffffa0b2140d0c00 [112807.605144] FS: 0000000000000000(0000) GS:ffffa0b56fc80000(0000) knlGS:0000000000000000 [112807.605149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [112807.605153] CR2: 00000000000000d0 CR3: 0000000106344000 CR4: 0000000000352ee0
another day, another mount attempt, the same outcome [263384.355406] BUG: kernel NULL pointer dereference, address: 0000000000000028 [263384.355422] #PF: supervisor read access in kernel mode [263384.355427] #PF: error_code(0x0000) - not-present page [263384.355431] PGD 0 P4D 0 [263384.355437] Oops: 0000 [#1] PREEMPT SMP NOPTI [263384.355443] CPU: 3 PID: 2226 Comm: nfsd Tainted: G U 6.0.7-060007-generic #202211031652 [263384.355449] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4105M, BIOS L1.99 12/31/2019 [263384.355454] RIP: 0010:vfs_setlease+0x21/0x90 [263384.355464] Code: 19 fe ff ff e8 c0 6e a5 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 <49> 8b 45 28 48 89 da 4c 89 ef 48 8b 80 d0 00 00 00 48 85 c0 74 3b [263384.355472] RSP: 0018:ffff9668424e7c90 EFLAGS: 00010246 [263384.355477] RAX: ffff8a17ca7da8f0 RBX: 0000000000000000 RCX: ffff9668424e7cc0 [263384.355481] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 [263384.355485] RBP: ffff9668424e7cb0 R08: 0000000000000000 R09: 0000000000000000 [263384.355490] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a18f1704ab0 [263384.355494] R13: 0000000000000000 R14: ffff8a17d8bb6160 R15: ffff8a17dd2de000 [263384.355499] FS: 0000000000000000(0000) GS:ffff8a1b2fd80000(0000) knlGS:0000000000000000 [263384.355504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [263384.355507] CR2: 0000000000000028 CR3: 00000002e5210000 CR4: 0000000000352ee0 [263384.355513] Call Trace: [263384.355517] <TASK> [263384.355526] ? nfsd4_cld_remove+0x18a/0x200 [nfsd] [263384.355582] destroy_unhashed_deleg+0x6d/0x100 [nfsd] [263384.355630] __destroy_client+0xdd/0x250 [nfsd] [263384.355672] expire_client+0x63/0x80 [nfsd] [263384.355712] nfsd4_create_session+0x8d6/0xaf0 [nfsd] [263384.355754] nfsd4_proc_compound+0x3b5/0x770 [nfsd] [263384.355798] nfsd_dispatch+0x174/0x2a0 [nfsd] [263384.355841] svc_process_common+0x2a8/0x640 [sunrpc] [263384.355915] ? svc_handle_xprt+0x18d/0x370 [sunrpc] [263384.355980] ? nfsd_svc+0x1b0/0x1b0 [nfsd] [263384.356022] ? nfsd_shutdown_threads+0xb0/0xb0 [nfsd] [263384.356062] svc_process+0xba/0x110 [sunrpc] [263384.356124] nfsd+0xdc/0x1b0 [nfsd] [263384.356165] kthread+0xe6/0x110 [263384.356174] ? kthread_complete_and_exit+0x20/0x20 [263384.356181] ret_from_fork+0x1f/0x30 [263384.356189] </TASK> [263384.356192] Modules linked in: bluetooth ecdh_generic ecc msr rpcsec_gss_krb5 tls xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype nft_compat veth vhost_net vhost vhost_iotlb tap wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel lz4 lz4_compress z3fold bridge xfs snd_sof_pci_intel_apl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp ir_nec_decoder nls_iso8859_1 rc_astrometa_t2hybrid snd_sof rtl2832_sdr snd_sof_utils videobuf2_vmalloc soundwire_bus videobuf2_memops videobuf2_v4l2 videobuf2_common snd_soc_avs snd_soc_hda_codec videodev snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc r820t snd_soc_sst_dsp snd_hda_codec_hdmi snd_soc_acpi_intel_match mn88473 ir_rc5_decoder rtl2832 snd_hda_codec_realtek rc_dvbsky intel_pmc_bxt snd_soc_acpi intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core [263384.356279] si2157 snd_soc_core x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp snd_compress si2168 kvm_intel ac97_bus ppdev ts2020 snd_pcm_dmaengine ledtrig_audio kvm snd_hda_intel intel_rapl_msr snd_intel_dspcfg mei_hdcp mei_pxp snd_intel_sdw_acpi dvb_usb_rtl28xxu dvb_usb_dvbsky smipcie snd_hda_codec dvb_usb_v2 snd_hda_core m88ds3103 snd_hwdep ch341 dvb_core i2c_mux snd_pcm usbserial processor_thermal_device_pci_legacy mc processor_thermal_device rapl processor_thermal_rfim intel_cstate parport_pc input_leds processor_thermal_mbox processor_thermal_rapl snd_timer mei_me int3406_thermal intel_rapl_common snd mei dptf_power intel_soc_dts_iosf serio_raw soundcore ee1004 int3403_thermal mac_hid int3400_thermal parport int340x_thermal_zone acpi_thermal_rel nft_nat nft_masq nft_chain_nat nf_nat nf_log_syslog nft_log nft_ct nf_conntrack nf_defrag_ipv6 sch_fq_codel nf_defrag_ipv4 8021q nf_tables garp dm_multipath mrp stp llc nfnetlink nct6775 nfsd scsi_dh_rdac nct6775_core [263384.356399] scsi_dh_emc scsi_dh_alua auth_rpcgss hwmon_vid wmi nfs_acl lockd coretemp grace ramoops pstore_blk reed_solomon efi_pstore sunrpc pstore_zone ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 multipath linear raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit drm_buddy ttm drm_display_helper polyval_generic ghash_clmulni_intel cec r8169 aesni_intel rc_core drm_kms_helper syscopyarea sysfillrect crypto_simd sysimgblt fb_sys_fops ahci xhci_pci drm i2c_i801 cryptd xhci_pci_renesas realtek i2c_smbus libahci video pinctrl_geminilake [263384.356511] CR2: 0000000000000028 [263384.356516] ---[ end trace 0000000000000000 ]--- [263384.548316] RIP: 0010:vfs_setlease+0x21/0x90 [263384.548353] Code: 19 fe ff ff e8 c0 6e a5 00 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 fd 53 48 89 d3 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 30 <49> 8b 45 28 48 89 da 4c 89 ef 48 8b 80 d0 00 00 00 48 85 c0 74 3b [263384.548375] RSP: 0018:ffff9668424e7c90 EFLAGS: 00010246 [263384.548390] RAX: ffff8a17ca7da8f0 RBX: 0000000000000000 RCX: ffff9668424e7cc0 [263384.548402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 [263384.548413] RBP: ffff9668424e7cb0 R08: 0000000000000000 R09: 0000000000000000 [263384.548425] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a18f1704ab0 [263384.548436] R13: 0000000000000000 R14: ffff8a17d8bb6160 R15: ffff8a17dd2de000 [263384.548448] FS: 0000000000000000(0000) GS:ffff8a1b2fd80000(0000) knlGS:0000000000000000 [263384.548462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [263384.548472] CR2: 0000000000000028 CR3: 000000012017c000 CR4: 0000000000352ee0
(In reply to a1bert from comment #2) > is this the same bug (started to occure after upgrade from 5.19.11 to > 6.0.6,6.0.7 This is not the same bug at all. Please open a new bug, thanks!
(In reply to Daire Byrne from comment #1) > So, I have slowly been working through previous kernels and I have been able > to reproduce this crash all the way back to v5.16. > > I have been running v5.15.3 for a few days now and haven't reproduced it > yet. It's not yet certain, but a few more days and I'll go back to retesting > v5.16 to make sure I can reproduce again. > > I couldn't really see anything suspicious in the changes between v5.15 and > v5.16 that could be related to this... We have a putative fix for this one. https://lore.kernel.org/linux-nfs/07EF2FD9-C49E-4414-B05B-83CE9D3AD6C3@hammerspace.com/T/#m64ac4fa33b122ec9607c7f949a1b8caccb97533f Daire, please try this. The fix is currently queued for v6.2. This bug is marked P1. Does that mean you want us to consider the fix for v6.1-rc ?
Thanks both. I saw the post on linux-nfs and applied Trond's patch to v6.1-rc4 today and put it into production. It'll probably take a week or so to verify that it fixes our issue. There is probably a much quicker reproducer we could devise... As for the P1, I didn't change that but it might have been updated because of the possibility it was a regression? I'm personally fine to wait until v6.2. Cheers.
Just to follow up - our re-export servers have been rock solid with this patch. We went from a crash every 2 days, to none in a month so far. Cheers, Daire