Bug 218288 - [REGRESSION] WARNING: CPU: 0 ... at kernel/workqueue.c:1638 __queue_work+0x329/0x440 followed by a BUG
Summary: [REGRESSION] WARNING: CPU: 0 ... at kernel/workqueue.c:1638 __queue_work+0x32...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-18 21:50 UTC by michallinuxstuff
Modified: 2024-01-12 16:05 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description michallinuxstuff 2023-12-18 21:50:37 UTC
OS: Fedora38, kernel 6.5.12

This issue happens while hot-plugging mlx5 net ctrls on the pci bus via sysfs. 

This is part of the test infra where local (single host) infiniband/rdma connection is established in initiator-target fashion. After the initiator connects to the target, the test removes target's net device from the bus, performs some checks and then attempts the full pci rescan. This is, more or less, when the BUG is triggered:


2023-12-18T22:12:14+01:00	Dec 18 21:12:14 10.211.11.73 [ 5036.770693] mlx5_core 0000:da:00.0: Port module event: module 0, Cable unplugged
2023-12-18T22:12:14+01:00	Dec 18 21:12:14 10.211.11.73 [ 5037.141364] mlx5_core 0000:da:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
2023-12-18T22:12:14+01:00	Dec 18 21:12:14 10.211.11.73 [ 5037.160337] mlx5_core 0000:da:00.0 mlx_0_0: renamed from eth2
2023-12-18T22:12:15+01:00	Dec 18 21:12:15 10.211.11.73 [ 5037.869411] mlx5_core 0000:da:00.0 mlx_0_0: Link down
2023-12-18T22:12:16+01:00	Dec 18 21:12:16 10.211.11.73 [ 5038.642122] mlx5_core 0000:da:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
2023-12-18T22:12:16+01:00	Dec 18 21:12:16 10.211.11.73 [ 5038.666406] mlx5_core 0000:da:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
2023-12-18T22:12:20+01:00	Dec 18 21:12:20 10.211.11.73 [ 5043.064983] mlx5_core 0000:da:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
2023-12-18T22:12:21+01:00	Dec 18 21:12:21 10.211.11.73 [ 5044.035537] mlx5_core 0000:da:00.1: E-Switch: cleanup
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.208080] pci 0000:da:00.1: [15b3:1015] type 00 class 0x020000
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.214982] pci 0000:da:00.1: reg 0x10: [mem 0x39fffa000000-0x39fffbffffff 64bit pref]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.223670] pci 0000:da:00.1: reg 0x30: [mem 0xfff00000-0xffffffff pref]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.231477] pci 0000:da:00.1: PME# supported from D3cold
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.237847] pci 0000:da:00.1: reg 0x1a4: [mem 0x39fffe000000-0x39fffe0fffff 64bit pref]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.246633] pci 0000:da:00.1: VF(n) BAR0 space: [mem 0x39fffe000000-0x39fffe7fffff 64bit pref] (contains BAR0 for 8 VFs)
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.260014] pci 0000:da:00.1: Adding to iommu group 20
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.266214] pcieport 0000:d7:00.0: bridge window [io  0x1000-0x0fff] to [bus d8] add_size 1000
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.275682] pcieport 0000:d7:00.0: BAR 13: no space for [io  size 0x1000]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.283295] pcieport 0000:d7:00.0: BAR 13: failed to assign [io  size 0x1000]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.291236] pcieport 0000:d7:00.0: BAR 13: no space for [io  size 0x1000]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.298840] pcieport 0000:d7:00.0: BAR 13: failed to assign [io  size 0x1000]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.306805] pci 0000:da:00.1: BAR 0: assigned [mem 0x39fffa000000-0x39fffbffffff 64bit pref]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.316155] pci 0000:da:00.1: BAR 6: assigned [mem 0xeee00000-0xeeefffff pref]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.324210] pci 0000:da:00.1: BAR 7: assigned [mem 0x39fffe000000-0x39fffe7fffff 64bit pref]
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.334837] mlx5_core 0000:da:00.1: firmware version: 14.32.1010
2023-12-18T22:12:22+01:00	Dec 18 21:12:22 10.211.11.73 [ 5045.341758] mlx5_core 0000:da:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
2023-12-18T22:12:23+01:00	Dec 18 21:12:23 10.211.11.73 [ 5045.638668] mlx5_core 0000:da:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
2023-12-18T22:12:23+01:00	Dec 18 21:12:23 10.211.11.73 [ 5045.662778] mlx5_core 0000:da:00.1: Port module event: module 1, Cable unplugged
2023-12-18T22:12:23+01:00	Dec 18 21:12:23 10.211.11.73 [ 5046.049705] mlx5_core 0000:da:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
2023-12-18T22:12:23+01:00	Dec 18 21:12:23 10.211.11.73 [ 5046.068488] mlx5_core 0000:da:00.1 mlx_0_1: renamed from eth2
2023-12-18T22:12:24+01:00	Dec 18 21:12:24 10.211.11.73 [ 5046.772351] mlx5_core 0000:da:00.1 mlx_0_1: Link down
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.838297] ------------[ cut here ]------------
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.843817] WARNING: CPU: 73 PID: 0 at kernel/workqueue.c:1638 __queue_work+0x329/0x440
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.852676] Modules linked in: nvme_rdma nvme_fabrics vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd rdma_ucm rdma_cm iw_cm ib_umad ib_cm rfkill usdm_drv(OE) intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp sunrpc coretemp kvm_intel binfmt_misc kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic qat_c62x(OE) ghash_clmulni_intel sha512_ssse3 intel_qat(OE) mlx5_ib rapl ib_uverbs intel_cstate iTCO_wdt intel_pmc_bxt ast acpi_ipmi mei_me iTCO_vendor_support ipmi_si pcspkr joydev i2c_algo_bit ib_core i2c_i801 mei intel_uncore uio ioatdma ipmi_devintf i2c_smbus lpc_ich wmi intel_pch_thermal dca ipmi_msghandler acpi_pad acpi_power_meter ip6_tables ip_tables fuse zram bpf_preload loop overlay squashfs netconsole nd_pmem nd_btt nd_e820 libnvdimm virtio_blk virtio_net net_failover failover uas usb_storage mlx5_core mlxfw psample tls pci_hy
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 perv_intf ice(OE) gnss nvme
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.852838]  nvme_core nvme_common i40e
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.948383] CPU: 73 PID: 0 Comm: swapper/73 Tainted: G           OE      6.5.12-200.fc38.x86_64 #1
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.957590] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.968275] RIP: 0010:__queue_work+0x329/0x440
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.972976] Code: e9 a0 fd ff ff 48 89 c5 e9 68 fe ff ff 65 8b 05 45 d1 ef 4d a9 00 01 ff 00 75 0f 65 48 8b 3c 25 c0 35 03 00 f6 47 2c 20 75 6d <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.992241] RSP: 0018:ffffa9d00db88e78 EFLAGS: 00010006
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5060.997724] RAX: 0000000080000101 RBX: ffff94bc097cec48 RCX: ffff94ead68621c0
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.005118] RDX: ffff94bc097cec48 RSI: ffff94d3afbef000 RDI: 0000000000002000
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.012510] RBP: ffff94bc097cec68 R08: 0000000000020092 R09: ffff94ead68621e8
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.019903] R10: 0000000000000002 R11: ffffa9d00db88ff8 R12: ffff94d3afbef000
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.027296] R13: 0000000000002000 R14: ffffa9d00db88f00 R15: ffff94ead68621c0
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.034685] FS:  0000000000000000(0000) GS:ffff94ead6840000(0000) knlGS:0000000000000000
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.043035] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.049046] CR2: 000055cc7f306048 CR3: 00000002ce222003 CR4: 00000000007706e0
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.056442] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.063841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.071257] PKRU: 55555554
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.074230] Call Trace:
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.076939]  <IRQ>
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.079211]  ? __queue_work+0x329/0x440
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.083300]  ? __warn+0x81/0x130
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.086780]  ? __queue_work+0x329/0x440
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.090884]  ? report_bug+0x171/0x1a0
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.094804]  ? handle_bug+0x3c/0x80
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.098546]  ? exc_invalid_op+0x17/0x70
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.102634]  ? asm_exc_invalid_op+0x1a/0x20
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.107071]  ? __queue_work+0x329/0x440
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.111148]  ? __pfx_delayed_work_timer_fn+0x10/0x10
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.116358]  call_timer_fn+0x24/0x130
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.120261]  ? __pfx_delayed_work_timer_fn+0x10/0x10
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.125462]  __run_timers+0x1b7/0x2c0
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.129369]  run_timer_softirq+0x1d/0x40
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.133526]  __do_softirq+0xd1/0x2c8
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.137336]  __irq_exit_rcu+0xa6/0xc0
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.141236]  sysvec_apic_timer_interrupt+0x72/0x90
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.146263]  </IRQ>
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.148602]  <TASK>
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.150938]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.156305] RIP: 0010:cpuidle_enter_state+0xcc/0x440
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.161488] Code: aa 2c 1c ff e8 c5 f3 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 93 32 1b ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.180672] RSP: 0018:ffffa9d0007f7e90 EFLAGS: 00000246
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.186116] RAX: ffff94ead6873e00 RBX: ffffc9cfff841278 RCX: 000000000000001f
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.193465] RDX: 0000000000000049 RSI: 000000003d187d7d RDI: 0000000000000000
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.200809] RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000000000018
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.208143] R10: ffff94ead68727c4 R11: 00000000000086d5 R12: ffffffffb443b0a0
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.215468] R13: 0000049a5177d49d R14: 0000000000000003 R15: 0000000000000000
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.222805]  cpuidle_enter+0x2d/0x40
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.226586]  do_idle+0x20d/0x270
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.230013]  cpu_startup_entry+0x2a/0x30
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.234126]  start_secondary+0x11e/0x140
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.238233]  secondary_startup_64_no_verify+0x17e/0x18b
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.243637]  </TASK>
2023-12-18T22:12:38+01:00	Dec 18 21:12:38 10.211.11.73 [ 5061.245997] ---[ end trace 0000000000000000 ]---
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.029690] BUG: kernel NULL pointer dereference, address: 0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.037365] #PF: supervisor write access in kernel mode
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.043239] #PF: error_code(0x0002) - not-present page
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.048777] PGD 0 P4D 0 
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.051707] Oops: 0002 [#1] PREEMPT SMP NOPTI
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.056331] CPU: 25 PID: 0 Comm: swapper/25 Tainted: G        W  OE      6.5.12-200.fc38.x86_64 #1
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.065466] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.076066] RIP: 0010:_raw_spin_lock+0x17/0x30
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.080691] Code: 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 c8 73 05 4d 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e8 97 01 00 00 90 c3 cc cc
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.099844] RSP: 0018:ffffa9d00d1c0e70 EFLAGS: 00010046
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.105268] RAX: 0000000000000000 RBX: ffff94d36cae4c48 RCX: ffff94ead62621c0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.112592] RDX: 0000000000000001 RSI: 000000007fffffff RDI: 0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.119927] RBP: ffff94ead6240000 R08: 0000000000020092 R09: ffff94ead62621e8
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.127259] R10: 0000000000000002 R11: ffffa9d00d1c0ff8 R12: ffff94d3b291c800
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.134592] R13: 0000000000002000 R14: 0000000000000019 R15: ffff94ead62621c0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.141927] FS:  0000000000000000(0000) GS:ffff94ead6240000(0000) knlGS:0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.150222] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.156174] CR2: 0000000000000000 CR3: 00000002ce222005 CR4: 00000000007706e0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.163520] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.170857] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.178197] PKRU: 55555554
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.181118] Call Trace:
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.183776]  <IRQ>
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.186002]  ? __die+0x23/0x70
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.189269]  ? page_fault_oops+0x171/0x4e0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.193577]  ? exc_page_fault+0x7f/0x180
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.197712]  ? asm_exc_page_fault+0x26/0x30
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.202107]  ? _raw_spin_lock+0x17/0x30
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.206151]  __queue_work+0x174/0x440
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.210024]  ? __pfx_delayed_work_timer_fn+0x10/0x10
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.215199]  call_timer_fn+0x24/0x130
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.219072]  ? __pfx_delayed_work_timer_fn+0x10/0x10
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.224239]  __run_timers+0x1b7/0x2c0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.228099]  run_timer_softirq+0x1d/0x40
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.232220]  __do_softirq+0xd1/0x2c8
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.235984]  __irq_exit_rcu+0xa6/0xc0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.239842]  sysvec_apic_timer_interrupt+0x72/0x90
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.244823]  </IRQ>
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.247107]  <TASK>
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.249393]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.254716] RIP: 0010:cpuidle_enter_state+0xcc/0x440
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.259861] Code: aa 2c 1c ff e8 c5 f3 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 93 32 1b ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.278995] RSP: 0018:ffffa9d000387e90 EFLAGS: 00000246
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.284421] RAX: ffff94ead6273e00 RBX: ffffc9cfff241278 RCX: 000000000000001f
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.291755] RDX: 0000000000000019 RSI: 000000003d187d7d RDI: 0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.299087] RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000000000018
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.306416] R10: ffff94ead62727c4 R11: 00000000000178f5 R12: ffffffffb443b0a0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.313748] R13: 0000049c39b69491 R14: 0000000000000003 R15: 0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.321081]  cpuidle_enter+0x2d/0x40
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.324864]  do_idle+0x20d/0x270
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.328296]  cpu_startup_entry+0x2a/0x30
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.332416]  start_secondary+0x11e/0x140
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.336537]  secondary_startup_64_no_verify+0x17e/0x18b
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.341954]  </TASK>
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.344326] Modules linked in: nvme_rdma nvme_fabrics vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd rdma_ucm rdma_cm iw_cm ib_umad ib_cm rfkill usdm_drv(OE) intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp sunrpc coretemp kvm_intel binfmt_misc kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic qat_c62x(OE) ghash_clmulni_intel sha512_ssse3 intel_qat(OE) mlx5_ib rapl ib_uverbs intel_cstate iTCO_wdt intel_pmc_bxt ast acpi_ipmi mei_me iTCO_vendor_support ipmi_si pcspkr joydev i2c_algo_bit ib_core i2c_i801 mei intel_uncore uio ioatdma ipmi_devintf i2c_smbus lpc_ich wmi intel_pch_thermal dca ipmi_msghandler acpi_pad acpi_power_meter ip6_tables ip_tables fuse zram bpf_preload loop overlay squashfs netconsole nd_pmem nd_btt nd_e820 libnvdimm virtio_blk virtio_net net_failover failover uas usb_storage mlx5_core mlxfw psample tls pci_hy
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 perv_intf ice(OE) gnss nvme
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.344385]  nvme_core nvme_common i40e
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.439428] CR2: 0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.442953] ---[ end trace 0000000000000000 ]---
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.452129] pstore: backend (erst) writing error (-28)
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.457496] RIP: 0010:_raw_spin_lock+0x17/0x30
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.462149] Code: 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 ff 05 c8 73 05 4d 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 c3 cc cc cc cc 89 c6 e8 97 01 00 00 90 c3 cc cc
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.481358] RSP: 0018:ffffa9d00d1c0e70 EFLAGS: 00010046
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.486816] RAX: 0000000000000000 RBX: ffff94d36cae4c48 RCX: ffff94ead62621c0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.494178] RDX: 0000000000000001 RSI: 000000007fffffff RDI: 0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.501543] RBP: ffff94ead6240000 R08: 0000000000020092 R09: ffff94ead62621e8
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.508908] R10: 0000000000000002 R11: ffffa9d00d1c0ff8 R12: ffff94d3b291c800
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.516277] R13: 0000000000002000 R14: 0000000000000019 R15: ffff94ead62621c0
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.523646] FS:  0000000000000000(0000) GS:ffff94ead6240000(0000) knlGS:0000000000000000
2023-12-18T22:12:46+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.531967] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-12-18T22:12:47+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.537954] CR2: 0000000000000000 CR3: 00000002ce222005 CR4: 00000000007706e0
2023-12-18T22:12:47+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.545323] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2023-12-18T22:12:47+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.552700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2023-12-18T22:12:47+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.560086] PKRU: 55555554a
2023-12-18T22:12:47+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.563037] Kernel panic - not syncing: Fatal exception in interrupt
2023-12-18T22:12:47+01:00	Dec 18 21:12:46 10.211.11.73 [ 5069.569727] Kernel Offset: 0x31000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2023-12-18T22:12:47+01:00	Dec 18 21:12:47 10.211.11.73 [ 5069.585713] Rebooting in 5 seconds..



The "rename" parts are coming from custom udev rule which only role is to rename mlx net devices (triggered on add|remove uevents only). Mentioning this since "lx5_core 0000:da:00.1 mlx_0_1: Link down" is the last message before the BUG is triggered (and this most likely comes from the custom rule which in first instance DOWNs the target net device).




Interestingly enough, similar issue was occurring under 6.1.14 kernel - exactly the same test was triggering similar panic but traces were pointing more at infiniband|mlx drivers:

[ 1974.451135] pcieport 0000:00:1c.0: Enabling MPC IRBNCE
[ 1974.457524] pcieport 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[ 1974.467108] pci 0000:00:1e.0: PCI bridge to [bus 11]
[ 1982.293358] mlx5_core 0000:81:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), active vports(0)
[ 1982.320312] mlx5_core 0000:81:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), active vports(0)
[ 1982.353880] BUG: unable to handle page fault for address: ffffffff9a8b78a0
[ 1982.361910] #PF: supervisor write access in kernel mode
[ 1982.368067] #PF: error_code(0x0002) - not-present page
[ 1982.374129] PGD f20015067 P4D f20015067 PUD f20016063 PMD 852497063 PTE 800ffff0df748062
[ 1982.383502] Oops: 0002 [#1] PREEMPT SMP PTI
[ 1982.388480] CPU: 16 PID: 57459 Comm: kworker/u341:0 Tainted: G           OE      6.1.14-200.fc37.x86_64 #1
[ 1982.399584] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 1982.411384] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
[ 1982.418361] RIP: 0010:native_queued_spin_lock_slowpath+0x256/0x2b0
[ 1982.425597] Code: 90 48 8b 55 00 48 85 d2 74 f5 eb e7 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 05 48 63 d2 48 05 40 28 03 00 48 03 04 d5 a0 6a 81 99 <48> 89 28 8b 45 08 85 c0 75 09 f3 90 8b 45 08 85 c0 74 f7 48 8b 55
[ 1982.447245] RSP: 0018:ffffb59bcaa5fd70 EFLAGS: 00010086
[ 1982.453427] RAX: ffffffff9a8b78a0 RBX: ffff8a13cb5ec600 RCX: 0000000000440000
[ 1982.461754] RDX: 0000000000000b4a RSI: ffffffff99749b33 RDI: ffffffff997046b9
[ 1982.470070] RBP: ffff8a1b95932840 R08: 2c6f6c6e622c6168 R09: ffff8a13ceb9db74
[ 1982.478389] R10: 000000000000000f R11: fefefefefefefeff R12: 0000000000000000
[ 1982.486676] R13: 0000000000000010 R14: ffff8a13ceb9db00 R15: ffff8a13cb5ec600
[ 1982.494995] FS:  0000000000000000(0000) GS:ffff8a1b95900000(0000) knlGS:0000000000000000
[ 1982.504396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1982.511196] CR2: ffffffff9a8b78a0 CR3: 0000000f20010001 CR4: 00000000001706e0
[ 1982.519536] Call Trace:
[ 1982.522637]  <TASK>
[ 1982.525355]  _raw_spin_lock_irqsave+0x44/0x50
[ 1982.530598]  mlx5_ib_poll_cq+0x3e/0xcd0 [mlx5_ib]
[ 1982.536237]  ? _raw_spin_unlock+0x15/0x30
[ 1982.541093]  ? finish_task_switch.isra.0+0x9b/0x300
[ 1982.546922]  __ib_process_cq+0x4f/0x180 [ib_core]
[ 1982.552573]  ib_cq_poll_work+0x26/0x80 [ib_core]
[ 1982.558141]  process_one_work+0x1c7/0x380
[ 1982.563007]  worker_thread+0x4d/0x380
[ 1982.567466]  ? _raw_spin_lock_irqsave+0x23/0x50
[ 1982.572936]  ? rescuer_thread+0x380/0x380
[ 1982.577804]  kthread+0xe9/0x110
[ 1982.581693]  ? kthread_complete_and_exit+0x20/0x20
[ 1982.587428]  ret_from_fork+0x22/0x30
[ 1982.591832]  </TASK>
[ 1982.594659] Modules linked in: nvme_rdma nvme_fabrics vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio rdma_ucm rdma_cm iw_cm ib_umad ib_cm rfkill usdm_drv(OE) sunrpc binfmt_misc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_
intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mlx5_ib polyval_clmulni polyval_generic qat_c62x(OE) ghash_clmulni_intel sha512_ssse3 intel_qat(OE) rapl ib_uverbs ipmi_si mei_me iTCO_wdt intel_cstate intel_pmc_bxt ipmi_devintf iTCO_vendor_support ib_core intel_uncore
 joydev ipmi_msghandler pcspkr mgag200 lpc_ich mei uio i2c_i801 ioatdma i2c_smbus wmi ip6_tables ip_tables fuse zram bpf_preload loop overlay squashfs netconsole nd_pmem nd_btt nd_e820 libnvdimm virtio_blk virtio_net net_failover failover uas usb_storage nvme nvme_core nvme_common
mlx5_core mlxfw psample tls pci_hyperv_intf ixgbe mdio igb dca [last unloaded: nvme_fabrics]
[ 1982.689682] CR2: ffffffff9a8b78a0
[ 1982.693897] ---[ end trace 0000000000000000 ]---
[ 1982.705758] RIP: 0010:native_queued_spin_lock_slowpath+0x256/0x2b0
[ 1982.713168] Code: 90 48 8b 55 00 48 85 d2 74 f5 eb e7 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 05 48 63 d2 48 05 40 28 03 00 48 03 04 d5 a0 6a 81 99 <48> 89 28 8b 45 08 85 c0 75 09 f3 90 8b 45 08 85 c0 74 f7 48 8b 55
[ 1982.735133] RSP: 0018:ffffb59bcaa5fd70 EFLAGS: 00010086
[ 1982.741490] RAX: ffffffff9a8b78a0 RBX: ffff8a13cb5ec600 RCX: 0000000000440000
[ 1982.749993] RDX: 0000000000000b4a RSI: ffffffff99749b33 RDI: ffffffff997046b9
[ 1982.758478] RBP: ffff8a1b95932840 R08: 2c6f6c6e622c6168 R09: ffff8a13ceb9db74
[ 1982.766974] R10: 000000000000000f R11: fefefefefefefeff R12: 0000000000000000
[ 1982.775446] R13: 0000000000000010 R14: ffff8a13ceb9db00 R15: ffff8a13cb5ec600
[ 1982.783925] FS:  0000000000000000(0000) GS:ffff8a1b95900000(0000) knlGS:0000000000000000
[ 1982.793451] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1982.800369] CR2: ffffffff9a8b78a0 CR3: 0000000f20010001 CR4: 00000000001706e0
[ 1982.808817] note: kworker/u341:0[57459] exited with preempt_count 1
[ 1986.517149] mlx5_core 0000:81:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), active vports(0)
[ 1987.217568] mlx5_core 0000:81:00.0: E-Switch: cleanup
[ 1988.326673] pci 0000:81:00.0: Removing from iommu group 13
[ 1988.453525] pcieport 0000:00:1c.0: Enabling MPC IRBNCE
[ 1988.460268] pcieport 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[ 1988.470906] pci 0000:00:1e.0: PCI bridge to [bus 11]
[ 1988.477756] pci 0000:81:00.0: [15b3:1015] type 00 class 0x020000
[ 1988.485633] pci 0000:81:00.0: reg 0x10: [mem 0x3800fc000000-0x3800fdffffff 64bit pref]
[ 1988.495665] pci 0000:81:00.0: reg 0x30: [mem 0xec100000-0xec1fffff pref]
[ 1988.504604] pci 0000:81:00.0: PME# supported from D3cold
[ 1988.511473] pci 0000:81:00.0: reg 0x1a4: [mem 0x3800fe800000-0x3800fe8fffff 64bit pref]
[ 1988.521025] pci 0000:81:00.0: VF(n) BAR0 space: [mem 0x3800fe800000-0x3800feffffff 64bit pref] (contains BAR0 for 8 VFs)
[ 1988.535153] pci 0000:81:00.0: Adding to iommu group 13
[ 1988.590326] pci 0000:81:00.0: BAR 0: assigned [mem 0x3800fc000000-0x3800fdffffff 64bit pref]
[ 1988.600641] pci 0000:81:00.0: BAR 6: assigned [mem 0xec100000-0xec1fffff pref]
[ 1988.609423] pci 0000:81:00.0: BAR 7: assigned [mem 0x3800fe800000-0x3800feffffff 64bit pref]
[ 1988.620767] mlx5_core 0000:81:00.0: firmware version: 14.32.1010
[ 1988.628288] mlx5_core 0000:81:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 1988.943317] mlx5_core 0000:81:00.0: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 1988.984750] mlx5_core 0000:81:00.0: Port module event: module 0, Cable unplugged
[ 1989.025316] mlx5_core 0000:81:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[ 1989.377692] mlx5_core 0000:81:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 1991.212802] ------------[ cut here ]------------
[ 1991.219151] WARNING: CPU: 39 PID: 1896 at mm/slab_common.c:1035 __ksize+0x10c/0x130
[ 1991.228809] Modules linked in: nvme_rdma nvme_fabrics vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio rdma_ucm rdma_cm iw_cm ib_umad ib_cm rfkill usdm_drv(OE) sunrpc binfmt_misc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_
intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mlx5_ib polyval_clmulni polyval_generic qat_c62x(OE) ghash_clmulni_intel sha512_ssse3 intel_qat(OE) rapl ib_uverbs ipmi_si mei_me iTCO_wdt intel_cstate intel_pmc_bxt ipmi_devintf iTCO_vendor_support ib_core intel_uncore
 joydev ipmi_msghandler pcspkr mgag200 lpc_ich mei uio i2c_i801 ioatdma i2c_smbus wmi ip6_tables ip_tables fuse zram bpf_preload loop overlay squashfs netconsole nd_pmem nd_btt nd_e820 libnvdimm virtio_blk virtio_net net_failover failover uas usb_storage nvme nvme_core nvme_common
mlx5_core mlxfw psample tls pci_hyperv_intf ixgbe mdio igb dca [last unloaded: nvme_fabrics]
[ 1991.324169] CPU: 39 PID: 1896 Comm: telegraf Tainted: G      D    OE      6.1.14-200.fc37.x86_64 #1
[ 1991.334783] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 1991.346732] RIP: 0010:__ksize+0x10c/0x130
[ 1991.351694] Code: c1 e1 0c 48 03 0d 74 c8 50 01 48 39 ca 75 1b 48 8b 16 f7 c2 00 00 01 00 49 0f 44 c0 c3 cc cc cc cc 0f 0b 31 c0 c3 cc cc cc cc <0f> 0b 31 c0 c3 cc cc cc cc 48 8b 0d 54 64 d1 01 e9 0d ff ff ff 66
[ 1991.373642] RSP: 0018:ffffb59bca77fc78 EFLAGS: 00010206
[ 1991.379978] RAX: 0000000000008000 RBX: 0000000000000280 RCX: ffff8a13e6358000
[ 1991.388447] RDX: ffff8a13e635a800 RSI: fffffb0de198d600 RDI: fffffb0dc0000000
[ 1991.396906] RBP: 0000000000000cc0 R08: 0000000000001000 R09: ffff8a1b9576ad00
[ 1991.405378] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
[ 1991.413843] R13: 0000000000000000 R14: ffff8a13c0048700 R15: ffff8a13e920ba00
[ 1991.422308] FS:  00007fe7bf7fe6c0(0000) GS:ffff8a1b95bc0000(0000) knlGS:0000000000000000
[ 1991.431829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1991.438728] CR2: 00007fe801482000 CR3: 00000001d59d6002 CR4: 00000000001706e0
[ 1991.447208] Call Trace:
[ 1991.450418]  <TASK>
[ 1991.453249]  __alloc_skb+0xa2/0x1c0
[ 1991.457629]  tcp_stream_alloc_skb+0x28/0x130
[ 1991.462878]  tcp_sendmsg_locked+0x64e/0xbf0
[ 1991.468030]  ? sock_sendmsg+0x58/0x70
[ 1991.472584]  ? sock_write_iter+0x89/0xe0
[ 1991.477431]  tcp_sendmsg+0x27/0x40
[ 1991.481702]  sock_sendmsg+0x58/0x70
[ 1991.486062]  sock_write_iter+0x89/0xe0
[ 1991.490702]  vfs_write+0x34e/0x3e0
[ 1991.494959]  ksys_write+0x97/0xd0
[ 1991.499119]  do_syscall_64+0x5b/0x80
[ 1991.503551]  ? do_syscall_64+0x67/0x80
[ 1991.508173]  ? __irq_exit_rcu+0x3d/0x140
[ 1991.512987]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1991.519076] RIP: 0033:0x48d99b
[ 1991.522928] Code: fe ff eb bd e8 26 5a fe ff e9 61 ff ff ff cc e8 9b 1e fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[ 1991.544793] RSP: 002b:000000c000712870 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 1991.553698] RAX: ffffffffffffffda RBX: 000000c000061800 RCX: 000000000048d99b
[ 1991.562130] RDX: 00000000000000f6 RSI: 000000c000300000 RDI: 0000000000000007
[ 1991.570537] RBP: 000000c0007128c0 R08: 00000000000000f6 R09: 0000000000000004
[ 1991.578959] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000000
[ 1991.587358] R13: 0000000000000043 R14: 0000000000000001 R15: 0000000000000001
[ 1991.595755]  </TASK>
[ 1991.598607] ---[ end trace 0000000000000000 ]---
[ 1991.604180] ------------[ cut here ]------------
[ 1991.609739] WARNING: CPU: 39 PID: 1896 at mm/slab_common.c:1035 __ksize+0x10c/0x130
[ 1991.618698] Modules linked in: nvme_rdma nvme_fabrics vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio rdma_ucm rdma_cm iw_cm ib_umad ib_cm rfkill usdm_drv(OE) sunrpc binfmt_misc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_
intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mlx5_ib polyval_clmulni polyval_generic qat_c62x(OE) ghash_clmulni_intel sha512_ssse3 intel_qat(OE) rapl ib_uverbs ipmi_si mei_me iTCO_wdt intel_cstate intel_pmc_bxt ipmi_devintf iTCO_vendor_support ib_core intel_uncore
 joydev ipmi_msghandler pcspkr mgag200 lpc_ich mei uio i2c_i801 ioatdma i2c_smbus wmi ip6_tables ip_tables fuse zram bpf_preload loop overlay squashfs netconsole nd_pmem nd_btt nd_e820 libnvdimm virtio_blk virtio_net net_failover failover uas usb_storage nvme nvme_core nvme_common
mlx5_core mlxfw psample tls pci_hyperv_intf ixgbe mdio igb dca [last unloaded: nvme_fabrics]
[ 1991.713093] CPU: 39 PID: 1896 Comm: telegraf Tainted: G      D W  OE      6.1.14-200.fc37.x86_64 #1
[ 1991.723602] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 1991.735454] RIP: 0010:__ksize+0x10c/0x130
[ 1991.740492] Code: c1 e1 0c 48 03 0d 74 c8 50 01 48 39 ca 75 1b 48 8b 16 f7 c2 00 00 01 00 49 0f 44 c0 c3 cc cc cc cc 0f 0b 31 c0 c3 cc cc cc cc <0f> 0b 31 c0 c3 cc cc cc cc 48 8b 0d 54 64 d1 01 e9 0d ff ff ff 66
[ 1991.762311] RSP: 0018:ffffb59bca77fc60 EFLAGS: 00010206
[ 1991.768579] RAX: 0000000000008000 RBX: ffff8a13e635a800 RCX: ffff8a13e6358000
[ 1991.776984] RDX: ffff8a13e635a800 RSI: fffffb0de198d600 RDI: fffffb0dc0000000
[ 1991.785375] RBP: ffff8a13e920ba00 R08: 0000000000001000 R09: ffff8a1b9576ad00
[ 1991.793766] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
[ 1991.802158] R13: 0000000000000000 R14: ffff8a13c0048700 R15: ffff8a13e920ba00
[ 1991.810544] FS:  00007fe7bf7fe6c0(0000) GS:ffff8a1b95bc0000(0000) knlGS:0000000000000000
[ 1991.820009] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1991.826854] CR2: 00007fe801482000 CR3: 00000001d59d6002 CR4: 00000000001706e0
[ 1991.835266] Call Trace:
[ 1991.838428]  <TASK>
[ 1991.841210]  __build_skb_around+0xae/0xc0
[ 1991.846128]  __alloc_skb+0xec/0x1c0
[ 1991.850460]  tcp_stream_alloc_skb+0x28/0x130
[ 1991.855667]  tcp_sendmsg_locked+0x64e/0xbf0
[ 1991.860779]  ? sock_sendmsg+0x58/0x70
[ 1991.865313]  ? sock_write_iter+0x89/0xe0
[ 1991.870144]  tcp_sendmsg+0x27/0x40
[ 1991.874381]  sock_sendmsg+0x58/0x70
[ 1991.878715]  sock_write_iter+0x89/0xe0
[ 1991.883342]  vfs_write+0x34e/0x3e0
[ 1991.887587]  ksys_write+0x97/0xd0
[ 1991.891721]  do_syscall_64+0x5b/0x80
[ 1991.896140]  ? do_syscall_64+0x67/0x80
[ 1991.900743]  ? __irq_exit_rcu+0x3d/0x140
[ 1991.905531]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1991.911574] RIP: 0033:0x48d99b
[ 1991.915379] Code: fe ff eb bd e8 26 5a fe ff e9 61 ff ff ff cc e8 9b 1e fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[ 1991.937200] RSP: 002b:000000c000712870 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 1991.946093] RAX: ffffffffffffffda RBX: 000000c000061800 RCX: 000000000048d99b
[ 1991.954488] RDX: 00000000000000f6 RSI: 000000c000300000 RDI: 0000000000000007
[ 1991.962888] RBP: 000000c0007128c0 R08: 00000000000000f6 R09: 0000000000000004
[ 1991.971303] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000000
[ 1991.979699] R13: 0000000000000043 R14: 0000000000000001 R15: 0000000000000001
[ 1991.988101]  </TASK>
[ 1991.990960] ---[ end trace 0000000000000000 ]---
[ 1992.667061] mlx5_core 0000:81:00.0 eth6: Link down
[ 1992.838968] mlx5_core 0000:81:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), active vports(0)
[ 1992.861929] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), active vports(0)
[ 1996.925731] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), active vports(0)
[ 1997.614150] mlx5_core 0000:81:00.1: E-Switch: cleanup
[ 1998.723725] pci 0000:81:00.1: Removing from iommu group 14
[ 1998.853756] pcieport 0000:00:1c.0: Enabling MPC IRBNCE
[ 1998.860145] pcieport 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[ 1998.870608] pci 0000:00:1e.0: PCI bridge to [bus 11]
[ 1998.878343] pci 0000:81:00.1: [15b3:1015] type 00 class 0x020000
[ 1998.886906] pci 0000:81:00.1: reg 0x10: [mem 0x3800fa000000-0x3800fbffffff 64bit pref]
[ 1998.897516] pci 0000:81:00.1: reg 0x30: [mem 0xec000000-0xec0fffff pref]
[ 1998.906948] pci 0000:81:00.1: PME# supported from D3cold
[ 1998.914320] pci 0000:81:00.1: reg 0x1a4: [mem 0x3800fe000000-0x3800fe0fffff 64bit pref]
[ 1998.924338] pci 0000:81:00.1: VF(n) BAR0 space: [mem 0x3800fe000000-0x3800fe7fffff 64bit pref] (contains BAR0 for 8 VFs)
[ 1998.938901] pci 0000:81:00.1: Adding to iommu group 14
[ 1998.994679] pci 0000:81:00.1: BAR 0: assigned [mem 0x3800fa000000-0x3800fbffffff 64bit pref]
[ 1999.004591] pci 0000:81:00.1: BAR 6: assigned [mem 0xec000000-0xec0fffff pref]
[ 1999.013071] pci 0000:81:00.1: BAR 7: assigned [mem 0x3800fe000000-0x3800fe7fffff 64bit pref]
[ 1999.024514] mlx5_core 0000:81:00.1: firmware version: 14.32.1010
[ 1999.031718] mlx5_core 0000:81:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 1999.342851] mlx5_core 0000:81:00.1: E-Switch: Total vports 10, per vport: max uc(128) max mc(2048)
[ 1999.381325] mlx5_core 0000:81:00.1: Port module event: module 1, Cable unplugged
[ 1999.425105] mlx5_core 0000:81:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0)
[ 1999.801251] mlx5_core 0000:81:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 1999.862861] general protection fault, probably for non-canonical address 0xc334bd6ee67153f1: 0000 [#2] PREEMPT SMP PTI
[ 1999.875311] CPU: 39 PID: 64956 Comm: (udev-worker) Tainted: G      D W  OE      6.1.14-200.fc37.x86_64 #1
[ 1999.886387] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[ 1999.898260] RIP: 0010:__kmem_cache_alloc_node+0x171/0x2b0
[ 1999.904676] Code: 44 24 08 e8 21 a9 ff ff 48 8b 44 24 08 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 41 8b 4f 28 49 8b 3f 48 01 c1 <48> 8b 19 48 89 ce 49 33 9f b8 00 00 00 48 0f ce 48 31 f3 40 f6 c7



I have been looking into latest changes in the kernel but the only similar thing that I could find was this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d1c1379d71777ddeda3e54f8fc26e9ecbfd1009 

but from the look of it it should be already present in >=6.2.

Any hints would be appreciated.
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-12-20 10:24:18 UTC
(In reply to michallinuxstuff from comment #0)

> Any hints would be appreciated.

If this is a somewhat recent regression (see https://docs.kernel.org/admin-guide/reporting-regressions.html ) it should be fixed, but at least to me it's unclear if this is really a regression. Hence allow me to ask: Did this work earlier on the particular machine? 

Side note: the developers you are trying to contact might not see this report. Search for bugzilla in https://docs.kernel.org/admin-guide/reporting-issues.html and https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/ to understand why; yes, I wish it would be different, but that's outside of my control.
Comment 2 michallinuxstuff 2023-12-20 16:02:30 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1)
> (In reply to michallinuxstuff from comment #0)
> 
> > Any hints would be appreciated.
> 
> If this is a somewhat recent regression (see
> https://docs.kernel.org/admin-guide/reporting-regressions.html ) it should
> be fixed, but at least to me it's unclear if this is really a regression.
> Hence allow me to ask: Did this work earlier on the particular machine? 
> 
> Side note: the developers you are trying to contact might not see this
> report. Search for bugzilla in
> https://docs.kernel.org/admin-guide/reporting-issues.html and
> https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-
> kernel-bug-reports-are-ignored/ to understand why; yes, I wish it would be
> different, but that's outside of my control.

The whole issue is intermittent in nature, it's not 100% reproducible - it takes couple of spins to trigger it. But I do believe that soon after this particular test was introduced we started seeing this issue popping up (both under 6.1.14 and 6.5.12).
Comment 3 michallinuxstuff 2024-01-10 21:55:35 UTC
Is there anything I could do to move this forward? :) I can collect kernel's core and fetch some data from it if needed. My c-fu is not that great so I would need some hints on what info would be useful for this case - I can basically provide anything that can be extracted from the vmcore via the crash util for instance.
Comment 4 michallinuxstuff 2024-01-11 13:36:34 UTC
Marking as [REGRESSION] since my gut tells me it's too closely tied to the code path touched by the https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d1c1379d71777ddeda3e54f8fc26e9ecbfd1009.
Comment 5 michallinuxstuff 2024-01-11 19:11:49 UTC
Under 6.5.12, sometimes, instead of an Oops this WARN is caught:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/workqueue.c?h=v6.5.12#n1638 and kernel continues (panic_on_warn == 0). But sooner or later, after spinning the test multiple times, we end up with the actual BUG again.
Comment 6 michallinuxstuff 2024-01-11 19:35:21 UTC
Here's a userspace process which actively spins during the test and which holds on to the mlx code path:

crash> foreach bt | grep reactor
WARNING: possibly bogus exception frame
 PID: 60215    TASK: ffff9d211e8f0000  CPU: 0    COMMAND: "reactor_0"
PID: 60219    TASK: ffff9d2107388000  CPU: 1    COMMAND: "reactor_1"
PID: 60257    TASK: ffff9d2107564000  CPU: 2    COMMAND: "reactor_2"
crash> set 60215
    PID: 60215
COMMAND: "reactor_0"
   TASK: ffff9d211e8f0000  [THREAD_INFO: ffff9d211e8f0000]
    CPU: 0
  STATE: TASK_WAKING
crash> bt
PID: 60215    TASK: ffff9d211e8f0000  CPU: 0    COMMAND: "reactor_0"
 #0 [ffffab7b471b3650] __schedule at ffffffffb6fd322e
 #1 [ffffab7b471b3708] schedule at ffffffffb6fd436e
 #2 [ffffab7b471b3720] schedule_timeout at ffffffffb6fdab98
 #3 [ffffab7b471b3770] wait_for_completion_timeout at ffffffffb6fd5293
 #4 [ffffab7b471b37d0] cmd_exec at ffffffffc084c58d [mlx5_core]
 #5 [ffffab7b471b3850] mlx5_cmd_do at ffffffffc084cab2 [mlx5_core]
 #6 [ffffab7b471b3880] mlx5_cmd_exec at ffffffffc084cb0b [mlx5_core]
 #7 [ffffab7b471b38a0] mlx5_query_nic_vport_min_inline at ffffffffc085e6d2 [mlx5_core]
 #8 [ffffab7b471b39d8] mlx5_query_min_inline at ffffffffc085e766 [mlx5_core]
 #9 [ffffab7b471b39e8] set_ucontext_resp at ffffffffc0e9f723 [mlx5_ib]
#10 [ffffab7b471b3a08] mlx5_ib_alloc_ucontext at ffffffffc0ea277f [mlx5_ib]
#11 [ffffab7b471b3ac0] ib_init_ucontext at ffffffffc0d97513 [ib_uverbs]
#12 [ffffab7b471b3b00] ib_uverbs_handler_UVERBS_METHOD_GET_CONTEXT at ffffffffc0d9ef4d [ib_uverbs]
#13 [ffffab7b471b3b30] ib_uverbs_cmd_verbs at ffffffffc0d9bef0 [ib_uverbs]
#14 [ffffab7b471b3dc8] ib_uverbs_ioctl at ffffffffc0d9c068 [ib_uverbs]
#15 [ffffab7b471b3e08] __x64_sys_ioctl at ffffffffb64787a4
#16 [ffffab7b471b3e38] do_syscall_64 at ffffffffb6fbe46d
#17 [ffffab7b471b3f50] entry_SYSCALL_64_after_hwframe at ffffffffb70000ea
    RIP: 00007f0e910c1ecd  RSP: 00007ffcf14908a0  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007ffcf14909e0  RCX: 00007f0e910c1ecd
    RDX: 00007ffcf1490a00  RSI: 00000000c0181b01  RDI: 00000000000000f3
    RBP: 00007ffcf14908f0   R8: 0000000001000010   R9: 0000000000000002
    R10: 0000000000000050  R11: 0000000000000246  R12: 00007ffcf1490a18
    R13: 00007ffcf1490a18  R14: 0000000000decc00  R15: 0000000000000001
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
crash>
Comment 7 michallinuxstuff 2024-01-11 20:21:33 UTC
Here's latest, "enhanced" stack dump (passed through decode_stacktrace.sh) that I got:

[  632.895084] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  632.902404] #PF: supervisor read access in kernel mode
[  632.907784] #PF: error_code(0x0000) - not-present page
[  632.913138] PGD 800000011f3a9067 P4D 800000011f3a9067 PUD 2d75bd067 PMD 0
[  632.920226] Oops: 0000 [#1] PREEMPT SMP PTI
[  632.934829] Hardware name: Intel Corporation S2600WFQ/S2600WFQ, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[  632.945476] RIP: 0010:__queue_work (kernel/workqueue.c:1659 (discriminator 1)) 
[ 632.950052] Code: 8b 33 40 f6 c6 04 75 cf 48 c1 ee 05 81 fe ff ff ff 7f 0f 84 ae 00 00 00 48 63 f6 48 c7 c7 a0 cb 25 b8 e8 44 16 e6 00 49 89 c7 <48> 8b 7d 00 4d 85 ff 0f 84 93 00 00 00 49 39 ff 0f 84 8a 00 00 00
All code
========
   0:	8b 33                	mov    (%rbx),%esi
   2:	40 f6 c6 04          	test   $0x4,%sil
   6:	75 cf                	jne    0xffffffffffffffd7
   8:	48 c1 ee 05          	shr    $0x5,%rsi
   c:	81 fe ff ff ff 7f    	cmp    $0x7fffffff,%esi
  12:	0f 84 ae 00 00 00    	je     0xc6
  18:	48 63 f6             	movslq %esi,%rsi
  1b:	48 c7 c7 a0 cb 25 b8 	mov    $0xffffffffb825cba0,%rdi
  22:	e8 44 16 e6 00       	call   0xe6166b
  27:	49 89 c7             	mov    %rax,%r15
  2a:*	48 8b 7d 00          	mov    0x0(%rbp),%rdi		<-- trapping instruction
  2e:	4d 85 ff             	test   %r15,%r15
  31:	0f 84 93 00 00 00    	je     0xca
  37:	49 39 ff             	cmp    %rdi,%r15
  3a:	0f 84 8a 00 00 00    	je     0xca

Code starting with the faulting instruction
===========================================
   0:	48 8b 7d 00          	mov    0x0(%rbp),%rdi
   4:	4d 85 ff             	test   %r15,%r15
   7:	0f 84 93 00 00 00    	je     0xa0
   d:	49 39 ff             	cmp    %rdi,%r15
  10:	0f 84 8a 00 00 00    	je     0xa0
[  632.969245] RSP: 0018:ffffab7b40003e78 EFLAGS: 00010046
[  632.974697] RAX: ffff9d2100c59c00 RBX: ffff9d2885a58c48 RCX: 0000000000000000
[  632.982052] RDX: ffff9d2100c59c00 RSI: ffff9d2101001b60 RDI: 00000000000000e0
[  632.989407] RBP: 0000000000000000 R08: ffff9d2101001c88 R09: ffffffffb825cba0
[  632.996758] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d28c5680800
[  633.004107] R13: 0000000000002000 R14: 0000000000000000 R15: ffff9d2100c59c00
[  633.011452] FS:  0000000000000000(0000) GS:ffff9d285f400000(0000) knlGS:0000000000000000
[  633.019755] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  633.025712] CR2: 0000000000000000 CR3: 0000000115b56002 CR4: 00000000007706f0
[  633.033055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  633.040400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  633.047738] PKRU: 55555554
[  633.050653] Call Trace:
[  633.053302]  <IRQ>
[  633.055518] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) 
[  633.058779] ? page_fault_oops (arch/x86/mm/fault.c:707 (discriminator 1)) 
[  633.063076] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:695 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542) 
[  633.067198] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570) 
[  633.071585] ? __queue_work (kernel/workqueue.c:1659 (discriminator 1)) 
[  633.075534] ? __queue_work (kernel/workqueue.c:787 kernel/workqueue.c:1658) 
[  633.079479] ? __pfx_delayed_work_timer_fn (kernel/workqueue.c:1841) 
[  633.084644] call_timer_fn (kernel/time/timer.c:1700) 
[  633.088510] ? __pfx_delayed_work_timer_fn (kernel/workqueue.c:1841) 
[  633.093679] __run_timers (kernel/time/timer.c:1747 kernel/time/timer.c:2022) 
[  633.097547] run_timer_softirq (kernel/time/timer.c:2037 (discriminator 1)) 
[  633.101681] __do_softirq (kernel/softirq.c:553) 
[  633.105470] __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632) 
[  633.109345] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1111 (discriminator 47)) 
[  633.114347]  </IRQ>
[  633.116658]  <TASK>
[  633.118974] asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645) 
[  633.124335] RIP: 0010:cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291) 
[ 633.129520] Code: aa 2c 1c ff e8 c5 f3 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 93 32 1b ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
All code
========
   0:	aa                   	stos   %al,%es:(%rdi)
   1:	2c 1c                	sub    $0x1c,%al
   3:	ff                   	(bad)
   4:	e8 c5 f3 ff ff       	call   0xfffffffffffff3ce
   9:	8b 53 04             	mov    0x4(%rbx),%edx
   c:	49 89 c5             	mov    %rax,%r13
   f:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  14:	31 ff                	xor    %edi,%edi
  16:	e8 93 32 1b ff       	call   0xffffffffff1b32ae
  1b:	45 84 ff             	test   %r15b,%r15b
  1e:	0f 85 56 02 00 00    	jne    0x27a
  24:	fb                   	sti
  25:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  2a:*	45 85 f6             	test   %r14d,%r14d		<-- trapping instruction
  2d:	0f 88 85 01 00 00    	js     0x1b8
  33:	49 63 d6             	movslq %r14d,%rdx
  36:	48 8d 04 52          	lea    (%rdx,%rdx,2),%rax
  3a:	48 8d 04 82          	lea    (%rdx,%rax,4),%rax
  3e:	49                   	rex.WB
  3f:	8d                   	.byte 0x8d

Code starting with the faulting instruction
===========================================
   0:	45 85 f6             	test   %r14d,%r14d
   3:	0f 88 85 01 00 00    	js     0x18e
   9:	49 63 d6             	movslq %r14d,%rdx
   c:	48 8d 04 52          	lea    (%rdx,%rdx,2),%rax
  10:	48 8d 04 82          	lea    (%rdx,%rax,4),%rax
  14:	49                   	rex.WB
  15:	8d                   	.byte 0x8d
[  633.148726] RSP: 0018:ffffffffb8203e28 EFLAGS: 00000246
[  633.154181] RAX: ffff9d285f433e00 RBX: ffffcb7322c01388 RCX: 000000000000001f
[  633.161554] RDX: 0000000000000000 RSI: 000000003351fed6 RDI: 0000000000000000
[  633.168926] RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000018
[  633.176288] R10: ffff9d285f4327c4 R11: 0000000000000016 R12: ffffffffb843b0a0
[  633.183654] R13: 000000935b7c7cf4 R14: 0000000000000002 R15: 0000000000000000
[  633.191025] ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:285) 
[  633.195611] cpuidle_enter (drivers/cpuidle/cpuidle.c:390 (discriminator 2)) 
[  633.199424] do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282) 
[  633.202883] cpu_startup_entry (kernel/sched/idle.c:379) 
[  633.207029] rest_init (usercopy_64.c:?) 
[  633.210474] arch_call_rest_init+0xe/0x30 
[  633.214696] start_kernel (init/main.c:833 (discriminator 1) init/main.c:909 (discriminator 1)) 
[  633.218561] x86_64_start_reservations (arch/x86/kernel/head64.c:544) 
[  633.223377] x86_64_start_kernel (arch/x86/kernel/head64.c:486 (discriminator 5)) 
[  633.227674] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:441) 
[  633.233100]  </TASK>
[  633.235478] Modules linked in: nvme_rdma nvme_fabrics vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd rdma_ucm rdma_cm iw_cm ib_umad ib_cm usdm_drv(OE) rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common sunrpc skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel binfmt_misc kvm ipmi_ssif irqbypass crct10dif_pclmul qat_c62x(OE) crc32_pclmul crc32c_intel polyval_clmulni polyval_generic intel_qat(OE) mlx5_ib ghash_clmulni_intel sha512_ssse3 rapl ib_uverbs iTCO_wdt intel_pmc_bxt iTCO_vendor_support intel_cstate acpi_ipmi mei_me ast ipmi_si i2c_i801 ioatdma ib_core dax_pmem intel_uncore pcspkr mei i2c_smbus i2c_algo_bit ipmi_devintf uio lpc_ich intel_pch_thermal wmi dca ipmi_msghandler acpi_power_meter acpi_pad joydev ip6_tables ip_tables fuse zram bpf_preload loop overlay squashfs netconsole nd_pmem nd_btt nd_e820 libnvdimm virtio_blk virtio_net net_failover failover mlx5_core mlxfw psample tls pci_hyperv_intf nvme nvme_core nvme_common
[  633.235550]  ice(OE) gnss i40e [last unloaded: nvme_fabrics]
[  633.332894] CR2: 0000000000000000
Comment 8 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-01-12 14:13:38 UTC
(In reply to michallinuxstuff from comment #3)
> Is there anything I could do to move this forward? :)

As mentioned earlier: you are unlikely to reach the developers here which you apparently try to contact. You should follow https://docs.kernel.org/admin-guide/reporting-issues.html 6.5 is EOL now anyway, so developers most likely want to know if current kernels (e.g. 6.7 and 6.8-rc1 once it's out) are still affected. 

Sorry, I wish things were different, but that's how it is.

/me unCCs himself
Comment 9 michallinuxstuff 2024-01-12 16:05:13 UTC
Fair enough - will try to test it out under 6.7 then and report my findings under proper mailing list instead. Thanks!

Note You need to log in before you can comment on or make changes to this bug.