Created attachment 306446 [details] full logfile (zipped) I've decided to try out 6.10-rc2 on my proxmox machine running on a Zen2 Threadripiper because of all the amd-pstate improvements. During bootup I notice it prints a lot of kernel panics in the logs. They mostly look like this: Jun 09 23:11:23 pve kernel: ------------[ cut here ]------------ Jun 09 23:11:23 pve kernel: WARNING: CPU: 9 PID: 1870 at include/linux/rwsem.h:85 remap_pfn_range_notrack+0x4a5/0x590 Jun 09 23:11:23 pve kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter scsi_transport_iscsi nf_tables bonding tls softdog sunrpc nfnetl> Jun 09 23:11:23 pve kernel: xhci_hcd i2c_piix4 wmi Jun 09 23:11:23 pve kernel: CPU: 9 PID: 1870 Comm: CPU 0/KVM Tainted: G W OE 6.10.0-rc2 #3 Jun 09 23:11:23 pve kernel: Hardware name: ASUS System Product Name/ROG ZENITH II EXTREME, BIOS 2102 02/16/2024 Jun 09 23:11:23 pve kernel: RIP: 0010:remap_pfn_range_notrack+0x4a5/0x590 Jun 09 23:11:23 pve kernel: Code: 45 31 d2 45 31 db e9 2a f2 d2 00 48 8b 7d b8 48 89 c6 e8 ce 95 ff ff 85 c0 0f 84 66 fe ff ff eb a6 0f 0b b9 ea ff ff ff eb a2 <0f> 0b e9 e9 fb ff ff 0f 0b 48 8b 7d b8 4c 89 fa 4c 89 ce 4c 89 4d Jun 09 23:11:23 pve kernel: RSP: 0018:ffffb640c103f900 EFLAGS: 00010246 Jun 09 23:11:23 pve kernel: RAX: 000000802d0644fb RBX: ffff9485c89ea730 RCX: 0000000000100000 Jun 09 23:11:23 pve kernel: RDX: 0000000000000000 RSI: ffff9485e489bc80 RDI: ffff9485c89ea730 Jun 09 23:11:23 pve kernel: RBP: ffffb640c103f9b8 R08: 8000000000000037 R09: 0000000000000000 Jun 09 23:11:23 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000c2100 Jun 09 23:11:23 pve kernel: R13: 00007f8a50200000 R14: 8000000000000037 R15: 00007f8a50100000 Jun 09 23:11:23 pve kernel: FS: 00007f8a4aa006c0(0000) GS:ffff94a47dc80000(0000) knlGS:0000000000000000 Jun 09 23:11:23 pve kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 09 23:11:23 pve kernel: CR2: 00007f8a352ae000 CR3: 0000000117588000 CR4: 0000000000350ef0 Jun 09 23:11:23 pve kernel: Call Trace: Jun 09 23:11:23 pve kernel: <TASK> Jun 09 23:11:23 pve kernel: ? show_regs+0x6c/0x80 Jun 09 23:11:23 pve kernel: ? __warn+0x88/0x140 Jun 09 23:11:23 pve kernel: ? remap_pfn_range_notrack+0x4a5/0x590 Jun 09 23:11:23 pve kernel: ? report_bug+0x182/0x1b0 Jun 09 23:11:23 pve kernel: ? handle_bug+0x46/0x90 Jun 09 23:11:23 pve kernel: ? exc_invalid_op+0x18/0x80 Jun 09 23:11:23 pve kernel: ? asm_exc_invalid_op+0x1b/0x20 Jun 09 23:11:23 pve kernel: ? remap_pfn_range_notrack+0x4a5/0x590 Jun 09 23:11:23 pve kernel: ? track_pfn_remap+0x139/0x140 Jun 09 23:11:23 pve kernel: ? down_write+0x12/0x80 Jun 09 23:11:23 pve kernel: remap_pfn_range+0x5c/0xc0 Jun 09 23:11:23 pve kernel: ? srso_return_thunk+0x5/0x5f Jun 09 23:11:23 pve kernel: vfio_pci_mmap_fault+0xb1/0x180 [vfio_pci_core] Jun 09 23:11:23 pve kernel: __do_fault+0x3b/0x130 Jun 09 23:11:23 pve kernel: do_fault+0xc5/0x490 Jun 09 23:11:23 pve kernel: ? srso_return_thunk+0x5/0x5f Jun 09 23:11:23 pve kernel: __handle_mm_fault+0x842/0x1100 Jun 09 23:11:23 pve kernel: handle_mm_fault+0x197/0x340 Jun 09 23:11:23 pve kernel: fixup_user_fault+0x91/0x1e0 Jun 09 23:11:23 pve kernel: vaddr_get_pfns+0x10e/0x280 [vfio_iommu_type1] Jun 09 23:11:23 pve kernel: vfio_pin_pages_remote+0x39f/0x520 [vfio_iommu_type1] Jun 09 23:11:23 pve kernel: ? srso_return_thunk+0x5/0x5f Jun 09 23:11:23 pve kernel: ? alloc_pages_mpol_noprof+0xd9/0x1f0 Jun 09 23:11:23 pve kernel: vfio_iommu_type1_ioctl+0x10ad/0x1ad0 [vfio_iommu_type1] Jun 09 23:11:23 pve kernel: vfio_fops_unl_ioctl+0x6b/0x380 [vfio] Jun 09 23:11:23 pve kernel: __x64_sys_ioctl+0xa3/0xf0 Jun 09 23:11:23 pve kernel: x64_sys_call+0xa68/0x24d0 Jun 09 23:11:23 pve kernel: do_syscall_64+0x70/0x160 Jun 09 23:11:23 pve kernel: ? srso_return_thunk+0x5/0x5f Jun 09 23:11:23 pve kernel: ? irqentry_exit+0x43/0x50 Jun 09 23:11:23 pve kernel: ? srso_return_thunk+0x5/0x5f Jun 09 23:11:23 pve kernel: ? exc_page_fault+0x93/0x1b0 Jun 09 23:11:23 pve kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Jun 09 23:11:23 pve kernel: RIP: 0033:0x7f8a5cb8cc5b Jun 09 23:11:23 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 Jun 09 23:11:23 pve kernel: RSP: 002b:00007f8a4a9faa40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Jun 09 23:11:23 pve kernel: RAX: ffffffffffffffda RBX: 0000560ed91739b0 RCX: 00007f8a5cb8cc5b Jun 09 23:11:23 pve kernel: RDX: 00007f8a4a9faaa0 RSI: 0000000000003b71 RDI: 000000000000003e Jun 09 23:11:23 pve kernel: RBP: 0000000081c00000 R08: 0000000000000000 R09: 0000000000000000 Jun 09 23:11:23 pve kernel: R10: 00000000000fe000 R11: 0000000000000246 R12: 00000000000fe000 Jun 09 23:11:23 pve kernel: R13: 00000000000fe000 R14: 00007f8a4a9faaa0 R15: 00007f8a4a9fabf0 Jun 09 23:11:23 pve kernel: </TASK> Jun 09 23:11:23 pve kernel: ---[ end trace 0000000000000000 ]--- But I've attached a full log containing all the panics. The systems seems to run stable otherwise.
Created attachment 306447 [details] Settings of the VM
Alright, it's not a regression in the kernel but caused by a bios update (I guess). I get the same on my previous kernel 6.9.0-rc1. Both my 6.9.0-rc1 6.10.0-rc2 kernels are vanilla builds from kernel.org (unpatched). After updating the bios/firmware of my mainboard Asus ROG Zenith II Extreme from 1802 to 2102, it always seems to spawn the error: [ 1150.380137] ------------[ cut here ]------------ [ 1150.380141] Unpatched return thunk in use. This should not happen! [ 1150.380144] WARNING: CPU: 3 PID: 4849 at arch/x86/kernel/cpu/bugs.c:2935 __warn_thunk+0x40/0x50 [ 1150.380152] Modules linked in: veth rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables scsi_transport_iscsi bonding tls softdog sunrpc nfnetlink_log nfnetlink binfmt_misc amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel eeepc_wmi crypto_simd asus_wmi cryptd platform_profile sparse_keymap asus_ec_sensors video pcspkr rapl ccp mxm_wmi wmi_bmof k10temp mac_hid vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd vhost_net vhost vhost_iotlb tap nct6775 nct6775_core hwmon_vid lm75 drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c igb xhci_pci atlantic nvme ahci crc32_pclmul xhci_pci_renesas i2c_algo_bit libahci dca macsec nvme_core xhci_hcd i2c_piix4 wmi [ 1150.380266] CPU: 3 PID: 4849 Comm: CPU 0/KVM Not tainted 6.9.0-rc1 #1 [ 1150.380269] Hardware name: ASUS System Product Name/ROG ZENITH II EXTREME, BIOS 2102 02/16/2024 [ 1150.380271] RIP: 0010:__warn_thunk+0x40/0x50 [ 1150.380275] Code: 96 f1 fe 00 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 31 ff e9 43 1c 08 01 48 c7 c7 b8 f2 f4 9e c6 05 56 61 4c 02 01 e8 00 b1 07 00 <0f> 0b 48 8b 5d f8 c9 31 f6 31 ff e9 20 1c 08 01 90 90 90 90 90 90 [ 1150.380278] RSP: 0018:ffffb478c2ce3ca8 EFLAGS: 00010046 [ 1150.380281] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 1150.380283] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 1150.380285] RBP: ffffb478c2ce3cb0 R08: 0000000000000000 R09: 0000000000000000 [ 1150.380287] R10: 0000000000000000 R11: 0000000000000000 R12: ffff91f80e948000 [ 1150.380289] R13: 0000000000000000 R14: ffffb478c4ab5000 R15: ffff91f80e948038 [ 1150.380291] FS: 00007f74baa006c0(0000) GS:ffff9216bd980000(0000) knlGS:0000000000000000 [ 1150.380293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1150.380295] CR2: 0000000000000000 CR3: 0000000106750000 CR4: 0000000000350ef0 [ 1150.380298] Call Trace: [ 1150.380300] <TASK> [ 1150.380304] ? show_regs+0x6c/0x80 [ 1150.380309] ? __warn+0x88/0x140 [ 1150.380312] ? __warn_thunk+0x40/0x50 [ 1150.380316] ? report_bug+0x182/0x1b0 [ 1150.380322] ? handle_bug+0x46/0x90 [ 1150.380325] ? exc_invalid_op+0x18/0x80 [ 1150.380329] ? asm_exc_invalid_op+0x1b/0x20 [ 1150.380336] ? __warn_thunk+0x40/0x50 [ 1150.380341] ? __warn_thunk+0x40/0x50 [ 1150.380344] warn_thunk_thunk+0x16/0x30 [ 1150.380351] svm_vcpu_enter_exit+0x71/0xc0 [kvm_amd] [ 1150.380364] svm_vcpu_run+0x1e7/0x850 [kvm_amd] [ 1150.380377] kvm_arch_vcpu_ioctl_run+0xca3/0x16d0 [kvm] [ 1150.380458] kvm_vcpu_ioctl+0x295/0x800 [kvm] [ 1150.380522] ? srso_return_thunk+0x5/0x5f [ 1150.380526] ? __x64_sys_ioctl+0xbb/0xf0 [ 1150.380530] ? srso_return_thunk+0x5/0x5f [ 1150.380533] ? syscall_exit_to_user_mode+0x75/0x1b0 [ 1150.380537] ? srso_return_thunk+0x5/0x5f [ 1150.380541] ? do_syscall_64+0x84/0x140 [ 1150.380544] ? srso_return_thunk+0x5/0x5f [ 1150.380547] ? do_syscall_64+0x84/0x140 [ 1150.380550] ? switch_fpu_return+0x50/0xe0 [ 1150.380555] __x64_sys_ioctl+0xa3/0xf0 [ 1150.380559] do_syscall_64+0x78/0x140 [ 1150.380563] ? srso_return_thunk+0x5/0x5f [ 1150.380566] ? do_syscall_64+0x84/0x140 [ 1150.380569] entry_SYSCALL_64_after_hwframe+0x6c/0x74 [ 1150.380573] RIP: 0033:0x7f74cbb8cc5b [ 1150.380592] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 1150.380595] RSP: 002b:00007f74ba9fb060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1150.380598] RAX: ffffffffffffffda RBX: 000056062f0cf7e0 RCX: 00007f74cbb8cc5b [ 1150.380600] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001f [ 1150.380602] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000 [ 1150.380604] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 1150.380606] R13: 0000000000000007 R14: 00007ffc10a8f820 R15: 00007f74ba200000 [ 1150.380612] </TASK> [ 1150.380613] ---[ end trace 0000000000000000 ]--- This happens when just starting the VM. Command line: BOOT_IMAGE=/boot/vmlinuz-6.9.0-rc1 root=/dev/mapper/pve-root ro quiet iommu=pt amd_iommu=on kvm_amd.npt=1 video=vesafb:off video=efifb:off video=simplefb:off nomodeset initcall_blacklist=sysfb_init modprobe.blacklist=nouveau modprobe.blacklist=amdgpu modprobe.blacklist=radeon modprobe.blacklist=nvidia amd_pstate=guided I've attached a screenshot of the settings for the VM. I believe the new bios updates the AGESA firmware from version: V9CastlePeakPI-SP3r3-1.0.0.9 To: CastlePeakPI-SP3r3 1.0.0.A (2023-11-21)
On Mon, Jun 10, 2024, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=218949 > > --- Comment #2 from Gino Badouri (badouri.g@gmail.com) --- > Alright, it's not a regression in the kernel but caused by a bios update (I > guess). > I get the same on my previous kernel 6.9.0-rc1. The WARNs are not remotely the same. The below issue in svm_vcpu_enter_exit() was resolved in v6.9 final[1]. The lockdep warnings in track_pfn_remap() and remap_pfn_range_notrack() is a known issue in vfio_pci_mmap_fault(), with an in-progress fix[2] that is destined for 6.10. [1] https://lore.kernel.org/all/1d10cd73-2ae7-42d5-a318-2f9facc42bbe@alu.unizg.hr [2] https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com > Both my 6.9.0-rc1 6.10.0-rc2 kernels are vanilla builds from kernel.org > (unpatched). > > After updating the bios/firmware of my mainboard Asus ROG Zenith II Extreme > from 1802 to 2102, it always seems to spawn the error: > > [ 1150.380137] ------------[ cut here ]------------ > [ 1150.380141] Unpatched return thunk in use. This should not happen! > [ 1150.380144] WARNING: CPU: 3 PID: 4849 at arch/x86/kernel/cpu/bugs.c:2935 > __warn_thunk+0x40/0x50 ... > [ 1150.380266] CPU: 3 PID: 4849 Comm: CPU 0/KVM Not tainted 6.9.0-rc1 #1 > [ 1150.380269] Hardware name: ASUS System Product Name/ROG ZENITH II EXTREME, > BIOS 2102 02/16/2024 > [ 1150.380271] RIP: 0010:__warn_thunk+0x40/0x50 ... > [ 1150.380298] Call Trace: > [ 1150.380300] <TASK> > [ 1150.380344] warn_thunk_thunk+0x16/0x30 > [ 1150.380351] svm_vcpu_enter_exit+0x71/0xc0 [kvm_amd] > [ 1150.380364] svm_vcpu_run+0x1e7/0x850 [kvm_amd] > [ 1150.380377] kvm_arch_vcpu_ioctl_run+0xca3/0x16d0 [kvm] > [ 1150.380458] kvm_vcpu_ioctl+0x295/0x800 [kvm]
Hi Sean! It always amazes me how fast you guys can find the patches/reports for certain bug reports :) The WARNING happened on 6.9.0-rc1 (so before the final release). For the pfn warnings I've applied the patchset from https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com on top of 6.10-rc3 and it's completely fixed, they're gone now. I've tested Call of Duty MW3 in Windows 11 with the NVIDIA GPU passed through the VM and I didn't notice any performance or stability difference with or without the patch. Thanks a lot!