Hello, when I try to run `qemu-system-x86_64 -accel kvm` in L1 bash in linux kvm guest with kernel 6.10.0 x86_64, I get this message in L1 guest: [ 104.446685] kvm_amd: Nested Virtualization enabled [ 104.446688] kvm_amd: Nested Paging disabled [ 104.446690] kvm_amd: PMU virtualization is disabled [ 112.940705] clocksource: timekeeping watchdog on CPU0: hpet wd-wd read-back delay of 50500ns [ 112.940746] clocksource: wd-tsc-wd read-back delay of 1385000ns, clock-skew test skipped! [ 355.714362] unchecked MSR access error: WRMSR to 0xc0000080 (tried to write 0x0000000000001d01) at rIP: 0xffffffff9228a274 (native_write_msr+0x4/0x20) [ 355.714373] Call Trace: [ 355.714376] <TASK> [ 355.714379] ? ex_handler_msr+0xd3/0x150 [ 355.714381] ? fixup_exception+0x276/0x2e0 [ 355.714383] ? exc_general_protection+0x14f/0x440 [ 355.714388] ? asm_exc_general_protection+0x22/0x30 [ 355.714391] ? native_write_msr+0x4/0x20 [ 355.714397] svm_hardware_enable+0xd5/0x2f0 [kvm_amd] [ 355.714405] kvm_arch_hardware_enable+0xc7/0x280 [kvm] [ 355.714469] hardware_enable_nolock+0x1d/0x50 [kvm] [ 355.714489] smp_call_function_many_cond+0xcf/0x4d0 [ 355.714494] ? kmalloc_trace_noprof+0x2c8/0x2f0 [ 355.714497] ? __pfx_hardware_enable_nolock+0x10/0x10 [kvm] [ 355.714516] on_each_cpu_cond_mask+0x20/0x40 [ 355.714517] kvm_dev_ioctl+0x815/0xb40 [kvm] [ 355.714538] __x64_sys_ioctl+0x93/0xd0 [ 355.714542] do_syscall_64+0x7e/0x190 [ 355.714545] ? kvm_dev_ioctl+0x2fb/0xb40 [kvm] [ 355.714564] ? __schedule+0x3f3/0xb40 [ 355.714566] ? syscall_exit_to_user_mode+0x73/0x200 [ 355.714567] ? do_syscall_64+0x8a/0x190 [ 355.714568] ? do_syscall_64+0x8a/0x190 [ 355.714569] ? tomoyo_init_request_info+0x95/0xc0 [ 355.714573] ? tomoyo_path_number_perm+0x88/0x200 [ 355.714576] ? kvm_dev_ioctl+0x2fb/0xb40 [kvm] [ 355.714595] ? syscall_exit_to_user_mode+0x73/0x200 [ 355.714597] ? syscall_exit_to_user_mode+0x73/0x200 [ 355.714598] ? do_syscall_64+0x8a/0x190 [ 355.714599] ? __count_memcg_events+0x54/0xf0 [ 355.714601] ? __rseq_handle_notify_resume+0xa4/0x4f0 [ 355.714604] ? handle_mm_fault+0xaa/0x320 [ 355.714608] ? restore_fpregs_from_fpstate+0x38/0x90 [ 355.714611] ? switch_fpu_return+0x4b/0xc0 [ 355.714612] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 355.714614] RIP: 0033:0x7fb24aab7c5b [ 355.714616] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 355.714617] RSP: 002b:00007ffee1205880 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 355.714619] RAX: ffffffffffffffda RBX: 000000000000ae01 RCX: 00007fb24aab7c5b [ 355.714620] RDX: 0000000000000000 RSI: 000000000000ae01 RDI: 000000000000000a [ 355.714620] RBP: 000055b5ba0d2160 R08: 00007fb24ab8cc68 R09: 0000000000000006 [ 355.714621] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 355.714621] R13: 00007ffee1205b80 R14: 0000000000000000 R15: 00007ffee1205ac0 [ 355.714622] </TASK> [ 355.880539] ------------[ cut here ]------------ [ 355.880542] kernel BUG at arch/x86/kvm/x86.c:510! [ 355.880548] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 355.880551] CPU: 0 PID: 1550 Comm: qemu-system-x86 Not tainted 6.10.0 #8 [ 355.880553] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-stable202402-prebuilt.qemu.org 02/14/2024 [ 355.880554] RIP: 0010:kvm_spurious_fault+0xe/0x10 [kvm] [ 355.880584] Code: 00 00 85 c0 0f 95 c0 e9 90 79 e7 d1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 80 3d f9 1c 02 00 00 74 05 e9 72 79 e7 d1 <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 e9 59 [ 355.880586] RSP: 0018:ffffb618806fbc38 EFLAGS: 00010246 [ 355.880587] RAX: 00000001025d0000 RBX: ffff94884d6c99b0 RCX: 0000000000000027 [ 355.880588] RDX: 0000000000000003 RSI: 000000000188d000 RDI: ffff94884d6c99b0 [ 355.880589] RBP: 0000000000038060 R08: 0000000000000001 R09: 0000000000000027 [ 355.880590] R10: 0000000000000001 R11: 0000000000400dc0 R12: ffff9488bbc38060 [ 355.880590] R13: 0000000000000000 R14: ffff9488411da000 R15: 0000000000000000 [ 355.880591] FS: 00007fb2390006c0(0000) GS:ffff9488bbc00000(0000) knlGS:0000000000000000 [ 355.880592] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 355.880593] CR2: 00007fbe78a5e030 CR3: 000000010d630000 CR4: 0000000000350ef0 [ 355.880595] Call Trace: [ 355.880598] <TASK> [ 355.880599] ? die+0x32/0x80 [ 355.880603] ? do_trap+0xd9/0x100 [ 355.880605] ? kvm_spurious_fault+0xe/0x10 [kvm] [ 355.880627] ? do_error_trap+0x6a/0x90 [ 355.880628] ? kvm_spurious_fault+0xe/0x10 [kvm] [ 355.880648] ? exc_invalid_op+0x4c/0x60 [ 355.880652] ? kvm_spurious_fault+0xe/0x10 [kvm] [ 355.880672] ? asm_exc_invalid_op+0x16/0x20 [ 355.880675] ? kvm_spurious_fault+0xe/0x10 [kvm] [ 355.880695] svm_prepare_switch_to_guest+0xe4/0x160 [kvm_amd] [ 355.880701] kvm_arch_vcpu_ioctl_run+0x441/0x15b0 [kvm] [ 355.880729] kvm_vcpu_ioctl+0x23d/0x6f0 [kvm] [ 355.880749] ? check_preempt_wakeup_fair+0x136/0x1d0 [ 355.880753] __x64_sys_ioctl+0x93/0xd0 [ 355.880757] do_syscall_64+0x7e/0x190 [ 355.880760] ? wake_up_q+0x4a/0x90 [ 355.880762] ? futex_wake+0x155/0x190 [ 355.880765] ? do_futex+0xeb/0x1c0 [ 355.880766] ? __x64_sys_futex+0x8e/0x1d0 [ 355.880767] ? syscall_exit_to_user_mode+0x73/0x200 [ 355.880769] ? syscall_exit_to_user_mode+0x73/0x200 [ 355.880770] ? do_syscall_64+0x8a/0x190 [ 355.880771] ? do_syscall_64+0x8a/0x190 [ 355.880772] ? exc_page_fault+0x72/0x170 [ 355.880773] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 355.880775] RIP: 0033:0x7fb24aab7c5b [ 355.880776] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 355.880777] RSP: 002b:00007fb238fff530 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 355.880778] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fb24aab7c5b [ 355.880779] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c [ 355.880780] RBP: 000055b5ba0d7e60 R08: 000055b5b32412d0 R09: 0000000000000000 [ 355.880780] R10: 00007fb24ab2bf70 R11: 0000000000000246 R12: 0000000000000000 [ 355.880781] R13: 0000000000000007 R14: 00007ffee1205360 R15: 00007fb238800000 [ 355.880782] </TASK> [ 355.880783] Modules linked in: kvm_amd ccp kvm qrtr rfkill binfmt_misc nls_ascii nls_cp437 vfat fat crc32_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd iTCO_wdt cryptd intel_pmc_bxt joydev iTCO_vendor_support pcspkr watchdog button sg evdev serio_raw parport_pc ppdev lp parport fuse loop efi_pstore dm_mod configfs qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c crc32c_generic xor raid6_pq raid1 raid0 md_mod hid_generic usbhid bochs drm_vram_helper hid sd_mod t10_pi drm_kms_helper crc64_rocksoft crc64 crc_t10dif crct10dif_generic drm_ttm_helper ttm ahci libahci ehci_pci uhci_hcd virtio_scsi libata ehci_hcd scsi_mod e1000e psmouse usbcore virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev crct10dif_pclmul crct10dif_common crc32c_intel drm virtio_ring i2c_i801 lpc_ich usb_common scsi_common i2c_smbus [last unloaded: ccp] [ 355.880835] ---[ end trace 0000000000000000 ]--- [ 355.884034] RIP: 0010:kvm_spurious_fault+0xe/0x10 [kvm] [ 355.884060] Code: 00 00 85 c0 0f 95 c0 e9 90 79 e7 d1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 80 3d f9 1c 02 00 00 74 05 e9 72 79 e7 d1 <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 e9 59 [ 355.884062] RSP: 0018:ffffb618806fbc38 EFLAGS: 00010246 [ 355.884063] RAX: 00000001025d0000 RBX: ffff94884d6c99b0 RCX: 0000000000000027 [ 355.884064] RDX: 0000000000000003 RSI: 000000000188d000 RDI: ffff94884d6c99b0 [ 355.884064] RBP: 0000000000038060 R08: 0000000000000001 R09: 0000000000000027 [ 355.884065] R10: 0000000000000001 R11: 0000000000400dc0 R12: ffff9488bbc38060 [ 355.884066] R13: 0000000000000000 R14: ffff9488411da000 R15: 0000000000000000 [ 355.884066] FS: 00007fb2390006c0(0000) GS:ffff9488bbc00000(0000) knlGS:0000000000000000 [ 355.884067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 355.884068] CR2: 00007fbe78a5e030 CR3: 000000010d630000 CR4: 0000000000350ef0 [ 355.884069] note: qemu-system-x86[1550] exited with preempt_count 1 If I run `qemu-system-x86_64 -accel tcg` in L1 bash, it correctly boots into qemu BIOS. Any ideas about what could have caused it?
Command I used on L0 AMD Ryzen: qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu Opteron_G5,check,+svm -hda c:\debian.qcow2 It's reproducible in 100% cases
On Mon, Jul 22, 2024, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=219085 > > --- Comment #1 from ununpta@mailto.plus --- > Command I used on L0 AMD Ryzen: > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu This is likely an issue in the L0 hypervisor, which in this case is Hyper-V. KVM (L1) hits a #GP when trying to enable EFER.SVME, which leads to the #UD on VMSAVE (SVM isn't enabled). [ 355.714362] unchecked MSR access error: WRMSR to 0xc0000080 (tried to write 0x0000000000001d01) at rIP: 0xffffffff9228a274 (native_write_msr+0x4/0x20) Do you you see the same behavior on other kernel (L1) version? Have you changed any other components (especially in L0)? > Opteron_G5,check,+svm -hda c:\debian.qcow2 > > It's reproducible in 100% cases > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are watching the assignee of the bug.
> Do you you see the same behavior on other kernel (L1) version? Have you > changed any other components (especially in L0)? Thank you for your help. What I tried: * Opened Hyper-V manager built in Windows and created Ubuntu 22.04 LTS available by default. * Opened PowerShell console and ran `Set-VMProcessor -VMName "Ubuntu 22.04 LTS" -ExposeVirtualizationExtensions $true` to allow Nested Virtualization in Hyper-V. I have to notice, though, that even without `ExposeVirtualizationExtensions $true`, KVM inside Hyper-V manager didn't crash as it did in qemu. Bash just printed a warning that nested virtualization is restricted. * Booted into "Ubuntu 22.04 LTS", installed qemu and `qemu-system-x86_64 -accel kvm` was successfull - BIOS was shown up. Default kernel was vmlinuz-5.15.0-27-generic - After qemu launch, only kvm-related messages were: [2.485820] kvm: Nested Virtualization enabled [2.485822] SVM: kvm: Nested Paging enabled [2.485823] SVM: kvm: Hyper-V enlightened NPT TLB flush enabled [2.485824] SVM: kvm: Hyper-V Direct TLB flush enabled [2.485828] SVM: Virtual VMLOAD VMSAVE supported Then I recompiled latest kernel and installed it with the same successful KVM-accelerated qemu BIOS boot. vmlinuz-6.10.0 - After qemu launch, only kvm-related messages are: [1.701988] kvm_amd: TSC scaling supported [1.701992] kvm_amd: Nested Virtualization enabled [1.701993] kvm_amd: Nested Paging enabled [1.701996] kvm_amd: kvm_amd: Hyper-V enlightened NPT TLB flush enabled [1.701997] kvm_amd: kvm_amd: Hyper-V Direct TLB flush enabled [1.701999] kvm_amd: Virtual VMLOAD VMSAVE supported [1.702000] kvm_amd: PMU virtualization is disabled I have to guess how to allow `Set-VMProcessor -VMName "Ubuntu 22.04 LTS" -ExposeVirtualizationExtensions $true` for third-party software, not only for machines created by Hyper-V manager. Maybe Qemu has to be run under admin priveleges as well. I also saw a claim from Peter Maydell, qemu developer, who had said this about qemu command line parameter `-cpu _processor_type_`: > using a specific cpu type will only work with KVM if the host CPU really is > that exact CPU type, otherwise, use "-cpu host" or "-cpu max". > This is a restriction in the kernel's KVM handling, and not something that > can be worked around in the QEMU side. Per https://gitlab.com/qemu-project/qemu/-/issues/239 I was somewhat confused by this claim because > --- Comment #1 from ununpta@mailto.plus --- > Command I used on L0 AMD Ryzen: > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu > Opteron_G5 Let me ask you a few questions. Q1: Can one use an older cpu (but still supporting SVM), not the actual bare one in qemu command line for nested virtualization or KVM will crash due to restriction in the kernel's KVM handling? Q2: Is there a command in bare Kernel/KVM console to figure out if EFER.SVME register/bit is writeable? If not, Q3: Can you recommend any package to figure out it?
On Tue, Jul 23, 2024, bugzilla-daemon@kernel.org wrote: > I also saw a claim from Peter Maydell, qemu developer, who had said this > about > qemu command line parameter `-cpu _processor_type_`: > > using a specific cpu type will only work with KVM if the host CPU really is > > that exact CPU type, otherwise, use "-cpu host" or "-cpu max". This generally isn't true. KVM is very capable of running older vCPU models on newer hardware. What won't work (at least, not well) is cross-vendor virtualization, i.e. advertising AMD on Intel and vice versa, but that's not what you're doing. > > This is a restriction in the kernel's KVM handling, and not something that > > can be worked around in the QEMU side. > Per https://gitlab.com/qemu-project/qemu/-/issues/239 > > I was somewhat confused by this claim because > > --- Comment #1 from ununpta@mailto.plus --- > > Command I used on L0 AMD Ryzen: > > qemu-system-x86_64.exe -m 4096 -machine q35 -accel whpx -smp 1 -cpu > > Opteron_G5 > > Let me ask you a few questions. > Q1: Can one use an older cpu (but still supporting SVM), not the actual bare > one in qemu command line for nested virtualization or KVM will crash due to > restriction in the kernel's KVM handling? Yes. There might be caveats, but AFAIK, QEMU's predefined vCPU models should always work. If it doesn't work, and you have decent evidence that it's a KVM problem, definitely feel free to file a KVM bug. > Q2: Is there a command in bare Kernel/KVM console to figure out if EFER.SVME > register/bit is writeable? If not, grep -q svm /proc/cpuinfo SVM can be disabled by firmware via MSR_VM_CR (0xc0010114) even if SVM is reported in raw CPUID, but the kernel accounts for that and clears the "svm" flag from the CPU data that's reported in /proc/cpuinfo. > Q3: Can you recommend any package to figure out it? Sorry, I don't follow this question.
Sean, after looking into AMD documentation on https://unix.stackexchange.com/questions/74376 I think it's clear why KVM in L1 crashes. AMD says: > Secure Virtual Machine Enable (SVME) Bit. Bit 12, read/write. Enables the SVM > extensions. When this bit is zero, the SVM instructions cause #UD exceptions. > EFER.SVME defaults to a reset value of zero. > The effect of turning off EFER.SVME while a guest is running is undefined; > therefore, the VMM should always prevent guests from writing EFER. > SVM extensions can be disabled by setting VM_CR.SVME_DISABLE. Command to read from EFER.SVME is `sudo rdmsr 0xC0000080 #EFER`. Both in non-working and working machines this command returns d01. d01 is 1101 0000 0001 in bin. Crashing command from Comment #1 did `WRMSR to 0xc0000080 (tried to write 0x0000000000001d01)`. 1d01 is 0001 1101 0000 0001 in bin. The leftmost 0001 is Bit 12. So crashing command in L1 tries to write Bit 12 to exclude #UD. Nested VM is impossible without Bit 12. Writing this bit needs 0ring priveleges, guests cannot do this but the VM manager can. VM manager hooks into the write operation, checks whether VM_CR.SVME_DISABLE == 0 and if true, sets the Bit 12 by itself with L0 priveleges, then returns success to the guest. This is what happens on Windows if KVM L1 runs on the top of native Windows Hyper-V manager L0. Qemu on windows does not hook into write command and guest tries to write the Bit with user privileges, which of course fails. Questions are: * How Does Processor determine who tries to write - L0 or L1? * Does KVM determine in its code source whether KVM itself runs on the top of Hyper-V or on the top of another KVM? * Should Qemu hook into WRMSR to 0xc0000080 (tried to write 0x0000000000001d01) coming from KVM if Qemu is accelerated by Hyper-V on L0 and KVM is L1? > Sorry, I don't follow this question. I figured out that the commands I had tried to describe turned out `sudo rdmsr 0xC0000080 #EFER` and `sudo rdmsr 0xC0010114 #VM_CR`. The package is called msr-tools :)
Closed as invalid since it is a qemu bug.