Bug 218792 - Guest call trace with mwait enabled
Summary: Guest call trace with mwait enabled
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-30 07:32 UTC by Chen, Fan
Modified: 2024-07-12 08:40 UTC (History)
1 user (show)

See Also:
Kernel Version: 6.9-rc6
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Chen, Fan 2024-04-30 07:32:26 UTC
Environment:
host/guest kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master e67572cd220(v6.9-rc6)
QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
Host/Guest OS: Centos stream9/Ubuntu24.04

Bug detail description: 
Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace "unchecked MSR access error"

Reproduce steps:
img=centos9.qcow2
qemu-system-x86_64 \
    -name legacy,debug-threads=on \
    -overcommit cpu-pm=on \
    -accel kvm -smp 8 -m 8G -cpu host \
    -drive file=${img},if=none,id=virtio-disk0 \
    -device virtio-blk-pci,drive=virtio-disk0 \
    -device virtio-net-pci,netdev=nic0 -netdev user,id=nic0,hostfwd=tcp::10023-:22 \
    -vnc :1 -serial stdio

Guest boot with call trace:
[ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP: 0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
[ 0.476465] Call Trace:
[ 0.476763] <TASK>
[ 0.477027] ? ex_handler_msr+0x128/0x140
[ 0.477460] ? fixup_exception+0x166/0x3c0
[ 0.477934] ? exc_general_protection+0xdc/0x3c0
[ 0.478481] ? asm_exc_general_protection+0x26/0x30
[ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
[ 0.479587] ? native_read_msr+0x8/0x40
[ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
[ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
[ 0.481275] intel_idle_init+0x161/0x360
[ 0.481742] do_one_initcall+0x45/0x220
[ 0.482209] do_initcalls+0xac/0x130
[ 0.482643] kernel_init_freeable+0x134/0x1e0
[ 0.483159] ? __pfx_kernel_init+0x10/0x10
[ 0.483648] kernel_init+0x1a/0x1c0
[ 0.484087] ret_from_fork+0x31/0x50
[ 0.484541] ? __pfx_kernel_init+0x10/0x10
[ 0.485030] ret_from_fork_asm+0x1a/0x30
[ 0.485462] </TASK>
Comment 1 Artem S. Tashkinov 2024-04-30 11:32:20 UTC
Is this a regression? Could you bisect?
Comment 2 Sean Christopherson 2024-04-30 16:42:01 UTC
On Tue, Apr 30, 2024, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218792
> 
>             Bug ID: 218792
>            Summary: Guest call trace with mwait enabled
>            Product: Virtualization
>            Version: unspecified
>           Hardware: Intel
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: farrah.chen@intel.com
>         Regression: No
> 
> Environment:
> host/guest kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> e67572cd220(v6.9-rc6)
> QEMU: https://gitlab.com/qemu-project/qemu.git master 5c6528dce86d
> Host/Guest OS: Centos stream9/Ubuntu24.04
> 
> Bug detail description: 
> Boot Guest with mwait enabled(-overcommit cpu-pm=on), guest call trace
> "unchecked MSR access error"
> 
> Reproduce steps:
> img=centos9.qcow2
> qemu-system-x86_64 \
>     -name legacy,debug-threads=on \
>     -overcommit cpu-pm=on \
>     -accel kvm -smp 8 -m 8G -cpu host \
>     -drive file=${img},if=none,id=virtio-disk0 \
>     -device virtio-blk-pci,drive=virtio-disk0 \
>     -device virtio-net-pci,netdev=nic0 -netdev
> user,id=nic0,hostfwd=tcp::10023-:22 \
>     -vnc :1 -serial stdio
> 
> Guest boot with call trace:
> [ 0.475344] unchecked MSR access error: RDMSR from 0xe2 at rIP:

MSR 0xE2 is MSR_PKG_CST_CONFIG_CONTROL, which hpet_is_pc10_damaged() assumes
exists if PC10 substates are supported. KVM doesn't emulate/support
MSR_PKG_CST_CONFIG_CONTROL, i.e. injects a #GP on the guest RDMSR, hence the
splat.  This isn't a KVM bug as KVM explicitly advertises all zeros for the
MWAIT CPUID leaf, i.e. QEMU is effectively telling the guest that PC10 substates
are support without KVM's explicit blessing.

That said, this is arguably a kernel bug (guest side), as I don't see anything
in the SDM that _requires_ MSR_PKG_CST_CONFIG_CONTROL to exist if PC10 substates
are supported.

The issue is likely benign, other that than obvious WARN.  The kernel gracefully
handles the #GP and zeros the result, i.e. will always think PC10 is _disabled_,
which may or may not be correct, but is functionally ok if the HPET is being
emulated by the host, which it probably is.

	rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
	if ((pcfg & 0xF) < 8)
		return false;

The most straightforward fix, and probably the most correct all around, would be
to use rdmsrl_safe() to suppress the WARN, i.e. have the kernel not yell if
MSR_PKG_CST_CONFIG_CONTROL doesn't exist.  Unless HPET is also being passed
through, that'll do the right thing when Linux is a guest.  And if a setup also
passes through HPET, then the VMM can also trap-and-emulate MSR_PKG_CST_CONFIG_CONTROL
as appropriate (doing so in QEMU without KVM support might be impossible, though
again it's unnecessary if QEMU is emulating the HPET).

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index c96ae8fee95e..2afafff18f92 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -980,7 +980,9 @@ static bool __init hpet_is_pc10_damaged(void)
                return false;
 
        /* Check whether PC10 is enabled in PKG C-state limit */
-       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
+       if (rdmsrl_safe(MSR_PKG_CST_CONFIG_CONTROL, pcfg))
+               return false;
+
        if ((pcfg & 0xF) < 8)
                return false;

> 0xffffffffb5a966b8 (native_read_msr+0x8/0x40)
> [ 0.476465] Call Trace:
> [ 0.476763] <TASK>
> [ 0.477027] ? ex_handler_msr+0x128/0x140
> [ 0.477460] ? fixup_exception+0x166/0x3c0
> [ 0.477934] ? exc_general_protection+0xdc/0x3c0
> [ 0.478481] ? asm_exc_general_protection+0x26/0x30
> [ 0.479052] ? __pfx_intel_idle_init+0x10/0x10
> [ 0.479587] ? native_read_msr+0x8/0x40
> [ 0.480057] intel_idle_init_cstates_icpu.constprop.0+0x5e/0x560
> [ 0.480747] ? __pfx_intel_idle_init+0x10/0x10
> [ 0.481275] intel_idle_init+0x161/0x360
> [ 0.481742] do_one_initcall+0x45/0x220
> [ 0.482209] do_initcalls+0xac/0x130
> [ 0.482643] kernel_init_freeable+0x134/0x1e0
> [ 0.483159] ? __pfx_kernel_init+0x10/0x10
> [ 0.483648] kernel_init+0x1a/0x1c0
> [ 0.484087] ret_from_fork+0x31/0x50
> [ 0.484541] ? __pfx_kernel_init+0x10/0x10
> [ 0.485030] ret_from_fork_asm+0x1a/0x30
> [ 0.485462] </TASK>
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.
Comment 3 Ma Xiangfei 2024-07-12 08:11:29 UTC
I have tried this patch, but it can still be reproduced.
Host/Guest OS: CentOS 9
Host kernel: 6.10.0-rc2
Guest kernel: 6.10.0-rc7+
Host commit: 02b0d3b9 (https://git.kernel.org/pub/scm/virt/kvm/kvm.git)
Guest commit: 43db1e03c086ed20cc75808d3f45e780ec4ca26e
QEMU commit: b9ee1387
Comment 4 Ma Xiangfei 2024-07-12 08:40:53 UTC
(In reply to Ma Xiangfei from comment #3)
> I have tried this patch, but it can still be reproduced.
> Host/Guest OS: CentOS 9
> Host kernel: 6.10.0-rc2
> Guest kernel: 6.10.0-rc7+ (Using Sean patch)
> Host commit: 02b0d3b9 (https://git.kernel.org/pub/scm/virt/kvm/kvm.git)
> Guest commit: 43db1e03c086ed20cc75808d3f45e780ec4ca26e
> QEMU commit: b9ee1387

Note You need to log in before you can comment on or make changes to this bug.