Bug 216102
Summary: | kernel BUG at arch/x86/kernel/traps.c:252 caused by CFI (Intel CET) and KVM | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Laurent Bonnaud (L.Bonnaud) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | NEW --- | ||
Severity: | normal | CC: | basjetimmer, bp, darose, jason.nader, johannes.penssel, kernel, mail, rawatdeepakg, saxophonebritish, simon, tiwarigeeta027, viktor.a.voronin |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.18.3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Full dmesg output
dmesg output |
Description
Laurent Bonnaud
2022-06-09 12:18:58 UTC
Created attachment 301135 [details]
Full dmesg output
Created attachment 301336 [details]
dmesg output
Same issue by me. Can't start the virtual machine since updating to 5.18.x. Linux Mint 20.3, 12th Gen Intel(R) Core(TM) i9-12900K. I have also attached a full dmesg output.
This affects me as well on Gentoo with 5.19-rc8 / Core i5-1135G7. GNOME Boxes (both native and Flatpak version) freezes the entire system as soon as I boot up Windows 10/11 in a VM. The Gentoo installation medium seems to work fine though. Adding 'ibt=off' to the kernel commandline circumvents this issue. I'm still seeing this with on an Intel 13600K running 6.1.3. Seeing as it affect more and more cpu's and 'ibt' is about to be enabled by default for 6.2, I expect more and more people to run into this. A related Arch-bug links to the following commit as the problem: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6649fa876da4c505548b8e8945a6fc48e62e427c Can folks pls try to reproduce this with latest Linus master, 6.4-rc6 currently? Also pls upload kernel .config. Thx. My system is currently running kernel 6.3.7, and I am only seeing warning messages in the kernel logs: [494097.166700] x86/split lock detection: #AC: qemu-system-x86/537034 took a split_lock trap at address: 0xfffff8006ca1e643 [498654.325731] x86/split lock detection: #AC: qemu-system-x86/554439 took a split_lock trap at address: 0xfffff80556a1e643 [499673.397842] x86/split lock detection: #AC: qemu-system-x86/556825 took a split_lock trap at address: 0x3ff2624d [499957.351816] x86/split lock detection: #AC: qemu-system-x86/557489 took a split_lock trap at address: 0x3ff2624d That's basically saying that you have locks in your kernel or a module which are split between two cachelines. This should not happen with a kernel build done with the usual toolchains used by distros. Sounds like you're using some out-of-tree module which got built by some weird compiler.
> That's basically saying that you have locks in your kernel or a module
I understand in the error message that the problem is in an userspace process (qemu-system-x86) and not in kernel code.
On Thu, Jun 15 2023 at 17:25, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216102 > > --- Comment #9 from Laurent Bonnaud (L.Bonnaud@laposte.net) --- > >> That's basically saying that you have locks in your kernel or a module > > I understand in the error message that the problem is in an userspace process > (qemu-system-x86) and not in kernel code. Well it's not necessarily the qemu process itself. Guest split lock detection is ending up in the same error path. And that's likely the guest because: [494097.166700] x86/split lock detection: #AC: qemu-system-x86/537034 took a split_lock trap at address: 0xfffff8006ca1e643 which is clearly a kernel address, but this can't be on the host because the host would end up with a different error message and die. Laurent, which kernel is running in your guest? Something ancient? Thanks, tglx
> Laurent, which kernel is running in your guest?
The guest OS is Windows 10.
On Fri, Jun 16 2023 at 06:34, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216102 > > --- Comment #11 from Laurent Bonnaud (L.Bonnaud@laposte.net) --- > >> Laurent, which kernel is running in your guest? > > The guest OS is Windows 10. Ok. Unfortunately the dmesg output does not differentiate between host user space and guest originated split lock access. The below patch for the host kernel makes it more obvious where this originates from. I'm going to polish that up and post it on LKML too. The problem itself is mostly harmless. Though split lock access which is unpriviledged can be used for a DoS attack on a machine because such an access has to fully lock the bus which causes a tremendous slow down for the whole system if e.g. done in a loop. Thanks, tglx --- diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 1c4639588ff9..f3b88a87efd9 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1343,14 +1343,14 @@ static int splitlock_cpu_offline(unsigned int cpu) return 0; } -static void split_lock_warn(unsigned long ip) +static void split_lock_warn(unsigned long ip, bool guest) { struct delayed_work *work; int cpu; if (!current->reported_split_lock) - pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n", - current->comm, current->pid, ip); + pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx guest: %d\n", + current->comm, current->pid, ip, guest); current->reported_split_lock = 1; if (sysctl_sld_mitigate) { @@ -1382,7 +1382,7 @@ static void split_lock_warn(unsigned long ip) bool handle_guest_split_lock(unsigned long ip) { if (sld_state == sld_warn) { - split_lock_warn(ip); + split_lock_warn(ip, true); return true; } @@ -1425,7 +1425,7 @@ bool handle_user_split_lock(struct pt_regs *regs, long error_code) { if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal) return false; - split_lock_warn(regs->ip); + split_lock_warn(regs->ip, false); return true; } > The below patch for the host kernel makes it more obvious where this originates from. Thanks for the patch! > I'm going to polish that up and post it on LKML too. I am looking forward to testing it in a released kernel... BTW, I did not state it explicitly, but the original issue (kernel BUG caused by CFI) is fixed in recent kernels. Thanks to whoever fixed it! |