Bug 208767
Summary: | kernel stack overflow due to Lazy update IOAPIC on an x86_64 *host*, when gpu is passthrough to macos guest vm | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Yani Stoyanov (yaweb) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | REOPENED --- | ||
Severity: | normal | CC: | alex.williamson, bonzini, carnil, fivescarynightsgames, i, yaweb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.6 up to and including 5.7.11 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Yani Stoyanov
2020-08-02 09:01:40 UTC
This should have been fixed by commit 8be8f932e3db5fe4ed178b8892eeffeab530273a in Linux 5.7. I was thinking the same thing when I saw: https://bugzilla.kernel.org/show_bug.cgi?id=207489 I write a comment there but start realizing that the reason for my issue may be something different since it happens only with my macos vm-s. Currently I am using kernel-5.7.10-201.fc32.x86_64 which should include the patch. And I mentioned bug people are complaining about windows guests. I have 2 windows 10 machines and they are working fine no issues there the problem appear only on my macos vm. (In reply to Paolo Bonzini from comment #1) > This should have been fixed by commit > 8be8f932e3db5fe4ed178b8892eeffeab530273a in Linux 5.7. This commit is already merged to kernel-5.7.10-201.fc32.x86_64 right? On Sun, Aug 2, 2020 at 2:01 AM <bugzilla-daemon@bugzilla.kernel.org> wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=208767 > > Bug ID: 208767 > Summary: kernel stack overflow due to Lazy update IOAPIC on an > x86_64 *host*, when gpu is passthrough to macos guest > vm > Product: Virtualization > Version: unspecified > Kernel Version: 5.6 up to and including 5.7 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: kvm > Assignee: virtualization_kvm@kernel-bugs.osdl.org > Reporter: yaweb@mail.bg > Regression: No > > I have fedora 32 host with latest kernel on a double xeon v5 2630 workstation > asus board and few vm with assigned gpus to them (linux windows and macos). I didn't think the Mac OS X license agreement permitted running it on non-Apple hardware. Has this changed? (In reply to Jim Mattson from comment #4) > On Sun, Aug 2, 2020 at 2:01 AM <bugzilla-daemon@bugzilla.kernel.org> wrote: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=208767 > > > > Bug ID: 208767 > > Summary: kernel stack overflow due to Lazy update IOAPIC on an > > x86_64 *host*, when gpu is passthrough to macos guest > > vm > > Product: Virtualization > > Version: unspecified > > Kernel Version: 5.6 up to and including 5.7 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: kvm > > Assignee: virtualization_kvm@kernel-bugs.osdl.org > > Reporter: yaweb@mail.bg > > Regression: No > > > > I have fedora 32 host with latest kernel on a double xeon v5 2630 > workstation > > asus board and few vm with assigned gpus to them (linux windows and macos). > > I didn't think the Mac OS X license agreement permitted running it on > non-Apple hardware. Has this changed? Jim Mattson, I guess official it is not support by as I wrote in the description of the issue the problem is in the mentioned function. I tested it and if comment the lines if (edge && kvm_apicv_activated(ioapic->kvm)) ioapic_lazy_update_eoi(ioapic, irq); It boots fine, if the function invocation is not commented I kernel stack overflow so the bug is for it it should not matter what case it right? I am not sure if this is relevant but there was old bug which explains how osx configure IOAPIC with the wrong polarity bit values. I may be interesting to take a look (I know it is from 6 years ago). https://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/index_old.html the part: ACPI-compliant operating systems are expected to query the firmware for an indication of which polarity type (ActiveLow or ActiveHigh) to use for any devices with level-triggered interrupts, and to configure the IOAPIC registers accordingly. Both QEMU and KVM have accumulated a significant number of optimizations based on the assumption that guest operating systems use ActiveHigh polarity, and are coded to assume that "physical" and "logical" IRQ line states are in sync. Even when a misbehaving guest OS (you guessed it, OS X does this) ignores the ACPI polarity hint (which in QEMU/KVM is ActiveLow, i.e. "physical"=="logical") and configures the virtual IOAPIC with the wrong polarity bit values, both QEMU and KVM will mostly use "logical" IRQ line levels. (In reply to Paolo Bonzini from comment #1) > This should have been fixed by commit > 8be8f932e3db5fe4ed178b8892eeffeab530273a in Linux 5.7. This is not fixed and it's not unique to a macos VM, a Linux guest can also reproduce this. I've seen this both during PXE boot and during shutdown with certain NIC combinations (see rhbz1867373). The only workaround is to disable acpiv (kvm_intel.enable_apicv=0). Any suggestions, Paolo? This bug is reproducible on Apple hardware too. I tried this on MacPro 2013 running QEMU KVM with GPU passthrough and all worked well until the commit with ioapic_lazy_update_eoi came in. |