Bug 197951
Summary: | QEMU/KVM & VFIO & PCI passthru with Windows 10 x64 guest: memory access intermittently causes CRITICAL_STRUCTURE_CORRUPTION BSOD unless swap is disabled on host, since 4.12.13 | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Jimi (JimiJames.Bove) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | high | CC: | alex.williamson, f.gruenbichler, JimiJames.Bove, lprosek, tyler, xjtuwjp |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.12.13 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Jimi
2017-11-21 21:19:09 UTC
Also, I use an AMD GPU and another user reported using NVidia, so this seems to not be vendor-specific, if it even is GPU-specific. Seems like a simple matter of bisecting between 4.12.12 and 4.12.13 then, it's a very short list: $ git log --oneline v4.12.12..v4.12.13 5d7d2e03e0f0 Linux 4.12.13 9f7df0bca168 xfs: XFS_IS_REALTIME_INODE() should be false if no rt device presen da0f4931ec52 NFSv4: Fix up mirror allocation 3307d5f5099c NFS: Sync the correct byte range during synchronous writes 6f50e3a1b8c3 NFS: Fix 2 use after free issues in the I/O code 7714f302294d ARM: 8692/1: mm: abort uaccess retries upon fatal signal b9a489e1d4a3 ARM64: dts: marvell: armada-37xx: Fix GIC maintenance interrupt 8329b5e8c6cf Bluetooth: Properly check L2CAP config option output buffer length 99dc1296b47c rt2800: fix TX_PIN_CFG setting for non MT7620 chips 2bce0fe7d0cd KVM: SVM: Limit PFERR_NESTED_GUEST_PAGE error_code check to L1 gues 9d6412aa06ce ALSA: msnd: Optimize / harden DSP and MIDI loops 846073130799 mm/memory.c: fix mem_cgroup_oom_disable() call missing 46791eb9f13e mm/swapfile.c: fix swapon frontswap_map memory leak on error 637f25e5ba94 mm: kvfree the swap cluster info if the swap file is unsatisfactory 58989dc3af0d selftests/x86/fsgsbase: Test selectors 1, 2, and 3 9ed3dc1c0431 radix-tree: must check __radix_tree_preload() return value 0af760ab3882 rtlwifi: btcoexist: Fix breakage of ant_sel for rtl8723be 8004198bb025 btrfs: resume qgroup rescan on rw remount 9a5537a76b62 nvme-fabrics: generate spec-compliant UUID NQNs 02c54b35cad8 mtd: nand: qcom: fix config error for BCH f2339a072e47 mtd: nand: qcom: fix read failure without complete bootchain 71515c37777d mtd: nand: mxc: Fix mxc_v1 ooblayout c54a31845019 mtd: nand: hynix: add support for 20nm NAND chips 2b8b46b24217 mtd: nand: make Samsung SLC NAND usable again Let us know the results. Sure, I'll start bisecting next time I get the chance (maybe tomorrow). It'll take a long time, though, since the BSOD might only happen once a day. I'll have to run the same commit for a few days before I'm confident that it isn't BSODing. Thank god for binary search. Did you have a chance to bisect yet? We are experiencing a similar issue with 4.13 and 4.14 based kernels, and our test case and bisect points to a series not contained in 4.12.13: https://forum.proxmox.com/threads/blue-screen-with-5-1.37664/ https://lkml.kernel.org/r/<20171130093320.66cxaoj45g2ttzoh@nora.maurer-it.com> I've been doing it. Currently on "[3307d5f5099c186d1ae43205eb23c29fabc6f5b8] NFS: Sync the correct byte range during synchronous writes" with 2 commits left to test after it. They've all been good commits so far. I have seen this crash on a Windows 10 x64 guest *without* any kind of device assignment. Didn't keep track of exact kernel versions but it was Fedora 26, very likely 4.12.*. If you've been able to build a kernel where this happens for you, try cherry-picking: commit a2b7861bb33b2538420bb5d8554153484d3f961f Author: Boqun Feng <boqun.feng@gmail.com> Date: Tue Oct 3 21:36:51 2017 +0800 kvm/x86: Avoid async PF preempting the kernel incorrectly Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call schedule() to reschedule in some cases. This could result in accidentally ending the current RCU read-side critical section early, causing random memory corruption in the guest, or otherwise preempting the currently running task inside between preempt_disable and preempt_enable. Keywords: "PF" (since the report mentions swap), "random memory corruption in the guest" Correction: It looks like a2b7861bb33b2538420bb5d8554153484d3f961f is more of a guest-side fix with no effect on non-Linux guests. Please ignore it. We've seen windows 10 BSOD with CRITICAL_STRUCTURE_CORRUPTION short after migration. host kernel version is 4.4.50, no device passthrough, no swap in our case. (In reply to Jack Wang from comment #8) > We've seen windows 10 BSOD with CRITICAL_STRUCTURE_CORRUPTION short after > migration. host kernel version is 4.4.50, no device passthrough, no swap in > our case. What kernel version was running on the migration source? Thanks! Source server were running kernel 3.12.45. I'm about to spend a few days with it installed to make sure, but it looks like this commit is probably our culprit: $ git bisect good Bisecting: 0 revisions left to test after this (roughly 1 step) [9f7df0bca168528aba20794f400be134495551b8] xfs: XFS_IS_REALTIME_INODE() should be false if no rt device present It looks like there's some evidence that this issue doesn't *only* come from 4.12.13. I want to reiterate, I was on 4.12.13 when this problem started happening to me, and I haven't had a single BSOD since downgrading to 4.12.12, including during this entire bisect. It was happening frequently enough that if 4.12.13 wasn't at least one of the cuprits, I definitely would've had a few BSODs by now. (In reply to Jimi from comment #11) > I'm about to spend a few days with it installed to make sure, but it looks > like this commit is probably our culprit: > > $ git bisect good > Bisecting: 0 revisions left to test after this (roughly 1 step) > [9f7df0bca168528aba20794f400be134495551b8] xfs: XFS_IS_REALTIME_INODE() > should be false if no rt device present A few things hint at this being a red herring. * It's the first commit before the 4.12.13 tag which means that you marked 4.12.13 as bad and everything else as good. * There's nothing in it that would explain why it affects only virt and only Windows guests. > It looks like there's some evidence that this issue doesn't *only* come from > 4.12.13. I want to reiterate, I was on 4.12.13 when this problem started > happening to me, and I haven't had a single BSOD since downgrading to > 4.12.12, including during this entire bisect. It was happening frequently > enough that if 4.12.13 wasn't at least one of the cuprits, I definitely > would've had a few BSODs by now. The bug is likely timing sensitive and just rebuilding the kernel, out of the same sources, may end up more (or less) prone to it just by how the binary is laid out, the exact compiler used etc. Also, we should not rule out the possibility that the problem has existed for a long time and Windows 10 got the ability to detect certain corruptions recently via a Windows Update patch. I hit it again yesterday and the BSOD analyzes to: CRITICAL_STRUCTURE_CORRUPTION (109) This bugcheck is generated when the kernel detects that critical kernel code or data have been corrupted. There are generally three causes for a corruption: 1) A driver has inadvertently or deliberately modified critical kernel code or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx 2) A developer attempted to set a normal kernel breakpoint using a kernel debugger that was not attached when the system was booted. Normal breakpoints, "bp", can only be set if the debugger is attached at boot time. Hardware breakpoints, "ba", can be set at any time. 3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data. Arguments: Arg1: a3a0206143b9d5b3, Reserved Arg2: b3b72ce7963bad06, Reserved Arg3: 0000032000000000, Failure type dependent information Arg4: 0000000000000017, Type of corrupted region, can be [...] 16 : Critical floating point control register modification 17 : Local APIC modification 18 : Kernel notification callout modification [...] I'm pretty sure that last time I got it the type of corrupted region was 17 as well. (In reply to Fabian Grünbichler from comment #4) > Did you have a chance to bisect yet? We are experiencing a similar issue > with 4.13 and 4.14 based kernels, and our test case and bisect points to a > series not contained in 4.12.13: > > https://lkml.kernel.org/r/<20171130093320.66cxaoj45g2ttzoh@nora.maurer-it. > com> FWIW, the 4.13 and 4.14 issue was caused by the linked series, and a subsequent patch[1] solved it completely for us. 1: https://lkml.kernel.org/r/<20171130180546.4331-1-rkrcmar@redhat.com> In my case, it's region 7 : Critical MSR modification, I guess in my case just some under line MSR state changed during migration from 3.12 to 4.4 kernel. (In reply to Fabian Grünbichler from comment #13) > FWIW, the 4.13 and 4.14 issue was caused by the linked series, and a > subsequent patch[1] solved it completely for us. > > 1: https://lkml.kernel.org/r/<20171130180546.4331-1-rkrcmar@redhat.com> Thanks! I have added this patch to my kernel (4.13.16 based, built locally, reproduces the BSOD). Will report back in a few days. (In reply to Ladi Prosek from comment #15) > (In reply to Fabian Grünbichler from comment #13) > > FWIW, the 4.13 and 4.14 issue was caused by the linked series, and a > > subsequent patch[1] solved it completely for us. > > > > 1: https://lkml.kernel.org/r/<20171130180546.4331-1-rkrcmar@redhat.com> > > Thanks! I have added this patch to my kernel (4.13.16 based, built locally, > reproduces the BSOD). Will report back in a few days. No crashes so far. The fix is in Linus's tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b1394e745b9453dcb5b0671c205b770e87dedb87 |