Bug 205441 - Enabling KVM causes any Linux VM reboot on kernel boot
Summary: Enabling KVM causes any Linux VM reboot on kernel boot
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-05 12:21 UTC by Marcin Krol
Modified: 2019-11-18 14:49 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.19.81
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel .config file (204.70 KB, text/plain)
2019-11-05 12:21 UTC, Marcin Krol
Details

Description Marcin Krol 2019-11-05 12:21:49 UTC
Created attachment 285801 [details]
Kernel .config file

I'm unable to run any Linux VM on host running kernel 4.19.81. VM starts fine, boot loader works fine, both kernel and initramfs images are loaded and when kernel boot should occur, VM simply reboots itself. This happens only when -enable-kvm is added on command line. Disabling KVM makes it working again (but slow).

Problem appears only on one of my two physical hosts, the one with Intel Core 2 Quad 9500 CPU. KVM works fine on Intel Xeon E3-1246 CPU (same packages, same versions, same configs). I tried different guest kernel versions and systems and VM will always reboot when trying to boot kernel.

If I downgrade to 4.19.73 KVM is working fine again. I may test kernels from 4.19.74 to 4.19.79 to see which one introduces the problem, but it will take some time as I must build these packages manually.

I'm starting my test VM with below command, but started via libvirt have same problem. 

qemu-system-x86_64 -m 2048M -smp 1 -k en-us -drive file=/tmp/cri-tld-x64-20190916.iso,media=cdrom,format=raw,if=ide -cpu kvm64 -vnc 127.0.0.1:0 -enable-kvm

I tried changing -cpu parameter including host passthrough option, but problems persists.

Kernel version (distribution package):

Linux version 4.19.81-4.19-vanilla-1 (builder@x64) (gcc version 9.2.0 20190830 (release) (TLD-Linux)) #1 SMP Wed Oct 30 20:56:24 CET 2019

Kernel config is attached.

CPU on host where problem exists (limited output to single core)

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz
stepping        : 10
microcode       : 0xa0b
cpu MHz         : 2015.344
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti tpr_shadow vnmi flexpriority dtherm
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 5652.52
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
Comment 1 Thomas Lamprecht 2019-11-07 18:35:22 UTC
We had some similar issues reported from our users (Proxmox VE), they had all older Intel CPUs like you, and had issues with booting most types of Linux VMs.

I bisected this here, with following result:
git bisect log 
# bad: [3b931173c97b0d73f80ea55b72bb2966a246167f] UBUNTU: Ubuntu-5.0.0-33.35
# good: [5d5a6b36e94909962297fae609bff487de3cc43a] UBUNTU: Ubuntu-5.0.0-30.32
git bisect start '3b931173c97b0d73f80ea55b72bb2966a246167f' '5d5a6b36e94909962297fae609bff487de3cc43a'
# good: [7b4f844b33969ab166800f8936beef153fab736e] net/ibmvnic: free reset work of removed device from queue
git bisect good 7b4f844b33969ab166800f8936beef153fab736e
# bad: [6c1fc88702a4f33886b44ce5b6f374893b95e369] arm64: tlb: Ensure we execute an ISB following walk cache invalidation
git bisect bad 6c1fc88702a4f33886b44ce5b6f374893b95e369
# good: [e627a027b54eccc95f9e374d69aead7f1498877b] loop: Add LOOP_SET_DIRECT_IO to compat ioctl
git bisect good e627a027b54eccc95f9e374d69aead7f1498877b
# good: [29919eff6333bc67ec580b454afdd8b49883df2f] libata/ahci: Drop PCS quirk for Denverton and beyond
git bisect good 29919eff6333bc67ec580b454afdd8b49883df2f
# good: [cb44193f94af73928f8df049ffbb6b4a0be136ae] PM / devfreq: passive: fix compiler warning
git bisect good cb44193f94af73928f8df049ffbb6b4a0be136ae
# good: [b1d479b27b26966aea931094b31864979d7f8102] scsi: implement .cleanup_rq callback
git bisect good b1d479b27b26966aea931094b31864979d7f8102
# bad: [ec15813844b05d8cbd4352c65a20e57d16f9f936] media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
git bisect bad ec15813844b05d8cbd4352c65a20e57d16f9f936
# good: [e83601f51a90d9739ced9ff42b6f202f8f802c72] parisc: Disable HP HSC-PCI Cards to prevent kernel crash
git bisect good e83601f51a90d9739ced9ff42b6f202f8f802c72
# good: [6d393bdf3b3f4b629070329488d3c6a3e142602b] KVM: x86: set ctxt->have_exception in x86_decode_insn()
git bisect good 6d393bdf3b3f4b629070329488d3c6a3e142602b
# bad: [208007519a7385a57b0c0a3c180142a521594876] KVM: x86: Manually calculate reserved bits when loading PDPTRS
git bisect bad 208007519a7385a57b0c0a3c180142a521594876
# first bad commit: [208007519a7385a57b0c0a3c180142a521594876] KVM: x86: Manually calculate reserved bits when loading PDPTRS

Which is:

   KVM: x86: Manually calculate reserved bits when loading PDPTRS
    
    BugLink: https://bugs.launchpad.net/bugs/1848367
    
    commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.
    
    Manually generate the PDPTR reserved bit mask when explicitly loading
    PDPTRs.  The reserved bits that are being tracked by the MMU reflect the
    current paging mode, which is unlikely to be PAE paging in the vast
    majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation,
    __set_sregs(), etc...  This can cause KVM to incorrectly signal a bad
    PDPTR, or more likely, miss a reserved bit check and subsequently fail
    a VM-Enter due to a bad VMCS.GUEST_PDPTR.
    
    Add a one off helper to generate the reserved bits instead of sharing
    code across the MMU's calculations and the PDPTR emulation.  The PDPTR
    reserved bits are basically set in stone, and pushing a helper into
    the MMU's calculation adds unnecessary complexity without improving
    readability.
    
    Oppurtunistically fix/update the comment for load_pdptrs().
    
    Note, the buggy commit also introduced a deliberate functional change,
    "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was
    effectively (and correctly) reverted by commit cd9ae5fe47df ("KVM: x86:
    Fix page-tables reserved bits").  A bit of SDM archaeology shows that
    the SDM from late 2008 had a bug (likely a copy+paste error) where it
    listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved
    for 2mb entries.  I.e. the SDM contradicted itself, and bits 6:5 are and
    always have been reserved.
    
    Fixes: 20c466b56168d ("KVM: Use rsvd_bits_mask in load_pdptrs()")
    Cc: stable@vger.kernel.org
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Reported-by: Doug Reiland <doug.reiland@intel.com>
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Reviewed-by: Peter Xu <peterx@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Kamal Mostafa <kamal@canonical.com>
    Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>


This one is also included in the 4.19.81 (or more correctly, it's there since 
v4.19.77) with commit 496cf984a60edb5534118a596613cc9971e406e8 [0] or
upstream commit 16cfacc8085782dab8e365979356ce1ca87fd6cc [1].

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/?h=v4.19.82&id=496cf984a60edb5534118a596613cc9971e406e8
[1]: https://git.kernel.org/torvalds/c/16cfacc8085782dab8e365979356ce1ca87fd6cc

Funny thing is: I cannot reproduce this with a 5.3.7 kernel, which _also_
includes above commit. So possible another patch is missing in the backport,
did not find anything obvious though...

So summary for reproducer:
* dust of an host with old Intel CPU, e.g.: Intel Core2Duo CPU E8500 @3.16GHz
  (something else westmer, conroe or the like should work too, or if it's released
   over 10 years ago. 
* Install a Linux Distro or just boot the installer of that in a VM, I used Debian 9,
  as our users had issues with that but not with an ubuntu 19.10 VM.
* see how it boot loops once a stable-kernel with above[0] backported
  is used on the host
Comment 2 Christian Theune 2019-11-18 13:05:35 UTC
I think this is happening to me, too.

Affected hosts:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
Stepping:              2
CPU MHz:               1596.000
CPU max MHz:           2661.0000
CPU min MHz:           1596.0000
BogoMIPS:              5319.79
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23

and

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 23
Model name:            Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
Stepping:              6
CPU MHz:               2327.504
CPU max MHz:           2333.0000
CPU min MHz:           2000.0000
BogoMIPS:              4655.07
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              6144K
NUMA node0 CPU(s):     0-7

The OOPSes we see in the guest:

[6196425.778274] general protection fault: 0000 [#1] SMP
[6196425.780178] Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables x_tables mousedev psmouse input_leds led_class serio_raw ppdev evdev mac_hid ata_generic pata_acpi floppy ide_pci_generic piix parport_pc 8250_fintek i2c_piix4 intel_agp parport intel_gtt i8042 ide_core tpm_tis i2c_core agpgart tpm processor button nf_conntrack_ftp nf_conntrack atkbd libps2 serio loop cpufreq_ondemand sunrpc i6300esb ipv6 autofs4 xfs libcrc32c crc32c_generic virtio_net virtio_blk ata_piix libata rtc_cmos scsi_mod virtio_pci dm_mod virtio_rng rng_core virtio_console virtio_balloon virtio_ring virtio
[6196425.809443] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.91 #1-NixOS
[6196425.812032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[6196425.815522] task: ffffffff81a11500 ti: ffffffff81a00000 task.ti: ffffffff81a00000
[6196425.817656] RIP: 0010:[<ffffffff814ddcfa>]  [<ffffffff814ddcfa>] __schedule+0x28a/0xa10
[6196425.821087] RSP: 0018:ffffffff81a03e90  EFLAGS: 00010007
[6196425.822750] RAX: ffff880079b79b00 RBX: ffff88007fc15680 RCX: 0000000000000000
[6196425.824790] RDX: 000077ff80000000 RSI: ffff880079b79b00 RDI: 0000000078dda000
[6196425.826828] RBP: ffffffff81a03ec8 R08: 0000000000000001 R09: 0016039dcd5c1010
[6196425.828852] R10: 0000000000000003 R11: 0000000000000001 R12: ffff880079bbce00
[6196425.830899] R13: ffff880079b30e00 R14: 0000000000000000 R15: ffffffff81a11a58
[6196425.832924] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[6196425.835210] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[6196425.836863] CR2: 00000000ffffffff CR3: 0000000079b48000 CR4: 00000000000006f0
[6196425.838894] Stack:
[6196425.839519]  ffff880079b79b00 ffffffff81a11500 ffffffff81a04000 ffffffff81a04000
[6196425.841768]  0000000000000000 0000000000000000 ffffffff81a00000 ffffffff81a03ee0
[6196425.843997]  ffffffff814de4b5 ffffffff81ae3c18 ffffffff81a03ef0 ffffffff814de74e
[6196425.846234] Call Trace:
[6196425.847003]  [<ffffffff814de4b5>] schedule+0x35/0x80
[6196425.848433]  [<ffffffff814de74e>] schedule_preempt_disabled+0xe/0x10
[6196425.850269]  [<ffffffff810aaa62>] cpu_startup_entry+0x182/0x340
[6196425.851977]  [<ffffffff814db6ec>] rest_init+0x7c/0x80
[6196425.853454]  [<ffffffff81b06f92>] start_kernel+0x46c/0x479
[6196425.855045]  [<ffffffff81b06120>] ? early_idt_handler_array+0x120/0x120
[6196425.856967]  [<ffffffff81b064d7>] x86_64_start_reservations+0x2a/0x2c
[6196425.858806]  [<ffffffff81b06614>] x86_64_start_kernel+0x13b/0x14a
[6196425.860543] Code: 89 f6 3e 4d 0f ab b4 24 d0 02 00 00 bf 00 00 00 80 49 03 7c 24 40 48 ba 00 00 00 80 ff 77 00 00 48 0f 42 15 19 33 53 00 48 01 d7 <0f> 22 df 0f 1f 40 00 0f 1f 44 00 00 3e 4d 0f b3 b5 d0 02 00 00
[6196425.867994] RIP  [<ffffffff814ddcfa>] __schedule+0x28a/0xa10
[6196425.869677]  RSP <ffffffff81a03e90>
[6196425.871737] ---[ end trace 4aa829a660f98e5f ]---
[6196425.873089] Kernel panic - not syncing: Attempted to kill the idle task!
[6196425.875093] Kernel Offset: disabled
[6196425.876142] Rebooting in 1 seconds..

[6208895.826260] general protection fault: 0000 [#1] SMP
[6208895.828125] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables x_tables mousedev input_leds led_class psmouse serio_raw evdev ppdev mac_hid ata_generic pata_acpi ide_pci_generic piix i2c_piix4 8250_fintek intel_agp tpm_tis parport_pc intel_gtt floppy tpm ide_core parport i8042 i2c_core agpgart processor button nf_conntrack_ftp nf_conntrack tun atkbd libps2 serio loop cpufreq_ondemand sunrpc i6300esb ipv6 autofs4 xfs libcrc32c crc32c_generic virtio_net virtio_blk ata_piix libata rtc_cmos scsi_mod virtio_pci dm_mod virtio_rng rng_core virtio_console virtio_balloon virtio_ring virtio
[6208895.856708] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.91 #1-NixOS
[6208895.858538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[6208895.862364] task: ffff88013abb9b00 ti: ffff88013abc0000 task.ti: ffff88013abc0000
[6208895.864478] RIP: 0010:[<ffffffff814ddcfa>] [<ffffffff814ddcfa>] __schedule+0x28a/0xa10
[6208895.866842] RSP: 0018:ffff88013abc3e78 EFLAGS: 00010007
[6208895.868359] RAX: ffff8800ba8a0000 RBX: ffff88013fd15680 RCX: 0000000000000000
[6208895.870387] RDX: 000077ff80000000 RSI: ffff8800ba8a0000 RDI: 0000000139541000
[6208895.872909] RBP: ffff88013abc3eb0 R08: 0000000000000000 R09: 00160ef5361dc5cd
[6208895.875064] R10: 0000000000000003 R11: 0000000000000001 R12: ffff8800bb72ce00
[6208895.877105] R13: ffff8800baa0b800 R14: 0000000000000001 R15: ffff88013abba058
[6208895.879125] FS: 0000000000000000(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
[6208895.881409] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[6208895.883057] CR2: 00000000ffffffff CR3: 000000013a359000 CR4: 00000000000006e0
[6208895.885082] Stack:
[6208895.885713] ffff8800ba8a0000 ffff88013abb9b00 ffff88013abc4000 ffff88013abc4000
[6208895.887943] 0000000000000000 0000000000000000 ffff88013abc0000 ffff88013abc3ec8
[6208895.890439] ffffffff814de4b5 ffffffff81ae3c18 ffff88013abc3ed8 ffffffff814de74e
[6208895.892742] Call Trace:
[6208895.893495] [<ffffffff814de4b5>] schedule+0x35/0x80
[6208895.894949] [<ffffffff814de74e>] schedule_preempt_disabled+0xe/0x10
[6208895.896789] [<ffffffff810aaa62>] cpu_startup_entry+0x182/0x340
[6208895.898481] [<ffffffff8104bca2>] start_secondary+0x132/0x140
[6208895.900151] Code: 89 f6 f0 4d 0f ab b4 24 d0 02 00 00 bf 00 00 00 80 49 03 7c 24 40 48 ba 00 00 00 80 ff 77 00 00 48 0f 42 15 19 33 53 00 48 01 d7 <0f> 22 df 0f 1f 40 00 0f 1f 44 00 00 f0 4d 0f b3 b5 d0 02 00 00
[6208895.907522] RIP [<ffffffff814ddcfa>] __schedule+0x28a/0xa10
[6208895.909189] RSP <ffff88013abc3e78>
[6208895.911235] ---[ end trace fc03b7585c232c61 ]---
[6208895.911247] general protection fault: 0000 [#2] SMP
[6208895.911287] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_tables x_tables mousedev input_leds led_class psmouse serio_raw evdev ppdev mac_hid ata_generic pata_acpi ide_pci_generic piix i2c_piix4 8250_fintek intel_agp tpm_tis parport_pc intel_gtt floppy tpm ide_core parport i8042 i2c_core agpgart processor button nf_conntrack_ftp nf_conntrack tun atkbd libps2 serio loop cpufreq_ondemand sunrpc i6300esb ipv6 autofs4 xfs libcrc32c crc32c_generic virtio_net virtio_blk ata_piix libata rtc_cmos scsi_mod virtio_pci dm_mod virtio_rng rng_core virtio_console virtio_balloon virtio_ring virtio
[6208895.911302] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D 4.4.91 #1-NixOS
[6208895.911303] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[6208895.911305] task: ffffffff81a11500 ti: ffffffff81a00000 task.ti: ffffffff81a00000
[6208895.911312] RIP: 0010:[<ffffffff814ddcfa>] [<ffffffff814ddcfa>] __schedule+0x28a/0xa10
[6208895.911313] RSP: 0018:ffffffff81a03e90 EFLAGS: 00010007
[6208895.911314] RAX: ffff8800b9933600 RBX: ffff88013fc15680 RCX: 0000000000000000
[6208895.911315] RDX: 000077ff80000000 RSI: ffff8800b9933600 RDI: 000000013a359000
[6208895.911316] RBP: ffffffff81a03ec8 R08: 0000000000000001 R09: 00160ef53628cb62
[6208895.911317] R10: 0000000000000003 R11: 0000000000000001 R12: ffff8800baa0b800
[6208895.911317] R13: ffff8800baa0ad80 R14: 0000000000000000 R15: ffffffff81a11a58
[6208895.911322] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[6208895.911323] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[6208895.911324] CR2: 00000000ffffffff CR3: 00000000b81bb000 CR4: 00000000000006f0
[6208895.911328] Stack:
[6208895.911330] ffff8800b9933600 ffffffff81a11500 ffffffff81a04000 ffffffff81a04000
[6208895.911332] 0000000000000000 0000000000000000 ffffffff81a00000 ffffffff81a03ee0
[6208895.911333] ffffffff814de4b5 ffffffff81ae3c18 ffffffff81a03ef0 ffffffff814de74e
[6208895.911334] Call Trace:
[6208895.911338] [<ffffffff814de4b5>] schedule+0x35/0x80
[6208895.911340] [<ffffffff814de74e>] schedule_preempt_disabled+0xe/0x10
[6208895.911344] [<ffffffff810aaa62>] cpu_startup_entry+0x182/0x340
[6208895.911346] [<ffffffff814db6ec>] rest_init+0x7c/0x80
[6208895.911389] [<ffffffff81b06f92>] start_kernel+0x46c/0x479
[6208895.911397] [<ffffffff81b06120>] ? early_idt_handler_array+0x120/0x120
[6208895.911399] [<ffffffff81b064d7>] x86_64_start_reservations+0x2a/0x2c
[6208895.911401] [<ffffffff81b06614>] x86_64_start_kernel+0x13b/0x14a
[6208895.911418] Code: 89 f6 f0 4d 0f ab b4 24 d0 02 00 00 bf 00 00 00 80 49 03 7c 24 40 48 ba 00 00 00 80 ff 77 00 00 48 0f 42 15 19 33 53 00 48 01 d7 <0f> 22 df 0f 1f 40 00 0f 1f 44 00 00 f0 4d 0f b3 b5 d0 02 00 00
[6208895.911419] RIP [<ffffffff814ddcfa>] __schedule+0x28a/0xa10
[6208895.911420] RSP <ffffffff81a03e90>
[6208895.911425] ---[ end trace fc03b7585c232c62 ]---
[6208895.911428] Kernel panic - not syncing: Attempted to kill the idle task!
[6208897.121744] Shutting down cpus with NMI
[6208897.123596] Kernel Offset: disabled
[6208897.124691] Rebooting in 1 seconds..
Comment 3 Christian Theune 2019-11-18 13:10:08 UTC
I'm going to try 4.19.84 as that has additional KVM changes even though none of those directly smell related. After that I'll try with 5.3 (currently at 5.3.11) as well.
Comment 4 Thomas Lamprecht 2019-11-18 13:20:37 UTC
This was identified[0] but the proposed backport of the fix[1] was not yet included in the 4.19.84 kernel.

5.3 has [1] already included, so it should not show up there, if it's this specific issue.

[0]: https://lore.kernel.org/stable/20191111173757.GB11805@linux.intel.com/
[1]: https://lore.kernel.org/stable/20191111225423.29309-1-sean.j.christopherson@intel.com/
Comment 5 Christian Theune 2019-11-18 13:56:13 UTC
Great. So, here's what I'll do - I'll run this against the 4.19.84 and then add the patch manually to that to verify that that's the issue. I'll report back after that.
Comment 6 Christian Theune 2019-11-18 14:49:53 UTC
Alright. I managed to get Sean's backport immediately working on top of 4.19.84 and this fixes the issue for me right away. I'm going to roll this out in our lab now.

Note You need to log in before you can comment on or make changes to this bug.