Bug 202189 - QEMU KVM causes BUGs and panics, disabling KVM is required to use virtual machines
Summary: QEMU KVM causes BUGs and panics, disabling KVM is required to use virtual mac...
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-09 05:16 UTC by leozinho29_eu
Modified: 2019-03-30 09:17 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.0-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Dmesg and config from two different 5.0-rc1 builds (444.50 KB, application/x-tar)
2019-01-09 05:16 UTC, leozinho29_eu
Details

Description leozinho29_eu 2019-01-09 05:16:01 UTC
Created attachment 280353 [details]
Dmesg and config from two different 5.0-rc1 builds

When using Linux kernel 5.0-rc1, guests using KVM acceleration causes multiple WARNINGs and, between these many WARNINGs, some BUGs appear and some panic can happen. So far I have noticed different behaviors when starting the QEMU guests:

1) QEMU starts, pauses emulation, prints a error on stdout, guest never start, kernel WARNINGs and BUGs start to appear on host. Some system degradation is noticed. The QEMU message:

KVM internal error. Suberror: 1
emulation failure
RAX=0000000000000000 RBX=000000005ff19ef0 RCX=0000000000000000 RDX=0000000000000001
RSI=000000005ff30634 RDI=000000005ff32bc0 RBP=000000005bf56000 RSP=000000005ff19ea8
R8 =000000005ff332c0 R9 =0000000000000000 R10=0000000000000100 R11=000000005fa0a3d8
R12=000000005ff1c40d R13=000000005fc01000 R14=000000005bf55d01 R15=000000005ff19ff0
RIP=00000000000a0000 RFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0008 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0018 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0008 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0008 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0008 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0008 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     00000000ffffff80 0000001f
IDT=     000000005bf55d90 0000021f
CR0=80010033 CR2=0000000000000000 CR3=000000005fc01000 CR4=00000660
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

2) QEMU starts, guest boots, after some seconds host and guest dmesg are filled with WARNINGs and BUGs, guest stops working and host becomes greatly degraded.

3) QEMU starts, guest boots, works for some minutes, demanding task starts, guest has bugs, segmentation faults start on both guest and host, guest panic, guest crashes, host panics.

On case 3 it seems the system degradation started with swap usage and host panicked when trying to save a screenshot showing the guest messages. This also seem to have corrupted at least one guest seriously, dropping it directly to initramfs and, after fsck, the guest has many files missing.

Steps to reproduce (CAUTION: file system corruption is expected!):

1) Use kernel 5.0-rc1 build with KVM support;
2) Have QEMU build with KVM support;
3) (NOT SURE OF THIS STEP) Have host using swap memory;
4) Try to start guest using KVM acceleration (qemu-system-x86_64 -accel kvm $OTHER_OPTIONS). Example:
qemu-system-x86_64 -m 1536 -accel kvm -device virtio-vga,virgl=true -device virtio-tablet-pci -serial vc -monitor vc -bios /usr/share/OVMF/OVMF_CODE.fd -hda xubuntu.qcow2 -cdrom ./bionic-desktop-amd64.iso
5) If it works, try to use the guest. If not, skip to 6;
6) Notice how applications start to crash and many messages start to appear in dmesg.

None of these issues were observed with 4.20 or older. I was able to obtain only two dmesg logs because the system became unusable very quickly once the problems starts. All cases had swap being used. Kernels BUGs were happening with programs as less and cat once the first WARNING appears. I thought I would get the entire alphabet of taints after some time.

The two dmesg attached have the kernel command line. None of the kernels were tainted until the WARNINGs and BUGs started to appear.

Host OS: Xubuntu 18.04.1;
Kernel version 1: 5.0.0-rc1-drm-tip-4d637a8d160356f01d22695ec1a76858bfb55758+;
Kernel version 2: 5.0.0-050000rc1-lowlatency;
QEMU version: 3.1.50 (v3.1.0-456-g9b2e891ec5-dirty);
Processor: Intel Core i3-6100U;
GPU: Intel HD Graphics 520;
RAM: 8 GB.
Comment 1 Jan 2019-03-30 07:20:25 UTC
Rename this bug to:
"KVM broken in recent kernels"
I'm having the same issue where a VM of a router was working fine in 4.4.15X and in 4.4.172 is broken. Linux VM work fine though.


KVM internal error. Suberror: 1
emulation failure
RAX=0000000000000000 RBX=0000000000000016 RCX=0000000000000837 RDX=0000000000000021
RSI=0000000000136118 RDI=0000000000000021 RBP=0000000000136140 RSP=0000000000136138
R8 =0000000000000000 R9 =0000000000000000 R10=000000000a5c15e8 R11=000000000a18bd00
R12=0000000000000001 R13=0000000000000001 R14=0000000000148f1c R15=0000000000000429
RIP=0000000000149147 RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0020 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 000000000fb29000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 ffffffff 00000000
TR =0008 0000000000000580 00000067 00008b00 DPL=0 TSS64-busy
GDT=     0000000000001000 000000af
IDT=     0000000013864000 00000fff
CR0=c0010013 CR2=0000000000000000 CR3=0000000000134000 CR4=00040260
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
^Cqemu-system-x86_64: terminating on signal 2
Comment 2 Jan 2019-03-30 09:17:19 UTC
Seems resolved in 4.4.177

Note You need to log in before you can comment on or make changes to this bug.