Bug 102301

Summary: Shutting down a Windowvs 10 virtual machine (with VGA passthrough) causes a hard crash, every time
Product: Virtualization Reporter: Will Marler (will.marler)
Component: kvmAssignee: virtualization_kvm
Status: RESOLVED UNREPRODUCIBLE    
Severity: normal CC: bayi, harn-solo
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.1.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: Backtrace taken with kernel 4.3.0
Backtrace taken with kernel 4.4-rc

Description Will Marler 2015-08-05 05:11:48 UTC
I'm using libvirt and virt-manager to manage the VM; the VM is running using qemu. I am using VGA passthrough, and it works very smoothly. Until it's time to shut down; when I shut down the VM (either by virt-manager force power off, or from within the guest using start -> shut down), the host crashes. 

Here's the last bit from journalctl:

Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Meta+F5" for "kwin" : "MoveMouseToFocus"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Alt+F3" for "kwin" : "Window Operations Menu"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Ctrl+F1" for "kwin" : "Switch to Desktop 1"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Alt+Shift+Backtab" for "kwin" : "Walk Through Windows (Reverse)"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Ctrl+F3" for "kwin" : "Switch to Desktop 3"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Alt+Tab" for "kwin" : "Walk Through Windows"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Ctrl+F8" for "kwin" : "ShowDesktopGrid"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Meta+Alt+Left" for "kwin" : "Switch Window Left"
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Alt+~" for "kwin" : "Walk Through Windows of Current Application
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: adding shift to the grab
Aug 04 23:00:17 haze kglobalaccel5[791]: kglobalaccel-runtime: Registering key "Meta+F6" for "kwin" : "MoveMouseToCenter"
Aug 04 23:00:18 haze libvirtd[523]: internal error: End of file from monitor
Aug 04 23:00:18 haze systemd-machined[1712]: Machine qemu-win10 terminated.
Aug 04 23:00:18 haze kernel: vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=io+mem:owns=none
Aug 04 23:00:18 haze kernel: BUG: unable to handle kernel paging request at 00000000ffd61000
Aug 04 23:00:18 haze kernel: IP: [<00000000ffd61000>] 0xffd61000
Aug 04 23:00:18 haze kernel: PGD 0 
Aug 04 23:00:18 haze kernel: Oops: 0010 [#1] PREEMPT SMP 
Aug 04 23:00:18 haze kernel: Modules linked in: vhost_net vhost macvtap macvlan vfio_pci vfio_iommu_type1 vfio_virqfd vfio xt_CHECKSUM iptable_m
Aug 04 23:00:18 haze kernel:  intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 l
Aug 04 23:00:18 haze kernel: CPU: 2 PID: 523 Comm: libvirtd Not tainted 4.1.4-1-ARCH #1
Aug 04 23:00:18 haze kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Pro3, BIOS P2.90 07/11/2014
Aug 04 23:00:18 haze kernel: task: ffff880419af0000 ti: ffff880366ad8000 task.ti: ffff880366ad8000
Aug 04 23:00:18 haze kernel: RIP: 0010:[<00000000ffd61000>]  [<00000000ffd61000>] 0xffd61000
Aug 04 23:00:18 haze kernel: RSP: 0018:ffff880366adbc70  EFLAGS: 00010286
Aug 04 23:00:18 haze kernel: RAX: 00000000ffd61000 RBX: ffff88041c846098 RCX: 0000000000000000
Aug 04 23:00:18 haze kernel: RDX: 0000000000000000 RSI: ffff88041c846098 RDI: ffff88041c846098
Aug 04 23:00:18 haze kernel: RBP: ffff880366adbc98 R08: 0000000000000002 R09: ffff880366adbc3c
Aug 04 23:00:18 haze kernel: R10: 0000000000000001 R11: 000000000000062e R12: ffff88041c846146
Aug 04 23:00:18 haze kernel: R13: 00000000ffd61000 R14: 0000000000000000 R15: 000000000000000c
Aug 04 23:00:18 haze kernel: FS:  00007f7f40cf17c0(0000) GS:ffff88042f300000(0000) knlGS:0000000000000000
Aug 04 23:00:18 haze kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 04 23:00:18 haze kernel: CR2: 00000000ffd61000 CR3: 0000000365262000 CR4: 00000000001407e0
Aug 04 23:00:18 haze kernel: Stack:
Aug 04 23:00:18 haze kernel:  ffffffff813fe7b6 0008880366adbca8 0000000000000008 ffff88041c846098
Aug 04 23:00:18 haze kernel:  0000000000000004 ffff880366adbcc8 ffffffff813ff6c1 ffff88041c846098
Aug 04 23:00:18 haze kernel:  0000000000000004 ffff88041c846146 0000000000000246 ffff880366adbcf8
Aug 04 23:00:18 haze kernel: Call Trace:
Aug 04 23:00:18 haze kernel:  [<ffffffff813fe7b6>] ? __rpm_callback+0x36/0x90
Aug 04 23:00:18 haze kernel:  [<ffffffff813ff6c1>] rpm_idle+0x231/0x2a0
Aug 04 23:00:18 haze kernel:  [<ffffffff813ff783>] __pm_runtime_idle+0x53/0x70
Aug 04 23:00:18 haze kernel:  [<ffffffff813125b8>] pci_device_remove+0x78/0xc0
Aug 04 23:00:18 haze kernel:  [<ffffffff813f4547>] __device_release_driver+0x87/0x120
Aug 04 23:00:18 haze kernel:  [<ffffffff813f4603>] device_release_driver+0x23/0x30
Aug 04 23:00:18 haze kernel:  [<ffffffff813f3405>] unbind_store+0x115/0x160
Aug 04 23:00:18 haze kernel:  [<ffffffff813f24e5>] drv_attr_store+0x25/0x40
Aug 04 23:00:18 haze kernel:  [<ffffffff8125b7ea>] sysfs_kf_write+0x3a/0x50
Aug 04 23:00:18 haze kernel:  [<ffffffff8125ace7>] kernfs_fop_write+0x127/0x180
Aug 04 23:00:18 haze kernel:  [<ffffffff811e05f7>] __vfs_write+0x37/0x110
Aug 04 23:00:18 haze kernel:  [<ffffffff811e3658>] ? __sb_start_write+0x58/0x110
Aug 04 23:00:18 haze kernel:  [<ffffffff812837b3>] ? security_file_permission+0x23/0xa0
Aug 04 23:00:18 haze kernel:  [<ffffffff811e0fc4>] vfs_write+0xa4/0x1c0
Aug 04 23:00:18 haze kernel:  [<ffffffff811e1d49>] SyS_write+0x59/0xd0
Aug 04 23:00:18 haze kernel:  [<ffffffff8158beae>] system_call_fastpath+0x12/0x71
Aug 04 23:00:18 haze kernel: Code:  Bad RIP value.
Aug 04 23:00:18 haze kernel: RIP  [<00000000ffd61000>] 0xffd61000
-- Reboot --


I'm happy to provide any useful information, although this is the first time I've filed a bug against the kernel. This crash is reliably reproducible, so just ask and ye shall receive. Distribution is Arch.
Comment 1 Will Marler 2015-08-11 01:44:38 UTC
No longer reproduces. Not sure what changed.
Comment 2 Balázs László Batári 2015-10-25 21:15:35 UTC
I can - occasionally - reproduce it.

Exact same thing: QEMU-KVM with a Windows 10 guest and with GPU passed through.

My kernel version is 4.2.4
Qemu: 2.4.0.1
Libvirt: 1.2.20

To reproduce it:
 - First shutdown is normally fine
 - Second shutdown triggers it
 - Reboot is always fine

Oh and it is not a "hard freeze", if im on TeamSpeak3 for example the others can still hear me and i can hear them, but keyboard/mouse/screen is not working

I couldnt copy out the dmesg, but i made screenshots:
 https://goo.gl/photos/fqK7Z5gom9FxwVfe9
 https://goo.gl/photos/CwtSX4MBktmEP7J57

The strange thing is that we have very similar motherboards with the reporter ( Z87 Pro3 vs Z87 Pro4 ) maybe its something with Asus
Comment 3 Michael Long 2015-12-13 10:10:37 UTC
Created attachment 197251 [details]
Backtrace taken with kernel 4.3.0
Comment 4 Michael Long 2015-12-13 10:11:15 UTC
Created attachment 197261 [details]
Backtrace taken with kernel 4.4-rc
Comment 5 Michael Long 2015-12-13 10:12:01 UTC
I'm also affected by this problem but haven't found a pattern to reproduce the problem yet.

My observation is, that in former kernel versions, the shutdown problem was on every 10th shutdown but with kernel 4.3 it occurs like every 3rd shutdown, although sometimes I can shutdown and start the Windows 10 VM (still build 10240) several times in a row without issue.

Kernel versions: 4.1.x - 4.4-rcX
Qemu: 2.4.0, 2.4.1

Pass through devices:

VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1)
USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03)

I've attached two stack traces, captured by using a serial connection. I would confirm Balázs' observation that the system is not hard locked because the watchdog kicks in over and over until a reset.