Bug 215351

Summary: Kernel OOPS in vmx.c starting kvm problem began in 5.15.7 works in 5.15.6
Product: Other Reporter: George Shearer (doc)
Component: OtherAssignee: Virtual assignee user for helpdesk (kernelorg-helpdesk)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: regressions
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description George Shearer 2021-12-17 20:13:35 UTC
KVM stopped working on two of my x86_64 systems as of version 5.15.7. Downgrading to 5.15.6 resolves all issues with no other changes.

Attemping to boot kvm produces:
kvm_set_irq: Input/output error

kernel log:
[ 1270.750295] ------------[ cut here ]------------
[ 1270.750301] WARNING: CPU: 10 PID: 2523 at arch/x86/kvm/vmx/vmx.c:6247 vmx_sync_pir_to_irr+0x3a/0x120 [kvm_intel]
[ 1270.750326] Modules linked in: tun bridge stp nct6775 llc hwmon_vid intel_rapl_msr intel_rapl_common iTCO_wdt intel_pmc_bxt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp intel_wmi_thunderbolt mxm_wmi coretemp kvm_intel kvm rapl intel_cstate intel_uncore raid0 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi md_mod lpc_ich snd_hda_intel i2c_i801 i2c_smbus snd_intel_dspcfg snd_intel_sdw_acpi wacom snd_usb_audio snd_hda_codec uvcvideo snd_usbmidi_lib snd_rawmidi snd_hda_core videobuf2_vmalloc snd_hwdep snd_seq_device videobuf2_memops snd_pcm mousedev nvidia_drm(POE) videobuf2_v4l2 nvidia_modeset(POE) snd_timer joydev videobuf2_common e1000e snd mei_me soundcore mei wmi v4l2loopback(OE) videodev mc nvidia_uvm(POE) nvidia(POE) mac_hid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
[ 1270.750432]  aesni_intel crypto_simd cryptd sr_mod xhci_pci cdrom xhci_pci_renesas vfio_pci vfio_pci_core irqbypass vfio_virqfd vfio_iommu_type1 vfio vfat fat nls_iso8859_1
[ 1270.750454] CPU: 10 PID: 2523 Comm: qemu-system-x86 Tainted: P           OE     5.15.10-arch1-1 #1 bbe3990b16b2d76240a936ebf8dc5ba666258542
[ 1270.750460] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
[ 1270.750463] RIP: 0010:vmx_sync_pir_to_irr+0x3a/0x120 [kvm_intel]
[ 1270.750478] Code: 04 25 28 00 00 00 48 89 44 24 08 31 c0 80 3d fc fb 09 ff 00 c7 44 24 04 00 00 00 00 75 47 48 8b 07 80 b8 04 c2 00 00 00 75 16 <0f> 0b 48 8b 3f be 04 03 00 00 c6 87 04 c2 00 00 01 e8 d0 71 fe fe
[ 1270.750481] RSP: 0018:ffffb57a43b6b9e0 EFLAGS: 00010246
[ 1270.750485] RAX: ffffb57a43f45000 RBX: ffffa03e3283c000 RCX: 0000000000000000
[ 1270.750488] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa03e3283c000
[ 1270.750491] RBP: ffffa03dca000000 R08: 0000000000000000 R09: 0000000000000000
[ 1270.750493] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb57a43b6bb40
[ 1270.750495] R13: ffffb57a43b6bb88 R14: 00007f9199ddb120 R15: ffffa03e3283c000
[ 1270.750499] FS:  00007f9199ddc640(0000) GS:ffffa05cdfc80000(0000) knlGS:0000000000000000
[ 1270.750502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1270.750505] CR2: 000056523dccf000 CR3: 0000000105e84006 CR4: 00000000003726e0
[ 1270.750508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1270.750511] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1270.750513] Call Trace:
[ 1270.750517]  <TASK>
[ 1270.750521]  kvm_arch_vcpu_ioctl+0xcd6/0x1560 [kvm 97b295b32edb47f6953359c3801fec0571c9fadd]
[ 1270.750631]  kvm_vcpu_ioctl+0x42e/0x6e0 [kvm 97b295b32edb47f6953359c3801fec0571c9fadd]
[ 1270.750694]  ? syscall_exit_to_user_mode+0x23/0x50
[ 1270.750702]  ? __fget_files+0x9c/0xd0
[ 1270.750712]  __x64_sys_ioctl+0x8b/0xd0
[ 1270.750717]  do_syscall_64+0x59/0x90
[ 1270.750721]  ? syscall_exit_to_user_mode+0x23/0x50
[ 1270.750727]  ? do_syscall_64+0x69/0x90
[ 1270.750730]  ? do_syscall_64+0x69/0x90
[ 1270.750733]  ? do_syscall_64+0x69/0x90
[ 1270.750736]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1270.750744] RIP: 0033:0x7f919d2b859b
[ 1270.750748] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48
[ 1270.750751] RSP: 002b:00007f9199ddb078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1270.750756] RAX: ffffffffffffffda RBX: 000000008400ae8e RCX: 00007f919d2b859b
[ 1270.750759] RDX: 00007f9199ddb120 RSI: ffffffff8400ae8e RDI: 0000000000000019
[ 1270.750782] RBP: 000056523db2db60 R08: 000056523cb2eac4 R09: 000056523cb2ea24
[ 1270.750785] R10: 000056523cb2e8c0 R11: 0000000000000246 R12: 00007f9199ddb120
[ 1270.750787] R13: 000000aa00000000 R14: 0000005500000000 R15: 000fff0000000000
[ 1270.750793]  </TASK>
[ 1270.750795] ---[ end trace e0f3b79e15ad0008 ]---

$ cat /proc/cmdline 
quiet splash vga=current nomodeset loglevel=0 rd.systemd.show_status=false rd.udev.log_priority=0 intremap=no_x2apic_optout mitigations=off root=/dev/vg00/root rw rootfstype=ext4 add_efi_memmap initrd=EFI\arch\intel-ucode.img initrd=EFI\arch\initramfs.img init=/usr/lib/systemd/systemd intel_iommu=on iommu=pt rd.luks.uuid=f863c926-fd06-4135-841b-a825f6aacb25 rd.luks.options=discard

$ cat /etc/modprobe.d/kvm.conf 
options kvm ignore_msrs=1
options kvm-intel nested=y ept=y enable_apicv=n

NOTE-- if I change that enable_apicv to 'y' -- my kvm instance boots but hangs when attempting to access SRV-IO PCI-E GPU.

$ grep 'model name' /proc/cpuinfo |head -1; uname -a
model name	: Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz
Linux docslinuxbox 5.15.10-arch1-1 #1 SMP PREEMPT Fri, 17 Dec 2021 11:17:37 +0000 x86_64 GNU/Linux

$ free
               total        used        free      shared  buff/cache   available
Mem:       132031320    19011304   110854572      110980     2165444   110007200
Swap:       16777212           0    16777212
Comment 1 George Shearer 2021-12-17 20:15:38 UTC
I believe this problem has something to do with this work:

https://www.spinics.net/lists/stable/msg519358.html
Comment 2 George Shearer 2021-12-17 20:17:12 UTC
QEMU command line (works fine in 5.15.6 and prior)

qemu-system-x86_64 \
-name sis \
-uuid 74631f72-af98-4530-a1aa-b7c2686dd4f0 \
-smbios type=1,serial=74631f72-af98-4530-a1aa-b7c2686dd4f0,uuid=74631f72-af98-4530-a1aa-b7c2686dd4f0,manufacturer=GeoIsAwesome\
-M q35,accel=kvm,vmport=off \
-nodefaults -no-user-config \
-msg timestamp=on \
-rtc base=utc,driftfix=slew \
-no-fd-bootchk \
-overcommit mem-lock=on,cpu-pm=off \
-cpu qemu64 \
-m 16G -mem-prealloc -mem-path /dev/hugepages \
-smp sockets=1,cores=8,threads=2 \
-device qxl-vga,id=video0 \
-device intel-hda,id=sound \
-device hda-micro,id=codec0,bus=sound.0,cad=0 \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2-ovmf/x64/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=/mnt/fast/qemu/sis-nvram.raw\
-drive id=vdisk0,if=none,media=disk,cache=none,format=qcow2,aio=native,discard=unmap,file=/mnt/fast/qemu/sis-vdisk0-change.qcow2 \
-object iothread,id=iot0 \
-device virtio-blk-pci,id=scont0,drive=vdisk0,scsi=off,config-wce=off,iothread=iot0 \
-netdev tap,id=vnic0,script=no,downscript=no,br=vmbridge \
-device virtio-net-pci,netdev=vnic0,mac=02:62:03:DB:A7:F4 \
-device virtio-balloon-pci,id=balloon0 \
-object rng-random,filename=/dev/random,id=rng0 \
-device virtio-rng-pci,rng=rng0 \
-device ich9-usb-ehci1,id=ehci,multifunction=on \
-device ich9-usb-uhci1,masterbus=ehci.0,firstport=0,multifunction=on \
-device ich9-usb-uhci1,masterbus=ehci.0,firstport=2,multifunction=on \
-device ich9-usb-uhci1,masterbus=ehci.0,firstport=4,multifunction=on \
-device virtio-tablet-pci,id=tablet0 \
-monitor telnet:127.0.0.1:7002,server,nowait,nodelay \
-chardev spicevmc,id=charredir0,name=usbredir \
-chardev spicevmc,id=charredir1,name=usbredir \
-chardev spicevmc,id=charredir2,name=usbredir \
-chardev spicevmc,id=charredir3,name=usbredir \
-device usb-redir,chardev=charredir0,id=redir0 \
-device usb-redir,chardev=charredir1,id=redir1 \
-device usb-redir,chardev=charredir2,id=redir2 \
-device usb-redir,chardev=charredir3,id=redir3 \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtio-serial-pci,id=virtio-serial0 \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
-spice port=7102,addr=0.0.0.0,disable-ticketing=on,agent-mouse=on,seamless-migration=off,plaintext-channel=main,plaintext-channel=display,plaintext-channel=inputs,plaintext-channel=cursor,plaintext-channel=playback,plaintext-channel=record,plaintext-channel=usbredir,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=filter,playback-compression=off\
-display vnc=none \
--pidfile /run/dhqemu-sis.pid 
-daemonize
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2021-12-18 06:16:25 UTC
Hi, this is your Linux kernel regression tracker speaking.

5.15.10 is out and contains a fix for a KVM issue. Could you give it a try? If not, we should bring the report to the mailing lists, as outlined in 
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

And btw: have you tried to revert the change you pointed to above to see if it fixes the problem?
Comment 4 George Shearer 2021-12-18 16:07:00 UTC
(In reply to Thorsten Leemhuis from comment #3)
> Hi, this is your Linux kernel regression tracker speaking.
> 
> 5.15.10 is out and contains a fix for a KVM issue. Could you give it a try?
> If not, we should bring the report to the mailing lists, as outlined in 
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
> 
> And btw: have you tried to revert the change you pointed to above to see if
> it fixes the problem?

Yes, the problem persist in 5.15.10, here's a fresh cut/paste:

[  137.123571] ------------[ cut here ]------------
[  137.123575] WARNING: CPU: 1 PID: 3606 at arch/x86/kvm/vmx/vmx.c:6247 vmx_sync_pir_to_irr+0x3a/0x120 [kvm_intel]
[  137.123587] Modules linked in: tun rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache netfs intel_rapl_msr intel_rapl_common bridge stp nct6775 llc hwmon_vid iTCO_wdt intel_pmc_bxt iTCO_vendor_support x86_pkg_temp_thermal intel_wmi_thunderbolt mxm_wmi intel_powerclamp coretemp kvm_intel kvm rapl intel_cstate snd_hda_codec_realtek intel_uncore raid0 snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio md_mod snd_hda_intel i2c_i801 snd_intel_dspcfg lpc_ich i2c_smbus e1000e snd_intel_sdw_acpi snd_usb_audio snd_hda_codec wacom snd_hda_core uvcvideo snd_usbmidi_lib videobuf2_vmalloc videobuf2_memops snd_rawmidi snd_hwdep nvidia_drm(POE) videobuf2_v4l2 snd_seq_device mousedev nvidia_modeset(POE) joydev videobuf2_common snd_pcm snd_timer snd mei_me soundcore mei wmi v4l2loopback(OE) videodev mc nvidia_uvm(POE) nvidia(POE) mac_hid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm
[  137.123628]  rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd sr_mod xhci_pci cdrom xhci_pci_renesas vfio_pci vfio_pci_core irqbypass vfio_virqfd vfio_iommu_type1 vfio vfat fat nls_iso8859_1
[  137.123639] CPU: 1 PID: 3606 Comm: qemu-system-x86 Tainted: P           OE     5.15.10-arch1-1 #1 bbe3990b16b2d76240a936ebf8dc5ba666258542
[  137.123642] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
[  137.123643] RIP: 0010:vmx_sync_pir_to_irr+0x3a/0x120 [kvm_intel]
[  137.123648] Code: 04 25 28 00 00 00 48 89 44 24 08 31 c0 80 3d fc 4b 15 00 00 c7 44 24 04 00 00 00 00 75 47 48 8b 07 80 b8 04 c2 00 00 00 75 16 <0f> 0b 48 8b 3f be 04 03 00 00 c6 87 04 c2 00 00 01 e8 d0 c1 09 00
[  137.123649] RSP: 0018:ffffbb63c3cbf978 EFLAGS: 00010246
[  137.123650] RAX: ffffbb63c3d21000 RBX: ffff9597a7258000 RCX: 0000000000000000
[  137.123651] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9597a7258000
[  137.123652] RBP: ffff95970f04ec00 R08: 0000000000000000 R09: 0000000000000000
[  137.123653] R10: 0000000000000000 R11: 0000000000000000 R12: ffffbb63c3cbfad8
[  137.123654] R13: ffffbb63c3cbfb20 R14: 00007f4742c99120 R15: ffff9597a7258000
[  137.123655] FS:  00007f4742c9a640(0000) GS:ffff95b61fa40000(0000) knlGS:0000000000000000
[  137.123656] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.123657] CR2: 00007fd7827d0000 CR3: 0000000122638002 CR4: 00000000003726e0
[  137.123658] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  137.123659] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  137.123659] Call Trace:
[  137.123661]  <TASK>
[  137.123663]  kvm_arch_vcpu_ioctl+0xcd6/0x1560 [kvm 97b295b32edb47f6953359c3801fec0571c9fadd]
[  137.123712]  kvm_vcpu_ioctl+0x42e/0x6e0 [kvm 97b295b32edb47f6953359c3801fec0571c9fadd]
[  137.123732]  ? syscall_exit_to_user_mode+0x23/0x50
[  137.123736]  ? __fget_files+0x9c/0xd0
[  137.123741]  __x64_sys_ioctl+0x8b/0xd0
[  137.123743]  do_syscall_64+0x59/0x90
[  137.123744]  ? syscall_exit_to_user_mode+0x23/0x50
[  137.123746]  ? do_syscall_64+0x69/0x90
[  137.123747]  ? syscall_exit_to_user_mode+0x23/0x50
[  137.123749]  ? do_syscall_64+0x69/0x90
[  137.123750]  ? syscall_exit_to_user_mode+0x23/0x50
[  137.123751]  ? do_syscall_64+0x69/0x90
[  137.123752]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  137.123755] RIP: 0033:0x7f474a17959b
[  137.123757] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48
[  137.123758] RSP: 002b:00007f4742c99078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  137.123759] RAX: ffffffffffffffda RBX: 000000008400ae8e RCX: 00007f474a17959b
[  137.123760] RDX: 00007f4742c99120 RSI: ffffffff8400ae8e RDI: 0000000000000019
[  137.123761] RBP: 0000561c10142b60 R08: 0000561c0dbe5ac4 R09: 0000561c0dbe5a24
[  137.123762] R10: 0000561c0dbe58c0 R11: 0000000000000246 R12: 00007f4742c99120
[  137.123762] R13: 000000aa00000000 R14: 0000005500000000 R15: 000fff0000000000
[  137.123765]  </TASK>
[  137.123765] ---[ end trace 356a4adaddf3d772 ]---


I haven't had time to try to revert the change as these are both production systems and will require downtime. Maybe Sunday.
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2021-12-19 05:18:24 UTC
TWIMC: Forwarded it to the list to increase awareness:
https://lore.kernel.org/regressions/f1ea22d3-cff8-406a-ad6a-cb8e0124a9b4@leemhuis.info/

BTW: for almost all subsystems mail is the preferred way of reporting issues, as stated in https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html, which is prominently linked on the front page of bugzilla.kernel.org. Hence it's likely the best way to continue by mail (George, I CCed you).
Comment 6 George Shearer 2021-12-24 02:23:21 UTC
Hello everyone, the problem seems to be resolved in 5.15.11. I can no longer reproduce the issue.

Thank you and happy Holidays!