Bug 215351
Summary: | Kernel OOPS in vmx.c starting kvm problem began in 5.15.7 works in 5.15.6 | ||
---|---|---|---|
Product: | Other | Reporter: | George Shearer (doc) |
Component: | Other | Assignee: | Virtual assignee user for helpdesk (kernelorg-helpdesk) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | regressions |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: |
Description
George Shearer
2021-12-17 20:13:35 UTC
I believe this problem has something to do with this work: https://www.spinics.net/lists/stable/msg519358.html QEMU command line (works fine in 5.15.6 and prior) qemu-system-x86_64 \ -name sis \ -uuid 74631f72-af98-4530-a1aa-b7c2686dd4f0 \ -smbios type=1,serial=74631f72-af98-4530-a1aa-b7c2686dd4f0,uuid=74631f72-af98-4530-a1aa-b7c2686dd4f0,manufacturer=GeoIsAwesome\ -M q35,accel=kvm,vmport=off \ -nodefaults -no-user-config \ -msg timestamp=on \ -rtc base=utc,driftfix=slew \ -no-fd-bootchk \ -overcommit mem-lock=on,cpu-pm=off \ -cpu qemu64 \ -m 16G -mem-prealloc -mem-path /dev/hugepages \ -smp sockets=1,cores=8,threads=2 \ -device qxl-vga,id=video0 \ -device intel-hda,id=sound \ -device hda-micro,id=codec0,bus=sound.0,cad=0 \ -drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2-ovmf/x64/OVMF_CODE.fd \ -drive if=pflash,format=raw,file=/mnt/fast/qemu/sis-nvram.raw\ -drive id=vdisk0,if=none,media=disk,cache=none,format=qcow2,aio=native,discard=unmap,file=/mnt/fast/qemu/sis-vdisk0-change.qcow2 \ -object iothread,id=iot0 \ -device virtio-blk-pci,id=scont0,drive=vdisk0,scsi=off,config-wce=off,iothread=iot0 \ -netdev tap,id=vnic0,script=no,downscript=no,br=vmbridge \ -device virtio-net-pci,netdev=vnic0,mac=02:62:03:DB:A7:F4 \ -device virtio-balloon-pci,id=balloon0 \ -object rng-random,filename=/dev/random,id=rng0 \ -device virtio-rng-pci,rng=rng0 \ -device ich9-usb-ehci1,id=ehci,multifunction=on \ -device ich9-usb-uhci1,masterbus=ehci.0,firstport=0,multifunction=on \ -device ich9-usb-uhci1,masterbus=ehci.0,firstport=2,multifunction=on \ -device ich9-usb-uhci1,masterbus=ehci.0,firstport=4,multifunction=on \ -device virtio-tablet-pci,id=tablet0 \ -monitor telnet:127.0.0.1:7002,server,nowait,nodelay \ -chardev spicevmc,id=charredir0,name=usbredir \ -chardev spicevmc,id=charredir1,name=usbredir \ -chardev spicevmc,id=charredir2,name=usbredir \ -chardev spicevmc,id=charredir3,name=usbredir \ -device usb-redir,chardev=charredir0,id=redir0 \ -device usb-redir,chardev=charredir1,id=redir1 \ -device usb-redir,chardev=charredir2,id=redir2 \ -device usb-redir,chardev=charredir3,id=redir3 \ -chardev spicevmc,id=charchannel0,name=vdagent \ -device virtio-serial-pci,id=virtio-serial0 \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \ -spice port=7102,addr=0.0.0.0,disable-ticketing=on,agent-mouse=on,seamless-migration=off,plaintext-channel=main,plaintext-channel=display,plaintext-channel=inputs,plaintext-channel=cursor,plaintext-channel=playback,plaintext-channel=record,plaintext-channel=usbredir,image-compression=auto_glz,jpeg-wan-compression=auto,zlib-glz-wan-compression=auto,streaming-video=filter,playback-compression=off\ -display vnc=none \ --pidfile /run/dhqemu-sis.pid -daemonize Hi, this is your Linux kernel regression tracker speaking. 5.15.10 is out and contains a fix for a KVM issue. Could you give it a try? If not, we should bring the report to the mailing lists, as outlined in https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html And btw: have you tried to revert the change you pointed to above to see if it fixes the problem? (In reply to Thorsten Leemhuis from comment #3) > Hi, this is your Linux kernel regression tracker speaking. > > 5.15.10 is out and contains a fix for a KVM issue. Could you give it a try? > If not, we should bring the report to the mailing lists, as outlined in > https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html > > And btw: have you tried to revert the change you pointed to above to see if > it fixes the problem? Yes, the problem persist in 5.15.10, here's a fresh cut/paste: [ 137.123571] ------------[ cut here ]------------ [ 137.123575] WARNING: CPU: 1 PID: 3606 at arch/x86/kvm/vmx/vmx.c:6247 vmx_sync_pir_to_irr+0x3a/0x120 [kvm_intel] [ 137.123587] Modules linked in: tun rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache netfs intel_rapl_msr intel_rapl_common bridge stp nct6775 llc hwmon_vid iTCO_wdt intel_pmc_bxt iTCO_vendor_support x86_pkg_temp_thermal intel_wmi_thunderbolt mxm_wmi intel_powerclamp coretemp kvm_intel kvm rapl intel_cstate snd_hda_codec_realtek intel_uncore raid0 snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio md_mod snd_hda_intel i2c_i801 snd_intel_dspcfg lpc_ich i2c_smbus e1000e snd_intel_sdw_acpi snd_usb_audio snd_hda_codec wacom snd_hda_core uvcvideo snd_usbmidi_lib videobuf2_vmalloc videobuf2_memops snd_rawmidi snd_hwdep nvidia_drm(POE) videobuf2_v4l2 snd_seq_device mousedev nvidia_modeset(POE) joydev videobuf2_common snd_pcm snd_timer snd mei_me soundcore mei wmi v4l2loopback(OE) videodev mc nvidia_uvm(POE) nvidia(POE) mac_hid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm [ 137.123628] rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd sr_mod xhci_pci cdrom xhci_pci_renesas vfio_pci vfio_pci_core irqbypass vfio_virqfd vfio_iommu_type1 vfio vfat fat nls_iso8859_1 [ 137.123639] CPU: 1 PID: 3606 Comm: qemu-system-x86 Tainted: P OE 5.15.10-arch1-1 #1 bbe3990b16b2d76240a936ebf8dc5ba666258542 [ 137.123642] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018 [ 137.123643] RIP: 0010:vmx_sync_pir_to_irr+0x3a/0x120 [kvm_intel] [ 137.123648] Code: 04 25 28 00 00 00 48 89 44 24 08 31 c0 80 3d fc 4b 15 00 00 c7 44 24 04 00 00 00 00 75 47 48 8b 07 80 b8 04 c2 00 00 00 75 16 <0f> 0b 48 8b 3f be 04 03 00 00 c6 87 04 c2 00 00 01 e8 d0 c1 09 00 [ 137.123649] RSP: 0018:ffffbb63c3cbf978 EFLAGS: 00010246 [ 137.123650] RAX: ffffbb63c3d21000 RBX: ffff9597a7258000 RCX: 0000000000000000 [ 137.123651] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9597a7258000 [ 137.123652] RBP: ffff95970f04ec00 R08: 0000000000000000 R09: 0000000000000000 [ 137.123653] R10: 0000000000000000 R11: 0000000000000000 R12: ffffbb63c3cbfad8 [ 137.123654] R13: ffffbb63c3cbfb20 R14: 00007f4742c99120 R15: ffff9597a7258000 [ 137.123655] FS: 00007f4742c9a640(0000) GS:ffff95b61fa40000(0000) knlGS:0000000000000000 [ 137.123656] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.123657] CR2: 00007fd7827d0000 CR3: 0000000122638002 CR4: 00000000003726e0 [ 137.123658] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 137.123659] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 137.123659] Call Trace: [ 137.123661] <TASK> [ 137.123663] kvm_arch_vcpu_ioctl+0xcd6/0x1560 [kvm 97b295b32edb47f6953359c3801fec0571c9fadd] [ 137.123712] kvm_vcpu_ioctl+0x42e/0x6e0 [kvm 97b295b32edb47f6953359c3801fec0571c9fadd] [ 137.123732] ? syscall_exit_to_user_mode+0x23/0x50 [ 137.123736] ? __fget_files+0x9c/0xd0 [ 137.123741] __x64_sys_ioctl+0x8b/0xd0 [ 137.123743] do_syscall_64+0x59/0x90 [ 137.123744] ? syscall_exit_to_user_mode+0x23/0x50 [ 137.123746] ? do_syscall_64+0x69/0x90 [ 137.123747] ? syscall_exit_to_user_mode+0x23/0x50 [ 137.123749] ? do_syscall_64+0x69/0x90 [ 137.123750] ? syscall_exit_to_user_mode+0x23/0x50 [ 137.123751] ? do_syscall_64+0x69/0x90 [ 137.123752] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 137.123755] RIP: 0033:0x7f474a17959b [ 137.123757] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48 [ 137.123758] RSP: 002b:00007f4742c99078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 137.123759] RAX: ffffffffffffffda RBX: 000000008400ae8e RCX: 00007f474a17959b [ 137.123760] RDX: 00007f4742c99120 RSI: ffffffff8400ae8e RDI: 0000000000000019 [ 137.123761] RBP: 0000561c10142b60 R08: 0000561c0dbe5ac4 R09: 0000561c0dbe5a24 [ 137.123762] R10: 0000561c0dbe58c0 R11: 0000000000000246 R12: 00007f4742c99120 [ 137.123762] R13: 000000aa00000000 R14: 0000005500000000 R15: 000fff0000000000 [ 137.123765] </TASK> [ 137.123765] ---[ end trace 356a4adaddf3d772 ]--- I haven't had time to try to revert the change as these are both production systems and will require downtime. Maybe Sunday. TWIMC: Forwarded it to the list to increase awareness: https://lore.kernel.org/regressions/f1ea22d3-cff8-406a-ad6a-cb8e0124a9b4@leemhuis.info/ BTW: for almost all subsystems mail is the preferred way of reporting issues, as stated in https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html, which is prominently linked on the front page of bugzilla.kernel.org. Hence it's likely the best way to continue by mail (George, I CCed you). Hello everyone, the problem seems to be resolved in 5.15.11. I can no longer reproduce the issue. Thank you and happy Holidays! |