Bug 216017

Summary: KVM: problem virtualization from kernel 5.17.9/5.18
Product: Virtualization Reporter: Alexey Boldyrev (ne-vlezay80)
Component: kvmAssignee: virtualization_kvm
Status: NEW ---    
Severity: high CC: borislav_ba, dongli.zhang, nutodafozo
Priority: P1 Keywords: opw
Hardware: AMD   
OS: Linux   
Kernel Version: 5.17.9-arch1-1, 5.18.0-arch1-1 Subsystem:
Regression: No Bisected commit-id:

Description Alexey Boldyrev 2022-05-23 08:48:46 UTC
Qemu periodically chaches width:

[root@router ne-vlezay80]# qemu-system-x86_64 -enable-kvm
qemu-system-x86_64: error: failed to set MSR 0xc0000104 to 0x100000000
qemu-system-x86_64: ../qemu-7.0.0/target/i386/kvm/kvm.c:2996: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted (core dumped)

Also if running virtual pachine width type -cpu host, system is freezez from kernel panic. 

Kernel version: 5.17.9
Distribution: Arch Linux
QEMU: 7.0
CPU: AMD Phenom X4
Arch: x86_64
Comment 1 nutodafozo 2022-05-23 09:05:46 UTC
KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled

might have to do something with this commit
Comment 2 mlevitsk 2022-05-23 09:09:47 UTC
On Mon, 2022-05-23 at 08:48 +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216017
> 
>             Bug ID: 216017
>            Summary: KVM: problem virtualization from kernel 5.17.9
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 5.17.9-arch1-1
>           Hardware: AMD
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Keywords: opw
>           Severity: high
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: ne-vlezay80@yandex.ru
>         Regression: No
> 
> Qemu periodically chaches width:
> 
> [root@router ne-vlezay80]# qemu-system-x86_64 -enable-kvm
> qemu-system-x86_64: error: failed to set MSR 0xc0000104 to 0x100000000
> qemu-system-x86_64: ../qemu-7.0.0/target/i386/kvm/kvm.c:2996:
> kvm_buf_set_msrs:
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> Aborted (core dumped)

This is my fault. You can either revert the commit you found in qemu,
or update the kernel to 5.18.

> 
> Also if running virtual pachine width type -cpu host, system is freezez from
> kernel panic. 

Can you check if this happens with 5.18 as well? If so, try to capture the panic message.


Best regards,
	Maxim Levitsky

> 
> Kernel version: 5.17.9
> Distribution: Arch Linux
> QEMU: 7.0
> CPU: AMD Phenom X4
> Arch: x86_64
>
Comment 3 Alexey Boldyrev 2022-05-23 10:18:34 UTC
>Can you check if this happens with 5.18 as well? If so, try to capture the
>panic message.
The system is from router. Opportunities capture panic message is not.
Comment 4 Alexey Boldyrev 2022-05-29 13:48:31 UTC
The situation with the bug on 5.18 has not changed. As before, the system goes into the panic kernel...
Comment 5 Alexey Boldyrev 2022-05-29 16:10:46 UTC
Although I just had on the AMD FX-6300 virtual machines seemed to work under the 5.18 kernel on the host. It looks like this is some kind of bug that only appears on AMD Phenom X4.
Comment 6 Alexey Boldyrev 2022-05-29 20:22:16 UTC
(In reply to mlevitsk from comment #2)
> On Mon, 2022-05-23 at 08:48 +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216017
> > 
> >             Bug ID: 216017
> >            Summary: KVM: problem virtualization from kernel 5.17.9
> >            Product: Virtualization
> >            Version: unspecified
> >     Kernel Version: 5.17.9-arch1-1
> >           Hardware: AMD
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Keywords: opw
> >           Severity: high
> >           Priority: P1
> >          Component: kvm
> >           Assignee: virtualization_kvm@kernel-bugs.osdl.org
> >           Reporter: ne-vlezay80@yandex.ru
> >         Regression: No
> > 
> > Qemu periodically chaches width:
> > 
> > [root@router ne-vlezay80]# qemu-system-x86_64 -enable-kvm
> > qemu-system-x86_64: error: failed to set MSR 0xc0000104 to 0x100000000
> > qemu-system-x86_64: ../qemu-7.0.0/target/i386/kvm/kvm.c:2996:
> > kvm_buf_set_msrs:
> > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> > Aborted (core dumped)
> 
> This is my fault. You can either revert the commit you found in qemu,
> or update the kernel to 5.18.
> 
> > 
> > Also if running virtual pachine width type -cpu host, system is freezez
> from
> > kernel panic. 
> 
> Can you check if this happens with 5.18 as well? If so, try to capture the
> panic message.
> 
> 
> Best regards,
>       Maxim Levitsky
> 
> > 
> > Kernel version: 5.17.9
> > Distribution: Arch Linux
> > QEMU: 7.0
> > CPU: AMD Phenom X4
> > Arch: x86_64
> >

OOPS message from kernel 5.18 in KVM:
[  598.682995] BUG: kernel NULL pointer dereference, address: 000000000000000b
[  598.683020] #PF: supervisor write access in kernel mode
[  598.683031] #PF: error_code(0x0002) - not-present page
[  598.683041] PGD 0 P4D 0 
[  598.683053] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  598.683066] CPU: 2 PID: 13004 Comm: qemu-system-x86 Not tainted 5.18.0-arch1-1 #1 b71a70fe104889aac2f32556bc52f649da2881d2
[  598.683086] Hardware name: MSI MS-7715/870-C45(FX) V2 (MS-7715)  , BIOS V3.1 04/16/2012
[  598.683097] RIP: 0010:kvm_replace_memslot+0xc0/0x380 [kvm]
[  598.683315] Code: 04 00 00 48 85 c0 0f 84 3b 02 00 00 48 89 d9 48 c1 e1 04 48 01 c1 48 8b 71 08 48 85 f6 74 1e 48 8b 39 48 89 3e 48 85 ff 74 04 <48> 89 77 08 48 c7 01 00 00 00 00 48 c7 41 08 00 00 00 00 48 8d 0c
[  598.683334] RSP: 0018:ffffbe0bc851bd50 EFLAGS: 00010206
[  598.683346] RAX: ffff96da40977a00 RBX: 0000000000000000 RCX: ffff96da40977a00
[  598.683358] RDX: 0000000000000000 RSI: ffffbe0bc8509110 RDI: 0000000000000003
[  598.683368] RBP: ffff96da40977000 R08: 0000000000000200 R09: ffff96da40977000
[  598.683378] R10: 0000000000000000 R11: fffffffffffffff0 R12: 0000000000000000
[  598.683388] R13: 0000000000000000 R14: 0000000000000000 R15: ffffbe0bc8509000
[  598.683398] FS:  00007f52ef16a640(0000) GS:ffff96da6b880000(0000) knlGS:0000000000000000
[  598.683413] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  598.683424] CR2: 000000000000000b CR3: 00000001003e4000 CR4: 00000000000006e0
[  598.683437] Call Trace:
[  598.683448]  <TASK>
[  598.683457]  ? kmem_cache_alloc_trace+0x16b/0x300
[  598.683480]  kvm_set_memslot+0x2a5/0x4b0 [kvm db3c7a88bf101c39d9e215d66cd0ad42c132fef6]
[  598.683666]  kvm_vm_ioctl+0x33f/0xe90 [kvm db3c7a88bf101c39d9e215d66cd0ad42c132fef6]
[  598.683852]  ? __rseq_handle_notify_resume+0x321/0x480
[  598.683873]  __x64_sys_ioctl+0x91/0xc0
[  598.683889]  do_syscall_64+0x5f/0x90
[  598.683904]  ? exc_page_fault+0x74/0x170
[  598.683920]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  598.683935] RIP: 0033:0x7f52f0d07b1f
[  598.683947] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  598.683969] RSP: 002b:00007f52ef168fa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  598.683986] RAX: ffffffffffffffda RBX: 000000004020ae46 RCX: 00007f52f0d07b1f
[  598.683997] RDX: 00007f52ef169140 RSI: 000000004020ae46 RDI: 0000000000000008
[  598.684008] RBP: 00007f52ef169140 R08: 0000000000000000 R09: 0000000000000000
[  598.684019] R10: 00007f52d8000c00 R11: 0000000000000246 R12: 000055d6f8080810
[  598.684030] R13: 0000000000020000 R14: 00007f52ee800000 R15: 00000000000e0000
[  598.684047]  </TASK>
[  598.684054] Modules linked in: act_mirred cls_matchall sch_ingress iptable_security ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat dummy nf_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock rpcrdma rdma_cm iw_cm ib_cm ib_core cls_flower sch_htb tcp_bbr ifb veth ip6_gre ip6_tunnel tunnel6 bridge stp llc tun ip_gre ip_tunnel gre ip6table_raw xt_NETMAP ip6table_nat ip6t_rpfilter xt_DSCP ip6table_mangle ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ts_kmp xt_conntrack xt_string iptable_filter xt_MASQUERADE xt_nat iptable_nat xt_set xt_LOG nf_log_syslog xt_mark xt_TCPMSS xt_tcpudp xt_connmark nfnetlink_cttimeout xt_recent xt_dscp iptable_mangle openvswitch ip_set_hash_ip nsh nf_conncount ip_set_hash_net nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink btrfs blake2b_generic xor raid6_pq libcrc32c snd_hda_codec_realtek snd_hda_codec_generic ath9k ledtrig_audio
[  598.684264]  ath9k_common ath9k_hw snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ath snd_hda_codec nouveau edac_mce_amd mac80211 kvm_amd snd_hda_core ccp snd_hwdep libarc4 wmi_bmof mxm_wmi cfg80211 video kvm snd_pcm drm_ttm_helper irqbypass ttm pcspkr rfkill snd_timer r8169 snd rng_core realtek sp5100_tco k10temp soundcore mdio_devres i2c_piix4 e1000e libphy drm_dp_helper wmi mac_hid acpi_cpufreq wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel nfsd udp_tunnel auth_rpcgss dm_multipath nfs_acl dm_mod lockd grace sg sunrpc fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi pata_atiixp
[  598.684545] CR2: 000000000000000b
[  598.684601] ---[ end trace 0000000000000000 ]---
[  598.684613] RIP: 0010:kvm_replace_memslot+0xc0/0x380 [kvm]
[  598.684824] Code: 04 00 00 48 85 c0 0f 84 3b 02 00 00 48 89 d9 48 c1 e1 04 48 01 c1 48 8b 71 08 48 85 f6 74 1e 48 8b 39 48 89 3e 48 85 ff 74 04 <48> 89 77 08 48 c7 01 00 00 00 00 48 c7 41 08 00 00 00 00 48 8d 0c
[  598.684846] RSP: 0018:ffffbe0bc851bd50 EFLAGS: 00010206
[  598.684859] RAX: ffff96da40977a00 RBX: 0000000000000000 RCX: ffff96da40977a00
[  598.684871] RDX: 0000000000000000 RSI: ffffbe0bc8509110 RDI: 0000000000000003
[  598.684882] RBP: ffff96da40977000 R08: 0000000000000200 R09: ffff96da40977000
[  598.684894] R10: 0000000000000000 R11: fffffffffffffff0 R12: 0000000000000000
[  598.684905] R13: 0000000000000000 R14: 0000000000000000 R15: ffffbe0bc8509000
[  598.684916] FS:  00007f52ef16a640(0000) GS:ffff96da6b880000(0000) knlGS:0000000000000000
[  598.684931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  598.684942] CR2: 000000000000000b CR3: 00000001003e4000 CR4: 00000000000006e0

CPU:
model name      : AMD Phenom(tm) II X4 965 Processor
Comment 7 Borislav Gerassimov 2022-05-30 14:05:28 UTC
I'm getting the exact same error on 5.17.9 and 5.18 too. The only difference is the hardware:
[   32.571294] Hardware name: Hewlett-Packard HP Z600 Workstation/0AE8h, BIOS 786G4 v03.61 03/05/2018
with this processor: Intel(R) Xeon(R) CPU X5550
Comment 8 Alexey Boldyrev 2022-05-30 14:15:53 UTC
Who has a source based kernel, you can apply this patch:
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?h=queue&id=51c4476c00c110486a06aae7eb93dec622ed28ed