Bug 217516

Summary: FAIL: TSC reference precision test when do hyperv_clock test of kvm unit test
Product: Virtualization Reporter: Ethan Xie (zjuysxie)
Component: kvmAssignee: virtualization_kvm
Status: NEW ---    
Severity: high CC: bagasdotme, bonzini, vkuznets
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Ethan Xie 2023-06-01 12:04:19 UTC
when i do kvm unit test with -cpu host,hv_time it failed sometime(e.g. not everytime) with error info "FAIL: TSC reference precision"

detail test failed log:
# timeout -k 1s --foreground 90s /usr/libexec/qemu-kvm --no-reboot -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel x86/hyperv_clock.flat -cpu host,hv_time # -initrd /tmp/tmp.kFgWZZHuFw
enabling apic
smp: waiting for 0 APs
paging enabled
cr0 = 80010011
cr3 = 1007000
cr4 = 20
PASS: MSR value after enabling
scale: 10624e40e147ae1 offset: -96899
refcnt 1523884, TSC 1826d03c, TSC reference 1523889
refcnt 21523885 (delta 20000001), TSC 1422cb9da, TSC reference 21523888 (delta 19999999)
delta on CPU 0 was 3...86
PASS: TSC reference precision test
iterations/sec:  47979415
PASS: MSR value after disabling
[root@iZbp1dwjy9g3o14k8rot75Z kvm-unit-tests]# timeout -k 1s --foreground 90s /usr/libexec/qemu-kvm --no-reboot -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel x86/hyperv_clock.flat -cpu host,hv_time # -initrd /tmp/tmp.kFgWZZHuFw
enabling apic
smp: waiting for 0 APs
paging enabled
cr0 = 80010011
cr3 = 1007000
cr4 = 20
PASS: MSR value after enabling
scale: 10624e40e147ae1 offset: -94820
refcnt 1523422, TSC 181d1f72, TSC reference 1523427
refcnt 21523428 (delta 20000006), TSC 142230e40, TSC reference 21523432 (delta 20000005)
suspecting drift on CPU 0? delta = 75, acceptable [0, 45)
FAIL: TSC reference precision test
iterations/sec:  47972169
PASS: MSR value after disabling


os info:
# cat /etc/os-release
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"

# cat /etc/redhat-release
CentOS Linux release 8.5.2111

# uname -r
4.18.0-348.7.1.el8_5.x86_64

I use kvm unit test code with branch master at commit:
commit 02d8befe99f8205d4caea402d8b0800354255681 (HEAD -> master, origin/master, origin/HEAD)
Author: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Date:   Tue Apr 4 20:50:47 2023 +0200

    pretty_print_stacks: modify relative path calculation


cpu info:
# cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
stepping	: 7
microcode	: 0x5003102
cpu MHz		: 3800.000
cache size	: 36608 KB
physical id	: 0
siblings	: 52
core id		: 0
cpu cores	: 26
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit
bogomips	: 5000.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

[...]

processor	: 103
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
stepping	: 7
microcode	: 0x5003102
cpu MHz		: 3800.000
cache size	: 36608 KB
physical id	: 1
siblings	: 52
core id		: 29
cpu cores	: 26
apicid		: 123
initial apicid	: 123
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit
bogomips	: 5006.84
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
Comment 1 Ethan Xie 2023-06-01 12:10:42 UTC
qemu version:
# rpm -qa | grep qemu-kvm
qemu-kvm-common-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
qemu-kvm-block-gluster-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
qemu-kvm-block-curl-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
qemu-kvm-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
qemu-kvm-block-iscsi-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
qemu-kvm-block-rbd-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
qemu-kvm-block-ssh-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
qemu-kvm-core-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64
Comment 2 Bagas Sanjaya 2023-06-02 12:13:30 UTC
(In reply to Ethan Xie from comment #0)
> # cat /etc/redhat-release
> CentOS Linux release 8.5.2111
> 
> # uname -r
> 4.18.0-348.7.1.el8_5.x86_64
> 

CentOS 8 has been EOLed in 2021. Can you test with Rocky Linux 9 instead
(and preferably latest mainline kernel)?
Comment 3 Ethan Xie 2023-06-05 03:17:33 UTC
(In reply to Bagas Sanjaya from comment #2)
> (In reply to Ethan Xie from comment #0)
> > # cat /etc/redhat-release
> > CentOS Linux release 8.5.2111
> > 
> > # uname -r
> > 4.18.0-348.7.1.el8_5.x86_64
> > 
> 
> CentOS 8 has been EOLed in 2021. Can you test with Rocky Linux 9 instead
> (and preferably latest mainline kernel)?

I change the OS to Rocky Linux 9 with mainline linux kernel, it also can reproduce this problem:

# timeout -k 1s --foreground 90s /usr/libexec/qemu-kvm --no-reboot -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel x86/hyperv_clock.flat -cpu host,hv_time # -initrd /tmp/tmp.kFgWZZHuFw
qemu-kvm: warning: Machine type 'pc-i440fx-rhel7.6.0' is deprecated: machine types for previous major releases are deprecated
enabling apic
smp: waiting for 0 APs
paging enabled
cr0 = 80010011
cr3 = 1007000
cr4 = 20
PASS: MSR value after enabling
scale: 10624dd2e147ae1 offset: -18011
refcnt 257287, TSC 41a33e4, TSC reference 257293
refcnt 20257287 (delta 20000000), TSC 12e2025fc, TSC reference 20257293 (delta 20000000)
suspecting drift on CPU 0? delta = 103, acceptable [0, 52)
FAIL: TSC reference precision test
iterations/sec:  47953642
PASS: MSR value after disabling

# uname -r
6.4.0-rc5

# cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.0 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.0"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.0 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.0"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.0"

# cat /etc/redhat-release
Rocky Linux release 9.0 (Blue Onyx)

# rpm -qa | grep qemu-kvm
qemu-kvm-docs-7.2.0-14.el9_2.x86_64
qemu-kvm-common-7.2.0-14.el9_2.x86_64
qemu-kvm-device-display-virtio-gpu-7.2.0-14.el9_2.x86_64
qemu-kvm-device-display-virtio-gpu-pci-7.2.0-14.el9_2.x86_64
qemu-kvm-device-usb-host-7.2.0-14.el9_2.x86_64
qemu-kvm-device-display-virtio-vga-7.2.0-14.el9_2.x86_64
qemu-kvm-block-rbd-7.2.0-14.el9_2.x86_64
qemu-kvm-tools-7.2.0-14.el9_2.x86_64
qemu-kvm-device-usb-redirect-7.2.0-14.el9_2.x86_64
qemu-kvm-audio-pa-7.2.0-14.el9_2.x86_64
qemu-kvm-core-7.2.0-14.el9_2.x86_64
qemu-kvm-ui-opengl-7.2.0-14.el9_2.x86_64
qemu-kvm-ui-egl-headless-7.2.0-14.el9_2.x86_64
qemu-kvm-7.2.0-14.el9_2.x86_64


linux from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master branch with commit:
commit 9561de3a55bed6bdd44a12820ba81ec416e705a7 (HEAD -> master, tag: v6.4-rc5, origin/master, origin/HEAD)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Jun 4 14:04:27 2023 -0400

    Linux 6.4-rc5
Comment 4 vkuznets 2023-06-12 09:20:49 UTC
It seems this is just an unstable test. It merges the divergence between MSR based clock and TSC page over one second and then expects delta to stay within the measured range over another two seconds. This works well for a completely idle system but if tasks get scheduled out, rescheduled to a different CPU,... the test fails. Widening the range help, e.g.:

diff --git a/x86/hyperv_clock.c b/x86/hyperv_clock.c
index f1e7204a8ea9..57d25770a2d0 100644
--- a/x86/hyperv_clock.c
+++ b/x86/hyperv_clock.c
@@ -79,7 +79,7 @@ static void hv_clock_test(void *data)
                min_delta = delta < min_delta ? delta : min_delta;
                if (t < msr_sample) {
                        max_delta = delta > max_delta ? delta: max_delta;
-               } else if (delta < 0 || delta > max_delta * 3 / 2) {
+               } else if (delta < 0 || delta > max_delta * 1024) {
                        printf("suspecting drift on CPU %d? delta = %d, acceptable [0, %d)\n", smp_id(),
                               delta, max_delta);
                        ok[i] = false;

but I wouldn't be surprised if on a busy system even '1024 * max_delta' is not going to be sufficient. Maybe we should make this a warning and not fail the whole test as I don't see
how we can make it reliable.

Paolo (as you're the author), wdyt?