Bug 217516
Summary: | FAIL: TSC reference precision test when do hyperv_clock test of kvm unit test | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Ethan Xie (zjuysxie) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | high | CC: | bagasdotme, bonzini, vkuznets |
Priority: | P3 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: |
Description
Ethan Xie
2023-06-01 12:04:19 UTC
qemu version: # rpm -qa | grep qemu-kvm qemu-kvm-common-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 qemu-kvm-block-gluster-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 qemu-kvm-block-curl-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 qemu-kvm-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 qemu-kvm-block-iscsi-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 qemu-kvm-block-rbd-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 qemu-kvm-block-ssh-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 qemu-kvm-core-4.2.0-59.module_el8.5.0+1063+c9b9feff.1.x86_64 (In reply to Ethan Xie from comment #0) > # cat /etc/redhat-release > CentOS Linux release 8.5.2111 > > # uname -r > 4.18.0-348.7.1.el8_5.x86_64 > CentOS 8 has been EOLed in 2021. Can you test with Rocky Linux 9 instead (and preferably latest mainline kernel)? (In reply to Bagas Sanjaya from comment #2) > (In reply to Ethan Xie from comment #0) > > # cat /etc/redhat-release > > CentOS Linux release 8.5.2111 > > > > # uname -r > > 4.18.0-348.7.1.el8_5.x86_64 > > > > CentOS 8 has been EOLed in 2021. Can you test with Rocky Linux 9 instead > (and preferably latest mainline kernel)? I change the OS to Rocky Linux 9 with mainline linux kernel, it also can reproduce this problem: # timeout -k 1s --foreground 90s /usr/libexec/qemu-kvm --no-reboot -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel x86/hyperv_clock.flat -cpu host,hv_time # -initrd /tmp/tmp.kFgWZZHuFw qemu-kvm: warning: Machine type 'pc-i440fx-rhel7.6.0' is deprecated: machine types for previous major releases are deprecated enabling apic smp: waiting for 0 APs paging enabled cr0 = 80010011 cr3 = 1007000 cr4 = 20 PASS: MSR value after enabling scale: 10624dd2e147ae1 offset: -18011 refcnt 257287, TSC 41a33e4, TSC reference 257293 refcnt 20257287 (delta 20000000), TSC 12e2025fc, TSC reference 20257293 (delta 20000000) suspecting drift on CPU 0? delta = 103, acceptable [0, 52) FAIL: TSC reference precision test iterations/sec: 47953642 PASS: MSR value after disabling # uname -r 6.4.0-rc5 # cat /etc/os-release NAME="Rocky Linux" VERSION="9.0 (Blue Onyx)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="9.0" PLATFORM_ID="platform:el9" PRETTY_NAME="Rocky Linux 9.0 (Blue Onyx)" ANSI_COLOR="0;32" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:rocky:rocky:9::baseos" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9" ROCKY_SUPPORT_PRODUCT_VERSION="9.0" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.0" # cat /etc/redhat-release Rocky Linux release 9.0 (Blue Onyx) # rpm -qa | grep qemu-kvm qemu-kvm-docs-7.2.0-14.el9_2.x86_64 qemu-kvm-common-7.2.0-14.el9_2.x86_64 qemu-kvm-device-display-virtio-gpu-7.2.0-14.el9_2.x86_64 qemu-kvm-device-display-virtio-gpu-pci-7.2.0-14.el9_2.x86_64 qemu-kvm-device-usb-host-7.2.0-14.el9_2.x86_64 qemu-kvm-device-display-virtio-vga-7.2.0-14.el9_2.x86_64 qemu-kvm-block-rbd-7.2.0-14.el9_2.x86_64 qemu-kvm-tools-7.2.0-14.el9_2.x86_64 qemu-kvm-device-usb-redirect-7.2.0-14.el9_2.x86_64 qemu-kvm-audio-pa-7.2.0-14.el9_2.x86_64 qemu-kvm-core-7.2.0-14.el9_2.x86_64 qemu-kvm-ui-opengl-7.2.0-14.el9_2.x86_64 qemu-kvm-ui-egl-headless-7.2.0-14.el9_2.x86_64 qemu-kvm-7.2.0-14.el9_2.x86_64 linux from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master branch with commit: commit 9561de3a55bed6bdd44a12820ba81ec416e705a7 (HEAD -> master, tag: v6.4-rc5, origin/master, origin/HEAD) Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun Jun 4 14:04:27 2023 -0400 Linux 6.4-rc5 It seems this is just an unstable test. It merges the divergence between MSR based clock and TSC page over one second and then expects delta to stay within the measured range over another two seconds. This works well for a completely idle system but if tasks get scheduled out, rescheduled to a different CPU,... the test fails. Widening the range help, e.g.: diff --git a/x86/hyperv_clock.c b/x86/hyperv_clock.c index f1e7204a8ea9..57d25770a2d0 100644 --- a/x86/hyperv_clock.c +++ b/x86/hyperv_clock.c @@ -79,7 +79,7 @@ static void hv_clock_test(void *data) min_delta = delta < min_delta ? delta : min_delta; if (t < msr_sample) { max_delta = delta > max_delta ? delta: max_delta; - } else if (delta < 0 || delta > max_delta * 3 / 2) { + } else if (delta < 0 || delta > max_delta * 1024) { printf("suspecting drift on CPU %d? delta = %d, acceptable [0, %d)\n", smp_id(), delta, max_delta); ok[i] = false; but I wouldn't be surprised if on a busy system even '1024 * max_delta' is not going to be sufficient. Maybe we should make this a warning and not fail the whole test as I don't see how we can make it reliable. Paolo (as you're the author), wdyt? |