Bug 197843

Summary: native_calibrate_tsc(): possibly incorrect TSC frequency on newest Intel Skylake-X CPUs, i7-7820X in particular
Product: Timers Reporter: Ivan (nekotekina)
Component: Interval TimersAssignee: timers_interval-timers
Status: NEW ---    
Severity: high CC: piecuch
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.10 ~ 4.13 Subsystem:
Regression: No Bisected commit-id:

Description Ivan 2017-11-11 00:24:05 UTC
Hello, I have Intel i7-7820X CPU and ASRock X299 Taichi motherboard.

I had various problems with this platforms (broken sound for example) which got fixed by setting clocksource=hpet boot option. By default, it calculated to 3,75 GHz, however, it seems that the real TSC frequency for this CPU is 3,6 GHz. By calculation I mean simple observing RDTSC results before and after waiting for a significant period of time (several seconds). My calculations may be wrong, though, I can't be 100% sure.

I tried to figure out what is the real TSC frequency for my CPU. native_calibrate_tsc() function uses crystal frequency 25000 KHz for Skylake-X, which computes to 3,75 GHz. Other Skylake models use 24000 KHz. I modified native_calibrate_tsc() to use 24000 for Skylake-X, built the kernel, and the TSC frequency computed to 3,6 GHz. Issues were also fixed this way, similarly to setting clocksource=hpet.

Values from CPUID[0x15]: 2; 300; 0; --

150 * 25 MHz = 3,75 GHz
150 * 24 MHz = 3,60 GHz

Does it mean that newest desktop Skylake-X CPUs are not really supported yet, or it's an actual bug? I'm not attaching any patches or logs. I'm sorry if this is an invalid report, I'm completely new to these things.
Thanks.
Comment 1 bjascob 2017-12-23 16:39:28 UTC
I have a very similar issue with an ASUS Prime X299-A motherboard and i9-7940x CPU.  I'm seeing a 4% clock drift which causes audio output from VLC to be glitchy.  Unlike you, however, my kern.log says tsc detected the processor speed correctly at 3.1GHz.  The clock drift doesn't seem to be causing any instability and I've left my clocksource at the default (which is tsc per the kern.log).

To correct for the glitchy audio I'm using the adjtimex package / adjtimexconfig to apply an automatic correction to the RTC clock.  This works but it seems like a work-around, not a fix.  I'm curious if you've found a more permanent solution.  Have you reviewed Bug#197299  native_calibrate_tsc?  I'm wondering if this issue is related to that one or if this is something totally different.

As of 11/17 intel pushed out a new microcode release.  I'm wondering if you've tried this and seen any difference?
Comment 2 Ivan 2017-12-23 23:53:33 UTC
Detected TSC frequency isn't really written in kern.log by default as I can see. I don't think CPU frequency is written incorrectly for me either (it's always 3,6 GHz). But I have "vboxdrv: TSC mode is Invariant, tentative frequency 3600000111 Hz" message, which was initially printing something close to 3750000000 Hz until I applied a hack, but this is vboxdrv.

Unfortunately I haven't found any better workaround.
Bug#197299 is interesting, but I can't judge whether it's a different issue or the same.

Yes, I tried this microcode release. No visible changes.
Comment 3 Krzysztof Piecuch 2019-09-27 16:23:10 UTC
Can you please check if it's a regression introduced somewhere after 4.4 kernel?

It happens on kernels 4.15 (and later, checked on 5.0.0 and 5.3.1) on Intel Xeon Gold 6146 and Intel Xeon Gold 6154. I'm forced to use HPET on those. Reverting back to 4.4.0 makes the clock stable on tsc.
Comment 4 bjascob 2019-09-27 17:02:49 UTC
FYI if this helps..
My particular issue (see above) was fixed with commit https://github.com/torvalds/linux/commit/b511203093489eb1829cb4de86e8214752205ac6.  This was committed 2018/01/25 and shows up in kernel 4.14.15 and the 4.15 kernel which I have in Ubuntu 18.04.  The fix is in arch/x86/kernel/tsc.c::native_calibrate_tsc(void)
Looking at this file in later kernels it looks like there has been a lot of changes here since and in the latest it uses CPUID to determine the TSC frequency instead of being hard-coded.

I'd suggest looking at native_calibrate_tsc in arch/x86/kernel/tsc.c for the various kernels you've tried and maybe you can see where the issue comes in.
Comment 5 Krzysztof Piecuch 2019-10-08 15:33:25 UTC
I have managed to bisect the bug:

aa297292d708e89773b3b2cdcaf33f01bfa095d8 is the first bad commit
commit aa297292d708e89773b3b2cdcaf33f01bfa095d8
Author: Len Brown <len.brown@intel.com>
Date:   Fri Jun 17 01:22:51 2016 -0400

    x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
    
    Skylake CPU base-frequency and TSC frequency may differ
    by up to 2%.
    
    Enumerate CPU and TSC frequencies separately, allowing
    cpu_khz and tsc_khz to differ.
    
    The existing CPU frequency calibration mechanism is unchanged.
    However, CPUID extensions are preferred, when available.
    
    CPUID.0x16 is preferred over MSR and timer calibration
    for CPU frequency discovery.
    
    CPUID.0x15 takes precedence over CPU-frequency
    for TSC frequency discovery.
    
    Signed-off-by: Len Brown <len.brown@intel.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/b27ec289fd005833b27d694d9c2dbb716c5cdff7.1466138954.git.len.brown@intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

:040000 040000 4961fd66b14c79ad1e56f38f2d6e7468e420bc76 2ce45fca87b8444b3aa82ee60b6b739251249094 M	arch



For the sake of completeness, here's a complete bisection log:

git bisect start
# bad: [694d0d0bb2030d2e36df73e2d23d5770511dbc8d] Linux 4.8-rc2
git bisect bad 694d0d0bb2030d2e36df73e2d23d5770511dbc8d
# good: [b3afc4525a507f21e98cc7571ea8c3f28484241c] Linux 4.7.10
git bisect good b3afc4525a507f21e98cc7571ea8c3f28484241c
# good: [523d939ef98fd712632d93a5a2b588e477a7565e] Linux 4.7
git bisect good 523d939ef98fd712632d93a5a2b588e477a7565e
# bad: [f0c98ebc57c2d5e535bc4f9167f35650d2ba3c90] Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
git bisect bad f0c98ebc57c2d5e535bc4f9167f35650d2ba3c90
# bad: [0e06f5c0deeef0332a5da2ecb8f1fcf3e024d958] Merge branch 'akpm' (patches from Andrew)
git bisect bad 0e06f5c0deeef0332a5da2ecb8f1fcf3e024d958
# bad: [e65805251f2db69c9f67ed8062ab82526be5a374] Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad e65805251f2db69c9f67ed8062ab82526be5a374
# good: [dd9506954539dcedd0294a065ff0976e61386fc6] Merge tag 'hwmon-for-linus-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
git bisect good dd9506954539dcedd0294a065ff0976e61386fc6
# good: [7e4dc77b2869a683fc43c0394fca5441816390ba] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 7e4dc77b2869a683fc43c0394fca5441816390ba
# bad: [c410614c902531d1ce2e46aec8ac91aa4dc89968] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad c410614c902531d1ce2e46aec8ac91aa4dc89968
# good: [0f657262d5f99ad86b9a63fb5dcd29036c2ed916] Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 0f657262d5f99ad86b9a63fb5dcd29036c2ed916
# good: [2d724ffddd958f21e2711b7400c63bdfee287d75] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 2d724ffddd958f21e2711b7400c63bdfee287d75
# good: [e99a0745bdf8a5f7e3126a686846af4aeb852cc9] x86/pci, x86/platform/intel_mid_pci: Remove duplicate power off code
git bisect good e99a0745bdf8a5f7e3126a686846af4aeb852cc9
# bad: [c48ec42d6eae08f55685ab660f0743ed33b9f22a] x86/tsc: Remove the unused check_tsc_disabled()
git bisect bad c48ec42d6eae08f55685ab660f0743ed33b9f22a
# good: [05680e7fa8a4e700e031a5e72cd8c18265f0031a] x86/tsc_msr: Correct Silvermont reference clock values
git bisect good 05680e7fa8a4e700e031a5e72cd8c18265f0031a
# good: [05680e7fa8a4e700e031a5e72cd8c18265f0031a] x86/tsc_msr: Correct Silvermont reference clock values
git bisect good 05680e7fa8a4e700e031a5e72cd8c18265f0031a
# good: [02c0cd2dcf7fdc47d054b855b148ea8b82dbb7eb] x86/tsc_msr: Remove irqoff around MSR-based TSC enumeration
git bisect good 02c0cd2dcf7fdc47d054b855b148ea8b82dbb7eb
# bad: [ff4c86635ee12461fd3bd911d7d5253394da8f9d] x86/tsc: Enumerate BXT tsc_khz via CPUID
git bisect bad ff4c86635ee12461fd3bd911d7d5253394da8f9d
# bad: [aa297292d708e89773b3b2cdcaf33f01bfa095d8] x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID
git bisect bad aa297292d708e89773b3b2cdcaf33f01bfa095d8
# first bad commit: [aa297292d708e89773b3b2cdcaf33f01bfa095d8] x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID


I tested these commits on Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz running on a Supermicro X11DPU-XLL v1.02 motherboard.

If there's any more information I can provide you with please feel free to ask for updates.
Comment 6 Krzysztof Piecuch 2019-12-03 13:54:57 UTC
I've managed to trace it down to Hyperspeed being turned on.

[Supermicro claims][1] that hyperspeed impacts base clock frequency.

I see that the code in 

`arch/x86/kernel/tsc.c:614:unsigned long native_calibrate_tsc(void)`

is not ideal: it uses cpuid leaf 0x15 which is fine, but if that doesn't work it falls back to leaf 0x16 which is documented as following:

> * Data is returned from this interface in accordance with the processor's
> specification and does not reflect actual values. Suitable use of this data
> includes the display of processor information in like manner to the processor
> brand string and for determining the appropriate range to use when displaying
> processor information e.g. frequency history graphs. The returned information
> should not be used for any other purpose as the returned information does not
> accurately correlate to information / counters returned by other processor
> interfaces. 

In case the 0x15 leaf doesn't work we should fall back to the behaviour implemented for CPUs that have lower cpuid levels.

[1]: https://www.supermicro.com/support/faqs/faq.cfm?faq=21337