Bug 208887

Summary: tsc calibration on Ryzen 5 3400G seems a bit off (800-850ppm slow)
Product: Platform Specific/Hardware Reporter: James Ettle (james)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: NEW ---    
Severity: normal CC: alexdeucher, madlabman, nix.sasl, parag.lkml, pmenzel+bugzilla.kernel.org, rafael.ristovski, rm+bko
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.7.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output

Description James Ettle 2020-08-12 11:39:42 UTC
Created attachment 290841 [details]
dmesg output

When using the TSC clocksource on my Ryzen 5 3400G machine, chrony consistently reports the frequency as around 800-850ppm slow (so just over minute/day). For example,

Reference ID    : 1F1F4A23 (mail.spamassassin.cz)
Stratum         : 3
Ref time (UTC)  : Wed Aug 12 10:46:40 2020
System time     : 0.000391718 seconds slow of NTP time
Last offset     : -0.000167522 seconds
RMS offset      : 0.000463384 seconds
Frequency       : 819.625 ppm slow
Residual freq   : -0.001 ppm
Skew            : 0.078 ppm
Root delay      : 0.045108408 seconds
Root dispersion : 0.004234392 seconds
Update interval : 1042.0 seconds
Leap status     : Normal

When up and running and chrony has applied its offset, the machine appears to keep good time. With HPET as the clocksource the measured offset is more like 20ppm. 

The machine is running on an Asus Prime A320M-K motherboard with the latest firmware.
Comment 1 Thomas Gleixner 2020-08-12 12:38:58 UTC
bugzilla-daemon@bugzilla.kernel.org writes:
> When using the TSC clocksource on my Ryzen 5 3400G machine, chrony
> consistently
> reports the frequency as around 800-850ppm slow (so just over minute/day).
> For
> example,
>
> Reference ID    : 1F1F4A23 (mail.spamassassin.cz)
> Stratum         : 3
> Ref time (UTC)  : Wed Aug 12 10:46:40 2020
> System time     : 0.000391718 seconds slow of NTP time
> Last offset     : -0.000167522 seconds
> RMS offset      : 0.000463384 seconds
> Frequency       : 819.625 ppm slow
> Residual freq   : -0.001 ppm
> Skew            : 0.078 ppm
> Root delay      : 0.045108408 seconds
> Root dispersion : 0.004234392 seconds
> Update interval : 1042.0 seconds
> Leap status     : Normal
>
> When up and running and chrony has applied its offset, the machine appears to
> keep good time. With HPET as the clocksource the measured offset is more like
> 20ppm. 
>
> The machine is running on an Asus Prime A320M-K motherboard with the latest
> firmware.

So looking at dmesg:

   tsc: Fast TSC calibration failed

that means that the early TSC calibration attempt via the magic PIT
algorithm does not work. That's either caused by some firmware SMI or
some PIT emulation issue in the hardware/ucode/whatever is involved.

Now the later more expensive calibration succeeds:

   tsc: PIT calibration matches HPET. 1 loops
   tsc: Detected 3692.975 MHz processor

and the recalibrated value late in the boot process becomes:

   tsc: Refined TSC clocksource calibration: 3696.144 MHz

which is a massive difference and it's still too slow according to
chrony.

Just for comparison from one of my Zen machines:

   tsc: Fast TSC calibration using PIT
   tsc: Detected 2245.833 MHz processor
   tsc: Refined TSC clocksource calibration: 2245.781 MHz

and from my laptop:

   tsc: Fast TSC calibration using PIT
   tsc: Detected 2294.524 MHz processor
   tsc: Refined TSC clocksource calibration: 2294.684 MHz

So assuming that the refined value is more correct we get the following
ppms between the initial and the refined one:

My Zen:        69.72
My Laptop:    -23.15 

These two are in the expected range and close enough to reality.

But on your machine the delta betwee early calibration and refined is
857.38ppm which is still 800ppm too slow according to chrony.

Now what's really odd is that you say:

> With HPET as the clocksource the measured offset is more like 20ppm.

And that's odd because both the initial calibration on your system and
the refined calibration are using HPET as a reference. The refined
calibration does:

       h0 = read_hpet();
       t0 = read_tsc();

wait ~ 1 second

       h1 = read_hpet();
       t1 = read_tsc();

and then we use the HPET frequency read from the hardware to calculate
the TSC frequency. The same HPET frequency is used to setup the
conversion value for HPET which seems to be halfways correct according
to chrony.

That does not make any sense at all and something is clearly going wrong
during boot on that machine vs. clock frequencies.

Not that I have an idea what is going wrong and not that I have an idea
what the kernel could do to prevent this wreckage.

As you say, that after chrony fixed it up it just keeps time, this seems
to be some intermediate state during boot.

Just as a shot into the dark, could you test whether postponing the TSC
refined calibration by 10 seconds makes any difference? See patch below.

You might play with that value to find a spot where it actually makes a
difference (if at all).

Thanks,

        tglx
---
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 49d925043171..6dbaad3fec4c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1403,7 +1403,7 @@ static int __init init_tsc_clocksource(void)
 		return 0;
 	}
 
-	schedule_delayed_work(&tsc_irqwork, 0);
+	schedule_delayed_work(&tsc_irqwork, 10 * HZ);
 	return 0;
 }
 /*
Comment 2 James Ettle 2020-08-15 01:04:11 UTC
Another observation (pending investigation with the above patch): the behaviour seems to be different depending on whether the machine is cold-started, or rebooted.

From a cold-start, things seem more in line with expectations:


$ dmesg|grep MHz
[    0.007000] tsc: Detected 3693.113 MHz processor
[    0.231743] hpet0: 3 comparators, 32-bit 14.318180 MHz counter
[    1.689376] tsc: Refined TSC clocksource calibration: 3693.051 MHz

$ chronyc tracking
Reference ID    : A29FC87B (time.cloudflare.com)
Stratum         : 4
Ref time (UTC)  : Sat Aug 15 01:01:24 2020
System time     : 0.000044223 seconds slow of NTP time
Last offset     : +0.052436728 seconds
RMS offset      : 0.052436728 seconds
Frequency       : 17.992 ppm fast
Residual freq   : +5.192 ppm
Skew            : 85.760 ppm
Root delay      : 0.022605952 seconds
Root dispersion : 0.006309438 seconds
Update interval : 65.0 seconds
Leap status     : Normal


The 3 MHz clock difference with the 820ppm slowdown as originally reported shows up when the machine is rebooted.

The weirdness deepens. I'll try your patch with a reboot hopefully in a week or so.
Comment 3 James Ettle 2020-09-16 19:41:24 UTC
Sorry for the massive delay in getting back to this. Turns out the cold-start case was just a fluke.

> ---
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 49d925043171..6dbaad3fec4c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1403,7 +1403,7 @@ static int __init init_tsc_clocksource(void)
>               return 0;
>       }
>  
> -     schedule_delayed_work(&tsc_irqwork, 0);
> +     schedule_delayed_work(&tsc_irqwork, 10 * HZ);
>       return 0;
>  }
>  /*

I tried this patch on 5.9rc5 and rebooted several times, and in each case the reported processor speed was quite consistent between each calibration (within about 0.1-0.2 MHz). By comparison on 5.8.8 without this patch, I (more often than not) see the 3 MHz difference.

Still no idea on cause. I've seen elsewhere with people spotting things like 'Fast TSC calibration failed' caused by attached USB devices...

Maybe there could be something to spot suspiciously large clock speed changes between TSC calibrations?
Comment 4 James Ettle 2020-11-14 14:38:23 UTC
Additional information: I've changed the motherboard for this machine to an MSI Mortar Max with a B450 chipset (latest BIOS), and this shows the same inconsistent behaviour.

Common things are the CPU chip, RAM (on DOCP profile to 2933MHz), SSD. Both on AMD AGESA 1.0.0.6, but I think this bug predates that update (at leas on the Asus board).
Comment 5 James Ettle 2021-03-20 19:54:01 UTC
[Mainly for reference/documentation.] Still happens with the latest board firmware (7B89v2C w/ AGESA 1.2.0.0) on the Mortar MAX B450. Tried a variety of (safe-looking) BIOS options with no effect; still no idea exactly what it might be doing in early on in a warm boot to upset the TSC calibration like that. Using the delay workaround as above still works -- leads to very tight TSC, typically < 10 ppm out.

I wonder how widespread this is with 3400G/Zen+? Maybe I'm the only one who's noticed... If this is a board firmware issue I've not a clue what I'd report to the manufacturer, or how.
Comment 6 Roman Mamedov 2021-06-10 21:29:14 UTC
I appear to have a similar issue with Ryzen 3600 on ASUS Crosshair VI Hero.

"Fast TSC calibration" either fails or succeeds across reboots, reasons unknown.

Right now, at this boot it has failed, and:

  * using TSC as the clocksource results in slow running clock; something akin to 10 seconds late after 1 hour.
  * forcing HPET as the clocksource, results in faster running clock, roughly a minute ahead over 12 hours.

Here's a log of recent reboots.

/var/log/dmesg:[    0.000000] Linux version 5.4.124-rm1+ (root@natsu.romanrm.net) (gcc version 8.3.0 (Debian 8.3.0-6)) #205 SMP Thu Jun 3 13:58:45 +05 2021
/var/log/dmesg:[    0.000000] tsc: Fast TSC calibration failed
/var/log/dmesg:[    0.028000] tsc: PIT calibration matches HPET. 1 loops
/var/log/dmesg:[    0.028000] tsc: Detected 3600.088 MHz processor
/var/log/dmesg:[    0.000002] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33e4a67a12a, max_idle_ns: 440795324680 ns
/var/log/dmesg:[    0.258273] clocksource: Switched to clocksource tsc-early
/var/log/dmesg:[    1.568514] tsc: Refined TSC clocksource calibration: 3613.327 MHz
/var/log/dmesg:[    1.568524] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x341580e1c01, max_idle_ns: 440795277082 ns
/var/log/dmesg:[    1.568558] clocksource: Switched to clocksource tsc
/var/log/dmesg.0:[    0.000000] Linux version 5.4.124-rm1+ (root@natsu.romanrm.net) (gcc version 8.3.0 (Debian 8.3.0-6)) #205 SMP Thu Jun 3 13:58:45 +05 2021
/var/log/dmesg.0:[    0.000000] tsc: Fast TSC calibration using PIT
/var/log/dmesg.0:[    0.000000] tsc: Detected 3599.839 MHz processor
/var/log/dmesg.0:[    0.650464] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33e3bb20977, max_idle_ns: 440795213615 ns
/var/log/dmesg.0:[    0.906493] clocksource: Switched to clocksource tsc-early
/var/log/dmesg.0:[    2.222806] tsc: Refined TSC clocksource calibration: 3599.998 MHz
/var/log/dmesg.0:[    2.222815] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e451ab1a6, max_idle_ns: 440795278720 ns
/var/log/dmesg.0:[    2.222844] clocksource: Switched to clocksource tsc
/var/log/dmesg.1.gz:[    0.000000] Linux version 5.4.122-rm1+ (root@natsu.romanrm.net) (gcc version 8.3.0 (Debian 8.3.0-6)) #204 SMP Wed May 26 17:34:27 +05 2021
/var/log/dmesg.1.gz:[    0.000000] tsc: Fast TSC calibration failed
/var/log/dmesg.1.gz:[    0.028000] tsc: PIT calibration matches HPET. 1 loops
/var/log/dmesg.1.gz:[    0.028000] tsc: Detected 3600.063 MHz processor
/var/log/dmesg.1.gz:[    0.000002] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33e48ecf16a, max_idle_ns: 440795232764 ns
/var/log/dmesg.1.gz:[    0.258393] clocksource: Switched to clocksource tsc-early
/var/log/dmesg.1.gz:[    1.568427] tsc: Refined TSC clocksource calibration: 3599.998 MHz
/var/log/dmesg.1.gz:[    1.568558] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e451ab1a6, max_idle_ns: 440795278720 ns
/var/log/dmesg.1.gz:[    1.568583] clocksource: Switched to clocksource tsc
/var/log/dmesg.2.gz:[    0.000000] Linux version 5.4.121-rm1+ (root@natsu.romanrm.net) (gcc version 8.3.0 (Debian 8.3.0-6)) #203 SMP Sat May 22 16:56:49 +05 2021
/var/log/dmesg.2.gz:[    0.000000] tsc: Fast TSC calibration failed
/var/log/dmesg.2.gz:[    0.032000] tsc: PIT calibration matches HPET. 2 loops
/var/log/dmesg.2.gz:[    0.032000] tsc: Detected 3599.965 MHz processor
/var/log/dmesg.2.gz:[    0.000002] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33e4321ce11, max_idle_ns: 440795227019 ns
/var/log/dmesg.2.gz:[    0.258408] clocksource: Switched to clocksource tsc-early
/var/log/dmesg.2.gz:[    1.564340] tsc: Refined TSC clocksource calibration: 3599.998 MHz
/var/log/dmesg.2.gz:[    1.564352] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e451ab1a6, max_idle_ns: 440795278720 ns
/var/log/dmesg.2.gz:[    1.564392] clocksource: Switched to clocksource tsc
/var/log/dmesg.3.gz:[    0.000000] Linux version 5.4.119-rm1+ (root@natsu.romanrm.net) (gcc version 8.3.0 (Debian 8.3.0-6)) #201 SMP Fri May 14 22:19:31 +05 2021
/var/log/dmesg.3.gz:[    0.000000] tsc: Fast TSC calibration using PIT
/var/log/dmesg.3.gz:[    0.000000] tsc: Detected 3599.993 MHz processor
/var/log/dmesg.3.gz:[    0.648154] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33e44c68b8d, max_idle_ns: 440795234678 ns
/var/log/dmesg.3.gz:[    0.906525] clocksource: Switched to clocksource tsc-early
/var/log/dmesg.3.gz:[    2.220617] tsc: Refined TSC clocksource calibration: 3599.998 MHz
/var/log/dmesg.3.gz:[    2.220747] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e451ab1a6, max_idle_ns: 440795278720 ns
/var/log/dmesg.3.gz:[    2.220780] clocksource: Switched to clocksource tsc
/var/log/dmesg.4.gz:[    0.000000] Linux version 5.4.119-rm1+ (root@natsu.romanrm.net) (gcc version 8.3.0 (Debian 8.3.0-6)) #201 SMP Fri May 14 22:19:31 +05 2021
/var/log/dmesg.4.gz:[    0.000000] tsc: Fast TSC calibration using PIT
/var/log/dmesg.4.gz:[    0.000000] tsc: Detected 3599.844 MHz processor
/var/log/dmesg.4.gz:[    0.648464] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33e3bfba92f, max_idle_ns: 440795278717 ns
/var/log/dmesg.4.gz:[    0.906875] clocksource: Switched to clocksource tsc-early
/var/log/dmesg.4.gz:[    2.220822] tsc: Refined TSC clocksource calibration: 3599.998 MHz
/var/log/dmesg.4.gz:[    2.220833] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33e451ab1a6, max_idle_ns: 440795278720 ns
/var/log/dmesg.4.gz:[    2.220872] clocksource: Switched to clocksource tsc
Comment 7 James Ettle 2021-06-28 23:14:26 UTC
Just upgraded to BIOS 2.D0 05/17/2021 on my Mortar Max, with AGESA 1.2.0.2 -- issue still present. Best solution for still seems to be to delay the calibration by 1 second. Wonder if I ask nicely whether this could be entered as a quirk...
Comment 8 James Ettle 2021-08-02 22:39:54 UTC
I've not been seeing this since 5.13.6 or maybe 5.13.7. Not sure, hopefully not a fluke. Exactly what change *could* be responsible is not something I can answer though given the suspect early boot interactions.
Comment 9 James Ettle 2021-09-19 00:02:17 UTC
Hmm... seems this has actually gone to picking tsc, then marking tsc as unstable and switching to hpet:

[    0.000000] tsc: Detected 3693.390 MHz processor
[    0.033049] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 3821924579961850 ns
[    0.077149] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[    0.087181] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x6a79e303a22, max_idle_ns: 881590710719 ns
[    0.213408] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 3822520892550000 ns
[    0.246818] PTP clock support registered
[    0.251987] hpet0: 3 comparators, 32-bit 14.318180 MHz counter
[    0.253430] clocksource: Switched to clocksource tsc-early
[    0.261649] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.318820] rtc_cmos 00:02: setting system clock to 2021-09-18T23:20:26 UTC (1632007226)
[    0.696293] sched_clock: Marking stable (695958175, 322521)->(699829946, -3549250)
[    1.275199] tsc: Refined TSC clocksource calibration: 3693.061 MHz
[    1.275207] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x6a7777116fa, max_idle_ns: 881590883556 ns
[    1.303201] clocksource: Switched to clocksource tsc
[    1.794259] [drm] DM_PPLIB: values for F clock
[    1.794262] [drm] DM_PPLIB: values for DCF clock
[   22.092420] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
[   22.092440] clocksource:                       'hpet' wd_nsec: 496346043 wd_now: 12ca74a7 wd_last: 125e03d3 mask: ffffffff
[   22.092446] clocksource:                       'tsc' cs_nsec: 497230355 cs_now: 1ffde0ac9c cs_last: 1f906ced06 mask: ffffffffffffffff
[   22.092451] clocksource:                       'tsc' is current clocksource.
[   22.092458] tsc: Marking TSC unstable due to clocksource watchdog
[   22.092482] sched_clock: Marking unstable (22092156166, 323186)<-(22096028445, -3549250)
[   22.093065] clocksource: Checking clocksource tsc synchronization from CPU 4 to CPUs 0,7.
[   22.093139] clocksource: Switched to clocksource hpet
Comment 10 James Ettle 2021-09-19 20:28:37 UTC
(In reply to James Ettle from comment #8)
> I've not been seeing this since 5.13.6 or maybe 5.13.7. Not sure, hopefully
> not a fluke. Exactly what change *could* be responsible is not something I
> can answer though given the suspect early boot interactions.

Seems like it was a fluke, observed again with kernel-5.13.16
Comment 11 Nix\ 2021-12-20 00:00:29 UTC
With acpi=off clocksourcec=tsc in kernel cmdline, the kernel select tsc-early and no mark an unstable, but the 4 cores of the Ryzen 3 3200U dissapear and show only one.
Kernel 5.15.10 and 5.16-rc5
AMD Ryzen 3 3200U Lenovo Ideapad L340-15API

But i remember the same bug with a old Phenom 945 using Fedora 28, so, a firmware amd bug?
Comment 12 James Ettle 2021-12-20 10:30:21 UTC
I've not seen this in quite a while on the 3400G now (not in the 5.15 series). I haven't updated the BIOS in that time so it must either be due to a kernel code change or a microcode update. I have not done any bisection to identify which.
Comment 13 James Ettle 2022-01-18 22:25:08 UTC
Hmm... well either that was a series of flukes or something's changed between 5.15.14 and 5.16.1 - where it's come back...
Comment 14 James Ettle 2022-01-20 08:37:45 UTC
OK here's where it seems to get weird. I've just gone back to 5.15.14 and I noticed that if the modules are stripped when installed, I get the bad tsc calibration (the two MHz values are about 3 apart). With un-stripped modules in the initramfs, things are OK.

I can only guess at the moment that the extra time delay needed to load the bigger, unstripped initramfs is what ultimately causes the difference here. Bizarre...
Comment 15 James Ettle 2022-01-23 17:47:12 UTC
Just changed over to a 5700G on the same board:

Reference ID    : C3AB2B0C (ns1.thorcom.net)
Stratum         : 2
Ref time (UTC)  : Sun Jan 23 17:42:53 2022
System time     : 0.000024752 seconds slow of NTP time
Last offset     : -0.000034575 seconds
RMS offset      : 0.349540144 seconds
Frequency       : 6818.986 ppm slow
Residual freq   : -0.176 ppm
Skew            : 1.423 ppm
Root delay      : 0.015526842 seconds
Root dispersion : 0.001106061 seconds
Update interval : 65.2 seconds
Leap status     : Normal

This is pretty whacky!

[    0.000000] tsc: Detected 3792.742 MHz processor
[    1.386167] tsc: Refined TSC clocksource calibration: 3818.728 MHz

on a 5.15.16 without the 1s wait. The same B450 MSI motherboard. Seriously odd... I wonder how prevalent this is?
Comment 16 James Ettle 2022-01-23 18:06:44 UTC
Add: confirm that the same fix - delaying the tsc work by 1s - applies for the 5700G.
Comment 17 Alex Deucher 2022-01-24 20:13:49 UTC
patch also seems to fix bug 202525.
Comment 18 James Ettle 2022-01-24 20:55:32 UTC
(In reply to Alex Deucher from comment #17)
> patch also seems to fix bug 202525.

I should clarify in case I get any wires crossed here...

1. With the 3400G, its TSC was never marked unstable but it did get the 800-900 ppm skew. This wasn't consistent, but it tended to happen more on warm reboots. I observed this over two different motherboards (Asus A320, MSI B450) and various BIOS and AGESA updates for each. The patch fixes the clock skew consistently.

2. I changed the 3400G for a 5700G on the MSI motherboard, otherwise everything else (RAM, peripherals, etc.) was the same. Now I had tsc=stable left in the kernel cmdline by mistake from some earlier experiments -- this lead to the wild skew in comment 15. When I removed that, the kernel detected the TSC as unstable and dropped to HPET - just as in 202525.

(At some point I'll grab a RAM kit and a micro-ATX case and get the 3400G back in action for testing on the A320 motherboard.)

So executive summary:

3400G - experiences this bug;
5700G - actually does bug 202525

and tglx's patch from comment 1 fixes both for me (with a 1 * HZ delay).

If this really is a BIOS bug, I wouldn't really know where to start or with whom (AMD? Asus or MSI?).
Comment 19 paragw 2022-02-01 00:38:13 UTC
The patch from comment 1 fixes unstable TSC for me on Ryzen 7 4800U without any noticeable side effects - are there plans for this patch to be in mainline?
Comment 20 madlabman 2022-02-03 17:58:20 UTC
Have the same issue on Ryzen 5800U and fixed with the patch as well.
Comment 21 paragw 2022-02-05 02:06:12 UTC
I spoke too soon - now TSC gets disabled after boot with the patch - perhaps only a BIOS fix can cure it.

[   25.124848] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
[   25.124868] clocksource:                       'hpet' wd_nsec: 550182704 wd_now: 1586c308 wd_last: 150e8f19 mask: ffffffff
[   25.124874] clocksource:                       'tsc' cs_nsec: 549776164 cs_now: 16e3dfa6fa cs_last: 16a8f4c754 mask: ffffffffffffffff
[   25.124879] clocksource:                       'tsc' is current clocksource.
[   25.124890] tsc: Marking TSC unstable due to clocksource watchdog
[   25.124907] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Comment 22 paragw 2022-02-07 02:00:35 UTC
Btw, in the process of compiling the patched kernel I also noticed for the first time that turbo boosting was disabled by default on various Fedora and mainline kernels - 5.15, 5.16 and mainline 5.17.x. /sys/devices/system/cpu/cpufreq/boost said 0 and cpumonitor / turbostat said the frequences never went above 1.8Ghz although the CPU is capable of 4.2Ghz. Just adding a udev rule for writing 1 to /sys/devices/system/cpu/cpufreq/boost got me some nice speedups.

I guess I will file a separate bug for that or is this normal for AMD CPUs to require manual turbo boost enable?
Comment 23 James Ettle 2022-02-07 08:39:10 UTC
(In reply to paragw from comment #22)
> I guess I will file a separate bug for that or is this normal for AMD CPUs
> to require manual turbo boost enable?

I've not seen any boost/scaling issues - please file a separate bug for this.
Comment 24 James Ettle 2022-02-12 15:32:44 UTC
(In reply to paragw from comment #21)
> I spoke too soon - now TSC gets disabled after boot with the patch - perhaps
> only a BIOS fix can cure it.

Likewise - just tried 5.19.6 on a 5700G with the 1s delay, warm reboot, tsc marked unstable. I see there's a BIOS update to AGESA 1.2.0.5 for my board but I haven't tried that yet. Looks like a related but different bug...
Comment 25 paragw 2022-02-13 04:05:00 UTC
FWIW I have been having better luck updating to latest vanilla git + the patch in comment #1 adjusted for 30* HZ - certainly not 100% reliable TSC but close. (Last  many reboots only once did it declare TSC unstable and switch to HPET.)

How is this level of flakiness even possible?! I guess TSC involves time and thus timing and maybe the delay gives time for things to settle before the kernel check. (I also now have the CPUFreq Boost force enabled - before the CPU was stuck at 1.3 - 1.8Ghz - now it nicely scales up and down to 4.2. Not sure if this has any bearing on TSC sync.)
Comment 26 paragw 2022-02-23 13:02:20 UTC
Delay of 30*Hz has worked reliably for me over past few days - hasn't reverted to HPET across multi-day uptime, few reboots, compiling several kernels and suspend/resume cycles.

Sounds like we need a quirk for Ryzen based systems to delay / recheck TSC stability?

 paragw@pn50  ~  dmesg|grep tsc  
[    0.000000] tsc: Fast TSC calibration failed
[    0.024000] tsc: PIT calibration matches HPET. 1 loops
[    0.024000] tsc: Detected 1796.701 MHz processor
[    0.000005] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e5fd27545, max_idle_ns: 440795288136 ns
[    0.288361] clocksource: Switched to clocksource tsc-early
[   31.076222] tsc: Refined TSC clocksource calibration: 1797.077 MHz
[   31.076236] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e76067c3c, max_idle_ns: 440795293890 ns
[   31.076284] clocksource: Switched to clocksource tsc