Bug 216166
Summary: | TSC marked unstable on AMD Ryzen 4750U | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Erik Quaeghebeur (bugzilla.kernel.org) |
Component: | x86-64 | Assignee: | Mario Limonciello (AMD) (mario.limonciello) |
Status: | RESOLVED DOCUMENTED | ||
Severity: | normal | CC: | diego.ce, fkrueger, hanif.ariffin.4326, kh.bugreport, marco.rodolfi, mario.limonciello, mpearson-lenovo, nix.sasl |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
URL: | https://forums.lenovo.com/t5/Other-Linux-Discussions/Unusable-TSC-on-P14s-and-X13-with-the-latest-LTS-kernel/m-p/5064905 | ||
See Also: |
https://bugzilla.kernel.org/show_bug.cgi?id=202525 https://bugzilla.kernel.org/show_bug.cgi?id=216146 https://bugzilla.kernel.org/show_bug.cgi?id=216161 |
||
Kernel Version: | 5.15.41 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Erik Quaeghebeur
2022-06-23 08:22:07 UTC
Can you please correlate this to cold/warm boot? Does it only happen in one scenario? Also, can you please share /sys/kernel/debug/dri/0/amdgpu_firmware_info? (In reply to Mario Limonciello (AMD) from comment #1) > Also, can you please share /sys/kernel/debug/dri/0/amdgpu_firmware_info? $ sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info VCE feature version: 0, firmware version: 0x00000000 UVD feature version: 0, firmware version: 0x00000000 MC feature version: 0, firmware version: 0x00000000 ME feature version: 53, firmware version: 0x000000a6 PFP feature version: 53, firmware version: 0x000000c2 CE feature version: 53, firmware version: 0x0000004f RLC feature version: 1, firmware version: 0x0000003c RLC SRLC feature version: 1, firmware version: 0x00000001 RLC SRLG feature version: 1, firmware version: 0x00000001 RLC SRLS feature version: 1, firmware version: 0x00000001 MEC feature version: 53, firmware version: 0x000001d0 SOS feature version: 0, firmware version: 0x00000000 ASD feature version: 0, firmware version: 0x21000072 TA XGMI feature version: 0x00000000, firmware version: 0x00000000 TA RAS feature version: 0x00000000, firmware version: 0x00000000 TA HDCP feature version: 0x17000028, firmware version: 0x00000000 TA DTM feature version: 0x1200000e, firmware version: 0x00000000 TA RAP feature version: 0x00000000, firmware version: 0x00000000 TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000 SMC feature version: 0, firmware version: 0x00374700 SDMA0 feature version: 41, firmware version: 0x00000028 VCN feature version: 0, firmware version: 0x05111002 DMCU feature version: 0, firmware version: 0x00000000 DMCUB feature version: 0, firmware version: 0x0101001f TOC feature version: 0, firmware version: 0x00000000 VBIOS version: 113-RENOIR-026 (In reply to Mario Limonciello (AMD) from comment #1) > Can you please correlate this to cold/warm boot? Does it only happen in one > scenario? That'll take some time, but I'll get back to you on this. > SMC feature version: 0, firmware version: 0x00374700 OK thanks this is the one I needed. 55.69.0 > That'll take some time, but I'll get back to you on this. Thanks, if it does only correlate with warm boot it's possible the firmware solution for https://bugzilla.kernel.org/show_bug.cgi?id=216146 could be ported for Renoir AGESA as well. If it happens on cold boot too, it's a different issue. (In reply to Mario Limonciello (AMD) from comment #1) > Can you please correlate this to cold/warm boot? Does it only happen in one > scenario? Happens on warm boot, but not on cold. (One sample each.) This is also mentioned in the thread I linked (but it is long and tedious). Also in that thread is the discussion that previously (earlier firmware), it happened on cold boot only and not on reboot. (I recall the this was the case for me as well at that time.) Thanks, that gives me what I need. I'll raise some internal discussion to see if the root cause and solution for 216146 can be ported to Ryzen 4000 as well. Follow up with new processor from: TSC marked unstable on AMD Ryzen 2200G https://bugzilla.kernel.org/show_bug.cgi?id=216214 Now with AMD Ryzen Pro 5 4650G. TSC detected as unstable following warm reboots, including using UEFI setup before booting to OS. TSC detected as stable after cold boot straight to OS. Same across all boots. Tested systems: AMD Ryzen Pro 5 4650G AsRock B450M Pro4-F R2.0 Bios P3.10 (latest) AGESA 1.2.0.6 OS tested: Debian Sid: Kernel 5.18.5 Cold boot: dmesg | egrep -i "tsc|hpet|clocksource" [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 3699.998 MHz processor [ 0.004645] ACPI: HPET 0x00000000BA73E000 000038 (v01 ALASKA A M I 01072009 AMI 00000005) [ 0.004675] ACPI: Reserving HPET table memory at [mem 0xba73e000-0xba73e037] [ 0.012190] ACPI: HPET id: 0x10228201 base: 0xfed00000 [ 0.012259] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns [ 0.053598] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns [ 0.073624] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x6aaaa638e3b, max_idle_ns: 881590585313 ns [ 0.230440] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.258905] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 [ 0.258905] hpet0: 3 comparators, 32-bit 14.318180 MHz counter [ 0.259989] clocksource: Switched to clocksource tsc-early [ 0.271068] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns [ 0.399596] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram, hpet irqs [ 1.297651] tsc: Refined TSC clocksource calibration: 3699.998 MHz [ 1.297661] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x6aaaa638e3b, max_idle_ns: 881590585313 ns [ 1.469464] clocksource: Switched to clocksource tsc [ 3.253295] SVM: TSC scaling supported Warm reboot: dmesg | egrep -i "tsc|hpet|clocksource" [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 3700.234 MHz processor [ 0.004147] ACPI: HPET 0x00000000BA73E000 000038 (v01 ALASKA A M I 01072009 AMI 00000005) [ 0.004177] ACPI: Reserving HPET table memory at [mem 0xba73e000-0xba73e037] [ 0.011711] ACPI: HPET id: 0x10228201 base: 0xfed00000 [ 0.011779] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns [ 0.053036] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns [ 0.073060] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x6aac6578f53, max_idle_ns: 881590624688 ns [ 0.229876] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.258559] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 [ 0.258559] hpet0: 3 comparators, 32-bit 14.318180 MHz counter [ 0.259418] clocksource: Switched to clocksource tsc-early [ 0.270569] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns [ 0.399293] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram, hpet irqs [ 1.297089] tsc: Refined TSC clocksource calibration: 3725.784 MHz [ 1.297099] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x6b68f6b3d46, max_idle_ns: 881590491450 ns [ 1.481558] clocksource: Switched to clocksource tsc [ 2.100147] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large: [ 2.100150] clocksource: 'hpet' wd_nsec: 519709138 wd_now: 1c02efb wd_last: 14ea372 mask: ffffffff [ 2.100152] clocksource: 'tsc' cs_nsec: 516112236 cs_now: c4754c503 cs_last: bd4b74abd mask: ffffffffffffffff [ 2.100153] clocksource: 'tsc' is current clocksource. [ 2.100158] tsc: Marking TSC unstable due to clocksource watchdog [ 2.100164] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 2.101590] clocksource: Checking clocksource tsc synchronization from CPU 1 to CPUs 0,9,11. [ 2.281080] clocksource: Switched to clocksource hpet [ 3.119889] SVM: TSC scaling supported cat /sys/kernel/debug/dri/0/amdgpu_firmware_info VCE feature version: 0, firmware version: 0x00000000 UVD feature version: 0, firmware version: 0x00000000 MC feature version: 0, firmware version: 0x00000000 ME feature version: 52, firmware version: 0x000000a4 PFP feature version: 52, firmware version: 0x000000bc CE feature version: 52, firmware version: 0x0000004f RLC feature version: 1, firmware version: 0x0000003c RLC SRLC feature version: 1, firmware version: 0x00000001 RLC SRLG feature version: 1, firmware version: 0x00000001 RLC SRLS feature version: 1, firmware version: 0x00000001 MEC feature version: 52, firmware version: 0x000001ca SOS feature version: 0, firmware version: 0x00000000 ASD feature version: 0, firmware version: 0x2100005a TA XGMI feature version: 0x00000000, firmware version: 0x00000000 TA RAS feature version: 0x00000000, firmware version: 0x00000000 TA HDCP feature version: 0x00000000, firmware version: 0x1700001f TA DTM feature version: 0x00000000, firmware version: 0x12000009 TA RAP feature version: 0x00000000, firmware version: 0x00000000 TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000 SMC feature version: 0, program: 0, firmware version: 0x00375b00 (55.91.0) SDMA0 feature version: 41, firmware version: 0x00000028 VCN feature version: 0, firmware version: 0x0510e014 DMCU feature version: 0, firmware version: 0x00000000 DMCUB feature version: 0, firmware version: 0x01010019 TOC feature version: 0, firmware version: 0x00000000 VBIOS version: 113-RENOIR-035 Didn't do cross-motherboard or cross-kernel tests for now. Can still do if asked, but changing processors contains some risk of damage. Yes both are Ryzen 4000 and should adopt same root cause. Yes. Just confirming with another manufacturers motherboard for you, and with desktop variant of Renoir. Which is why I didn't do more cross-testing and just quickly tested with specific combination of hardware. Can confirm the aforementioned issue and observation with AMD Ryzen 7 PRO 4750U (ThinkPad T14 Gen 1, 20UES00L00, BIOS 1.41):
> sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info|grep -i smc
SMC feature version: 0, program: 0, firmware version: 0x00375b00 (55.91.0)
(In reply to Frank Kruger from comment #9) > Can confirm the aforementioned issue and observation with AMD Ryzen 7 PRO > 4750U (ThinkPad T14 Gen 1, 20UES00L00, BIOS 1.41): > > > sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info|grep -i smc > SMC feature version: 0, program: 0, firmware version: 0x00375b00 (55.91.0) ...openSUSE Tumbleweed with kernel 5.19. *** Bug 216967 has been marked as a duplicate of this bug. *** I can reproduce the issue here. Happens on warm boot. Ryzen 4600G Gigabyte A520M-DS3H (BIOS version F17b AGESA V2 1.2.0.8.) # cat /sys/kernel/debug/dri/1/amdgpu_firmware_info | grep SMC SMC feature version: 0, program: 0, firmware version: 0x00375b00 (55.91.0) (In reply to Mario Limonciello (AMD) from comment #7) > Yes both are Ryzen 4000 and should adopt same root cause. I've confirmed with internal team the fix for Ryzen 5000 also works on Ryzen 4000. > SMC feature version: 0, program: 0, firmware version: 0x00375b00 (55.91.0) Yes, this version is affected. The fix should be in a newer version. Same bug ryzen 3 3200u Lenovo L340-15API Kernel 6.3.3 Manjaro. journalctl -k -b -2 -o short-monotonic | grep -i tsc [ 0.000000] lenovo kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=36b0852f-0fab-4c28-8bf4-c60d613cad31 rw tsc=directsync amdgpu.ppfeaturemask=0xfff7ffff pcie_aspm.policy=performance rootfstype=ext4 zswap.enabled=1 zswap.compressor=lz4hc apparmor=1 security=apparmor module_blacklist=sp5100_tco nmi_watchdog=0 nowatchdog watchdog=0 audit=0 amd_iommu=on iommu=pt random.trust_cpu=on mitigations=off retbleed=off resume=UUID=d2cc43e0-3ea0-47ab-a5cb-dcc3bb5084e0 udev.log_priority=3 preempt=full pcie_aspm.policy=performance btusb.enable_autosuspend=n 3 [ 0.000000] lenovo kernel: tsc: Fast TSC calibration using PIT [ 0.000000] lenovo kernel: tsc: Detected 2595.004 MHz processor [ 0.051230] lenovo kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=36b0852f-0fab-4c28-8bf4-c60d613cad31 rw tsc=directsync amdgpu.ppfeaturemask=0xfff7ffff pcie_aspm.policy=performance rootfstype=ext4 zswap.enabled=1 zswap.compressor=lz4hc apparmor=1 security=apparmor module_blacklist=sp5100_tco nmi_watchdog=0 nowatchdog watchdog=0 audit=0 amd_iommu=on iommu=pt random.trust_cpu=on mitigations=off retbleed=off resume=UUID=d2cc43e0-3ea0-47ab-a5cb-dcc3bb5084e0 udev.log_priority=3 preempt=full pcie_aspm.policy=performance btusb.enable_autosuspend=n 3 [ 0.133230] lenovo kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2567cc8bdeb, max_idle_ns: 440795324972 ns [ 0.298271] lenovo kernel: clocksource: Switched to clocksource tsc-early [ 0.375917] lenovo kernel: tsc: tsc_khz exported in sysfs [ 1.373604] lenovo kernel: tsc: Refined TSC clocksource calibration: 2597.301 MHz [ 1.374493] lenovo kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2570468ff78, max_idle_ns: 440795327844 ns [ 1.375421] lenovo kernel: clocksource: Switched to clocksource tsc [ 2.133603] lenovo kernel: clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large: [ 2.135072] lenovo kernel: clocksource: 'tsc' cs_nsec: 496013964 cs_now: 1c7d274766 cs_last: 1c305d6b2c mask: ffffffffffffffff [ 2.135806] lenovo kernel: clocksource: Clocksource 'tsc' skewed -415749 ns (18446744073709 ms) over watchdog 'hpet' interval of 496429713 ns (496 ms) [ 2.136517] lenovo kernel: clocksource: 'tsc' is current clocksource. [ 2.137230] lenovo kernel: tsc: Marking TSC unstable due to clocksource watchdog [ 2.137940] lenovo kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 2.140079] lenovo kernel: clocksource: Checking clocksource tsc synchronization from CPU 1 to CPUs 0,3. [ 21.167957] lenovo kernel: vboxdrv: TSC mode is Invariant, tentative frequency 2595252515 Hz [ 29.244899] lenovo kernel: kvm_amd: TSC scaling supported (In reply to Mario Limonciello (AMD) from comment #14) > (In reply to Mario Limonciello (AMD) from comment #7) > > Yes both are Ryzen 4000 and should adopt same root cause. > > I've confirmed with internal team the fix for Ryzen 5000 also works on Ryzen > 4000. > > > SMC feature version: 0, program: 0, firmware version: 0x00375b00 (55.91.0) > > Yes, this version is affected. The fix should be in a newer version. Is there any news from your side? Thx. > Is there any news from your side? Thx.
The fix is in newer versions than the one reported.
(In reply to Mario Limonciello (AMD) from comment #17) > > Is there any news from your side? Thx. > > The fix is in newer versions than the one reported. The issue has been fixed for Ryzen 7 PRO 4750U as well, given "SMC feature version: 0, program: 0, firmware version: 0x00375d00 (55.93.0)". Thanks a lot for your efforts, including the AMD team! Happens on hbina@akarin ~> screenfetch (base) hbina@akarin MMMMMMMMMMMMMMMMMMMMMMMMMmds+. OS: Linuxmint 21.1 vera MMm----::-://////////////oymNMd+` Kernel: x86_64 Linux 6.4.9-060409-generic MMd /++ -sNMd: Uptime: 7m MMNso/` dMM `.::-. .-::.` .hMN: Packages: 2798 ddddMMh dMM :hNMNMNhNMNMNh: `NMm Shell: fish 3.3.1 NMm dMM .NMN/-+MMM+-/NMN` dMM Resolution: 1920x1080 NMm dMM -MMm `MMM dMM. dMM DE: GNOME NMm dMM -MMm `MMM dMM. dMM WM: Muffin NMm dMM .mmd `mmm yMM. dMM WM Theme: Linux Mint (Mint-Y) NMm dMM` ..` ... ydm. dMM GTK Theme: Mint-Y-Aqua [GTK2/3] hMM- +MMd/-------...-:sdds dMM Icon Theme: Mint-Y-Aqua -NMm- :hNMNNNmdddddddddy/` dMM Font: Ubuntu 10 -dMNs-``-::::-------.`` dMM Disk: 265G / 439G (64%) `/dMNmy+/:-------------:/yMMM CPU: AMD Ryzen 7 4800U with Radeon Graphics @ 16x 1.8GHz ./ydNMMMMMMMMMMMMMMMMMMMMM GPU: RENOIR (renoir, LLVM 15.0.7, DRM 3.52, 6.4.9-060409-generic) \.MMMMMMMMMMMMMMMMMMM RAM: 4963MiB / 15396MiB hbina@akarin ~> journalctl -k -b -2 -o short-monotonic | grep -i tsc (base) [ 0.000000] akarin kernel: tsc: Fast TSC calibration failed [ 0.028000] akarin kernel: tsc: PIT calibration matches HPET. 2 loops [ 0.028000] akarin kernel: tsc: Detected 1796.669 MHz processor [ 0.000006] akarin kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x19e5dedc306, max_idle_ns: 440795217416 ns [ 0.386335] akarin kernel: clocksource: Switched to clocksource tsc-early [ 1.472059] akarin kernel: tsc: Refined TSC clocksource calibration: 1796.624 MHz [ 1.472089] akarin kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19e5b473752, max_idle_ns: 440795288133 ns [ 1.472386] akarin kernel: clocksource: Switched to clocksource tsc [ 2.023964] akarin kernel: clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large: [ 2.023984] akarin kernel: clocksource: 'tsc' cs_nsec: 519941146 cs_now: a8a1ce440 cs_last: a526f10d8 mask: ffffffffffffffff [ 2.023991] akarin kernel: clocksource: Clocksource 'tsc' skewed 7748116 ns (7 ms) over watchdog 'hpet' interval of 512193030 ns (512 ms) [ 2.023998] akarin kernel: clocksource: 'tsc' is current clocksource. [ 2.024017] akarin kernel: tsc: Marking TSC unstable due to clocksource watchdog [ 2.024057] akarin kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 2.024426] akarin kernel: clocksource: Checking clocksource tsc synchronization from CPU 11 to CPUs 0,3,5,9,12,15. [ 11.825083] akarin kernel: kvm_amd: TSC scaling supported > Happens on
As mentioned above, you need to report your firmware for SMC.
# cat /sys/kernel/debug/dri/1/amdgpu_firmware_info | grep SMC
Oh sorry, this is what it says: root@akarin:/sys/kernel/debug/dri# cat 0/amdgpu_firmware_info | grep 'SMC' SMC feature version: 0, program: 0, firmware version: 0x00374700 (55.71.0) OK. That's too old, it doesn't pick up the fix. you'll need a BIOS upgrade. I am already on the latest BIOS. This is the device that I am using https://www.asus.com/my/displays-desktops/mini-pcs/pn-series/mini-pc-pn50/ PN-50 PN50-ASUS-0624 BIOS |