Same as a lot of other AMD users, the issue is present also on my laptop if booted with tsc=nowatchdog. Same CPU3 drift, and TSC marked as unstable on cold boot. # chronyc tracking Reference ID : 877DA587 (ntp74.kashra-server.com) Stratum : 3 Ref time (UTC) : Wed Jan 25 14:56:44 2023 System time : 0.000261786 seconds fast of NTP time Last offset : -0.234086469 seconds RMS offset : 0.234086469 seconds Frequency : 2096.744 ppm slow Residual freq : -1438.992 ppm Skew : 2061.489 ppm Root delay : 0.027643852 seconds Root dispersion : 0.172473431 seconds Update interval : 64.6 seconds Leap status : Normal # sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info VCE feature version: 0, firmware version: 0x00000000 UVD feature version: 0, firmware version: 0x00000000 MC feature version: 0, firmware version: 0x00000000 ME feature version: 53, firmware version: 0x000000a6 PFP feature version: 53, firmware version: 0x000000c2 CE feature version: 53, firmware version: 0x00000050 RLC feature version: 1, firmware version: 0x0000003c RLC SRLC feature version: 1, firmware version: 0x00000001 RLC SRLG feature version: 1, firmware version: 0x00000001 RLC SRLS feature version: 1, firmware version: 0x00000001 RLCP feature version: 0, firmware version: 0x00000000 RLCV feature version: 0, firmware version: 0x00000000 MEC feature version: 53, firmware version: 0x000001d4 IMU feature version: 0, firmware version: 0x00000000 SOS feature version: 0, firmware version: 0x00000000 ASD feature version: 0, firmware version: 0x21000095 TA XGMI feature version: 0x00000000, firmware version: 0x00000000 TA RAS feature version: 0x00000000, firmware version: 0x00000000 TA HDCP feature version: 0x00000000, firmware version: 0x17000031 TA DTM feature version: 0x00000000, firmware version: 0x12000013 TA RAP feature version: 0x00000000, firmware version: 0x00000000 TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x27000005 SMC feature version: 0, program: 0, firmware version: 0x00375000 (55.80.0) SDMA0 feature version: 41, firmware version: 0x00000028 VCN feature version: 0, firmware version: 0x05113000 DMCU feature version: 0, firmware version: 0x00000000 DMCUB feature version: 0, firmware version: 0x01010023 TOC feature version: 0, firmware version: 0x00000000 MES_KIQ feature version: 0, firmware version: 0x00000000 MES feature version: 0, firmware version: 0x00000000 VBIOS version: 113-RENOIR-031
Lenovo hardware? If so, you should go ask them to send you a BIOS fix. Also, can you upload dmesg from that machine? Thx.
Yes; this should be a duplicate of #216166 and the same fix for that applies here. Lenovo will need to provide it.
Nope, ASUS VivoBook TM420IA. Yea, I guessed that it was a firmware bug, but dealing with ASUS Support is the same as trying to carve water out of a rock, so I hoped there was some alternative route besides firmware updates.
I'm sorry to say but we looked at this in depth and unfortunately in this case there is nothing that can be done from Linux. I recall the problem only happens with "warm boot", and so you can get into the habit of "cold boot" your machine as a workaround until ASUS can get a fix for it in.
(In reply to Mario Limonciello (AMD) from comment #4) > I'm sorry to say but we looked at this in depth and unfortunately in this > case there is nothing that can be done from Linux. > > I recall the problem only happens with "warm boot", and so you can get into > the habit of "cold boot" your machine as a workaround until ASUS can get a > fix for it in. Unfortunately this happens always, regardless of boot state. I'll try pestering ASUS support again, but I'm sure everything I will get out is "It's not Windows, best of luck" to you. This is actually the second time that I got shafted from AMD firmware bugs, initially on this year amd_sfh had been changed and caused a regression on my machine, I lost autorotation on my 2in1. In some way, the AMD engineer managed to force ASUS to provide me a fixed BIOS, send me the build through email, and never updated the build on your website. I wrote to ASUS providing the email the engineer sent me and the BIOS build, since the usb C port wasn't working on the new build, and requested a fixed version with both the sensor fix and the USB-C fixed. All I got was: if it's not on the main site, it's the latest version. I do not know how he did it, but as of today I have a newer build compared with the site version, and still with a non working USB-C port. Unfortunately, I can't recommend your products anymore to people, expecially if using Linux, too much stuff is left to vendors to fix and all they do is forget that they need to actually update the vendor firmware and not releasing "new" bioses with the same AGESA version of two years ago (yes, the 6 bios released in the span of two year never had an updated AGESA version, only my custom build). Sorry for the vent, If you can't do anything mode, feel free to close this, Marco.
/s/this year/last year/
*** This bug has been marked as a duplicate of bug 216166 ***
(In reply to Marco from comment #5) > Unfortunately, I can't recommend your products anymore to people The CPU vendor doesn't matter - it is the OEMs who don't care about their laptops supporting Linux properly. I, myself, wouldn't take ASUS hardware even for free, judging by past experience. What you could do next time you buy, is search the net whether someone else has run Linux on that hw and what her/his feedback about it is. Or, if you can get your hands on the machine you wanna buy, you boot a live CD on it and stare at dmesg and see whether everything gets detected properly and so on. I have a Zen2 laptop here from Lenovo and a perfectly fine TSC which is simply f***ed by the OEM BIOS. Guess what'll happen the next time I have to buy a Lenovo machine... I'm sorry but this is the reality, unfortunately.
(In reply to Borislav Petkov from comment #8) > (In reply to Marco from comment #5) > > Unfortunately, I can't recommend your products anymore to people > > The CPU vendor doesn't matter - it is the OEMs who don't care about their > laptops supporting Linux properly. I, myself, wouldn't take ASUS hardware > even for free, judging by past experience. > > What you could do next time you buy, is search the net whether someone else > has run Linux on that hw and what her/his feedback about it is. Or, if you > can get your hands on the machine you wanna buy, you boot a live CD on it > and stare at dmesg and see whether everything gets detected properly and so > on. > > I have a Zen2 laptop here from Lenovo and a perfectly fine TSC which is > simply f***ed by the OEM BIOS. Guess what'll happen the next time I have to > buy a Lenovo machine... > > I'm sorry but this is the reality, unfortunately. If only the firmware was open, all of this would never happen. I would update my god damn AGESA version by hand, and the problem would be solved. But no, gotta love closed, unsupported,and unreliable crap. Unfortunately Intel had far fewer BIOS bugs on laptop with Linux compared with the plethora that I had with AMD, and yes, a lot of it is just in their shared codebase. Even if a part of the blame is on the vendor side, AMD has still some part to it. Maybe it's just coincidence that I had more luck with Intel and a too small sample size. On desktop they are forced to update it, since you can replace the CPU freely, but on laptop its better to make 400 models at month rather than even do the minimum required to make them work. Marco.
I'm sorry for your situation. In general, I would suggest purchasing laptops that the manufacturer has certified to work well with a Linux OS vendor. Everything else, we (AMD) do the best we can to help the ecosystem, but the reality is that there are some things completely out of our control.
(In reply to Marco from comment #9) > If only the firmware was open, all of this would never happen. I would > update my god damn AGESA version by hand, and the problem would be solved. > > But no, gotta love closed, unsupported,and unreliable crap. I'd love to have open firmware everywhere but it is not that easy. And the more you get involved in this, the more you realize that the rabbit hole doesn't end. Look at all those presentations from the open firmware folks. > Maybe it's just coincidence that I had more luck with Intel and a too small > sample size. Yes, I think it is a coincidence because the kernel is full of fixes for BIOS bugs, regardless of vendors. And we do try to push back harder on those so that they do get fixed before they even reach OEM vendors but it's a win-lose battle. Hell, it is 2023 and we still don't have a f*ckin' serial port on laptops so that we can get debug dump and you're talking about supporting Linux. Pff. If you ask me, OEM vendors should have no opportunity to f*ck up the machine. Simply do your damn bling bling shiny clicky toy BIOS window and GTFO. In the case of the TSC, they should not even have the opportunity to touch the TSC MSR. That thing should be read-only. End of story. But that ship has sailed long ago. This is what I mean with the endless rabbit hole... > On desktop they are forced to update it, since you can replace the CPU > freely, but on laptop its better to make 400 models at month rather than > even do the minimum required to make them work. No, desktop is the same snafu. The only difference is servers where the end customer says, but but, I want this fixed. And more often than not it does get fixed for obvious reasons...
(In reply to Mario Limonciello (AMD) from comment #10) > I'm sorry for your situation. In general, I would suggest purchasing > laptops that the manufacturer has certified to work well with a Linux OS > vendor. > > Everything else, we (AMD) do the best we can to help the ecosystem, but the > reality is that there are some things completely out of our control. Yea, I know, I have one AMD desktop and a Steam Deck, if the vendor gives a damn it usually just works. Got biased from the amount of issues on this laptop compared to my previous brand (Clevo), however thanks to the modularity of Clevo barebones the older was far more easy to mod, even at BIOS level (updated quite a lot of Intel Microcodes that way, and no signature on the UEFI capsule itself, good times). You're good, usually, but on this machine it is one problem after another unfortunately :S
(In reply to Borislav Petkov from comment #11) > (In reply to Marco from comment #9) > > If only the firmware was open, all of this would never happen. I would > > update my god damn AGESA version by hand, and the problem would be solved. > > > > But no, gotta love closed, unsupported,and unreliable crap. > > I'd love to have open firmware everywhere but it is not that easy. And the > more > you get involved in this, the more you realize that the rabbit hole doesn't > end. > Look at all those presentations from the open firmware folks. > > > Maybe it's just coincidence that I had more luck with Intel and a too small > > sample size. > > Yes, I think it is a coincidence because the kernel is full of fixes for BIOS > bugs, regardless of vendors. And we do try to push back harder on those so > that > they do get fixed before they even reach OEM vendors but it's a win-lose > battle. > > Hell, it is 2023 and we still don't have a f*ckin' serial port on laptops so > that we can get debug dump and you're talking about supporting Linux. Pff. > > If you ask me, OEM vendors should have no opportunity to f*ck up the machine. > Simply do your damn bling bling shiny clicky toy BIOS window and GTFO. > > In the case of the TSC, they should not even have the opportunity to touch > the > TSC MSR. That thing should be read-only. End of story. But that ship has > sailed > long ago. > > This is what I mean with the endless rabbit hole... > > > On desktop they are forced to update it, since you can replace the CPU > > freely, but on laptop its better to make 400 models at month rather than > > even do the minimum required to make them work. > > No, desktop is the same snafu. > > The only difference is servers where the end customer says, but but, I want > this > fixed. And more often than not it does get fixed for obvious reasons... Oh, no, not as much down as some OS firmware stuff, but if you don't want to support the platform just release the god damn build chain and sources of your modification and let me fix my own machine; that's all I was asking. I am not at the level to want to verify the silicon pattern printed on the silicon itself :P
> And the more you get involved in this, the more you realize that the rabbit > hole doesn't end To add to this - even on "open" firmware designs, you still end up with things like FSP that will NEVER open up. Sure, you can do a lot more with these designs, but more knobs don't immediately equate to you can fix all the problems. You need proper debug tools, documentation, a good understanding of how different IP blocks interact, and knowledge of the bugs you need to dodge when you make solutions. If tomorrow someone told me there is a source code drop available for all firmware your laptop, I still don't expect anyone but a miniscule # of people would be able to fix this TSC issue. >And we do try to push back harder on those so that they do get fixed before they even reach OEM vendors but it's a win-lose battle. At the end of the day, the OEM owns the product. Even if we offer firmware fixes for bugs, they might not roll them out. It sucks for everyone involved. When this happens, we do our best to at least offer a workaround in the kernel as well if it makes sense. For example, this change is going into 6.2-rc for a suspend bug where multiple IRQs are active over s2idle: https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/commit/?h=review-hans&id=8e60615e8932167057b363c11a7835da7f007106 You can see it has a guard that will only apply it up until a certain firmware version that we have the fix. If the OEM includes it, the workaround turns off and you have the right behavior. If the OEM doesn't, at least Linux will behave "better" than before. If I had something like that to offer to you for the TSC I absolutely would, but I don't. > The only difference is servers where the end customer says, but but, I want > this fixed. And more often than not it does get fixed for obvious reasons... The reality is that no matter the ecosystem the platform firmware comes from you need a team to be developing it and testing it. If you just churn out cookie cutter laptops and don't bother to test them with Linux while you're making them this is exactly where you end up.
(In reply to Mario Limonciello (AMD) from comment #14) > You can see it has a guard that will only apply it up until a certain > firmware version that we have the fix. If the OEM includes it, the > workaround turns off and you have the right behavior. If the OEM doesn't, > at least Linux will behave "better" than before. That one is simple. If it involves touching a lot of code and it becomes real ugly and we have to support it forever, I don't have a problem to shoot it down. The kernel can't be the dumping ground for BIOS f*ckup fixes. At some point, they will have to face the music. > The reality is that no matter the ecosystem the platform firmware comes from > you need a team to be developing it and testing it. If you just churn out > cookie cutter laptops and don't bother to test them with Linux while you're > making them this is exactly where you end up. Preach brother! :-)
According to responses in https://forums.lenovo.com/t5/Other-Linux-Discussions/Unusable-TSC-on-P14s-and-X13-with-the-latest-LTS-kernel/m-p/5064905?page=9#6031445 It seems like Lenovo has just started shipping Thinkpad firmware updates with SMC firmware version 55.93.0, e.g. T14 Gen 1 / P14s Gen 1 ver 1.44 on https://support.lenovo.com/us/en/downloads/ds544977-bios-update-utility-bootable-cd-for-windows-10-64-bit-thinkpad-t14-gen-1-types-20ud-20ue (which I don't see on LVFS yet? Probably will be added soon). Is the required firmware update included in the AGESA packages that desktop motherboard manufacturers get? I've got a Renoir APU in an Asus board which currently has ComboAM4v2 PI 1.2.0.8 and SMC firmware version 55.91.0 - any chance you could say what version I should be looking for or asking for to get this fix?
> any chance you could say what version I should be looking for or asking for > to get this fix? For desktop, ComboAM4v2 PI 1.2.0.9 picked up the fix.