Bug 217212
Summary: | AMD Ryzen fTPM stutter even after #216989 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Branko Grubić (bitlord0xff) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | 1138267643, Jason, manliodp, mario.limonciello, reach622, regressions |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
See Also: | https://bugzilla.kernel.org/show_bug.cgi?id=216989 | ||
Kernel Version: | 6.2.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
tpm2_getcap properties-fixed (LENOVO IdeaPad 5 Pro 14ACN6)
change the value to see if it is defected |
Description
Branko Grubić
2023-03-17 18:39:55 UTC
Can you bisect? Does it still happen if the kernel is compiled with CONFIG_HW_RANDOM_TPM=n ? um, since you are using Lenovo product, I think there is an option in BIOS that you can completely true off fTPM. The patch itself isn't truly solved the problem, the goal is helping people who can't disable fTPM in BIOS (like me). But if you don't have this option in your BIOS, feel free to report this bug. quick update, I installed Fedora 37 and updated it to kernel 6.2.6 on my ASUS system. Can confirm this patch works in the 6.2.6-fedora kernel. I check your "tpm2_getcap" file and looks like you have a newer firmware version that might not be recognized as "defected" in the patch. need some investment in it. Created attachment 303978 [details]
change the value to see if it is defected
another quick update:
I grab Mario's patch and turn it into a simple C program that can input the fTPM version to see if it is included in "defected" list.
It turns out, your fTPM version is not included.
It might mean that more fTPM versions might need to be included in the patch. There can very well be multiple issues that manifest as a stutter, but they don't all have the same root cause as the HW RNG functionality. Please lets follow the above suggestion to turn off HW RNG in the kernel config as described above and see if it still happens for you or not. Thanks everyone for comments suggestions, and sorry for the late answer. Haven't compiled kernel for some time. Tried yesterday rebuild it as rpm package, not building kernel manually (I was not sure re-using Fedora kernel config would work, if they have some custom downstream or backported patches). But I'm not sure if I did everything right. I just manually changed in config (commenting line (like other lines are)): ``` $ grep CONFIG_HW_RANDOM_TPM /boot/config-$(uname -r) # CONFIG_HW_RANDOM_TPM is not set ``` But I cannot verify is it disabled or not. Is there a way to verify it? Device node (/dev/hwrng) still exists, but if you try: ``` ls -l /dev/hwrng crw-------. 1 root root 10, 183 Mar 20 09:28 /dev/hwrng cat /dev/hwrng > /dev/null cat: /dev/hwrng: No such device ``` If this confirms it is disabled, I'll continue using this kernel for few days and verify if stutter is still happening. Once again, sorry for not answering everyone or testing all options. Bisecting wouldn't be an easy option this hardware (it takes a lot of time to compile and makes a lot of noise/heat). Regards, Branko Sounds right to me. You might also look at bug 217158, which is pointing fingers at a very specific combination of graphical software is causing it. (In reply to Mario Limonciello (AMD) from comment #9) > Sounds right to me. > > You might also look at bug 217158, which is pointing fingers at a very > specific combination of graphical software is causing it. I saw that bug already mentioned on the original fTPM report, but problem for me started before kernel 6.2 (most likely on 6.1 as everyone else (I have this system for 6~8 months). Regarding kernel built without CONFIG_HW_RANDOM_TPM, so far it's running for 24h (haven't tested it all 24h) and to this moment I haven't experienced any stutter. I'll continue using it like this and see in next day or two how it behaves. After few days of use with kernel built without CONFIG_HW_RANDOM_TPM, I couldn't reproduce this issue. Using computer the same way as I do every day. Is it possible to verify if this specific firmware version is also affected? >Is it possible to verify if this specific firmware version is also affected? Yeah, I checked with internal team, but it shouldn't be. Can you confirm the AGESA version in your BIOS? And you are on the latest BIOS? >After few days of use with kernel built without CONFIG_HW_RANDOM_TPM, I >couldn't reproduce this issue. Using computer the same way as I do every day. Something we can "consider" is to just disable fTPM RNG entirely for AMD. > Something we can "consider" is to just disable fTPM RNG entirely for AMD.
The quality of bug reporting has dramatically decreased in the last weeks, so I wouldn't make any decisions based on the quasi-information from here. Until we get some really solid bug reports where it's clear that a rigorous process is being carried out, I suspect we should treat these as "user testing error" or "an unrelated bug".
(In reply to Mario Limonciello (AMD) from comment #12) > >Is it possible to verify if this specific firmware version is also affected? > > Yeah, I checked with internal team, but it shouldn't be. Can you confirm > the AGESA version in your BIOS? What I got from the BIOS is: CezannePI-FP6 1.0.0.B > > And you are on the latest BIOS? Yes, this is at the moment latest officially available BIOS (double checked on the support website again, it's still the same version (https://download.lenovo.com/consumer/mobiles/gecn33ww.txt)). > > >After few days of use with kernel built without CONFIG_HW_RANDOM_TPM, I > >couldn't reproduce this issue. Using computer the same way as I do every > day. > > Something we can "consider" is to just disable fTPM RNG entirely for AMD. > What I got from the BIOS is: > CezannePI-FP6 1.0.0.B Yes; the AMD fix was introduced two versions before this. You might be hitting an OEM/model specific BIOS bug. > Until we get some really solid bug reports where it's clear that a rigorous > process is being carried out, I suspect we should treat these as "user > testing error" or "an unrelated bug Maybe we should just blacklist this system from registering fTPM with RNG? Hello, I can report the same issue even after the supposed fix and it's very bothersome as this impacts gaming, browsing and media fruition. Please expose a parameter someway to set TPM HW RNG optional without the need to recompile the kernel, I think this is very crucial for everyday PC usage. Leave the choice up to the user. Thanks a lot in advance Seems like some people still suffering from this problem even though their fTPM version is not listed as "defection". so it is a good idea to add a parameter for the user to manually control fTPM? or just block them all? @ Mario Limonciello (AMD) With all of these one-off reports, I think we need a compelling test description to confirm what the actual issue is. So far I haven't heard a compelling description that this is caused by fTPM other than reporters mentioning it in this bug report. It's very difficult to reproduce the issue in a sistematical way. Is it so "wrong" to leave power to the user? Is there a "convenience" to avoid parameterization of something that is causing issues to some users? Unluckily I don't have the possibility to disabile TPM in UEFI so please make of optional on the OS layer. Linux has always bene about choice. Just my opinion. Thanks (In reply to manliodp from comment #19) > Linux has always bene about choice. Nope: http://www.islinuxaboutchoice.com/ (In reply to manliodp from comment #16) > I can report the same issue even after the supposed fix Then please open a separate bug and drop the link here; we got a few reports already that had similar symptoms, but turned out to be different issues. Trying to sort this out in a ticket about a bug that solved the issue for some people just gets confusing, hence it's a recipe to make developers ignore a bug. Sorry but this bug is in "NEW" status and already separate from the original one with the fix you are mentioning. My problem was already solved. but I pick up this again and see if I can find more useful information. First, I finally boot into the 4.15.0 kernel on my laptop (Ubuntu 16.04.7LTS) Second, I found that in 4.15.0, the /dev/hwrng doesn't exist (No such devices or files), then I download 4.16.0-rc1 from Ubuntu archive, which this time, /dev/hwrng is actually a thing I can cat it. Third, "sudo cat /dev/hwrng > /dev/null" or just "sudo cat /dev/hwrng" can still trigger stuttering in 4.16.o-rc1 Fourth, looking into the 4.15 to 4.16 changelog, I didn't see any change on hwrng or TPM or so. So why 4.15 -> 4.16 can make a different on /dev/hwrng? need git bisect to find out. but after that many tests, I start to doubt if the problem was actually caused by the fTPM. we need a lot more tests (different hardware/settings/kernels). gonna try my best to find more hardware. well, I borrowed two Lenovo laptops from my mates, one is 6800H and one is 5800H. both show "defective firmware" (by booting into 6.2.8 Arch ISO and checking dmsg) however, I can't reproduce any stuttering in 6.1.0 kernel (test on a USB Arch installation), even ten "cat /dev/hwrng" running in parallel for half an hour. so yeah, I have no idea now. (In reply to Bell from comment #23) > My problem was already solved. but I pick up this again and see if I can > find more useful information. > > First, I finally boot into the 4.15.0 kernel on my laptop (Ubuntu 16.04.7LTS) > Vendor kernels can go through vendors. |