Bug 217833
Description
Luca Pigliacampo
2023-08-27 19:57:50 UTC
Created attachment 304954 [details]
dumps of acpi tables
Created attachment 304955 [details]
content of /sys/kernel/debug/gpio in version 6.4.4
Created attachment 304956 [details]
content of /sys/kernel/debug/gpio in version 6.5-rc7 before pressing power button
Created attachment 304957 [details]
content of /sys/kernel/debug/gpio in version 6.5-rc7 after pressing power button
My observations from your logs so far: 1) Your system doesn't use _AEI to program any GPIOs. The values programmed to the firmware are what persist. 2) In your 6.5 logs it seems that bit 30 is active after you've pressed the button. Bit 30 being asserted for GPIO 0 means that the power button was pressed for less than 2 seconds you can reference page 948 of this document: https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/55072_AMD_Family_15h_Models_70h-7Fh_BKDG.pdf --- My next step asks: 1) Can you please turn on the dynamic debugging statements for pinctrl-amd.c and see if you get messages when you press the button? Given the interrupt is "disabled" I think you won't, but I'd like to confirm. 2) Can you try what I suggested with holding the power button longer than 2s but less than 10s to see if it triggers? 3) If nothing from 2, can you try from >10s? 4) I'd like to understand what source you "normally" see the power button from as well. An ACPI device? 5) I notice in the documentation that software needs to clear InterruptSts to clear LessThan2s. As the interrupt isn't enabled (since you don't have _AEI), I'm not sure if that will work. Let me see your findings from 1-4 first, and I can follow up with a patch later if I think this can work for your case. Created attachment 304963 [details]
possible patch v1
Here's a possible patch for my idea. It's only compile tested, and please still provide the logs for the other things I asked to make sure it makes sense.
Created attachment 304964 [details]
possible patch v2
Created attachment 304966 [details]
messages received after enabling dynamic debug
thanks, i'll test the patch now, about 1) when i enabled dynamic debug it started spamming the message i'm attaching in the system journal, multiple times every second, and pressing the power button didin't cause any different messages about 2) sorry i forgot about it before, keeping the power button pressed for a bit longer (between 2 and 4 seconds) didn't make a difference compared to a short press, after keeping it pressed for more than 4 seconds the press is recognized as KEY_LEFTMETA on the keyboard, rather than power button, about 3) keeping the power button pressed for more than 10 seconds cuts power to the computer and causes it to shut down abruptly. about 4) sorry, but i'm not sure what you mean, i know evtest is reading events for the power button from the file at /dev/input/event2, i found that it is also referenced from /sys/devices/LNXSYSTM\:00/LNXPWRBN\:00/, that seems related to acpi, so i think it's an acpi device? about 5) i hope these were useful I'm testing the patch right now and certainly it changed something, before the power button would be ignored until a reboot, now it is recognized again after i log out and log back in, so i can press it once, log out, log in, and i can press it a second time, then i need again to log out and back in to press it a third time also it seems that in some cases the power button can be actually pressed multiple times by waiting a few seconds between presses, but i couldn't find a way to reproduce it consistently (In reply to Luca Pigliacampo from comment #10) > I'm testing the patch right now and certainly it changed something, > before the power button would be ignored until a reboot, > now it is recognized again after i log out and log back in, > > so i can press it once, log out, log in, and i can press it a second time, > then i need again to log out and back in to press it a third time I believe the patch is the right direction then. I think that what you're probably observing is related to the debounce behavior and the driver is expected to be clearing it. The patch might be correct, but I'd still like some more checks (see below). (In reply to Luca Pigliacampo from comment #9) > thanks, i'll test the patch now, > > about 1) when i enabled dynamic debug it started spamming the message i'm > attaching in the system journal, multiple times every second, and pressing > the power button didin't cause any different messages It's tied to movement from your touchpad. It makes sense. > > about 2) sorry i forgot about it before, > keeping the power button pressed for a bit longer (between 2 and 4 seconds) > didn't make a difference compared to a short press, > after keeping it pressed for more than 4 seconds the press is recognized as > KEY_LEFTMETA on the keyboard, rather than power button, Yup, that's exactly what I expected to happen when it's more than 2 and less than 10. It's probably a combination (left meta + something else). > > about 3) keeping the power button pressed for more than 10 seconds cuts > power to the computer and causes it to shut down abruptly. OK. > about 4) sorry, but i'm not sure what you mean, i know evtest is reading > events for the power button from the file at /dev/input/event2, i found that > it is also referenced from /sys/devices/LNXSYSTM\:00/LNXPWRBN\:00/, that > seems related to acpi, so i think it's an acpi device? OK. > > about 5) i hope these were useful Very much so. (In reply to Luca Pigliacampo from comment #11) > also it seems that in some cases the power button can be actually pressed > multiple times by waiting a few seconds between presses, but i couldn't find > a way to reproduce it consistently Is it possible that it only starts working again after the trackpad is used? The patch introduces a new dynamic debug statement. If you turn it on for the whole module while look at this we should be able to better understand it. Can you please try to correlate the exact things you do with power button and touchpad movement along with the logs? What I'm wondering is if the power button only starts working again after you've moved the touchpad. Created attachment 304975 [details]
log from pressing power after using trackpad
> log from pressing power after using trackpad
So that message definitely comes up, but it's not possible to tell purely from the log if it was tied to the touchpad movement or if the interrupt fired on it's own.
Can you compare the GPIO0 line from /sys/kernel/debug/gpio in this specific series of events:
1. Boot up (no power button pressed)
2. Press power button once. DON'T touch the touchpad.
3. Touch the touchpad.
4. Press power button once. DON'T touch the touchpad.
5. Press power button once more. DON'T touch the touchpad.
6. Touch the touchpad.
If after 2, 4, and 5 bit 30 clears on it's own then I think the patch does everything I hope it would. If bit 30 only clears after steps 3 and 6 then it's not an "ideal" fix, but might be good enough.
Created attachment 304976 [details]
log of unrecognized press followed by succesful press
> log of unrecognized press followed by succesful press
Do I follow this right?
1. You pressed once (worked).
2. You pressed again (didn't work).
3. You touched touchpad.
4. You pressed again (worked).
5. You touched touchpad.
> Do I follow this right? > 1. You pressed once (worked). > 2. You pressed again (didn't work). > 3. You touched touchpad. > 4. You pressed again (worked). > 5. You touched touchpad. not exactly, 1. i pressed once (worked) 2. activated debug messages 3. pressed again (didn't work) 4. touched touchpad 5. pressed again (worked) 6. touched touchpad so the first press shouldn't appear i think, still now i'm going to upload the sequence you asked Created attachment 304977 [details]
log of a specific sequence of presses
timings of my actions: 00:03:40 - pressed power 00:03:45 - touched trackpad 00:03:50 - pressed power 00:04:00 - pressed power 00:04:05 - touched trackpad > log of a specific sequence of presses
Sorry I might have been unclear. I'm looking specifically for what happens to this command in that sequence of events.
grep "#0" /sys/kernel/debug/gpio
So capture that command's output 6 times
> Is it possible that it only starts working again after the trackpad is used?
yes, it does
(In reply to Mario Limonciello (AMD) from comment #20) > > log of a specific sequence of presses > > Sorry I might have been unclear. I'm looking specifically for what happens > to this command in that sequence of events. > > grep "#0" /sys/kernel/debug/gpio > > So capture that command's output 6 times sorry, i really should pay more attention Created attachment 304978 [details]
state of gpio0 while using the power button
here it is Created attachment 304979 [details]
possible patch v3
No worries.
The patch you bisected to we definitely can't revert as it fixes other bugs with power button handling relative to what the platform expects.
I think what we should do is:
1. Submit that patch that helps when there is another GPIO interrupt firing.
2. Write a new patch to offer a module parameter
Can you please try this attached patch? This should let you configure the power button behavior back the way it was before (pinctrl_amd.powerbtn=0 on your kernel command line).
If that works and you're happy enough with it I'll submit the two patches.
thank you! i'll try that patch tomorrow, at the moment i feel like i'm going to fall asleep while standing. good night OK. Please also attach a dmidecode, if the module parameter approach works I'll add a 3rd patch that sets the default behavior for your system to the old behavior with it. Actually scratch that, I think I found a dmidecode online to match your system. Here's a branch as I'd like it tested on your system. https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci/kbz-217833/ It should detect your system and automatically put it into the previous behavior. If it doesn't, please try pinctrl_amd.powerbtn=0 on kernel command line to force it and share the dmidecode. Great! the patch works, except my system's product name is 82LM rather than 82FG, attached is the output of dmidecode -t system i'm going to use this computer normally for a few days to see if there's any problem, sorry i didn't realize it was so close to the merge window when i reported the bug Thanks! Created attachment 304983 [details]
output of dmidecode -t system
A bug is a bug; IMO it doesn't matter when it happens. I'll submit the patches to the mailing list for review. Per Hans' feedback on my v1 series, I have some updated patches for you to try. https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci%2Fkbz-217833-v2 Please swap out the v1 ones for these two. hi, i've been running them since this morning, it works well and it doesn't seem to cause problems on my other system. also it's nice that now it doesn't need to be enabled for specific hardware OK thanks! I am going to make one more change to it so it doesn't run on ALL hardware but only the ones that need it. I expect it will still work for you, but please test it instead of v2. https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci%2Fkbz-217833-v3 sorry, but there's another problem, the presses of the power button stop being recognized after putting the system into s3 sleep, i didn't think of testing this before, but now i'll check the previous version of the patch No worries. Can you please upload /sys/kernel/debug/gpio after resuming from suspend? I have an idea what this is, but I want to confirm your registers. Also, did you resume using the power button or another source when the failure happened? i woke up the system using the power button now i'll post the contents of gpio before and after the sleep Created attachment 304994 [details]
gpio before sleep
Created attachment 304995 [details]
gpio after sleep
mmm, they're "the same" for GPIO0 and the master register. I don't know what we can do here. Has this always been there for wakeup from power button? nevermind, it's an unrelated issue, just checked and version 6.4.4, which is before your commits, is also affected. i normally shut the computer down, so i didn't notice it, now i saw it by chance OK. In that case, does v3 work as correctly as expected? yes, it works just fine nonono, sorry, i booted in the wrong kernel, in 6.4.4 it works Can you capture /sys/kernel/debug/gpio before/after suspend on the working kernel? sure thing, attaching them now, next i'll see if the first patch also was the same Created attachment 304997 [details]
gpio before sleep in 6.4.4
Created attachment 304998 [details]
gpio after sleep in 6.4.4
so, the power button after sleep works in the first patch you proposed, so the issue was probably introduced in the patch v2 Can I see the kernel log across a suspend cycle with dynamic debug turned on for pinctrl-amd with version 3? And does moving the touchpad fix it after resume on v2/v3? Created attachment 304999 [details]
log of kernel messages across suspend
> And does moving the touchpad fix it after resume on v2/v3?
sadly no
(In reply to Luca Pigliacampo from comment #53) > Created attachment 304999 [details] > log of kernel messages across suspend It looks like you probably pressed the power button to 'enter' suspend. Is that right? If so; does this also reproduce if you only use suspend using software but resume using power button? > It looks like you probably pressed the power button to 'enter' suspend.
Yes
OK let's try this. Explicitly at suspend let's check if those bits weren't cleared yet and try to clear them again. https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci%2Fkbz-217833-v4 If this reproduces with using power-button for suspend and resume can you share the log again? ok, got the log without touching power before sleep, i only saw messages from the trackpad Created attachment 305000 [details]
log across suspension without touching power before
> It looks like you probably pressed the power button to 'enter' suspend. Is
> that right? If so; does this also reproduce if you only use suspend using
> software but resume using power button?
it also reproduces using software
now i'm compiling the new patch
> If this reproduces with using power-button for suspend and resume can you
> share the log again?
can reproduce bot using and not using the power button
attached is the log when using the power button
Created attachment 305001 [details]
log when using the power button to suspend on patch v4
Well I'm awfully confused now. You are absolutely sure it resolved with just the patch that clears it when you moved the touchpad? Here's one more idea, let's try to force to ensure it clears on suspend. https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci%2Fkbz-217833-v5 no, my finger slipped, i meant the one at https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci/kbz-217833/ that added the commandline option that apparently just restores the old behaviour for specific hardware didn't it? OK. How about this - let's try to clear the bit for suspend and restore it afterwards. https://gitlab.freedesktop.org/superm1/linux/-/commits/mlimonci%2Fkbz-217833-v6 Can you test this? sure, it fixes the issue with suspending, but now i think my pc is messing with me, i just realised that rebooting and powering off and on again are different, and in all patches since the patch v2 the power button is also not recognized after shutting down and turning the pc on again, it works only after a reboot. if it's of any use, the power button also works if i rebooted from the bootloader (refind) before starting the operating system. i'm sorry i keep missing stuff i should have tested also there seems to be no difference in the contents of /sys/kernel/debug/gpio before and after rebooting You're killing me here! :) If /sys/kernel/debug/gpio is the same before/after reboot that's quite baffling. Is there any chance that you can get me information from Windows? Windows To Go is fine as long as you get the AMD drivers installed. If you can do that, please install R/W everything in Windows and follow these steps to get the register. For using R/W everything in Windows you will use the "MMIO" function. 1. Press the MMIO button and then hit the "+" button. 2. Add a mapping for the base address (FED81500) and save it. 3. Add a line item for the master register (FC) Share the value it shows. I'd like to see this value both from cold boot, warm boot, and after suspend in Windows if you can. sorry for the wait, i the last few days i didn't have much access to my computer, the value R/W everything shows is 0xFF00A000 after both booting from powered off, rebooting, and waking from sleep, but it seems that in certain conditions the power button doesn't work in windows as well, i'll have to do more tests > the value R/W everything shows is 0xFF00A000 after both booting from powered > off, rebooting, and waking from sleep, OK - then that means that Windows isn't really messing with this value. > but it seems that in certain conditions the power button doesn't work in > windows as well, i'll have to do more tests OK. Please be cognizant of avoiding a warm boot from Linux so that it doesn't influence registers. Cold boot into Windows, or Warm boot from Windows to Windows. now i did some more tests and confirmed that windows also has that issue, specifically the power button stops working after suspending and resuming windows, then i updated the bios and the issue was fixed both on windows and every linux version i tried (6.4.5, all your patches, 6.5.3, 6.5.3-arch1-1), so apparently it was a bios issue all along and the old behaviour hid the problem i know it's not advised to change too much stuff while investigating an issue, but it seemed too strange that both systems failed almost in the exact same way, so, I'm sorry for wasting your time on an issue that wasn't even in the kernel Wow, great! |