Bug 216208
Summary: | Interrupt storm on Asus UM325UAZ | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Pavel Krc (reg.krn) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | mario.limonciello |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | 5.18.2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
/proc/interrupts ps auxk-cputime (shortly after boot) lshw grep . /sys/kernel/irq/34/* acpidump /sys/kernel/debug/gpio Possible patch to ignore pin from driver side /sys/kernel/debug/gpio with changed SSDT dmesg with changed SSDT dmesg with blacklisted pinctrl_amd dmesg with SSDT changed via GRUB /sys/kernel/debug/gpio with SSDT changed via GRUB Quirk patch (v2) Patch v3 [1/2] PATCH v3 [2/2] dmesg with patch v3 |
Created attachment 301339 [details]
/proc/interrupts
Created attachment 301340 [details]
ps auxk-cputime (shortly after boot)
Created attachment 301341 [details]
lshw
Created attachment 301342 [details]
grep . /sys/kernel/irq/34/*
Please install the latest BIOS version first, see if it helps. Yours is very old. Also check if `sudo rmmod pinctrl-amd` helps. If it does, blacklist this module, until there's a better solution. Thank you for your help. I checked again and the newest BIOS that Asus provides for this laptop is version 300, which I already have. I am not able to blacklist pinctrl-amd, as it is a built-in module. I tried to find an init function that I would block using initcall_blacklist= param, but I could not find one. I tried initcall_blacklist=amd_gpio_probe, but that is probably not a suitable init call. It did print "blacklisting initcall amd_gpio_probe" in dmesg, but nothing about the interrupts actually changed. CC'ing Mario Limonciello - maybe he could say something. Maybe the pinctrl-amd driver has nothing to do with this issue at all - it was based on my limited research. I don't suppose you have any way to know what is connected to pin 18. Can you try to share an acpidump? We might be able to get a hint from _AEI. Created attachment 301364 [details]
acpidump
Attaching acpidump. In the meantime, I am also compiling the same kernel without pinctrl-amd to test that.
Can you look and see what's at GPP4? You can try: $ grep GPP4 /sys/bus/acpi/devices/*/path Then you can switch into that directory and see what it's bound with. root@largo:~/storm# grep GPP4 /sys/bus/acpi/devices/*/path /sys/bus/acpi/devices/device:0c/path:\_SB_.PCI0.GPP4 /sys/bus/acpi/devices/device:0d/path:\_SB_.PCI0.GPP4.D00C /sys/bus/acpi/devices/QCOM6390:00/path:\_SB_.PCI0.GPP4.BTH0 Looks like something with Bluetooth: /sys/bus/acpi/devices/device:0c/device:0d/power/runtime_active_time:0 /sys/bus/acpi/devices/device:0c/device:0d/power/runtime_active_kids:0 /sys/bus/acpi/devices/device:0c/device:0d/power/runtime_usage:0 /sys/bus/acpi/devices/device:0c/device:0d/power/runtime_status:unsupported /sys/bus/acpi/devices/device:0c/device:0d/power/async:disabled /sys/bus/acpi/devices/device:0c/device:0d/power/runtime_suspended_time:0 /sys/bus/acpi/devices/device:0c/device:0d/power/runtime_enabled:disabled /sys/bus/acpi/devices/device:0c/device:0d/power/control:auto /sys/bus/acpi/devices/device:0c/device:0d/adr:0x000000ff /sys/bus/acpi/devices/device:0c/device:0d/path:\_SB_.PCI0.GPP4.D00C /sys/bus/acpi/devices/device:0c/power/runtime_active_time:0 /sys/bus/acpi/devices/device:0c/power/runtime_active_kids:0 /sys/bus/acpi/devices/device:0c/power/runtime_usage:0 /sys/bus/acpi/devices/device:0c/power/runtime_status:unsupported /sys/bus/acpi/devices/device:0c/power/async:disabled /sys/bus/acpi/devices/device:0c/power/runtime_suspended_time:0 /sys/bus/acpi/devices/device:0c/power/runtime_enabled:disabled /sys/bus/acpi/devices/device:0c/power/control:auto /sys/bus/acpi/devices/device:0c/adr:0x00020002 /sys/bus/acpi/devices/device:0c/path:\_SB_.PCI0.GPP4 /sys/bus/acpi/devices/device:0c/QCOM6390:00/uevent:MODALIAS= /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_active_time:0 /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_active_kids:0 /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_usage:0 /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_status:unsupported /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/async:disabled /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_suspended_time:0 /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_enabled:disabled /sys/bus/acpi/devices/device:0c/QCOM6390:00/power/control:auto /sys/bus/acpi/devices/device:0c/QCOM6390:00/hid:QCOM6390 /sys/bus/acpi/devices/device:0c/QCOM6390:00/path:\_SB_.PCI0.GPP4.BTH0 /sys/bus/acpi/devices/device:0c/QCOM6390:00/status:0 Which is strange, because the USB bluetooth device that I know of (13d3:3563), reported as MediaTek and managed by btusb module, works fine. > /sys/bus/acpi/devices/device:0c/QCOM6390:00/status:0 This means the device is not present. https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/06_Device_Configuration/Device_Configuration.html?highlight=_sta#sta-device-status > QCOM6390 Did you change out your Wifi module or any hardware modifications as it was shipped? At least my educated guess of what is going on here is that whatever is connected to pin 18 at least in one of the SKUs for this BIOS notifies the Qualcomm BT controller when events happen. It might be a wakeup pin for BT for example. However it seems that in your SKU with the Mediatek wifi there is nothing to notify. Maybe the pin is floating, or maybe it's tied to something else. If you didn't change your HW at all, this seems like it's probably an SBIOS bug that _AEI shouldn't be marking that pin to trigger. Let's hear what your results are for your kernel without pinctrl-amd. I expect that if you don't compile in pinctrl-amd that your "high interrupts" go away, but that's not really a solution. It's just an information and debugging tactic. Please also add /sys/kernel/debug/gpio to this bug report (when you have pinctrl-amd compiled in) so we have that as a data point. Created attachment 301367 [details]
/sys/kernel/debug/gpio
I did not change any HW. Actually we have two identical laptops at our office and both are experiencing the same issue.
In addition to that, we also have a third, slightly older, yet almost identical model (UM325UA without the Z), which does not experience the issue. Comparing lshw listing for these models, the differences are:
* WiFi+BT (UA: Intel, UAZ: MediaTek)
* NVMe (UA: SKHynix, UAZ: Intel)
* BIOS firmwares are different (non-interchangeable) and most IRQ+memory ranges for devices differ
The laptop shipped with Windows 10 which is still accessible (and it has normal power consumption), just FYI in case it could be helpful in any way.
My PC on which I compile the kernel became inaccessible, I will have a physical access to it on Wednesday.
I had a discussion with some colleagues and believe that pin 18 is for WLAN wakeup. Are you comfortable loading a patched SSDT? I think this might fix your issue: $ diff -u ssdt11.dsl.old ssdt11.dsl --- ssdt11.dsl.old 2022-07-11 12:35:14.075642745 -0500 +++ ssdt11.dsl 2022-07-11 12:35:34.855642149 -0500 @@ -213,12 +213,6 @@ "\\_SB.GPIO", 0x00, ResourceConsumer, , ) { // Pin list - 0x0012 - } - GpioInt (Edge, ActiveLow, ExclusiveAndWake, PullNone, 0x0000, - "\\_SB.GPIO", 0x00, ResourceConsumer, , - ) - { // Pin list 0x0018 } GpioInt (Edge, ActiveHigh, ExclusiveAndWake, PullNone, 0x0000, Created attachment 301397 [details]
Possible patch to ignore pin from driver side
If you use the SSDT approach and it works, then that at least confirms the root cause is this pin and the proper solution is for ASUS to fix it.
If it doesn't work, but you're not sure you loaded it correctly please share the /sys/kernel/debug/gpio output and a new dmesg so we can check.
In lieu of ASUS fixing it here is a possible patch for your system that might help as well. To confirm it applied you should see a new dev_info statement (see the patch for detail). If I got the DMI information wrong (I just lifted from your dmesg), please fix it in the patch and report back with the right info.
Thank you again. I will try to upload the modified SSDT now. Just to make sure that I do not mess up anything (this is my first manipulation of ACPI tables): - I disassembled the SSDT from acpidump and patched your patch - I recompiled the table using iasl -sa ssdt11.dsl - I added the new ssdt11.aml inside /kernel/firmware/acpi as cpio in front of current initrd - I will now boot with the updated initrd. Am I doing it right? Thanks you. There are a variety of ways to patch SSDT. My go to method is to use GRUB to load an ACPI table (as it's easier to swap out on the fly) but that one sounds like it will work correctly to me. Created attachment 301444 [details]
/sys/kernel/debug/gpio with changed SSDT
I tried both the SSDT approach and blacklisting pinctrl_amd recompiled as module. The SSDT patch did not fix the interrupts. Attaching gpio and dmesg. Is it still worth testing the driver patch?
Blacklisting pinctrl_amd did fix the interrupts while wifi kept on working, however the integrated touchpad stopped working. Attaching dmesg. Let me know if other files are needed for any of the cases.
Created attachment 301445 [details]
dmesg with changed SSDT
Created attachment 301446 [details]
dmesg with blacklisted pinctrl_amd
> pin18 Edge trigger| Active low| interrupt is enabled| interrupt is unmasked| > enable wakeup in S0i3 state| enable wakeup in S3 state| disable wakeup in S4/S5 state| input is high| pull-up is disabled| Pull-down is disabled| output is disabled| debouncing filter disabled| 0x57a00 I don't believe you properly disabled it from SSDT. I see that the pin is still edge triggered. > Is it still worth testing the driver patch? Yeah, I don't see why not. Created attachment 301449 [details]
dmesg with SSDT changed via GRUB
You are right. Now I tried loading SSDT via grub and interrupts are gone, touchpad and wifi still work and I did not detect any immediate issue. Attaching dmesg and gpio. Next test will be the driver patch once I compile it.
Created attachment 301450 [details]
/sys/kernel/debug/gpio with SSDT changed via GRUB
TIL that there is actually an interface available in the kernel already for this quirking. Please disregard my patch. You can try to use this on your kernel command line: gpiolib_acpi.ignore_wake=AMDI0030:00@18 Attached is a new patch that should do it automatically based on your DMI data from lshw. Created attachment 301452 [details]
Quirk patch (v2)
Created attachment 301453 [details]
Patch v3 [1/2]
Created attachment 301454 [details]
PATCH v3 [2/2]
Actually I see that controls wake only - and your problem is a floating interrupt. Try v3 instead, which is 2 patches to apply.
Created attachment 301457 [details]
dmesg with patch v3
I have thoroughly tested patch v3 and I can confirm that everything works flawlessly. DMI strings are matched and the pin is ignored (see dmesg). Now I have almost 20 h battery lifetime as opposed to 3-4 before. Also, the interrupts used to interfere with suspend (always) and hibernate (sometimes), which I can now use reliably.
Thank you very much, Mario, for the quick feedback and fix.
Sure, thanks for checking it! I would also suggest you report this to ASUS to get fixed in the BIOS. For now I've submitted the quirk upstream here: https://lore.kernel.org/linux-gpio/20220719142142.247-1-mario.limonciello@amd.com/T/#t In addition to 5.18.2, I have successfully tested the patch with 5.18.14. |
Created attachment 301338 [details] dmesg I am experiencing interrupt storm on my Asus UM325UAZ laptop, which fully occupies one CPU core, shortening battery life by a factor of 3-4. I have tried multiple kernel versions (5.14, 5.15, 5.16, 5.17 and 5.18) from Debian, Ubuntu and Fedora with identical results. The attached listings (dmesg, interrupts, cputime) come from 5.18.2 on Debian. I have also attached lshw listing and the full contents of /sys/kernel/irq/34 from 5.14.9, but that should not make much difference.