Bug 216208

Summary: Interrupt storm on Asus UM325UAZ
Product: Platform Specific/Hardware Reporter: Pavel Krc (reg.krn)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED CODE_FIX    
Severity: normal CC: mario.limonciello
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: 5.18.2 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
/proc/interrupts
ps auxk-cputime (shortly after boot)
lshw
grep . /sys/kernel/irq/34/*
acpidump
/sys/kernel/debug/gpio
Possible patch to ignore pin from driver side
/sys/kernel/debug/gpio with changed SSDT
dmesg with changed SSDT
dmesg with blacklisted pinctrl_amd
dmesg with SSDT changed via GRUB
/sys/kernel/debug/gpio with SSDT changed via GRUB
Quirk patch (v2)
Patch v3 [1/2]
PATCH v3 [2/2]
dmesg with patch v3

Description Pavel Krc 2022-07-06 08:26:08 UTC
Created attachment 301338 [details]
dmesg

I am experiencing interrupt storm on my Asus UM325UAZ laptop, which fully occupies one CPU core, shortening battery life by a factor of 3-4. I have tried multiple kernel versions (5.14, 5.15, 5.16, 5.17 and 5.18) from Debian, Ubuntu and Fedora with identical results. The attached listings (dmesg, interrupts, cputime) come from 5.18.2 on Debian. I have also attached lshw listing and the full contents of /sys/kernel/irq/34 from 5.14.9, but that should not make much difference.
Comment 1 Pavel Krc 2022-07-06 08:27:15 UTC
Created attachment 301339 [details]
/proc/interrupts
Comment 2 Pavel Krc 2022-07-06 08:28:38 UTC
Created attachment 301340 [details]
ps auxk-cputime (shortly after boot)
Comment 3 Pavel Krc 2022-07-06 08:29:25 UTC
Created attachment 301341 [details]
lshw
Comment 4 Pavel Krc 2022-07-06 08:29:50 UTC
Created attachment 301342 [details]
grep . /sys/kernel/irq/34/*
Comment 5 Artem S. Tashkinov 2022-07-06 16:31:35 UTC
Please install the latest BIOS version first, see if it helps.

Yours is very old.
Comment 6 Artem S. Tashkinov 2022-07-06 16:35:34 UTC
Also check if `sudo rmmod pinctrl-amd` helps. If it does, blacklist this module, until there's a better solution.
Comment 7 Pavel Krc 2022-07-07 09:57:04 UTC
Thank you for your help. I checked again and the newest BIOS that Asus provides for this laptop is version 300, which I already have.

I am not able to blacklist pinctrl-amd, as it is a built-in module. I tried to find an init function that I would block using initcall_blacklist= param, but I could not find one. I tried initcall_blacklist=amd_gpio_probe, but that is probably not a suitable init call. It did print "blacklisting initcall amd_gpio_probe" in dmesg, but nothing about the interrupts actually changed.
Comment 8 Artem S. Tashkinov 2022-07-07 22:22:22 UTC
CC'ing Mario Limonciello - maybe he could say something.

Maybe the pinctrl-amd driver has nothing to do with this issue at all - it was based on my limited research.
Comment 9 Mario Limonciello (AMD) 2022-07-07 22:24:41 UTC
I don't suppose you have any way to know what is connected to pin 18.  Can you try to share an acpidump?  We might be able to get a hint from _AEI.
Comment 10 Pavel Krc 2022-07-08 08:42:03 UTC
Created attachment 301364 [details]
acpidump

Attaching acpidump. In the meantime, I am also compiling the same kernel without pinctrl-amd to test that.
Comment 11 Mario Limonciello (AMD) 2022-07-08 12:53:41 UTC
Can you look and see what's at GPP4?

You can try:
$ grep GPP4 /sys/bus/acpi/devices/*/path

Then you can switch into that directory and see what it's bound with.
Comment 12 Pavel Krc 2022-07-08 15:15:27 UTC
root@largo:~/storm# grep GPP4 /sys/bus/acpi/devices/*/path
/sys/bus/acpi/devices/device:0c/path:\_SB_.PCI0.GPP4
/sys/bus/acpi/devices/device:0d/path:\_SB_.PCI0.GPP4.D00C
/sys/bus/acpi/devices/QCOM6390:00/path:\_SB_.PCI0.GPP4.BTH0

Looks like something with Bluetooth:

/sys/bus/acpi/devices/device:0c/device:0d/power/runtime_active_time:0
/sys/bus/acpi/devices/device:0c/device:0d/power/runtime_active_kids:0
/sys/bus/acpi/devices/device:0c/device:0d/power/runtime_usage:0
/sys/bus/acpi/devices/device:0c/device:0d/power/runtime_status:unsupported
/sys/bus/acpi/devices/device:0c/device:0d/power/async:disabled
/sys/bus/acpi/devices/device:0c/device:0d/power/runtime_suspended_time:0
/sys/bus/acpi/devices/device:0c/device:0d/power/runtime_enabled:disabled
/sys/bus/acpi/devices/device:0c/device:0d/power/control:auto
/sys/bus/acpi/devices/device:0c/device:0d/adr:0x000000ff
/sys/bus/acpi/devices/device:0c/device:0d/path:\_SB_.PCI0.GPP4.D00C
/sys/bus/acpi/devices/device:0c/power/runtime_active_time:0
/sys/bus/acpi/devices/device:0c/power/runtime_active_kids:0
/sys/bus/acpi/devices/device:0c/power/runtime_usage:0
/sys/bus/acpi/devices/device:0c/power/runtime_status:unsupported
/sys/bus/acpi/devices/device:0c/power/async:disabled
/sys/bus/acpi/devices/device:0c/power/runtime_suspended_time:0
/sys/bus/acpi/devices/device:0c/power/runtime_enabled:disabled
/sys/bus/acpi/devices/device:0c/power/control:auto
/sys/bus/acpi/devices/device:0c/adr:0x00020002
/sys/bus/acpi/devices/device:0c/path:\_SB_.PCI0.GPP4
/sys/bus/acpi/devices/device:0c/QCOM6390:00/uevent:MODALIAS=
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_active_time:0
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_active_kids:0
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_usage:0
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_status:unsupported
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/async:disabled
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_suspended_time:0
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/runtime_enabled:disabled
/sys/bus/acpi/devices/device:0c/QCOM6390:00/power/control:auto
/sys/bus/acpi/devices/device:0c/QCOM6390:00/hid:QCOM6390
/sys/bus/acpi/devices/device:0c/QCOM6390:00/path:\_SB_.PCI0.GPP4.BTH0
/sys/bus/acpi/devices/device:0c/QCOM6390:00/status:0

Which is strange, because the USB bluetooth device that I know of (13d3:3563), reported as MediaTek and managed by btusb module, works fine.
Comment 13 Mario Limonciello (AMD) 2022-07-08 15:26:55 UTC
> /sys/bus/acpi/devices/device:0c/QCOM6390:00/status:0

This means the device is not present.
https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/06_Device_Configuration/Device_Configuration.html?highlight=_sta#sta-device-status

> QCOM6390

Did you change out your Wifi module or any hardware modifications as it was shipped?

At least my educated guess of what is going on here is that whatever is connected to pin 18 at least in one of the SKUs for this BIOS notifies the Qualcomm BT controller when events happen.  It might be a wakeup pin for BT for example.

However it seems that in your SKU with the Mediatek wifi there is nothing to notify.  Maybe the pin is floating, or maybe it's tied to something else.

If you didn't change your HW at all, this seems like it's probably an SBIOS bug that _AEI shouldn't be marking that pin to trigger.

Let's hear what your results are for your kernel without pinctrl-amd.  I expect that if you don't compile in pinctrl-amd that your "high interrupts" go away, but that's not really a solution.  It's just an information and debugging tactic.

Please also add /sys/kernel/debug/gpio to this bug report (when you have pinctrl-amd compiled in) so we have that as a data point.
Comment 14 Pavel Krc 2022-07-08 16:23:20 UTC
Created attachment 301367 [details]
/sys/kernel/debug/gpio

I did not change any HW. Actually we have two identical laptops at our office and both are experiencing the same issue.

In addition to that, we also have a third, slightly older, yet almost identical model (UM325UA without the Z), which does not experience the issue. Comparing lshw listing for these models, the differences are:

* WiFi+BT (UA: Intel, UAZ: MediaTek)
* NVMe (UA: SKHynix, UAZ: Intel)
* BIOS firmwares are different (non-interchangeable) and most IRQ+memory ranges for devices differ

The laptop shipped with Windows 10 which is still accessible (and it has normal power consumption), just FYI in case it could be helpful in any way.

My PC on which I compile the kernel became inaccessible, I will have a physical access to it on Wednesday.
Comment 15 Mario Limonciello (AMD) 2022-07-11 17:37:30 UTC
I had a discussion with some colleagues and believe that pin 18 is for WLAN wakeup.  Are you comfortable loading a patched SSDT?  I think this might fix your issue:

$ diff -u ssdt11.dsl.old ssdt11.dsl
--- ssdt11.dsl.old      2022-07-11 12:35:14.075642745 -0500
+++ ssdt11.dsl  2022-07-11 12:35:34.855642149 -0500
@@ -213,12 +213,6 @@
                     "\\_SB.GPIO", 0x00, ResourceConsumer, ,
                     )
                     {   // Pin list
-                        0x0012
-                    }
-                GpioInt (Edge, ActiveLow, ExclusiveAndWake, PullNone, 0x0000,
-                    "\\_SB.GPIO", 0x00, ResourceConsumer, ,
-                    )
-                    {   // Pin list
                         0x0018
                     }
                 GpioInt (Edge, ActiveHigh, ExclusiveAndWake, PullNone, 0x0000,
Comment 16 Mario Limonciello (AMD) 2022-07-11 18:22:21 UTC
Created attachment 301397 [details]
Possible patch to ignore pin from driver side

If you use the SSDT approach and it works, then that at least confirms the root cause is this pin and the proper solution is for ASUS to fix it.

If it doesn't work, but you're not sure you loaded it correctly please share the /sys/kernel/debug/gpio output and a new dmesg so we can check.

In lieu of ASUS fixing it here is a possible patch for your system that might help as well.  To confirm it applied you should see a new dev_info statement (see the patch for detail).  If I got the DMI information wrong (I just lifted from your dmesg), please fix it in the patch and report back with the right info.
Comment 17 Pavel Krc 2022-07-12 07:21:53 UTC
Thank you again. I will try to upload the modified SSDT now. Just to make sure that I do not mess up anything (this is my first manipulation of ACPI tables):
- I disassembled the SSDT from acpidump and patched your patch
- I recompiled the table using iasl -sa ssdt11.dsl
- I added the new ssdt11.aml inside /kernel/firmware/acpi as cpio in front of current initrd
- I will now boot with the updated initrd.

Am I doing it right? Thanks you.
Comment 18 Mario Limonciello (AMD) 2022-07-12 12:55:29 UTC
There are a variety of ways to patch SSDT.  My go to method is to use GRUB to load an ACPI table (as it's easier to swap out on the fly) but that one sounds like it will work correctly to me.
Comment 19 Pavel Krc 2022-07-17 09:11:03 UTC
Created attachment 301444 [details]
/sys/kernel/debug/gpio with changed SSDT

I tried both the SSDT approach and blacklisting pinctrl_amd recompiled as module. The SSDT patch did not fix the interrupts. Attaching gpio and dmesg. Is it still worth testing the driver patch?

Blacklisting pinctrl_amd did fix the interrupts while wifi kept on working, however the integrated touchpad stopped working. Attaching dmesg. Let me know if other files are needed for any of the cases.
Comment 20 Pavel Krc 2022-07-17 09:11:43 UTC
Created attachment 301445 [details]
dmesg with changed SSDT
Comment 21 Pavel Krc 2022-07-17 09:13:06 UTC
Created attachment 301446 [details]
dmesg with blacklisted pinctrl_amd
Comment 22 Mario Limonciello (AMD) 2022-07-18 14:34:16 UTC
> pin18 Edge trigger| Active low| interrupt is enabled| interrupt is unmasked|
> enable wakeup in S0i3 state| enable wakeup in S3 state|
 disable wakeup in S4/S5 state| input is high|   pull-up is disabled| Pull-down is disabled|   output is disabled| debouncing filter disabled|   0x57a00

I don't believe you properly disabled it from SSDT.  I see that the pin is still edge triggered.

> Is it still worth testing the driver patch?

Yeah, I don't see why not.
Comment 23 Pavel Krc 2022-07-18 16:47:28 UTC
Created attachment 301449 [details]
dmesg with SSDT changed via GRUB

You are right. Now I tried loading SSDT via grub and interrupts are gone, touchpad and wifi still work and I did not detect any immediate issue. Attaching dmesg and gpio. Next test will be the driver patch once I compile it.
Comment 24 Pavel Krc 2022-07-18 16:48:54 UTC
Created attachment 301450 [details]
/sys/kernel/debug/gpio with SSDT changed via GRUB
Comment 25 Mario Limonciello (AMD) 2022-07-19 02:31:08 UTC
TIL that there is actually an interface available in the kernel already for this quirking.  Please disregard my patch.  You can try to use this on your kernel command line:

gpiolib_acpi.ignore_wake=AMDI0030:00@18

Attached is a new patch that should do it automatically based on your DMI data from lshw.
Comment 26 Mario Limonciello (AMD) 2022-07-19 02:33:03 UTC
Created attachment 301452 [details]
Quirk patch (v2)
Comment 27 Mario Limonciello (AMD) 2022-07-19 03:11:19 UTC
Created attachment 301453 [details]
Patch v3 [1/2]
Comment 28 Mario Limonciello (AMD) 2022-07-19 03:12:22 UTC
Created attachment 301454 [details]
PATCH v3 [2/2]

Actually I see that controls wake only - and your problem is a floating interrupt.  Try v3 instead, which is 2 patches to apply.
Comment 29 Pavel Krc 2022-07-19 13:39:12 UTC
Created attachment 301457 [details]
dmesg with patch v3

I have thoroughly tested patch v3 and I can confirm that everything works flawlessly. DMI strings are matched and the pin is ignored (see dmesg). Now I have almost 20 h battery lifetime as opposed to 3-4 before. Also, the interrupts used to interfere with suspend (always) and hibernate (sometimes), which I can now use reliably.

Thank you very much, Mario, for the quick feedback and fix.
Comment 30 Mario Limonciello (AMD) 2022-07-19 14:37:58 UTC
Sure, thanks for checking it!  I would also suggest you report this to ASUS to get fixed in the BIOS.  For now I've submitted the quirk upstream here:
https://lore.kernel.org/linux-gpio/20220719142142.247-1-mario.limonciello@amd.com/T/#t
Comment 31 Pavel Krc 2022-08-20 15:26:53 UTC
In addition to 5.18.2, I have successfully tested the patch with 5.18.14.