Bug 207749

Summary: irq 9: nobody cared - Icelake i5 1035G1,Ideapad V15-IIL,Ideapad 330-15ICH
Product: ACPI Reporter: Thomas Pfaff (tpfaff)
Component: Config-InterruptsAssignee: Zhang Rui (rui.zhang)
Status: NEEDINFO ---    
Severity: normal CC: accounts.zac+bugzilla, alberto, BrandonMcC, ekofman, jasperh+kernel, ragnarok, rui.zhang, sacbarnett, xiaohuyz, xry111
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.6.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel log
5.6 kernel config
5.7 kernel config
5.7 kernel log
5.8 kernel config
5.8 kernel log
/sys/firmware/acpi/interrupts/
acpidata
Kernel log with wifi disabled
Lenovo V15-IIL 82C5 i3-1005G1 dmesg
Lenovo V15-IIL 82C5 i3-1005G1 acpidump
Lenovo V15-IIL 82C5 i3-1005G1 interrupts
Lenovo V15-IIL inxi output
Picture of failed boot after adding intel_iommu=on as a kernel boot parameter
For comment 37: output of dmesg and grep as requested for grub parameter acpi_mask_gpe=0x6e
Command 37: sudo grep . /sys/firmware/acpi/interrupts/* output
acpidump from Yoga S740
Output of "cat /proc/interrupts" from Yoga S740
ACPIDUMP from Yoga S740, 5.12.13
debug patch for SCI IRQ Nobody care
debug: why GPE43 is enabled
Dmesg from Yoga S740 with both patches applied and no masks
Dmesg from Yoga S740 with both patches applied and GPE 43 masked
dmesg from Yoga S740 with both patches and intel_hid disabled

Description Thomas Pfaff 2020-05-15 07:05:29 UTC
Created attachment 289149 [details]
Kernel log

This always happens immediately after boot :

[    9.764027] irq 9: nobody cared (try booting with the "irqpoll" option)
[    9.764031] CPU: 0 PID: 294 Comm: rngd Not tainted 5.6.12 #1
[    9.764031] Hardware name: LENOVO 82C5/LNVNB161216, BIOS DKCN26WW 03/04/2020
[    9.764032] Call Trace:
[    9.764035]  <IRQ>
[    9.764039]  dump_stack+0x50/0x70
[    9.764041]  __report_bad_irq+0x30/0xa2
[    9.764043]  note_interrupt.cold+0xb/0x64
[    9.764044]  handle_irq_event_percpu+0x6a/0x80
[    9.764046]  handle_irq_event+0x2f/0x4c
[    9.764047]  handle_fasteoi_irq+0x9e/0x140
[    9.764049]  do_IRQ+0x68/0x120
[    9.764050]  common_interrupt+0xf/0xf
[    9.764051]  </IRQ>
[    9.764052] RIP: 0033:0x7f361dd19900
[    9.764054] Code: 48 31 45 f0 48 8b 45 e0 48 c1 e8 1b 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 16 83 e0 01 48 31 45 f0 48 d1 65 e0 48 8b 45 f0 <48> 31 45 e0 83 45 d4 01 83 7d d4 40 0f 86 72 ff ff ff 48 83 45 d8
[    9.764054] RSP: 002b:00007f361d85fcd0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde
[    9.764056] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000001a
[    9.764056] RDX: 000004be94000000 RSI: 0000000000000000 RDI: 00007ffd7b4c7080
[    9.764056] RBP: 00007f361d85fd20 R08: 00007ffd7b4c7080 R09: 0000000000004b4c
[    9.764057] R10: 000000005ebb08e9 R11: 00007f361dd19b98 R12: 00007ffd7b4bbd2e
[    9.764057] R13: 00007ffd7b4bbd2f R14: 000055743804f9b0 R15: 000055743804f9b0
[    9.764058] handlers:
[    9.764061] [<00000000f1cac164>] acpi_irq
[    9.764062] Disabling IRQ #9

The CPU is an Icelake i5 1035G1.
IRQ 9 handling stops after 100000 interrupts.

I am using gentoo, with vanilla kernel 5.6.12, but i get the same with Ubuntu 20.04 booting from USB, that has a 5.4 kernel.
Comment 1 Zhang Rui 2020-06-22 13:41:03 UTC
can you attach the full dmesg output after boot and before the problem occurs?
can you share your kernel config file?
Comment 2 Thomas Pfaff 2020-06-24 07:32:07 UTC
(In reply to Zhang Rui from comment #1)
> can you attach the full dmesg output after boot and before the problem
> occurs?
> can you share your kernel config file?

I attached the full dmesg output already, here is the kernel for config for 5.6.
Comment 3 Thomas Pfaff 2020-06-24 07:32:55 UTC
Created attachment 289863 [details]
5.6 kernel config
Comment 4 Thomas Pfaff 2020-06-24 07:34:39 UTC
Now i am on 5.7.5, and i made some changes to kernel config and boot parameters to support the touchpad :
I had to add "ELAN0633" to struct acpi_device_id elan_acpi_id in include/linux/input/elan-i2c-ids.h, and i need to give i8042.nopnp=1 pci=nocrs to the kernel boot parameters to get the touchpad working.
With this changes its slightly different, for example i do not see errors about the intel-lpss device that i get without pci=nocrs.

And i get two behaviours ;
Most of the the i see an "irq 9: nobody cared" message, and the acpi interrupts going to sci_not.

But sometimes after boot i do not have the message about irq9 immediately, but a lot of gpe6E interrupts instead.
Then acpi events like charger plugged in/out are handled correctly.
Comment 5 Thomas Pfaff 2020-06-24 07:36:20 UTC
Created attachment 289865 [details]
5.7 kernel config
Comment 6 Thomas Pfaff 2020-06-24 07:36:49 UTC
Created attachment 289867 [details]
5.7 kernel log
Comment 7 Thomas Pfaff 2020-09-28 11:30:13 UTC
I tried again with Kernel 5.8.11 and latest Lenovo BIOS DKCN48WW from August 2020.
Still with irq 9: nobody cared error
Comment 8 Thomas Pfaff 2020-09-28 11:31:24 UTC
Created attachment 292689 [details]
5.8 kernel config
Comment 9 Thomas Pfaff 2020-09-28 11:32:09 UTC
Created attachment 292691 [details]
5.8 kernel log
Comment 10 Thomas Pfaff 2020-09-28 11:33:35 UTC
Created attachment 292693 [details]
/sys/firmware/acpi/interrupts/
Comment 11 Thomas Pfaff 2020-09-28 11:34:07 UTC
Created attachment 292695 [details]
acpidata
Comment 12 Xi Ruoyao 2021-03-08 10:34:07 UTC
On 5.10.19 this issue holds still on a Lenovo V15-IIL (my model has an 1065-G7 though).

In most time the kernel complains "irq 9: nobody cared" after boot.

Occasionally, irq 9 is handled but in a buggy way: the counter of irq 9 rises very quickly and the interrupt handling eats a lot of CPU.  By shutting down and re-plug the AC adapter, it turns back to "nobody cared".

Is there any progress on resolving this?
Comment 13 Zhang Rui 2021-03-09 05:53:21 UTC
Given that this "irq noboday cared" occurs in late boot phase, where the system is configuring the wireless, can you please check if the problem still exists with wireless disabled?
Comment 14 Rodrigo Saboya 2021-03-11 12:20:51 UTC
@Zhang Rui: I disabled the Wireless LAN option at the BIOS and booted. It still presented the same error (log attached below).

My laptop is a Lenovo Ideapad 330-15ICH. Some weird behavious this laptop has:

1 - Brightness keys do not work (even on Windows)
2 - It has a 8750H processor, and the turbo boosting behaviour is bad. I had to use intel-undervolt tool and mess with MSR registers to make it work correctly.

Not sure if those could be related to the IRQ issue.
Comment 15 Rodrigo Saboya 2021-03-11 12:21:57 UTC
Created attachment 295803 [details]
Kernel log with wifi disabled
Comment 16 Alberto Abrao 2021-03-18 02:09:10 UTC
I have a Lenovo Ideapad V15-IIL (LENOVO_MT_82C5_BU_idea_FM_V15-IIL). I experience the same issue, but what caught my attention was:

I bought the laptop, started the factory install of Windows, ran Lenovo Vantage to update to latest BIOS (version: DKCN51WW, date: 12/23/2020). Then, downloaded Rufus to create a USB stick and install Xubuntu.

After installing, first boot of Xubuntu (Groovy Gorilla, 20.10) worked *perfectly* without any Kernel parameters whatsoever, including all function keys, Wi-Fi, and TouchPad.

It was only after I turned it off, then on again later, that I started having trouble with Wifi, function keys, and everything else.

Right now I am using both parameters aforementioned (i8042.nopnp=1 pci=nocrs). The only function keys that work are volume up, down, and mute. Microphone mute does NOT, only audio.

One more curious thing: at one point, when disabling the Bluetooth adapter manually, I saw the popup for brightness appear. Fn keys did not work, though.

If there are any tests I can run to assist, I am willing to do so.
Comment 17 Zhang Rui 2021-03-19 07:46:53 UTC
what kernel are you using?
I found a kernel problem on my tigerlake machine, which has an interrupt storm of  gpe6E.
Not sure if it is related, but it is better to always stick with latest upstream kernel.
Comment 18 Alberto Abrao 2021-03-20 01:16:25 UTC
(In reply to Zhang Rui from comment #17)
> what kernel are you using?
This was with Xubuntu 20.10, Kernel 5.8 series. It was fully updated as of March 18th.

That said, I've tried many things.

Fedora 33 locks on 

> I found a kernel problem on my tigerlake machine, which has an interrupt
> storm of  gpe6E.
> Not sure if it is related, but it is better to always stick with latest
> upstream kernel.

With 5.11, I started to get random freezes. They always lasted for a little while, around 30s or so. The machine's performance was also noticeably worse even when not locked up.

It happened on both of the following environments, be it with either, neither, or one of the following kernel parameters: pci=nocrs | i8042.nopnp

- Groovy Gorilla (Xubuntu 20.10, original kernel is from 5.8 series), updated to 5.11.7 from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11.7/ 
- Hirsute Hippo (Xubuntu 21.04 development branch, kernel 5.11.0-11 from Ubuntu repositories, NOT mainline).

here's a snippet:
(...)
kernel: [   28.113886] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [sh:186]
(...)
[   60.113692] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [systemd-udevd:455]



For that, the fix was to use the following parameters (from /etc/default/grub):
GRUB_CMDLINE_LINUX="pci=realloc pci=pcie_scan_all pci=nocrs i8042.nopnp"

using only "pci=realloc" and "pci=pcie_scan_all" gets rid of the freezes, but does not bring the touchpad back, only "pci=nocrs" does.

With all of that said, many times I did try to use the Fn keys. As usual, only the volume ones work.



Should you need more information or logs please let me know. I have a few here already, and I am also willing to run any tests that are needed to fix this issue.
Comment 19 Alberto Abrao 2021-03-20 01:20:45 UTC
Oh, one more thing:

Although the machine is usable with the four parameters aforementioned, I still get the following - the title of this report - on every boot:

[  +5.515282] irq 9: nobody cared (try booting with the "irqpoll" option)
[  +0.000004] CPU: 1 PID: 157 Comm: systemd-udevd Not tainted 5.11.0-11-generic #12-Ubuntu
[  +0.000002] Hardware name: LENOVO 82C5/LNVNB161216, BIOS DKCN51WW 12/23/2020
[  +0.000001] Call Trace:
[  +0.000001]  <IRQ>
[  +0.000002]  show_stack+0x52/0x58
[  +0.000004]  dump_stack+0x70/0x8b
[  +0.000003]  __report_bad_irq+0x3a/0xaf
[  +0.000002]  note_interrupt.cold+0x8/0x5d
[  +0.000002]  handle_irq_event+0xaa/0xc0
[  +0.000003]  handle_fasteoi_irq+0x7d/0x1c0
[  +0.000002]  common_interrupt+0x70/0x140
[  +0.000002]  asm_common_interrupt+0x1e/0x40
[  +0.000002] RIP: 0010:__do_softirq+0x72/0x281
[  +0.000002] Code: ff 89 75 bc 65 81 05 29 7b c1 5a 00 01 00 00 c7 45 d0 0a 00 00 00 45 89 fe 65 66 c7 05 15 ba c2 5a 00 00 fb 66 0f 1f 44 00 00 <b8> ff ff >
[  +0.000002] RSP: 0000:ffffb19cc0144fa0 EFLAGS: 00000206
[  +0.000002] RAX: ffff983d00dfdc40 RBX: 0000000000000000 RCX: 00000000000006e0
[  +0.000001] RDX: 0000000000000000 RSI: 0000000000404140 RDI: 0000000000000000
[  +0.000001] RBP: ffffb19cc0144fe8 R08: 0000000000000001 R09: 0000000078aa8760
[  +0.000001] R10: 00000000786d0080 R11: 0000000000006469 R12: ffffb19cc0213e08
[  +0.000001] R13: 0000000000000000 R14: 0000000000000080 R15: 0000000000000080
[  +0.000001]  asm_call_irq_on_stack+0xf/0x20
[  +0.000002]  </IRQ>
[  +0.000001]  do_softirq_own_stack+0x3d/0x50
[  +0.000002]  irq_exit_rcu+0x95/0xd0
[  +0.000003]  sysvec_apic_timer_interrupt+0x3d/0x90
[  +0.000001]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  +0.000002] RIP: 0010:exit_to_user_mode_loop+0x73/0x160
[  +0.000002] Code: 02 75 7e fa 66 0f 1f 44 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b 20 41 f7 c4 0e 30 02 00 0f 84 a6 00 00 00 fb 66 0f 1f 44 00 00 <41> f6 c4 >
[  +0.000001] RSP: 0000:ffffb19cc0213eb8 EFLAGS: 00000202
[  +0.000001] RAX: ffff983d00dfdc40 RBX: ffff983d00dfdc40 RCX: 00000000b43efd33
[  +0.000001] RDX: ffff983d009f9ec0 RSI: 0000000000000008 RDI: ffffb19cc0213f58
[  +0.000001] RBP: ffffb19cc0213ed0 R08: 0000000000000000 R09: 0000000000000008
[  +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008
[  +0.000000] R13: ffffb19cc0213f58 R14: 0000000000000000 R15: ffff983d001d6a00
[  +0.000002]  exit_to_user_mode_prepare+0x85/0x90
[  +0.000001]  irqentry_exit_to_user_mode+0x9/0x20
[  +0.000002]  irqentry_exit+0x19/0x30
[  +0.000001]  common_interrupt+0x88/0x140
[  +0.000002]  ? asm_common_interrupt+0x8/0x40
[  +0.000002]  asm_common_interrupt+0x1e/0x40
[  +0.000001] RIP: 0033:0x55e40ceec402
[  +0.000002] Code: 48 89 fd e8 80 fa ff ff 0f b6 45 28 83 e0 1f 75 59 f6 45 7c 02 75 41 48 8b 45 50 48 85 c0 74 06 48 8b 7d 10 ff d0 48 8b 7d 20 <e8> d9 6a >
[  +0.000001] RSP: 002b:00007ffdc36e13f0 EFLAGS: 00000246
[  +0.000001] RAX: 0000000000000000 RBX: 0000000000000005 RCX: 000000000ae7d04e
[  +0.000001] RDX: 000055e40ef53a60 RSI: 00000000000f4240 RDI: 0000000000000000
[  +0.000000] RBP: 000055e40eebd640 R08: 00000000ffffffff R09: 0000000003c0bd58
[  +0.000001] R10: 0000000000000018 R11: 000055e40eeeae80 R12: 00007ffdc36e1540
[  +0.000001] R13: 00007ffdc36e14f0 R14: 00007ffdc36e14ef R15: 0000000000000000
[  +0.000001] handlers:
[  +0.000001] [<0000000076f30fc8>] acpi_irq
[  +0.000003] Disabling IRQ #9
Comment 20 Zhang Rui 2021-03-22 12:28:48 UTC
Okay, then this might be two different issues.
Let's focus on the irq nobody care issue first.

please attach the output of "grep . /sys/firmware/acpi/interrupts/*".
please attach the dmesg output after boot.
please attach the acpidump output
Comment 21 Alberto Abrao 2021-03-23 20:25:07 UTC
Created attachment 296019 [details]
Lenovo V15-IIL 82C5 i3-1005G1 dmesg
Comment 22 Alberto Abrao 2021-03-23 20:25:48 UTC
Created attachment 296021 [details]
Lenovo V15-IIL 82C5 i3-1005G1 acpidump
Comment 23 Alberto Abrao 2021-03-23 20:26:31 UTC
Created attachment 296023 [details]
Lenovo V15-IIL 82C5 i3-1005G1 interrupts
Comment 24 Alberto Abrao 2021-03-23 20:27:28 UTC
Created attachment 296025 [details]
Lenovo V15-IIL inxi output

Hello Zhang,

I have attached the files you asked for. I am also providing the output of inxi with more details.

As usual, if there's anything else needed, please let me know.

Alberto Abrao
Comment 25 Zhang Rui 2021-03-25 02:10:49 UTC
(In reply to Thomas Pfaff from comment #0)
> The CPU is an Icelake i5 1035G1.
> IRQ 9 handling stops after 100000 interrupts.
> 
Hi, Thomas,
is this a laptop? if yes, may I know the model name?
I'm trying to understand if this is a series issues on Lenovo Ideapad.
Comment 26 Zhang Rui 2021-03-25 02:19:32 UTC
To other reporters in this thread,
"IRQ 9 nobody care" could be sufficient to cause a series of issues that you run into later. So let's focus on this only.

Question: (some of them have been confirmed by some of you, but I need to double check we're focusing on the same issue with different reporters)

"the issue" means "irq 9 nobody care shown in dmesg"

1. Does this issue occur on EVERY boot?
2. Is there any known kernel that this issue does not exist?
3. Is there any known BIOS version that this issue does not exist?
Comment 27 Rodrigo Saboya 2021-03-25 02:27:05 UTC
Double checking:

Specifically talking about the "IRQ 9 nobody care":

1 - Yes, it happens every single boot, without exception.
2 - I believe I tried 5.10 and 5.11 series, they both exhibited the error.
3 - I tried 2 BIOS versions, the one that came with the laptop and the newest one, same results.

My laptop specifically is a Lenovo Ideapad 330-15ICH.
Comment 28 Alberto Abrao 2021-03-25 03:00:00 UTC
(In reply to Zhang Rui from comment #26)
> To other reporters in this thread,
> "IRQ 9 nobody care" could be sufficient to cause a series of issues that you
> run into later. So let's focus on this only.
> 
> Question: (some of them have been confirmed by some of you, but I need to
> double check we're focusing on the same issue with different reporters)
> 
> "the issue" means "irq 9 nobody care shown in dmesg"
> 
> 1. Does this issue occur on EVERY boot?

Yes.

[  +5.420577] irq 9: nobody cared (try booting with the "irqpoll" option)

Using the irqpoll kernel option suggested by the aforementioned error message makes the computer *really* slow, and all other issues persist.

> 2. Is there any known kernel that this issue does not exist?

Not that I know of.

Tested: 

under Xubuntu:
- 5.8 series, Ubuntu 20.10.
- 5.11.7 mainline, from Ubuntu Kernel PPA.
- 5.11 series, Ubuntu 21.04.

Fedora:
- 5.10 series, Fedora 33 - hangs during install process, Intel GPU Backlight probe if I recall correctly. I am sure t was something related to Intel GPU, though. I was not able to install Fedora 33 on this computer, not even reached the installer.

I will try again and report back on that one.

> 3. Is there any known BIOS version that this issue does not exist?

I bought the computer recently, and updated the BIOS under Windows before wiping the drive to install Linux, thus being unable to report on that.
Comment 29 Thomas Pfaff 2021-03-25 08:56:36 UTC
(In reply to Zhang Rui from comment #25)
> (In reply to Thomas Pfaff from comment #0)
> > The CPU is an Icelake i5 1035G1.
> > IRQ 9 handling stops after 100000 interrupts.
> > 
> Hi, Thomas,
> is this a laptop? if yes, may I know the model name?
> I'm trying to understand if this is a series issues on Lenovo Ideapad.

Yes, it's an Lenovo Ideapad V15-IIL laptop.

Unfortunately i gave it away, not only because of this, but mainly because of touchpad problems. The touchpad stopped working after suspend and resume.

But to answer your other questions :

1. Does this issue occur on EVERY boot?

Most of the the i had the "irq 9: nobody cared" message, and the acpi interrupts going to sci_not.
But sometimes after boot i did not have the message about irq9 immediately, but a lot of gpe6E interrupts instead.
Then acpi events like charger plugged in/out were handled correctly.

That's why i think it's a problem of the embedded controller.

2. Is there any known kernel that this issue does not exist?

I went until 5.10, it always happened.

3. Is there any known BIOS version that this issue does not exist?

I was trying BIOS versions up to DKCN51WW, all of them with this issue.
Comment 30 Alberto Abrao 2021-04-15 01:24:26 UTC
Update:

Since updating to Kernel 5.11.0-14-generic x86_64 (Xubuntu 21.04/Hirsute Hippo) a few minutes ago, I had to **remove** the aforementioned parameters ("pci=realloc pci=pcie_scan_all pci=nocrs i8042.nopnp") from /etc/default/grub in order for my computer to work properly, else I would get the random freezes that used to happen without these parameters.

As it stands - without any extra Kernel parameters - the original message of this thread (IRQ 9: nobody cared) disappeared. Also, all function keys other than the brightness ones are working, and WiFi/Bluetooth are working as well.

Thus, it seems that this issue is resolved.
Comment 31 Rodrigo Saboya 2021-04-15 02:01:17 UTC
I just updated to 5.11.14, and the issue is very much present. Maybe Ubuntu has some kernel patch to solve that issue (which Ï would be more than happy to test), but using Gentoo, which is fairly close to vanilla, the issue is still there.
Comment 32 Alberto Abrao 2021-04-17 01:51:22 UTC
(In reply to Alberto Abrao from comment #30)
> Update:
> 
> Since updating to Kernel 5.11.0-14-generic x86_64 (Xubuntu 21.04/Hirsute
> Hippo) a few minutes ago, I had to **remove** the aforementioned parameters
> ("pci=realloc pci=pcie_scan_all pci=nocrs i8042.nopnp") from
> /etc/default/grub in order for my computer to work properly, else I would
> get the random freezes that used to happen without these parameters.
> 
> As it stands - without any extra Kernel parameters - the original message of
> this thread (IRQ 9: nobody cared) disappeared. Also, all function keys other
> than the brightness ones are working, and WiFi/Bluetooth are working as well.
> 
> Thus, it seems that this issue is resolved.

IRQ 9: nobody cared message is back.

I can't figure out a reason for that to happen.

Also, only the Fn volume keys seem to work.

That said, I am using no parameters during boot, and Wi-Fi is working OK.
Comment 33 Alberto Abrao 2021-04-21 02:43:02 UTC
Created attachment 296447 [details]
Picture of failed boot after adding intel_iommu=on as a kernel boot parameter

I have re-added `pci=nocrs i8042.nopnp`, as although Wi-Fi worked, the touchpad didn't.

Yet, today I found another strange thing that may be relevant.

I wanted to enable IOMMU, so I added intel_iommu=on as well.

That was enough to break it all. "Waiting for encrypted source device UUID={uuid of boot device}". Please see picture attached.

The only way to have intel_iommu was to remove everything else.
Comment 34 Alberto Abrao 2021-04-21 02:44:43 UTC
Also, let me state that Virtualization *IS* enabled in the BIOS.
Comment 35 Scott Barnett 2021-06-14 14:41:17 UTC
I thought I'd also chime in, I have a Lenovo Yoga S740-14IIL (i5 1035G4, Intel Integrated Graphics) and also experience the "IRQ 9: Nobody cared" issue. This is occurring on the Fedora F33 5.12.9 kernel.

My symptoms appear to be mostly the same as the above, my touchpad works but all the function keys (bar volume) don't work. The laptop also does not detect power button presses and shutting the lid does not send to sleep. Fan control also appears to be a bit iffy (e.g. when it does sleep due to a timeout, the fans kick on full for ~30 seconds before shutting off).

Do we know if this issue is caused by buggy firmware from Lenovo or something else?
Comment 36 Zhang Rui 2021-06-17 07:47:01 UTC
@Alberto, please also attach the output of "cat /proc/interrupts" after boot
Comment 37 Zhang Rui 2021-06-17 15:15:48 UTC
@alberto,

[    0.309290] ACPI: EC: GPE=0x6e

can you please try boot with kernel parameter acpi_mask_gpe=0x6e and attach the dmesg output and also the output of "grep . /sys/firmware/acpi/interrupts/*"?
Comment 38 Zhang Rui 2021-06-17 15:18:04 UTC
/sys/firmware/acpi/interrupts/gpe61:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe62:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe66:       5  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe6D:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe6E:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe6F:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe72:       0  EN     enabled      unmasked

If masking EC GPE does not fix the problem, please try masking the above GPEs one by one and check masking which one help.
Comment 39 Alberto Abrao 2021-06-18 18:42:57 UTC
Created attachment 297469 [details]
For comment 37: output of dmesg and grep as requested for grub parameter acpi_mask_gpe=0x6e

Hello Rui,

sorry for the delay. 

Please see attached the results for acpi_mask_gpe=0x6e. Wi-Fi works. Touchpad and function keys such as brightness control do not.
Comment 40 Alberto Abrao 2021-06-18 18:43:40 UTC
Created attachment 297471 [details]
Command 37: sudo grep . /sys/firmware/acpi/interrupts/* output
Comment 41 Alberto Abrao 2021-06-18 19:35:23 UTC
(In reply to Zhang Rui from comment #38)
> /sys/firmware/acpi/interrupts/gpe61:       0  EN     enabled      unmasked
> /sys/firmware/acpi/interrupts/gpe62:       0  EN     enabled      unmasked
> /sys/firmware/acpi/interrupts/gpe66:       5  EN     enabled      unmasked
> /sys/firmware/acpi/interrupts/gpe6D:       0  EN     enabled      unmasked
> /sys/firmware/acpi/interrupts/gpe6E:       0  EN     enabled      unmasked
> /sys/firmware/acpi/interrupts/gpe6F:       0  EN     enabled      unmasked
> /sys/firmware/acpi/interrupts/gpe72:       0  EN     enabled      unmasked
> 
> If masking EC GPE does not fix the problem, please try masking the above
> GPEs one by one and check masking which one help.

Hello again,

yes, I did these individually, without any other parameters.

For all of them: Wi-Fi works, Touchpad does NOT work, function keys do NOT work. Same as 0x6e.

Should you need anything else on my end, please let me know.
Comment 42 Zhang Rui 2021-06-21 01:06:17 UTC
Sorry I was not clear enough in the beginning.

Let's focus on the "IRQ9 nobody care problem" first. Because many things might be wrong if the ACPI interrupt is not working well.
So, please
1. attach the output of "cat /proc/interrupts", so that we can check if the ACPI interrupt is shared with any other device or not.
2. check if the "IRQ 9 nobody care" message is still there, with different acpi_gpe_mask parameters as you tried above.
Comment 43 Scott Barnett 2021-06-22 15:02:22 UTC
I'm not the person you asked for results (and if this is not helpful please let me know and I can stay quiet), but I had a go masking all the enabled GPE's (one at a time) that were displayed with "grep . /sys/firmware/acpi/interrupts/*".

For me at least, I don't believe this made the IRQ Message go away, as IRQ 9 was disabled each time.

I also had a look inside /proc/interrupts and there is only one line that contains "acpi" in it, which had 100000 interrupts under CPU1 (0 under the rest). If you would like me to attach any logs, please let me know.
Comment 44 Zhang Rui 2021-06-24 12:48:48 UTC
(In reply to Scott Barnett from comment #43)
> I'm not the person you asked for results (and if this is not helpful please
> let me know and I can stay quiet), but I had a go masking all the enabled
> GPE's (one at a time) that were displayed with "grep .
> /sys/firmware/acpi/interrupts/*".
> 
> For me at least, I don't believe this made the IRQ Message go away, as IRQ 9
> was disabled each time.
> 
> I also had a look inside /proc/interrupts and there is only one line that
> contains "acpi" in it, which had 100000 interrupts under CPU1 (0 under the
> rest). If you would like me to attach any logs, please let me know.

yes, please attach the output of "cat /proc/interrupt"
and please attach the acpidump output of your platform.
Comment 45 Scott Barnett 2021-06-24 15:21:12 UTC
(In reply to Zhang Rui from comment #44)
> (In reply to Scott Barnett from comment #43)
> > I'm not the person you asked for results (and if this is not helpful please
> > let me know and I can stay quiet), but I had a go masking all the enabled
> > GPE's (one at a time) that were displayed with "grep .
> > /sys/firmware/acpi/interrupts/*".
> > 
> > For me at least, I don't believe this made the IRQ Message go away, as IRQ
> 9
> > was disabled each time.
> > 
> > I also had a look inside /proc/interrupts and there is only one line that
> > contains "acpi" in it, which had 100000 interrupts under CPU1 (0 under the
> > rest). If you would like me to attach any logs, please let me know.
> 
> yes, please attach the output of "cat /proc/interrupt"
> and please attach the acpidump output of your platform.

Please see the attached. For reference, my device is a Lenovo Yoga S740-14IIL (i5-1035G4).
Comment 46 Scott Barnett 2021-06-24 15:22:10 UTC
Created attachment 297591 [details]
acpidump from Yoga S740
Comment 47 Scott Barnett 2021-06-24 15:22:43 UTC
Created attachment 297593 [details]
Output of "cat /proc/interrupts" from Yoga S740
Comment 48 Scott Barnett 2021-06-25 14:31:34 UTC
Some progress!

I decided to fully read all the comments in this bug thread, and noticed that at some point there was a suspected relation to the wifi.

Therefore I followed the suggestion to disable wifi in BIOS and actually, it fairly consistently doesn't cause the IRQ9 error now (before it was 100% failure every time).

I then had a look at the results of "grep . /sys/firmware/acpi/interrupts/*" again and was mentioned, GPE6 is very high compared to the rest. I therefore masked GPE6 and kept the wifi disabled in BIOS. However, this seemed to trigger the IRQ9 error again.

Some things to note: When the IRQ9 error hasn't appeared, the interrupts under CPU1 in /proc/interrupts are still high and rise very quickly, however it doesn't stop at 100000.

Also, when the IRQ9 error doesn't appear, more ACPI features work (e.g. AC power status and the power button work), however notably brightness still does not work and shutting the lid fails to send it to sleep.
Comment 49 Scott Barnett 2021-06-25 14:39:52 UTC
Ok now I'm baffled, it now appears to be booting without the IRQ 9 error most of the time with wifi re-enabled. I did run an upgrade yesterday, I don't know whether there was a firmware update that helped?

something is still wrong though, the laptop actually feels laggy now, unlike before (I guess because acpi hasn't died and is flooding the system with interrupts?).
Comment 50 Scott Barnett 2021-06-25 14:41:20 UTC
Sorry for flooding the thread, a final note the laptop has now crashed which it never did after IRQ9 died.
Comment 51 Zhang Rui 2021-06-30 04:46:01 UTC
Actually, I'm still confused. :)
first of all, you did a firmware upgrade, and then the symptom in comment #48 are all with new firmware, right?
then something changed even with new firmware, and the IRQ9 nobody care messages show again, for unknown reason, even with WIFI disabled in BIOS, and GPE6 not masked?
Comment 52 Scott Barnett 2021-06-30 09:54:12 UTC
Sorry, realise that's all a bit of a mess above :D

What happened was I ran a "dnf upgrade" (this was the upgrade I was talking about as I didn't know whether the intel-microcode had been updated as part of it).

After that update, I tried booting with wifi disabled in bios. This appeared to stop IRQ9 dying on pretty much every boot. When IRQ9 doesn't die, my laptop received a lot of interrupts on acpi (far more than 100,000) and becomes laggy, eventually crashing.

These interrupts are on GPE6, but masking GPE6 (even with wifi still disabled) causes IRQ9 to die on boot again.

The confusion was then caused as even with wifi re-enabled, I couldn't reproduce IRQ9 dying when GPE6 was unmasked. However, this is not the case any more and it dies on every boot as per usual now.
Comment 53 Zhang Rui 2021-07-01 02:04:15 UTC
so now, IRQ9 dies on every boot no matter
1. the wifi is disabled or enabled in BIOS
2. the GPE6 is masked or not
right?

please attach the acpidump output and let me check what the GPE is for.
Comment 54 Scott Barnett 2021-07-01 10:33:28 UTC
Masking GPE6 appears to ensure that IRQ9 dies.

Disabling or enabling the wifi in BIOS sometimes seems to temporarily stop IRQ9 dying (regardless of whether it was enabling or disabling), however it does eventually revert back to dying in my experience.

From what I can see, the main deciding factor appears to be the interrupts received on boot. If acpi (in /proc/interrupts) gets 100,000 interrupts before boot has finished, IRQ9 dies. If it manages to keep interrupts under 100,000, IRQ9 will not die.

If IRQ9 does not die, that acpi counter rises and GPE6 is flooded with interrupts. These continue to rise until the system slows down and crashes.

The acpidump is actually from when IRQ9 has not died, please see the attached
Comment 55 Scott Barnett 2021-07-01 10:34:16 UTC
Created attachment 297685 [details]
ACPIDUMP from Yoga S740, 5.12.13
Comment 56 Scott Barnett 2021-08-04 16:55:09 UTC
(In reply to Zhang Rui from comment #53)
> so now, IRQ9 dies on every boot no matter
> 1. the wifi is disabled or enabled in BIOS
> 2. the GPE6 is masked or not
> right?
> 
> please attach the acpidump output and let me check what the GPE is for.

Has there been any progress on this issue? If not, is there anything else I can do to help?
Comment 57 Scott Barnett 2021-09-24 20:56:47 UTC
(In reply to Zhang Rui from comment #53)
> so now, IRQ9 dies on every boot no matter
> 1. the wifi is disabled or enabled in BIOS
> 2. the GPE6 is masked or not
> right?
> 
> please attach the acpidump output and let me check what the GPE is for.

As an update, 5.14.7 is unfortunately still showing the same symptoms as before.

Did the ACPIDUMP reveal anything?

As another question, are we likely to be able to do anything or is this a manufacturer fault that will need to be fixed in firmware?
Comment 58 Eduardo Zacour 2021-11-01 19:28:59 UTC
I'm also having these same issues. I own a 82DJ Lenovo IdeaPad S145-15IIL, and am running kernel 5.14.14-zen1-1-zen. I've tried the parameters sugested by the various replies to this thread, also without success. If I can help in any way just let me know, I'll be happy to help in any way I can.
Comment 59 Ernesto 2021-11-12 12:10:16 UTC
I'm experiencing the same issue but it seems that by accident I found a workaround. Last week my laptop froze while executing a program, and after a hard reset, everything worked well. The "irq 9 nobody care" message dissapeared from dmesg but I found instead a message saying that the journal file/var/log/journal/..../user-1000.journal was corrupt (I guess the laptop froze while it was writing that file). Then, after a clean reboot, the problem on irq9 was back. 
  So, as a test, I manually removed the user-1000.journal file and did a hard reset and after that everything started working well again (dmesg complained again about the jorunal file, but not about irq9). It would be great if someone else can reproduce this (and even better if he/she can find an explanation).
Comment 60 Scott Barnett 2021-11-20 19:15:33 UTC
To add to this, I've just tested the Fedora test week 5.15.3 kernel on my Lenovo Yoga S740-14IIL and still get the same results.

Going through all the previous testing, I get the same results. Disabling wifi in UEFI generally clears up the IRQ9 error, but GPE 6E then gets a lot of interrupts to the point that the system feels laggy. Masking GPE 6E, regardless of wifi state, causes the IRQ9 error to come back.

I'm not experienced with ACPI tables, but I disassembled them to see if I could identify what GPE 6E was. I found it under ecdt.dsl with the namepath \_SB.PCI0.LPCB.H_EC. I did take note of the OEM ID being Lenovo, so this would indicate to me the issue does lie in Lenovo's firmware. Does this sound correct? If so, do we have any contacts within Lenovo that could potentially sort this out?
Comment 61 Zhang Rui 2022-03-02 15:37:15 UTC
Created attachment 300519 [details]
debug patch for SCI IRQ Nobody care

By checking /proc/interrupts, if SCI is not shared with any other device, can you please apply the debug patch attached and boot with acpi_sci=fake_handle and see if it makes any difference?
Comment 62 Scott Barnett 2022-03-02 19:20:22 UTC
Just given it a go (built vanilla 5.16.12 on Fedora 35) and it definitely changed the symptoms.

IRQ9 now doesn't die (and the number of interrupts it receives is now ~150 rather than 100,000).

ACPI still doesn't work as expected, and the dmesg has a lot of lines showing:

[    6.261975] ACPI Error: No handler or method for GPE 43, disabling event (20210930/evgpe-840)

I can provide the full dmesg if required.

As a side note, I don't know if this was related to my kernel build or the patch (I haven't built the kernel before today), but I briefly get the error "error: ../../grub-core/kern/mm.c:376:out of memory." when booting the kernel with the patch applied.
Comment 63 Zhang Rui 2022-06-21 02:53:32 UTC
(In reply to Scott Barnett from comment #62)
> Just given it a go (built vanilla 5.16.12 on Fedora 35) and it definitely
> changed the symptoms.
> 
> IRQ9 now doesn't die (and the number of interrupts it receives is now ~150
> rather than 100,000).
> 
> ACPI still doesn't work as expected,

is there any functional issue besides the error log below?

> and the dmesg has a lot of lines
> showing:
> 
> [    6.261975] ACPI Error: No handler or method for GPE 43, disabling event
> (20210930/evgpe-840)

what if you boot with GPE 43 masked?

> 
> I can provide the full dmesg if required.
>
yes, please.
Comment 64 Zhang Rui 2022-06-21 05:44:54 UTC
Created attachment 301237 [details]
debug: why GPE43 is enabled

and it is better to apply this debug patch while you do the test.
I'd like to see why GPE 43 is enabled
Comment 65 Scott Barnett 2022-06-21 13:56:15 UTC
(In reply to Zhang Rui from comment #63)

Just given it another go on 5.18.5 with the two listed patches and will upload the two dmesg's (with and without masking GPE 43) underneath.

> is there any functional issue besides the error log below?

Yes, whilst the errors are different to before, ACPI is still not working. This involves keyboard function keys like brightness control, and the ability to put the laptop into sleep by shutting the lid or pressing the power button.

> what if you boot with GPE 43 masked?

It doesn't appear to change anything. I did grep . /sys/firmware/acpi/interrupts/*" and therefore confirmed it was masked.

Thank you so much for following up on this issue, if there's any more information or tests you'd like me to run, please let me know!
Comment 66 Scott Barnett 2022-06-21 13:57:42 UTC
Created attachment 301241 [details]
Dmesg from Yoga S740 with both patches applied and no masks
Comment 67 Scott Barnett 2022-06-21 13:58:03 UTC
Created attachment 301242 [details]
Dmesg from Yoga S740 with both patches applied and GPE 43 masked
Comment 68 Zhang Rui 2022-06-22 05:47:26 UTC
what if you blacklist intel-hid driver via grub?
does the GPE 43 error goes away?
Comment 69 Scott Barnett 2022-06-22 12:58:26 UTC
(In reply to Zhang Rui from comment #68)
> what if you blacklist intel-hid driver via grub?
> does the GPE 43 error goes away?

Unfortunately that doesn't appear to make any difference (I added "modprobe.blacklist=intel_hid" to grub and "lsmod" confirmed it wasn't loaded).

I obtained a dmesg with this module disabled and have attached it below
Comment 70 Scott Barnett 2022-06-22 12:59:14 UTC
Created attachment 301254 [details]
dmesg from Yoga S740 with both patches and intel_hid disabled
Comment 71 xiaohuyz 2022-12-07 08:58:21 UTC
  I also used a xiaoxin-laptop from lenovo and had find the solution to the problem.
  As mentioned by other men above, the problem is related to the wifi.
  Firstly, we use "$ rfkill list all", we can find four things listed.
  Then, we can use "sudo modprobe -r ideapad_laptop" to disable the first two.
  Then, we use "$ rfkill list all" again, at this time we can find two things listed.
  We add "sudo modprobe -r ideapad_laptop" to /etc/rc.local to run this order automatically every boot.
  And now, the irq 9 error disappears and the system can work well.
  As I am not an expert in Linux, I can only provide the solution but don not know why this operation works.
Comment 72 Scott Barnett 2022-12-07 20:04:38 UTC
I just gave this a go and unfortunately this isn't my experience - I still get vast amounts of interrupts under IRQ9, eventually killing it (though now they are filtered to /sys/firmware/acpi/interrupts/sci_not, which I think means they don't have anything trying to deal with the events).

@Zhang Rui, I was wondering if there was anything else I could do to try and help with this issue?
Comment 73 Fernando Toledo 2023-06-10 02:59:40 UTC
I still have the issue on Lenovo V15-IIL on debian 11 with bpo kernel:


Linux ragnarok 6.1.0-0.deb11.7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 
6.1.20-2~bpo11+1 (2023-04-23) x86_64 GNU/Linux

bios version: DKCN53WW

If necessary I can send logs, dmidecode, etc. But I don't think they are very different from the ones that have already been sent before.

As this freezing behavior seems to occur when there is wifi traffic, I am trying with a compiled version of the rtw88 driver of:
 
https://github.com/lwfinger/rtw88

For now it seems stable, but I have to try more time.
Comment 74 Fernando Toledo 2024-04-13 19:04:15 UTC
On debian 12:

[   10.544553] irq 9: nobody cared (try booting with the "irqpoll" option)
[   10.544972] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-20-amd64 #1  Debian 6.1.85-1
[   10.545386] Hardware name: LENOVO 82C5/LNVNB161216, BIOS DKCN53WW 05/31/2021
[   10.545799] Call Trace:
[   10.546203]  <IRQ>
[   10.546606]  dump_stack_lvl+0x44/0x5c
[   10.547003]  __report_bad_irq+0x35/0xa7
[   10.547398]  note_interrupt.cold+0xa/0x62
[   10.547800]  handle_irq_event+0x6b/0x70
[   10.548200]  handle_fasteoi_irq+0x78/0x1d0
[   10.548600]  __common_interrupt+0x3c/0xa0
[   10.548995]  common_interrupt+0x7d/0xa0
[   10.549391]  </IRQ>
[   10.549776]  <TASK>
[   10.550155]  asm_common_interrupt+0x22/0x40
[   10.550537] RIP: 0010:cpuidle_enter_state+0xde/0x420
[   10.550922] Code: 00 00 31 ff e8 83 29 97 ff 45 84 ff 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 25 03 00 00 31 ff e8 68 d4 9d ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[   10.551725] RSP: 0018:ffffc030c0107e90 EFLAGS: 00000246
[   10.552131] RAX: ffff99d847a71a40 RBX: ffffe030bfc5b8b0 RCX: 0000000000000000
[   10.552536] RDX: 0000000000000001 RSI: fffffffd3f596474 RDI: 0000000000000000
[   10.552933] RBP: 0000000000000001 R08: 0000000000000000 R09: 000000006b86e1b8
[   10.553332] R10: 0000000000000528 R11: 000000000000000f R12: ffffffffba19ef20
[   10.553730] R13: 00000002747ff605 R14: 0000000000000001 R15: 0000000000000000
[   10.554134]  cpuidle_enter+0x29/0x40
[   10.554531]  do_idle+0x202/0x2a0
[   10.554934]  cpu_startup_entry+0x26/0x30
[   10.555338]  start_secondary+0x12a/0x150
[   10.555743]  secondary_startup_64_no_verify+0xe5/0xeb
[   10.556140]  </TASK>
[   10.556532] handlers:
[   10.556927] [<0000000025cf1726>] acpi_irq
[   10.557315] Disabling IRQ #9