Bug 201817
Description
Bram Coenen
2018-11-29 18:40:08 UTC
Created attachment 279743 [details]
dmesg after boot, 4.19.3, working touchscreen
Just checked. The kernel version doesn't matter, a decent about of reboots does the trick to get the touchscreen working even on 4.19.4. [ 16.587361] irq 7: nobody cared (try booting with the "irqpoll" option) [ 16.587366] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 4.19.4-200.fc28.x86_64 #1 [ 16.587367] Hardware name: HP HP ENVY x360 Convertible 15-bq0xx/8311, BIOS F.08 03/30/2018 [ 16.587368] Call Trace: [ 16.587372] <IRQ> [ 16.587381] dump_stack+0x5c/0x80 [ 16.587385] __report_bad_irq+0x37/0xae [ 16.587388] note_interrupt.cold.9+0xa/0x69 [ 16.587390] handle_irq_event_percpu+0x6a/0x80 [ 16.587392] handle_irq_event+0x27/0x44 [ 16.587394] handle_fasteoi_irq+0x7f/0x120 [ 16.587398] handle_irq+0xbf/0x100 [ 16.587400] do_IRQ+0x49/0xd0 [ 16.587403] common_interrupt+0xf/0xf [ 16.587405] </IRQ> [ 16.587409] RIP: 0010:native_safe_halt+0x2/0x10 [ 16.587411] Code: ff ff 7f c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c4 eb 8c 90 90 90 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 [ 16.587412] RSP: 0018:ffffffffbd203e18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd8 [ 16.587415] RAX: 0000000080000000 RBX: ffff887df5a01c00 RCX: 0000000000000034 [ 16.587416] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffbd2dd200 RDI: ffff887df5a01c64 [ 16.587418] RBP: ffff887df5a01c64 R08: 0000000000000002 R09: 0000000000020800 [ 16.587419] R10: 0000000f66cccf99 R11: ffff887df721fde8 R12: 0000000000000001 [ 16.587420] R13: 0000000000000001 R14: 0000000000000001 R15: 00000000d0c2fae0 [ 16.587424] acpi_safe_halt+0x1b/0x30 [ 16.587427] acpi_idle_enter+0x104/0x2a0 [ 16.587431] cpuidle_enter_state+0x71/0x320 [ 16.587435] do_idle+0x226/0x260 [ 16.587438] cpu_startup_entry+0x6f/0x80 [ 16.587442] start_kernel+0x523/0x543 [ 16.587446] secondary_startup_64+0xa4/0xb0 [ 16.587448] handlers: [ 16.587453] [<00000000e6019074>] amd_gpio_irq_handler [pinctrl_amd] [ 16.587455] Disabling IRQ #7 I discussed this a bit on the mailinglist. Here are the relevant parts of the discussion: Me: The amd_gpio chip/driver appears to be the only driver connected to IRQ 7, so I think there is an issue with the amd_gpio driver where it does not properly clear the interrupt source. E.g. it might be that the BIOS requested interrupts on a GPIO which Linux does not monitor and that the driver does not disable this GPIO-IRQ on probe and since it is not handling that pin in IRQ mode also does not clear it. Anyways that is just a theory. It would greatly help if someone who knows the amd_gpio driver better could take a look. Reply by Daniel Drake: Sorry that I can't be much help here - I don't have access to any useful info beyond the source code already present in Linux. Maybe you could explore your theory by dumping the GPIO/GPIO-INT enable regs, see if any of them are marked as enabled by something other than Linux. ### I'm afraid I don't have time to look into this myself atm. Maybe someone can add some printk calls to drivers/pinctrl/pinctrl-amd.c to dump relevant register values as Daniel suggested and see if that yields any useful info? Forcing `amd_gpio_irq_handler()` from `drivers/pinctrl/pinctrl-amd.c` to always return `IRQ_HANDLED` makes touchscreen work, but now system is sending interrupts at very high rate (around 100 k/s). I a completely newbie to kernel development, so I don't really know what I'm doing... Any idea? Created attachment 280107 [details]
cat /proc/interrupts
Seeing the exact same problem on 4.19.10 (HP Envy x360 13-ag0004ng)
Attached the output of /proc/interrupts
The only case where I got the "nobody cared" panic on my HP Envy x360 bq-1xx was if I used Windows 10 on my last boot and rebooted into Linux. My theory is that Windows set the IRQ 7 on a state that persists on reboot(and trigger the panic in linux) and only get cleared if you hold down power button. My Laptop is now Linux only and since then I never had this issue again(using 4.19.5 now). This issue occurs without regard to Windows 10 previous boot. From cold powerup I get failure to boot about 2 out of 3 attempts. Anyway to remove this touchscreen driver? I think it should be backed out until the regression is fixxed. (In reply to JerryD from comment #8) > This issue occurs without regard to Windows 10 previous boot. From cold > powerup I get failure to boot about 2 out of 3 attempts. Anyway to remove > this touchscreen driver? I think it should be backed out until the > regression is fixxed. I don't even have Windows any more and get the bug sometimes. However, I do not agree that the driver should be removed. I still use the touchscreen daily because for me it is working most of the time. Besides, it does not hurt having it there, does it? Are there any workarounds? Some suggest here: https://github.com/linuxwacom/wacom-hid-descriptors/issues/12 that switching to Legacy BIOS boot helped them. I can confirm that always returning IRQ_HANDLED fixes the error, but spams the system with a lot of these interrupts (it never stops) Sometimes booting with the kernel option noirqdebug helped me to get the touchscreen up and running again. I then noticed the following: See attached three files. One with a modified amd_gpio_irq_handler and two dmesg outputs. One of them with a working touchscreen and one where the touchscreen does not work. I can see that in the working case all interrupts are handled correctly while in the "not-working"-case there are A LOT of interrupts handled at all. Created attachment 280583 [details]
modified function
Created attachment 280585 [details]
modified - dmesg - working
Not working - dmesg (too large to attach directly): https://bit.ly/2FHChBb (In reply to nospamming11+kernel from comment #12) > I can confirm that always returning IRQ_HANDLED fixes the error, but spams > the system with a lot of these interrupts (it never stops) > > Sometimes booting with the kernel option noirqdebug helped me to get the > touchscreen up and running again. > > I then noticed the following: See attached three files. One with a modified > amd_gpio_irq_handler and two dmesg outputs. One of them with a working > touchscreen and one where the touchscreen does not work. I can see that in > the working case all interrupts are handled correctly while in the > "not-working"-case there are A LOT of interrupts handled at all. What does this imply? Does the driver need to actually handle this interupt? What hardware is actually generating this interupt? Or is IRQ 7 and unused pin that if floating and therefore must be disabled? Appears to be fixed on Fedora kernel 4.20.6-200.fc29.x86_64 Sadly, I can't confirm that. Neither on 4.20.6.arch1-1-ARCH nor on 4.20.7-arch1-1-ARCH the problem is fixed for me. (Laptop Firmware: F.32) Still noirqdebug is a workaround. (In reply to nospamming11+kernel from comment #18) > Sadly, I can't confirm that. Neither on 4.20.6.arch1-1-ARCH nor on > 4.20.7-arch1-1-ARCH the problem is fixed for me. (Laptop Firmware: F.32) > > Still noirqdebug is a workaround. I am on HP Envy Laptop with Bios F19 which HP pulled evidently because it has some other big problem. But since everything is stable for me at the moment I am just leaving it alone. From what I am reading the only way to back that bios out is with a special USB stick which they will send you. My current kernel boot line is: BOOT_IMAGE=/vmlinuz-4.20.6-200.fc29.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait processor.max_cstate=5 (In reply to JerryD from comment #17) > Appears to be fixed on Fedora kernel 4.20.6-200.fc29.x86_64 It isn't fixed for me on the 4.20.6-200.fc29.x86_64. But it doesn't always happen, sometimes it can go days without the bug appearing. (In reply to Bram Coenen from comment #20) > (In reply to JerryD from comment #17) > > Appears to be fixed on Fedora kernel 4.20.6-200.fc29.x86_64 > > It isn't fixed for me on the 4.20.6-200.fc29.x86_64. But it doesn't always > happen, sometimes it can go days without the bug appearing. You are right, it is intermittent, sometimes I get it and sometimes I dont. A side note, I upgraded to Bios F20. Dont do this. 4.20.xx fails to boot. 4.18 seems to boot fine. Can you guys confirm that "noirqdebug" as a kernel boot param works for you too? (In reply to nospamming11+kernel from comment #22) > Can you guys confirm that "noirqdebug" as a kernel boot param works for you > too? With Linux version 4.18.16-300.fc29.x86_64 mockbuild@bkernel04.phx2.fedoraproject.org) I see no irq7 issue. With 4.20.7-200.fc29.x86_64 it locks up right away and unable to get any sort of backtrace with ot without "noirqdebug" ABRT reports insufficiant information to generate a report and to contact kernel mailing list. For clarity on comment 23: HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.20 12/25/2018 Created attachment 281207 [details]
[RFC] pinctrl/amd: Clear interrupt enable bits on probe
Good news, Leonard Crestez has come up with a patch which likely fixes this.
I'm attaching the patch here, please give it a try.
Unfortunately this still does not fix it for me. I applied it to 4.20.10-arch1-1 (Archlinux kernel) and I still get the error: "irq 7:nobody cared (try booting with the "irqpoll" option)" with a Call Trace afterwards. I can see that the new function is getting called and tells me that a bunch of PINs get disabled "amd_gpio: AMD10030:00: Pin 67 interrupt enabled on boot: disable ..... " But right after that IRQ 7 starts to spam my logs again (having still added the above described log-outputs Created attachment 281231 [details]
dmesg with leonard patch and a non-working touchscreen
Dmesg output when trying Leonard Crestez's patch the first time and it didn't work. It worked when booting into the kernel the second time and I think the relevant error message was the one shown below. I'll come back and give an update if/when the touchscreen stops working again.
The nobody cared error is gone at least!
[ 2.964272] i2c_hid i2c-ELAN0732:00: HID over i2c has not been provided an Int IRQ
[ 2.964330] i2c_hid: probe of i2c-ELAN0732:00 failed with error -22
(This is my first time compiling a kernel in fedora. So I could also have done something wrong.)
I have seen the i2c-error too - when my touchscreen worked, but the patch does not work for me (even after several linux-only-boots). Do you have a windows system installed @Bram Coenen? Can you boot into it and test if the touchscreen still works, when you boot back into linux? (In reply to nospamming11+kernel from comment #28) > I have seen the i2c-error too - when my touchscreen worked, but the patch > does not work for me (even after several linux-only-boots). > > Do you have a windows system installed @Bram Coenen? Can you boot into it > and test if the touchscreen still works, when you boot back into linux? No, unfortunately I don't have windows installed. However the bug happens now and again anyways Created attachment 281233 [details] dmesg with leonard patch and a working touchscreen I think the interesting parts here are these prints. [ 2.988099] i2c_hid i2c-ELAN0732:00: i2c-ELAN0732:00 supply vdd not found, using dummy regulator [ 2.988146] i2c_hid i2c-ELAN0732:00: Linked as a consumer to regulator.0 [ 2.988148] i2c_hid i2c-ELAN0732:00: i2c-ELAN0732:00 supply vddl not found, using dummy regulator [ 2.993265] audit: type=1130 audit(1550670938.923:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 3.015329] acpi PNP0C14:01: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00) [ 3.015354] wmi_bus wmi_bus-PNP0C14:01: WQBJ data block query control method not found [ 3.017236] input: ELAN0732:00 04F3:24BC Touchscreen as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input13 [ 3.017377] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input14 [ 3.017434] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input15 [ 3.017492] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input16 [ 3.017589] hid-generic 0018:04F3:24BC.0004: input,hidraw3: I2C HID v1.00 Device [ELAN0732:00 04F3:24BC] on i2c-ELAN0732:00 [ 3.064691] nvme nvme0: pci function 0000:03:00.0 [ 3.107262] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> [ 3.213210] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input18 [ 3.213350] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input21 [ 3.213445] hid-multitouch 0018:04F3:24BC.0004: input,hidraw2: I2C HID v1.00 Device [ELAN0732:00 04F3:24BC] on i2c-ELAN0732:00 I got the bug again, but no print from the patch. So I probably messed up the compilation of the kernel with the patch. I'll try to patch it correctly tomorrow! (In reply to JerryD from comment #23) > (In reply to nospamming11+kernel from comment #22) > > Can you guys confirm that "noirqdebug" as a kernel boot param works for you > > too? > > With Linux version 4.18.16-300.fc29.x86_64 > mockbuild@bkernel04.phx2.fedoraproject.org) > > I see no irq7 issue. > > With 4.20.7-200.fc29.x86_64 it locks up right away and unable to get any > sort of backtrace with ot without "noirqdebug" ABRT reports insufficiant > information to generate a report and to contact kernel mailing list. Turns out the problem I have with the 4.20 kernel is not bios related and bios F.20 is OK. I ran into this bug: https://bugs.freedesktop.org/show_bug.cgi?id=109206 Kernel 4.19.15-300.fc29.x86_64 is working fine. (In reply to JerryD from comment #32) > (In reply to JerryD from comment #23) > > (In reply to nospamming11+kernel from comment #22) > > > Can you guys confirm that "noirqdebug" as a kernel boot param works for > you > > > too? > > > > With Linux version 4.18.16-300.fc29.x86_64 > > mockbuild@bkernel04.phx2.fedoraproject.org) > > > > I see no irq7 issue. > > > > With 4.20.7-200.fc29.x86_64 it locks up right away and unable to get any > > sort of backtrace with ot without "noirqdebug" ABRT reports insufficiant > > information to generate a report and to contact kernel mailing list. > > Turns out the problem I have with the 4.20 kernel is not bios related and > bios F.20 is OK. I ran into this bug: > > https://bugs.freedesktop.org/show_bug.cgi?id=109206 > > Kernel 4.19.15-300.fc29.x86_64 is working fine. Try a higher version of 4.20. Mine is working on 4.20.10 for example. Created attachment 281253 [details]
dmesg_leonard_patch_no_touchscreen
I applied the patch made by Leonard correctly this time. The touchscreen does not work and prints the following for pin 67 to 148.
[ 3.080467] amd_gpio AMD0030:00: Pin 67 interrupt enabled on boot: disable
(In reply to Bram Coenen from comment #34) > Created attachment 281253 [details] > dmesg_leonard_patch_no_touchscreen > > I applied the patch made by Leonard correctly this time. The touchscreen > does not work and prints the following for pin 67 to 148. > > [ 3.080467] amd_gpio AMD0030:00: Pin 67 interrupt enabled on boot: disable "irq 7: nobody cared" is also still present. (In reply to Bram Coenen from comment #35) > (In reply to Bram Coenen from comment #34) > > Created attachment 281253 [details] > > dmesg_leonard_patch_no_touchscreen > > > > I applied the patch made by Leonard correctly this time. The touchscreen > > does not work and prints the following for pin 67 to 148. > > > > [ 3.080467] amd_gpio AMD0030:00: Pin 67 interrupt enabled on boot: > disable > > "irq 7: nobody cared" is also still present. Now my touchscreen is working with the patch. It still prints that the same pins are disabled and the nobody cared is gone. So I don't think the patch made any difference. Giving a quick update: On Linux 5.1.15 (archlinux) with Firmware F.32 the bug still persists reliable (Linux boot after Windows boot) Still present on 5.1.19-300 (Fedora 30) I just faced a similar problem on a new platform: the BIOS boots with a GPIO IRQ enabled, the boot-time GPIO state causes the IRQ to fire. pinctrl-amd tries to handle the IRQ, but doesn't call any handler, and as a result we get a boot-time interrupt storm. I considered the patch here, but I checked Windows vs Linux. After boot, both OSes have the same 41 GPIO IRQs enabled. This is many more than the ones listed in the DSDT. I believe this means that Windows does not disable all GPIO IRQs at boot time, and hence the patch posted here (based on an earlier suggestion that I made) is somewhat risky. I guess from the comments above that approach didn't help either. I took an alternative approach, to just disable the spurious IRQ, which solves the issue I was facing: [PATCH] pinctrl/amd: disable spurious-firing GPIO IRQs but if the disable-all approach didn't work then I don't think this patch will help your case either. If anyone wants to investigate further, I think the next step here is to dump the contents of the "status" variable in amd_gpio_irq_handler(). The fact that you are getting "nobody cared" indicates that the pinctrl-amd driver itself doesn't know even which GPIO is causing the interrupt to be fired. From that angle it's understandable that disabling all the GPIO interrupts didn't make a difference. Checking the exact value of "status" will give us more clarity. You can already find logs with the status variable dumped in this thread. I attached them: one version with a boot where the touchscreen was working and one version with a boot where the touchscreen was not working. (and the source code of the modified handler that creates the dump) The problem still persists and a workaround is booting with the kernel-option noirqdebug or let the device powered off for a few days. Then the touchscreen works again (until you boot to windows). Ah, I see it in Comment #15. So when your issue bites, the pinctrl-amd status register is zero. The GPIO controller is indicating that it did not generate any interrupt at all. So as you probably already grasped, the cause of your issue almost certainly resides outside of pinctrl-amd. To look for clues here, two more ideas: 1. Check closely when the interrupt spam starts appearing. Is it right after pinctrl-amd loads, or is it triggered somewhat later in boot? If it's triggered later then you can try to figure out precisely what causes it. 2. Look for differences elsewhere between the working & non-working setups. Perhaps differences in the APIC registers (or something like that) could explain the weird interrupts on irq 7. I'm a bit out of my knowledge area there though. I have a similar problem with Lenovo Thinkpad E495 (see below for dmesg snippet). This laptop doesn't even have a touchscreen display. The error doesn't really cause any hangs or slowdown (I'm running 5.4-rc6), only that message in the log (shown during boot). Is there a way though to get rid of it without using Windows? I don't have access to Windows installation on this computer anymore. Thanks! Here is what I see in dmesg: [ 2.380193] irq 7: nobody cared (try booting with the "irqpoll" option) [ 2.380216] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G E 5.4.0-rc5+ #25 [ 2.380217] Hardware name: LENOVO 20NECTO1WW/20NECTO1WW, BIOS R11ET30W (1.10 ) 10/11/2019 [ 2.380218] Call Trace: [ 2.380220] <IRQ> [ 2.380225] dump_stack+0x5c/0x80 [ 2.380228] __report_bad_irq+0x38/0xad [ 2.380230] note_interrupt.cold+0xb/0x6e [ 2.380232] handle_irq_event_percpu+0x72/0x80 [ 2.380233] handle_irq_event+0x3c/0x5c [ 2.380234] handle_fasteoi_irq+0xa3/0x160 [ 2.380236] do_IRQ+0x53/0xe0 [ 2.380238] common_interrupt+0xf/0xf [ 2.380238] </IRQ> [ 2.380241] RIP: 0010:cpuidle_enter_state+0xc4/0x450 [ 2.380243] Code: e8 c1 94 ad ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 61 03 00 00 31 ff e8 e3 b1 b3 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 8c 02 00 00 49 63 cc 4c 2b 6c 24 10 48 8d 04 49 48 [ 2.380243] RSP: 0018:ffffaea10015fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde [ 2.380245] RAX: ffff8f99708ea6c0 RBX: ffffffff9babcb00 RCX: 000000008dcbf47f [ 2.380246] RDX: 000000008e08fd75 RSI: 000000008dcbf47f RDI: 0000000000000000 [ 2.380246] RBP: ffff8f996ddcf800 R08: 000000008dcbf8a5 R09: 00000049b4d1ac6b [ 2.380246] R10: ffff8f99708e95a0 R11: ffff8f99708e9580 R12: 0000000000000002 [ 2.380247] R13: 000000008dcbf8a5 R14: 0000000000000002 R15: ffff8f996efe8000 [ 2.380249] ? cpuidle_enter_state+0x9f/0x450 [ 2.380251] cpuidle_enter+0x29/0x40 [ 2.380253] do_idle+0x1dc/0x270 [ 2.380254] cpu_startup_entry+0x19/0x20 [ 2.380257] start_secondary+0x15f/0x1b0 [ 2.380259] secondary_startup_64+0xa4/0xb0 [ 2.380260] handlers: [ 2.380270] [<0000000027c08871>] amd_gpio_irq_handler [ 2.380283] Disabling IRQ #7 (In reply to Shmerl from comment #42) > I have a similar problem with Lenovo Thinkpad E495 (see below for dmesg > snippet). This laptop doesn't even have a touchscreen display. > > The error doesn't really cause any hangs or slowdown (I'm running 5.4-rc6), > only that message in the log (shown during boot). Is there a way though to > get rid of it without using Windows? I don't have access to Windows > installation on this computer anymore. > > Thanks! Did you try to start your kernel with noirqdebug (as kernel option) which is a workaround for the issue with touchscreens (In reply to nospamming11+kernel from comment #43) > > Did you try to start your kernel with noirqdebug (as kernel option) which is > a workaround for the issue with touchscreens Just tried it. With noirqdebug, the message doesn't pop up anymore during boot, but on next boot returns if I boot without noirqdebug, so it just masks the issue, rather than changing something permanently like in some examples above. (In reply to Shmerl from comment #44) > (In reply to nospamming11+kernel from comment #43) > > > > Did you try to start your kernel with noirqdebug (as kernel option) which > is > > a workaround for the issue with touchscreens > > Just tried it. With noirqdebug, the message doesn't pop up anymore during > boot, but on next boot returns if I boot without noirqdebug, so it just > masks the issue, rather than changing something permanently like in some > examples above. For the touchscreen: with noirqdebug the touchscreen works without (and thus the appearing error) the touchscreen does not work. So it has to do more than just suppressing the error message in dmesg. Another data point when waking up from suspend for Thinkpad T495 on Ryzen 3700 which has an amdgpu: [ 373.071812] irq 7: nobody cared (try booting with the "irqpoll" option) [ 373.071822] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.5.0-rc2 #1-NixOS [ 373.071823] Hardware name: LENOVO 20NJCTO1WW/20NJCTO1WW, BIOS R12ET46W(1.16 ) 10/28/2019 [ 373.071824] Call Trace: [ 373.071828] <IRQ> [ 373.071837] dump_stack+0x66/0x90 [ 373.071842] __report_bad_irq+0x37/0xb1 [ 373.071846] note_interrupt.cold.10+0xa/0x6d [ 373.071849] handle_irq_event_percpu+0x6a/0x80 [ 373.071852] handle_irq_event+0x3c/0x5c [ 373.071855] handle_fasteoi_irq+0xa3/0x150 [ 373.071859] do_IRQ+0x51/0xe0 [ 373.071862] common_interrupt+0xf/0xf [ 373.071863] </IRQ> [ 373.071868] RIP: 0010:cpuidle_enter_state+0xbe/0x3f0 [ 373.071872] Code: e8 27 c6 b3 ff 80 7c 24 13 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d5 02 00 00 31 ff e8 f9 d3 b9 ff fb 66 0f 1f 44 00 00 <85> ed 0f 88 42 02 00 00 48 63 c5 4c 8b 3c 24 4c 2b 7c 24 08 48 8d [ 373.071873] RSP: 0018:ffffffff94203e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffc8 [ 373.071877] RAX: ffff98d838a2c300 RBX: ffff98d83677c000 RCX: 000000000000001f [ 373.071878] RDX: 00000056dccfecc1 RSI: 0000000037b5a6f8 RDI: 0000000000000000 [ 373.071879] RBP: 0000000000000001 R08: 0000000000000002 R09: 000000000002bb80 [ 373.071880] R10: 0000000285c499ea R11: ffff98d838a2b3e4 R12: ffffffff942b9da0 [ 373.071881] R13: ffffffff942b9e20 R14: 0000000000000001 R15: 0000000000000001 [ 373.071886] ? cpuidle_enter_state+0x99/0x3f0 [ 373.071889] cpuidle_enter+0x29/0x40 [ 373.071894] do_idle+0x22b/0x260 [ 373.071898] cpu_startup_entry+0x19/0x20 [ 373.071901] start_kernel+0x4e2/0x504 [ 373.071906] secondary_startup_64+0xb6/0xc0 [ 373.071908] handlers: [ 373.071916] [<0000000085173049>] amd_gpio_irq_handler [pinctrl_amd] [ 373.071918] Disabling IRQ #7 i have same problem, HP desktop M01-F0xxx, Ryzen 5 3400G, no touchscreen: [ 5.799887] irq 7: nobody cared (try booting with the "irqpoll" option) [ 5.799890] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.8-zen1-1-zen #1 [ 5.799891] Hardware name: HP HP Desktop M01-F0xxx/8643, BIOS F.11 11/26/2019 [ 5.799892] Call Trace: [ 5.799894] <IRQ> [ 5.799898] dump_stack+0x66/0x90 [ 5.799901] __report_bad_irq+0x35/0xaa [ 5.799902] note_interrupt.cold+0xb/0x69 [ 5.799903] handle_irq_event+0xa9/0xb0 [ 5.799904] handle_fasteoi_irq+0xcc/0x1e0 [ 5.799906] do_IRQ+0x84/0x140 [ 5.799907] common_interrupt+0xf/0xf [ 5.799908] </IRQ> [ 5.799910] RIP: 0010:cpuidle_enter_state+0xc4/0xa20 [ 5.799911] Code: e8 41 cb 87 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 06 09 00 00 31 ff e8 13 10 8f ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 86 02 00 00 49 63 cc 4c 2b 6c 24 10 48 8d 04 49 48 [ 5.799912] RSP: 0018:ffffafbf40177e50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9 [ 5.799913] RAX: ffff936a509c0000 RBX: ffffffffbb4c1ba0 RCX: 000000000000001f [ 5.799913] RDX: 0000000000000000 RSI: 0000000022a8e515 RDI: 0000000000000000 [ 5.799914] RBP: ffff936a481a9800 R08: 0000000159b326b3 R09: 00000001687d2800 [ 5.799914] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000002 [ 5.799915] R13: 0000000159b326b3 R14: 0000000000000002 R15: ffff936a4efe9e40 [ 5.799917] cpuidle_enter+0x29/0x40 [ 5.799919] do_idle+0x202/0x2b0 [ 5.799920] cpu_startup_entry+0x19/0x20 [ 5.799922] start_secondary+0x1c6/0x220 [ 5.799923] secondary_startup_64+0xb6/0xc0 [ 5.799924] handlers: [ 5.799927] [<00000000abfe7a71>] amd_gpio_irq_handler [pinctrl_amd] [ 5.799928] Disabling IRQ #7 If i try add irqpoll option for boot, my hpet will be broken: hpet: Lost 9601 RTC interrupts Archlinux, kernel linux-zen 5.4.8-zen1, Ditto, Archlinux kernel 5.4.14-arch1-1, ThinkPad E595 Ryzen 3500U [ 11.749810] irq 7: nobody cared (try booting with the "irqpoll" option) [ 11.749814] CPU: 5 PID: 444 Comm: systemd-journal Not tainted 5.4.14-arch1-1 #1 [ 11.749816] Hardware name: LENOVO 20NFCTO1WW/20NFCTO1WW, BIOS R11ET31W (1.11 ) 11/20/2019 [ 11.749817] Call Trace: [ 11.749820] <IRQ> [ 11.749827] dump_stack+0x66/0x90 [ 11.749831] __report_bad_irq+0x35/0xaa [ 11.749834] note_interrupt.cold+0xb/0x69 [ 11.749836] handle_irq_event_percpu+0x6f/0x80 [ 11.749839] handle_irq_event+0x37/0x54 [ 11.749842] handle_fasteoi_irq+0xb5/0x160 [ 11.749845] do_IRQ+0x84/0x140 [ 11.749848] common_interrupt+0xf/0xf [ 11.749849] </IRQ> [ 11.749851] RIP: 0033:0x7f3e23058524 [ 11.749854] Code: 48 89 ca 4c 89 c1 45 85 c9 41 0f 9f c0 48 85 f6 74 0e 45 0f b6 c0 47 8d 44 00 04 e9 16 a6 f6 ff 50 e8 a0 04 00 00 f3 0f 1e fa <48> 81 ec d8 00 00 00 41 89 d2 4c 89 c2 4c 89 4c 24 48 84 c0 74 > [ 11.749855] RSP: 002b:00007ffd0eedf6d8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda [ 11.749858] RAX: 0000000000000000 RBX: 000000000000001a RCX: 0000000000000030 [ 11.749859] RDX: 0000000000000001 RSI: 000000000000002c RDI: 00005637261d73f0 [ 11.749860] RBP: 0000000000000000 R08: 00005637254a7008 R09: 000000000000004d [ 11.749861] R10: 0000000000000000 R11: 00007f3e2310ba40 R12: 0000000000000000 [ 11.749861] R13: 00007ffd0eedfa40 R14: 00005637261d7360 R15: 00007ffd0eedfd28 [ 11.749864] handlers: [ 11.749869] [<000000000dea8798>] amd_gpio_irq_handler [pinctrl_amd] [ 11.749870] Disabling IRQ #7 ditto 5.6.0-0.rc3.git0.1.fc32.x86_64 This went away for a while and then the 5.5 kernels started showing it again for me. The irq 7: nobody cared statement is quite true. I have no touchscreen function now either. Hi There! Just to advise those that were having this issue with Lenovo ThinkPads, my T495 () in particular had this IRQ#7 issue with 5.5.7-200.fc31.x86_64 and lower. I updated the BIOS version to 1.19 (for T495) and the issue appears to have now gone away. I'll continue to observe and report back if the problem returns. This update also fixed Linux not being able to see the Battery's status (unsure if related to IRQ#7) Thankfully, Lenovo are releasing stand-alone .ISOs for BIOS updating, just make sure to disable Secure Boot before booting into it off a USB. Just quickly checking some of the E series Ryzen ThinkPads they've also had a BIOS update in the last week or so. I hope other hardware vendors have released, or will release a fix for this! I'm wondering if AMD have pushed some newer byte-code for these chips? Cheers, Hi, For me this issue is still present, even there is not much impact as far as I know. Machine: Lenovo ThinkPad E495, model 20NECTO1WW, ThinkPad BIOS R11ET35W (1.15 ), EC R11HT35W OS: DISTRIB_ID=Ubuntu DISTRIB_RELEASE=19.10 DISTRIB_CODENAME=eoan Kernel: Linux e495 5.5.8-050508-generic #202003051633 SMP Thu Mar 5 16:37:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Exception: [Wed Mar 11 08:39:44 2020] Hardware name: LENOVO 20NECTO1WW/20NECTO1WW, BIOS R11ET35W (1.15 ) 02/19/2020 [Wed Mar 11 08:39:44 2020] Call Trace: [Wed Mar 11 08:39:44 2020] <IRQ> [Wed Mar 11 08:39:44 2020] dump_stack+0x6d/0x9a [Wed Mar 11 08:39:44 2020] __report_bad_irq+0x3a/0xaf [Wed Mar 11 08:39:44 2020] note_interrupt.cold+0xb/0x61 [Wed Mar 11 08:39:44 2020] handle_irq_event_percpu+0x73/0x80 [Wed Mar 11 08:39:44 2020] handle_irq_event+0x3b/0x5a [Wed Mar 11 08:39:44 2020] handle_fasteoi_irq+0x9c/0x150 [Wed Mar 11 08:39:44 2020] do_IRQ+0x55/0xf0 [Wed Mar 11 08:39:44 2020] common_interrupt+0xf/0xf [Wed Mar 11 08:39:44 2020] </IRQ> [Wed Mar 11 08:39:44 2020] RIP: 0010:__call_rcu+0xd3/0x1d0 [Wed Mar 11 08:39:44 2020] Code: 0f a3 05 e0 5b 72 01 73 1c 49 8b 94 24 98 00 00 00 48 8b 05 1f ab 54 01 49 03 84 24 b0 00 00 00 48 39 c2 7f 77 4c 89 ef 57 9d <0f> 1f 44 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 d4 [Wed Mar 11 08:39:44 2020] RSP: 0018:ffffa8f2405f7e08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde [Wed Mar 11 08:39:44 2020] RAX: 0000000000002710 RBX: 000000000002df00 RCX: ffffffffa3ae5b90 [Wed Mar 11 08:39:44 2020] RDX: 0000000000000014 RSI: ffff91bb81d92e00 RDI: 0000000000000246 [Wed Mar 11 08:39:44 2020] RBP: ffffa8f2405f7e40 R08: ffff91bb8d5d3a40 R09: 0000000000000064 [Wed Mar 11 08:39:44 2020] R10: 0000000040000010 R11: ffff91bb8d009068 R12: ffff91bb8f2edf00 [Wed Mar 11 08:39:44 2020] R13: 0000000000000246 R14: ffff91bb8f2edf50 R15: ffff91bb81d92e00 [Wed Mar 11 08:39:44 2020] ? get_max_files+0x20/0x20 [Wed Mar 11 08:39:44 2020] ? get_max_files+0x20/0x20 [Wed Mar 11 08:39:44 2020] call_rcu+0x10/0x20 [Wed Mar 11 08:39:44 2020] __fput+0x150/0x260 [Wed Mar 11 08:39:44 2020] ____fput+0xe/0x10 [Wed Mar 11 08:39:44 2020] task_work_run+0x8f/0xb0 [Wed Mar 11 08:39:44 2020] exit_to_usermode_loop+0x131/0x160 [Wed Mar 11 08:39:44 2020] do_syscall_64+0x170/0x1b0 [Wed Mar 11 08:39:44 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Wed Mar 11 08:39:44 2020] RIP: 0033:0x7f2302ff1ab7 [Wed Mar 11 08:39:44 2020] Code: ff ff e8 3c 13 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 33 5e f8 ff [Wed Mar 11 08:39:44 2020] RSP: 002b:00007ffdf4a630e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [Wed Mar 11 08:39:44 2020] RAX: 0000000000000000 RBX: 00007f2302a887a0 RCX: 00007f2302ff1ab7 [Wed Mar 11 08:39:44 2020] RDX: 00007ffdf4a63150 RSI: 00007ffdf4a63150 RDI: 0000000000000000 [Wed Mar 11 08:39:44 2020] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000562965a2f720 [Wed Mar 11 08:39:44 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [Wed Mar 11 08:39:44 2020] R13: 0000000000000005 R14: 00007ffdf4a63238 R15: 0000000000000005 [Wed Mar 11 08:39:44 2020] handlers: [Wed Mar 11 08:39:44 2020] [<000000003328b550>] amd_gpio_irq_handler [Wed Mar 11 08:39:44 2020] Disabling IRQ #7 Philipp Phillip, bugzilla-daemon@bugzilla.kernel.org writes: > Kernel: > Linux e495 5.5.8-050508-generic #202003051633 SMP Thu Mar 5 16:37:27 UTC 2020 > x86_64 x86_64 x86_64 GNU/Linux > > Exception: > [Wed Mar 11 08:39:44 2020] Hardware name: LENOVO 20NECTO1WW/20NECTO1WW, BIOS > R11ET35W (1.15 ) 02/19/2020 > [Wed Mar 11 08:39:44 2020] handlers: > [Wed Mar 11 08:39:44 2020] [<000000003328b550>] amd_gpio_irq_handler > [Wed Mar 11 08:39:44 2020] Disabling IRQ #7 Can you please enable CONFIG_DEBUG_FS and provide the output of cat /sys/kernel/debug/gpio Thanks, tglx (In reply to Alois Nespor from comment #47) > i have same problem, HP desktop M01-F0xxx, Ryzen 5 3400G, no touchscreen: > > [ 5.799887] irq 7: nobody cared (try booting with the "irqpoll" option) > [ 5.799890] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE > 5.4.8-zen1-1-zen #1 > [ 5.799891] Hardware name: HP HP Desktop M01-F0xxx/8643, BIOS F.11 > 11/26/2019 > [ 5.799892] Call Trace: > [ 5.799894] <IRQ> > [ 5.799898] dump_stack+0x66/0x90 > [ 5.799901] __report_bad_irq+0x35/0xaa > [ 5.799902] note_interrupt.cold+0xb/0x69 > [ 5.799903] handle_irq_event+0xa9/0xb0 > [ 5.799904] handle_fasteoi_irq+0xcc/0x1e0 > [ 5.799906] do_IRQ+0x84/0x140 > [ 5.799907] common_interrupt+0xf/0xf > [ 5.799908] </IRQ> > [ 5.799910] RIP: 0010:cpuidle_enter_state+0xc4/0xa20 > [ 5.799911] Code: e8 41 cb 87 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 > 00 f6 c4 02 0f 85 06 09 00 00 31 ff e8 13 10 8f ff fb 66 0f 1f 44 00 00 <45> > 85 e4 0f 88 86 02 00 00 49 63 cc 4c 2b 6c 24 10 48 8d 04 49 48 > [ 5.799912] RSP: 0018:ffffafbf40177e50 EFLAGS: 00000246 ORIG_RAX: > ffffffffffffffd9 > [ 5.799913] RAX: ffff936a509c0000 RBX: ffffffffbb4c1ba0 RCX: > 000000000000001f > [ 5.799913] RDX: 0000000000000000 RSI: 0000000022a8e515 RDI: > 0000000000000000 > [ 5.799914] RBP: ffff936a481a9800 R08: 0000000159b326b3 R09: > 00000001687d2800 > [ 5.799914] R10: 0000000000000008 R11: 0000000000000008 R12: > 0000000000000002 > [ 5.799915] R13: 0000000159b326b3 R14: 0000000000000002 R15: > ffff936a4efe9e40 > [ 5.799917] cpuidle_enter+0x29/0x40 > [ 5.799919] do_idle+0x202/0x2b0 > [ 5.799920] cpu_startup_entry+0x19/0x20 > [ 5.799922] start_secondary+0x1c6/0x220 > [ 5.799923] secondary_startup_64+0xb6/0xc0 > [ 5.799924] handlers: > [ 5.799927] [<00000000abfe7a71>] amd_gpio_irq_handler [pinctrl_amd] > [ 5.799928] Disabling IRQ #7 > > If i try add irqpoll option for boot, my hpet will be broken: > hpet: Lost 9601 RTC interrupts > > Archlinux, kernel linux-zen 5.4.8-zen1, @Thomas Gleixner, if you help my output of 'cat /sys/kernel/debug/gpio', linux-zen 5.5.8, please see gpio.txt Created attachment 287877 [details]
cat /sys/kernel/debug/gpio - linux-zen 5.5.8
Issue still persists in my Lenovo ThinkPad E595, BIOS (1.15). Attached my /sys/kernel/debug/gpio, will run again with CONFIG_DEBUG_FS on tomorrow. Kernel 5.5.8-arch1-1. Created attachment 287879 [details]
cat /sys/kernel/gpio result
Created attachment 287883 [details]
cat /sys/kernel/debug/gpio
Output of cat /sys/kernel/debug/gpio
Machine:
Hardware name: LENOVO 20NECTO1WW/20NECTO1WW, BIOS R11ET35W (1.15 ) 02/19/2020
OS:
Ubuntu 19.10
Kernel:
Linux e495 5.5.8-050508-generic #202003051633 SMP Thu Mar 5 16:37:27 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
bugzilla-daemon@bugzilla.kernel.org writes: Thanks for the files! This starts to be really puzzling. The files from Alois and Phillip have dozens of interrupts enabled, but ALL of them are masked which means they cannot raise an interupt unless the masking is broken. Joses file has not a single interrupt line enabled which means that the interrupt fires for completely different reasons. I'm trying to get some more detailed information about the inner working of these chips. Will take a while. Thanks, tglx Similar issue. Fedora 31, Linux version 5.5.11. Thinkpad E495. [ 5.809023] irq 7: nobody cared (try booting with the "irqpoll" option) [ 5.809027] CPU: 4 PID: 31 Comm: ksoftirqd/4 Not tainted 5.5.11-200.fc31.x86_64 #1 [ 5.809028] Hardware name: LENOVO 20NECTO1WW/20NECTO1WW, BIOS R11ET35W (1.15 ) 02/19/2020 [ 5.809029] Call Trace: [ 5.809033] <IRQ> [ 5.809044] dump_stack+0x66/0x90 [ 5.809051] __report_bad_irq+0x35/0xa7 [ 5.809053] note_interrupt.cold+0xb/0x63 [ 5.809056] handle_irq_event_percpu+0x6f/0x80 [ 5.809058] handle_irq_event+0x36/0x53 [ 5.809060] handle_fasteoi_irq+0x8b/0x130 [ 5.809063] do_IRQ+0x50/0xe0 [ 5.809067] common_interrupt+0xf/0xf [ 5.809069] </IRQ> [ 5.809072] RIP: 0010:finish_task_switch+0x80/0x2a0 [ 5.809074] Code: 8b 1c 25 c0 8b 01 00 0f 1f 44 00 00 0f 1f 44 00 00 41 c7 45 38 00 00 00 00 4c 89 e7 c6 07 00 0f 1f 40 00 fb 66 0f 1f 44 00 00 <65> 48 8b 04 25 c0 8b 01 00 0f 1f 44 00 00 4d 85 f6 74 21 65 48 8b [ 5.809076] RSP: 0018:ffffbb45c025be40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb [ 5.809078] RAX: ffff9efd68482680 RBX: ffff9efd7260a680 RCX: 0000000000000000 [ 5.809079] RDX: 0000000002068000 RSI: 0000000000000000 RDI: ffff9efd74b2ae00 [ 5.809080] RBP: ffffbb45c025be68 R08: ffff9efd68482730 R09: ffff9efd68482730 [ 5.809080] R10: 00000000000000bb R11: ffff9efd74b2aeb8 R12: ffff9efd74b2ae00 [ 5.809081] R13: ffff9efd68482680 R14: 0000000000000000 R15: 0000000000000000 [ 5.809087] __schedule+0x2cf/0x740 [ 5.809089] schedule+0x4a/0xb0 [ 5.809093] smpboot_thread_fn+0x10b/0x160 [ 5.809097] kthread+0xf9/0x130 [ 5.809099] ? sort_range+0x20/0x20 [ 5.809100] ? kthread_park+0x90/0x90 [ 5.809102] ret_from_fork+0x22/0x40 [ 5.809104] handlers: [ 5.809109] [<00000000463020c1>] amd_gpio_irq_handler [pinctrl_amd] [ 5.809110] Disabling IRQ #7 Created attachment 288097 [details]
cat /sys/kernel/debug/gpio. Cuz why not
Here the same problem with IRQ 7 on Lenovo E495 with BIOS 1.15; running CentOS 8 Stream, Linux 5.6.3-1.el8.elrepo.x86_64. Enabling noapic kernel option will let the "IRQ 7" warning disappear. I haven't noticed any issues with system, seems to run same as without this tweak. To all here who also have an E495, BIOS 1.15 was removed from Lenovo's pages some time ago, but there is now a newer 1.16 available. https://pcsupport.lenovo.com/us/de/products/laptops-and-netbooks/thinkpad-edge-laptops/thinkpad-e495-type-20ne/downloads/DS539418 If someone feels like trying this one out, please ping me if you had success. I'm still on BIOS 1.10 for now, since I have no real problems and the laptop really needs to continue working until I finished my thesis ;) Just for reference - E495 with BIOS 1.10 - IRQ 7 nobody cared - Never booted a Windows, only Linux (Xubuntu 19.10) - Kernel 5.3.0-51-generic #44-Ubuntu SMP (In reply to Jan Sordid from comment #62) > If someone feels like trying this one out, please ping me if you had success. [ 4.985889] irq 7: nobody cared (try booting with the "irqpoll" option) [ 4.985894] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.6.8-200.fc31.x86_64 #1 [ 4.985895] Hardware name: LENOVO 20NECTO1WW/20NECTO1WW, BIOS R11ET36W (1.16 ) 03/30/2020 [ 4.985896] Call Trace: [ 4.985900] <IRQ> [ 4.985911] dump_stack+0x66/0x90 [ 4.985917] __report_bad_irq+0x35/0xa7 [ 4.985920] note_interrupt.cold+0xb/0x63 [ 4.985923] handle_irq_event_percpu+0x4f/0x60 [ 4.985925] handle_irq_event+0x36/0x53 [ 4.985927] handle_fasteoi_irq+0x8b/0x130 [ 4.985931] do_IRQ+0x50/0xe0 [ 4.985934] common_interrupt+0xf/0xf [ 4.985935] </IRQ> [ 4.985941] RIP: 0010:cpuidle_enter_state+0xc9/0x3e0 [ 4.985943] Code: e8 3c 2d 91 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 3e 5f 97 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48 [ 4.985944] RSP: 0018:ffffad9c4015fe78 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb [ 4.985947] RAX: ffff9258f4b2ae80 RBX: ffff9258ea173400 RCX: 00000001292e3489 [ 4.985948] RDX: 00000001293d76b5 RSI: 00000001292e3489 RDI: 0000000000000000 [ 4.985949] RBP: ffffffff90769f40 R08: 00000001292e349d R09: 000000007fffffff [ 4.985950] R10: 0000000000000001 R11: ffff9258f4b29ca4 R12: 00000001292e349d [ 4.985950] R13: 0000000000000002 R14: 0000000000000002 R15: ffff9258f2ff0000 [ 4.985955] ? cpuidle_enter_state+0xa4/0x3e0 [ 4.985957] cpuidle_enter+0x29/0x40 [ 4.985961] do_idle+0x1c0/0x260 [ 4.985964] cpu_startup_entry+0x19/0x20 [ 4.985967] start_secondary+0x152/0x190 [ 4.985972] secondary_startup_64+0xb6/0xc0 [ 4.985974] handlers: [ 4.985979] [<00000000cf504361>] amd_gpio_irq_handler [pinctrl_amd] [ 4.985980] Disabling IRQ #7 Nothing new. I have this problem too in a lenovo thinkpad E595 with Linux 5.6 from manjaro and arch kernels. It appear that not generate any problem at all.. but I dont know well Created attachment 289205 [details]
dmseg with linux 5.6ck
Created attachment 289537 [details]
dmesg
Created attachment 289539 [details]
interrupts
Hi, For me this issue is still present. I use Thinkpad e595. AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx. Machine: Lenovo ThinkPad E595, model 20NF001QTX, ThinkPad BIOS R11ET36W (1.16 ) Kernel: 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux dmesg error like this; [ 1.048960] pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter. [ 3.693667] snd_pci_acp3x 0000:05:00.5: Invalid ACP audio mode : 2 [ 3.814530] tpm tpm0: tpm_try_transmit: send(): error -5 [ 3.814532] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead [ 5.117604] irq 7: nobody cared (try booting with the "irqpoll" option) [ 5.117669] handlers: [ 5.117673] [<000000009876dd1a>] amd_gpio_irq_handler [pinctrl_amd] and also emrg ; [ 5.117674] Disabling IRQ #7 I have attached full dmesg and interrupts output above. Hi all, Just to add my 2 cents... I basically have the same system as Alois from comment 47 just with a Ryzen-3 instead of Ryzen-5 (I'll attach a dmesg). This is an HP Desktop system from HP not a notebook and hence does not feature a touchscreen. The "description" of the custom mainboard Erica2 of HP is here: https://support.hp.com/us-en/product/hp-desktop-pc-m01-f0000i/29014486/model/31450166/document/c06418906 To describe the system a little: Most of the hardware is onboard. These are AMD Systems with a AMD Promontory B550A (which seems to b a rebranded 450 series chipset). The system has 4 back and 4 front USB Ports and an SD card reader. Wi-Fi, Bluetooth and Ethernet are onboard. The only other connectors are a VGA Port, an HDMI port as well as an audio-out and a mic-in. Except the Power Button there are no other external LEDs. I get the Kernel crash that is described here (see dmesg) during boot, but the system afterwards seems to work normally. If I use the irqpoll option as advised, then the desktop system becomes unusable. Youtube Videos hang, libreoffice misses key presses scrolling lags all of this due to a very high (100%) Xorg cpu usage. Given the price point of this system I would not count on HP of releasing any BIOS update helping in that regard... Feel free to ask for any help I can do with the stock OpenSUSE kernel, but booting different ones is not quite an option since I want to put the system into productive use. Created attachment 289547 [details]
dmesg output of HP M01-F0001 System
The dmesg output of Kernel 5.3.18 from OpenSuse 15.2 on an HP M01-F0001 System.
Created attachment 289565 [details]
Part of the Mainboard "Erica2" from HP
Part of the Mainboard "Erica2" from HP. Note that it has a large SPI field of Pins.
I added a shot of the mainboard. Maybe it's interesting that it has a large Pin field called SPI - Debug. Hello everyone, It's been a while since I used Fedora and newer kernel versions, but it seems like the "irq 7: nobody cared" is gone from my dmesg output! Has anyone else noticed the same thing? [bram@loki ~]$ dmesg | grep irq [ 0.067335] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.067337] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.146820] NR_IRQS: 524544, nr_irqs: 1000, preallocated irqs: 16 [ 1.312580] AMD0020:00: ttyS4 at MMIO 0xfedc6000 (irq = 10, base_baud = 3000000) is a 16550A [ 1.314985] ata1: SATA max UDMA/133 abar m1024@0xf0d6c000 port 0xf0d6c100 irq 19 [ 1.315627] ehci-pci 0000:00:12.0: irq 18, io mem 0xf0d6d000 [ 1.330943] i8042: PNP: PS/2 Controller [PNP0303:KBD0,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 [ 1.336886] serio: i8042 KBD port at 0x60,0x64 irq 1 [ 1.336890] serio: i8042 AUX port at 0x60,0x64 irq 12 [ 1.337381] rtc_cmos 00:01: alarms up to one month, 114 bytes nvram, hpet irqs Kernel version: 5.7.8-200.fc32.x86_64 I'll see if it shows up again after a few shutdowns ;) I have to test it in my lenovo e595 to see if it is fixed or not. I can further comment on the issue still being present for the Lenovo e595. A recent boot using 5.7.10 still prompted me with the same 'disabling IRQ#7' message. *** Bug 208469 has been marked as a duplicate of this bug. *** i'm still experiencing this on my thinkpad x395 running Debian Testing. maybe a slightly different dmesg error: [ 3382.895489] PM: suspend exit [ 3382.921981] irq 7: nobody cared (try booting with the "irqpoll" option) [ 3382.921985] CPU: 0 PID: 9977 Comm: i3bar Tainted: G OE 5.7.0-1-amd64 #1 Debian 5.7.6-1 [ 3382.921986] Hardware name: LENOVO 20NL001RMX/20NL001RMX, BIOS R13ET44W(1.18 ) 05/08/2020 [ 3382.921987] Call Trace: [ 3382.921990] <IRQ> [ 3382.921997] dump_stack+0x66/0x90 [ 3382.922001] __report_bad_irq+0x38/0xad [ 3382.922003] note_interrupt.cold+0xb/0x6e [ 3382.922005] handle_irq_event_percpu+0x72/0x80 [ 3382.922006] handle_irq_event+0x3c/0x5c [ 3382.922008] handle_fasteoi_irq+0xa3/0x160 [ 3382.922013] do_IRQ+0x53/0xe0 [ 3382.922015] common_interrupt+0xf/0xf [ 3382.922016] </IRQ> [ 3382.922018] RIP: 0033:0x7f663e593401 [ 3382.922020] Code: 00 b8 ff ff ff ff c3 48 83 ec 08 bf 10 00 00 00 e8 e4 09 ff ff b8 ff ff ff ff 48 83 c4 08 c3 66 2e 0f 1f 84 00 00 00 00 00 55 <53> 48 83 ec 18 83 fe 05 77 3d 41 f6 c0 03 75 4f 44 89 44 24 04 81 [ 3382.922021] RSP: 002b:00007ffe3c655da0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd9 [ 3382.922023] RAX: 00007f663e616608 RBX: 00007ffe3c655eb8 RCX: 000000000000000c [ 3382.922024] RDX: 0000000000000008 RSI: 0000000000000002 RDI: 000055e7513fde30 [ 3382.922024] RBP: 000055e7513fde30 R08: 0000000000000008 R09: 000055e7513fde90 [ 3382.922025] R10: 000000000000000c R11: 0000000000000008 R12: 00007ffe3c655f60 [ 3382.922025] R13: 0000000000000002 R14: 000000000000000c R15: 0000000000000008 [ 3382.922028] handlers: [ 3382.922030] [<00000000af5d2529>] amd_gpio_irq_handler [ 3382.922031] Disabling IRQ #7 .... I have been tackling this same problem on my Thinkpad E595 for at least a year, trying a plethora of boot options. The noapic setting worked well for some kernel versions, but lately it has messed with the ability to resume after suspension. The irqpoll option gives a constant 100% load on one of the CPU threads. However, when using fwupd I noticed that "capsule updates" were disabled, which in the BIOS settings corresponds to allowing "Windows UEFI update". After enabling that setting, my IRQ 7 problem has disappeared. Hopefully this is of use to someone else. (In reply to Joakim R from comment #78) > I have been tackling this same problem on my Thinkpad E595 for at least a > year, trying a plethora of boot options. The noapic setting worked well for > some kernel versions, but lately it has messed with the ability to resume > after suspension. The irqpoll option gives a constant 100% load on one of > the CPU threads. > > However, when using fwupd I noticed that "capsule updates" were disabled, > which in the BIOS settings corresponds to allowing "Windows UEFI update". > After enabling that setting, my IRQ 7 problem has disappeared. > > > Hopefully this is of use to someone else. Interesting, did you perhaps also boot a newer kernel at the same time? Either a newer "z" release (as in kernel version expresses as x.y.z) or maybe a 5.11-rc kernel? There have been some recent AMD GPIO interrupt handling changes which might be related. If you did also boot a new kernel and have an older kernel still installed it would be interesting to know if the problem is also gone with the older kernel. p.s. You want "capsule updates" to be enabled anyways since these are used by lvfs / fwupd; and in general it is best to change as little BIOS options as possible, typically only the default settings are properly tested. (In reply to Hans de Goede from comment #79) > (In reply to Joakim R from comment #78) > > I have been tackling this same problem on my Thinkpad E595 for at least a > > year, trying a plethora of boot options. The noapic setting worked well for > > some kernel versions, but lately it has messed with the ability to resume > > after suspension. The irqpoll option gives a constant 100% load on one of > > the CPU threads. > > > > However, when using fwupd I noticed that "capsule updates" were disabled, > > which in the BIOS settings corresponds to allowing "Windows UEFI update". > > After enabling that setting, my IRQ 7 problem has disappeared. > > > > > > Hopefully this is of use to someone else. > > Interesting, did you perhaps also boot a newer kernel at the same time? > Either a newer "z" release (as in kernel version expresses as x.y.z) or > maybe a 5.11-rc kernel? There have been some recent AMD GPIO interrupt > handling changes which might be related. > > If you did also boot a new kernel and have an older kernel still installed > it would be interesting to know if the problem is also gone with the older > kernel. I have been testing this with kernel 5.9.16 and 5.10.6. Initially I started to fool around with all this again because suspend wasn't working properly after going to 5.10. I have currently booted from kernel 5.10.6 and can verify that suspend is now working as it should, as well as no "irq 7 nobody cared" error. (In reply to Joakim R from comment #81) > (In reply to Hans de Goede from comment #79) > > (In reply to Joakim R from comment #78) > > > I have been tackling this same problem on my Thinkpad E595 for at least a > > > year, trying a plethora of boot options. The noapic setting worked well > for > > > some kernel versions, but lately it has messed with the ability to resume > > > after suspension. The irqpoll option gives a constant 100% load on one of > > > the CPU threads. > > > > > > However, when using fwupd I noticed that "capsule updates" were disabled, > > > which in the BIOS settings corresponds to allowing "Windows UEFI update". > > > After enabling that setting, my IRQ 7 problem has disappeared. > > > > > > > > > Hopefully this is of use to someone else. > > > > Interesting, did you perhaps also boot a newer kernel at the same time? > > Either a newer "z" release (as in kernel version expresses as x.y.z) or > > maybe a 5.11-rc kernel? There have been some recent AMD GPIO interrupt > > handling changes which might be related. > > > > If you did also boot a new kernel and have an older kernel still installed > > it would be interesting to know if the problem is also gone with the older > > kernel. > > I have been testing this with kernel 5.9.16 and 5.10.6. Initially I started > to fool around with all this again because suspend wasn't working properly > after going to 5.10. I have currently booted from kernel 5.10.6 and can > verify that suspend is now working as it should, as well as no "irq 7 nobody > cared" error. It seems like I may have spoken too soon. The suspend error is still there on kernel 5.10, so it seems to be intermittent. Also, dmesg still shows the irq 7 error, but abrt no longer reports on it, leading me to believe it was gone. So I guess this wasn't very helpful after all. the same IRQ 7 error still shows up on my ThinkPad x395. Even with latest 5.10 kernel and BIOS updates. My brand new AMD 4700G GPU with included Radeon GPU. [ 35.466776] CPU: 3 PID: 0 Comm: swapper/3 Tainted: P O 5.12.5-pclos1 #1 [ 35.466778] Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.03 09/23/2020 [ 35.466779] Call Trace: [ 35.466781] <IRQ> [ 35.466782] dump_stack+0x64/0x7c [ 35.466787] __report_bad_irq+0x35/0xaa [ 35.466789] note_interrupt.cold+0xb/0x64 [ 35.466791] handle_irq_event+0xa0/0xb0 [ 35.466793] handle_fasteoi_irq+0x7f/0x1d0 [ 35.466795] __common_interrupt+0x3e/0xa0 [ 35.466797] common_interrupt+0x7e/0xa0 [ 35.466800] </IRQ> [ 35.466800] asm_common_interrupt+0x1e/0x40 [ 35.466803] RIP: 0010:cpuidle_reflect+0x10/0x20 [ 35.466805] Code: fc ff ff 48 c7 43 08 00 00 00 00 5b 5d 41 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 05 ac a7 40 01 48 8b 40 40 <48> 85 c0 74 09 85 f6 78 05 e9 e2 4b 46 00 c3 90 0f 1f 44 00 00 48 [ 35.466807] RSP: 0018:ffffa6bcc017ff00 EFLAGS: 00000292 [ 35.466808] RAX: ffffffffae79ef40 RBX: 0000000000000003 RCX: 0000000000000002 [ 35.466809] RDX: ffff8ebc831a00c0 RSI: 0000000000000003 RDI: ffff8ebc831a0000 [ 35.466810] RBP: ffff8ebc80893c80 R08: 0000000841f946f4 R09: 0000000000000008 [ 35.466811] R10: 0000000000000003 R11: 0000000000000002 R12: ffffffffaf53c460 [ 35.466811] R13: ffff8ebc831a0000 R14: 0000000000000003 R15: 0000000000000000 [ 35.466812] ? ladder_select_state+0x1a0/0x1a0 [ 35.466814] do_idle+0x1ed/0x290 [ 35.466817] cpu_startup_entry+0x19/0x20 [ 35.466818] secondary_startup_64_no_verify+0xb0/0xbb [ 35.466820] handlers: [ 35.466821] [<00000000466e3e82>] amd_gpio_irq_handler [ 35.466824] Disabling IRQ #7 On Wed, May 19 2021 at 21:15, bugzilla-daemon wrote: > --- Comment #84 from Bill Reyolds (texstar@gmail.com) --- > My brand new AMD 4700G GPU with included Radeon GPU. > > [ 35.466820] handlers: > [ 35.466821] [<00000000466e3e82>] amd_gpio_irq_handler > [ 35.466824] Disabling IRQ #7 Can you please add 'apic=verbose' to the kernel command line and provide the full output of dmesg? Thanks, tglx Created attachment 296887 [details]
dmesg with apic=verbose kernel 5.12.5
Created attachment 296959 [details]
dmesg apic=verbose, ryzen 3500u, thinkpad e495
I'd like to share mine too. Hope it's useful.
Kernel 5.12.8. I doubt I will have much success in getting HP to update their locked down bios to fix. [ 35.884729] irq 7: nobody cared (try booting with the "irqpoll" option) [ 35.884734] CPU: 3 PID: 0 Comm: swapper/3 Tainted: P S O 5.12.8-pclos1 #1 [ 35.884736] Hardware name: HP HP Desktop M01-F1xxx/87D6, BIOS F.03 09/23/2020 [ 35.884737] Call Trace: [ 35.884739] <IRQ> [ 35.884741] dump_stack+0x64/0x7c [ 35.884745] __report_bad_irq+0x35/0xaa [ 35.884747] note_interrupt.cold+0xb/0x64 [ 35.884749] handle_irq_event+0xa0/0xb0 [ 35.884752] handle_fasteoi_irq+0x7f/0x1d0 [ 35.884754] __common_interrupt+0x3e/0xa0 [ 35.884756] common_interrupt+0x7e/0xa0 [ 35.884758] </IRQ> [ 35.884759] asm_common_interrupt+0x1e/0x40 [ 35.884761] RIP: 0010:do_idle+0x65/0x290 [ 35.884764] Code: 48 8b 45 00 89 db a8 08 74 28 e9 e9 00 00 00 e8 c1 24 06 00 e8 9c c6 91 00 e8 67 ff ff ff 65 48 8b 04 25 00 6d 01 00 48 8b 00 <a8> 08 0f 85 c6 00 00 00 0f ae e8 fa 48 0f a3 1d 67 23 51 01 0f 83 [ 35.884765] RSP: 0018:ffffab1c8017ff08 EFLAGS: 00000202 [ 35.884767] RAX: 0000000000204000 RBX: 0000000000000003 RCX: 0000000000000002 [ 35.884768] RDX: 0000000000000007 RSI: 0000000000000003 RDI: ffff9c1a438bdc00 [ 35.884768] RBP: ffff9c1a40895ac0 R08: 0000000000000000 R09: 0000000000000008 [ 35.884769] R10: 0000000000000003 R11: 0000000000000003 R12: ffffffffa553c480 [ 35.884770] R13: ffff9c1a438bdc00 R14: 0000000000000003 R15: 0000000000000000 [ 35.884771] ? do_idle+0x59/0x290 [ 35.884773] cpu_startup_entry+0x19/0x20 [ 35.884775] secondary_startup_64_no_verify+0xb0/0xbb [ 35.884777] handlers: [ 35.884777] [<00000000bbf5ab51>] amd_gpio_irq_handler [ 35.884781] Disabling IRQ #7 I don't know this is actually helpful to the situation, but I noticed recently that the DSDT for SMB00001 includes IRQ7. IOW commit 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa may be part of the reason that IRQ7 isn't serviced by anything. It might be useful however as a data point for someone involved here if reverting helps. As there is a lot of history behind that though, I don't think it can simply be reverted without causing problems for a number of older machines. Another +1 from me. Linux 5.13.12-200.fc34.x86_64 on a Lenovo E595 (Ryzen 3500U). [ 1.830326] irq 7: nobody cared (try booting with the "irqpoll" option) [ 1.830328] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.13.12-200.fc34.x86_64 #1 [ 1.830330] Hardware name: LENOVO 20NFCTO1WW/20NFCTO1WW, BIOS R11ET40W (1.20 ) 11/17/2020 [ 1.830331] Call Trace: [ 1.830334] <IRQ> [ 1.830336] dump_stack+0x76/0x94 [ 1.830341] __report_bad_irq+0x35/0xa7 [ 1.830344] note_interrupt.cold+0xb/0x61 [ 1.830346] handle_irq_event+0x88/0x90 [ 1.830350] handle_fasteoi_irq+0x78/0x1c0 [ 1.830352] __common_interrupt+0x3e/0xa0 [ 1.830355] common_interrupt+0x7e/0xa0 [ 1.830359] </IRQ> [ 1.830359] asm_common_interrupt+0x1e/0x40 [ 1.830363] RIP: 0010:note_page+0x66/0x660 [ 1.830366] Code: 85 c9 74 08 48 63 c2 4c 8b 64 c7 30 8b 43 18 48 8b 53 20 48 8b 4b 28 83 f8 ff 0f 84 a4 02 00 00 4 8 39 d5 0f 95 c2 3b 44 24 04 <0f> 95 c0 08 c2 75 09 49 39 cc 0f 84 65 03 00 00 80 7b 71 00 74 2a [ 1.830367] RSP: 0018:ffffb21d00067cb8 EFLAGS: 00000246 [ 1.830369] RAX: 0000000000000004 RBX: ffffb21d00067ea8 RCX: 0000000000000000 [ 1.830371] RDX: 0000000000000001 RSI: ffffff64261a8000 RDI: ffffb21d00067ea8 [ 1.830372] RBP: 8000000000000161 R08: ffffffff8407dda0 R09: 0000000000000000 [ 1.830372] R10: ffff880000000000 R11: ffffffff85d468c8 R12: 8000000000000000 [ 1.830373] R13: ffffff64261a8000 R14: ffffff64261a8000 R15: 0000000000000000 [ 1.830374] ? ptdump_walk_pgd_level_debugfs+0x40/0x40 [ 1.830377] ? hugetlb_get_unmapped_area+0x300/0x300 [ 1.830378] ptdump_pte_entry+0x57/0x60 [ 1.830382] __walk_page_range+0xb85/0xc70 [ 1.830386] walk_page_range_novma+0x57/0x70 [ 1.830388] ptdump_walk_pgd+0x48/0xb0 [ 1.830390] ptdump_walk_pgd_level_core+0xb3/0xd0 [ 1.830391] ? ptdump_walk_pgd_level_debugfs+0x40/0x40 [ 1.830392] ? hugetlb_get_unmapped_area+0x300/0x300 [ 1.830394] ? rest_init+0xb4/0xb4 [ 1.830395] ? rest_init+0xb4/0xb4 [ 1.830397] kernel_init+0x36/0x11c [ 1.830398] ret_from_fork+0x22/0x30 [ 1.830402] handlers: [ 1.830402] [<00000000e88fc910>] amd_gpio_irq_handler [ 1.830406] Disabling IRQ #7 I've done some testing with respect to commit 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa, and confirm that reverting it fixes the "irq 7: nobody cared" message for me. I compared the current Fedora 34 kernel (v5.13.14), against a build with the commit reverted. If others would like to test my kernel build is available here (use at your own risk): http://static.pauldoo.com/kernel/kernel-5.13.14-0.paultest1.fc34.x86_64.tar.gz After superficial testing the laptop (Lenovo E595) works normally with this commit reverted and there are no new errors appearing in dmesg output. Created attachment 298741 [details] Attempt to work between rock and a hard place Paul - thanks for trying that. Looking at this some today, the workaround from commit 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa to fix the touchscreen on the BIOS using both legacy IRQ and extended IRQ notation has non-obvious side effects. In addition to the IRQ7 nobody cared, I would suspect this is the reason that touchpads that support SMBUS aren't binding on a number of laptops. Reverting it will cause the breakage from https://bugzilla.kernel.org/show_bug.cgi?id=198715 to return, but leaving it in place causes this bug. As it's stuck in a very subtle workaround for very old BIOS from over 13 years ago, I have a thought on how we can avoid it. Can some folks affected by this please try the attached patch? If this helps I'll send it out to the mailing lists for further feedback. I build a kernel using attachment 298741 [details] from comment 92, and unfortunately the "irq 7: nobody cared" message is present again. Here is my build should anyone want to test: http://static.pauldoo.com/kernel/kernel-5.13.14-0.paultest2.fc34.x86_64.tar.gz That's... surprising considering the patch has the revert of 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa which worked for you. Are you sure it helped and you're sure you ran the right test kernel? I did see the patch included in your source rpm (within patch-5.13-redhat.patch). Assuming it's all built up right and you tested the right thing, can you please share your dmesg output with that patch in place and an acpidump? I am new to the workflows of building kernels, so if the result is suspicious it’s very likely I messed up. I’ll do a rebuild and update again in a day or two. Created attachment 298745 [details]
dmesg after paultest1 build
Hello, I tried the 5.13.14-0.paultest1.fc34.x86_64 rpm packages. And no more irq 7.
Thinkpad E495 ryzen 3500u
I also installed the paultest2 rpm packages but they have the irq7 nobody cared message.
NOW. Just to be clear I don't have a touchscreen on my thnkpad ;)
Unrelated: hopefully Lenovo fixes our HPET stuck cpu's and patch the AMD memory clock stuck at 100% soon.
@Neil,
I see in your dmesg this:
>[ 0.406508] ACPI: IRQ 7 override to edge, high
which is what I expect from the first test (revert 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa)
Can you please share your dmesg from the second test kernel and an acpidump please?
Created attachment 298747 [details]
dmesg paultest2 build
Created attachment 298749 [details]
acpidump kernel paultest2
There you have @Mario.
Later if needed I can share the acpidump from the first build.
https://bugzilla.kernel.org/show_bug.cgi?id=213031 is actually pretty much the same problem with the legacy IRQ getting setup wrong. However the commit from that bug was reverted because it caused regressions. So I suspect that reverting 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa and applying 0ec4e55e9f571f08970ed115ec0addc691eda613 on top of master again would have helped this bug as well. (In reply to Neil from comment #96) > > NOW. Just to be clear I don't have a touchscreen on my thnkpad ;) > Fixed the title ;) @Mario, Since the build for "paultest2" was suspicious, I created another build "paultest3" using the same diff from attachment 298741 [details] in comment 92. I used a clean checkout of the code to rule out the possibility of me not rebuilding correctly last time. The result is that the message "irq 7: nobody cared" is present - which is the same as "paultest2". I'll attach the dmesg and acpidump outputs (note to Fedora folks: this tool is found in the "acpica-tools" package). Here is a link to the build should anyone wish to test: http://static.pauldoo.com/kernel/kernel-5.13.14-0.paultest3.fc34.x86_64.tar.gz (it should give identical results to paultest2) Created attachment 298755 [details]
Lenovo E595 dmesg output using paultest3 kernel
Created attachment 298757 [details]
Lenovo E595 acpidump output using paultest3 kernel
Created attachment 298761 [details]
dmesg from paultest3 build
Tried paultest3 build.
Created attachment 298763 [details]
acpidump from paultest3
I'm just wondering, is there another patch I can test? The failure is every time for you or just some boots? Can you please double check your 'paultest1' kernel again? I'm still perplexed why reverting 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa helped but a commit with 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa plus ignoring the "bad" codepath for legacy devices didn't. With an unmodified Fedora 34 kernel, the warning occurs on every boot. I'll to another build with only 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa reverted and get back to you. Another build to report. F34 kernel (v5.13.16) with only 2bbb5fa37475d7aa5fa62f34db1623f3da2dfdfa reverted. "paultest4": http://static.pauldoo.com/kernel/kernel-5.13.16-0.paultest4.fc34.x86_64.tar.gz I confirm that this eliminates the "irq 7: nobody cared" message. I'll attach fresh dmesg and acpidump output under this build. This confirms what we saw with "paultest1". Created attachment 298953 [details]
Lenovo E595 dmesg output using paultest4 kernel
Created attachment 298955 [details]
Lenovo E595 acpidump output using paultest4 kernel
Created attachment 299015 [details]
dmesg from paultest4
Howdy
Created attachment 299017 [details]
acpidump from paultest4
Hi, For me the issue is gone. Machine: Lenovo ThinkPad E495, model 20NECTO1WW, ThinkPad BIOS R11ET44W (1.24 ), EC R11HT44W OS: DISTRIB_ID=Ubuntu DISTRIB_RELEASE=21.10 DISTRIB_CODENAME=impish Kernel: Linux e495 5.13.0-35-generic #40-Ubuntu SMP Mon Mar 7 08:03:10 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Hi, For me the issue is gone too. HP HP Desktop M01-F1xxx/87D6, BIOS F.13 03/29/2021 EFI v2.70 by American Megatrends Linux version 5.16.13-pclos1 OS: PCLinuxOS 2022 For me, the issue is still present. [ 1.573670] irq 7: nobody cared (try booting with the "irqpoll" option) [ 1.573694] CPU: 3 PID: 115 Comm: modprobe Not tainted 5.16.0-4-amd64 #1 Debian 5.16.12-1 [ 1.573697] Hardware name: LENOVO 20NE001QRT/20NE001QRT, BIOS R11ET36W (1.16 ) 03/30/2020 [ 1.573698] Call Trace: ... [ 1.573745] handlers: [ 1.573755] [<00000000ba607cbb>] amd_gpio_irq_handler [ 1.573772] Disabling IRQ #7 I am using Debian GNU/Linux bookworm/sid, with kernel 5.16.12-1. Forgot to mention that I am using Lenovo ThinkPad E495, model 20NE001QRT. The bug is gone for me after updating BIOS from 1.16 to 1.24. Fixed for me also on my Lenovo E595, after updating the BIOS from 1.21 to 1.24. BIOS v1.21: ``` > uname -a ;and dmesg | grep -P -i 'nobody cared|Disabling IRQ|DMI:' Linux len 5.16.14-200.fc35.x86_64 #1 SMP PREEMPT Fri Mar 11 20:31:18 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux [ 0.000000] DMI: LENOVO 20NFCTO1WW/20NFCTO1WW, BIOS R11ET41W (1.21 ) 06/07/2021 [ 1.463917] irq 7: nobody cared (try booting with the "irqpoll" option) [ 1.464069] Disabling IRQ #7 ``` BIOS v1.24: ``` > uname -a ;and dmesg | grep -P -i 'nobody cared|Disabling IRQ|DMI:' Linux len 5.16.14-200.fc35.x86_64 #1 SMP PREEMPT Fri Mar 11 20:31:18 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux [ 0.000000] DMI: LENOVO 20NFCTO1WW/20NFCTO1WW, BIOS R11ET44W (1.24 ) 01/26/2022 ``` I'm still facing this issue on both kernel 5.17.1-arch1-1 and 5.15.32-1-lts on a T495 with an AMD Ryzen 7 PRO 3700U. irq 7: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 0 Comm: swapper/0 Tainted: P OE 5.15.32-1-lts #1 bb8765a1c0d> Hardware name: LENOVO 20NKS28F00/20NKS28F00, BIOS R12ET55W(1.25 ) 07/06/2020 Call Trace: <IRQ> dump_stack_lvl+0x46/0x5a __report_bad_irq+0x35/0xaa note_interrupt.cold+0xb/0x64 handle_irq_event+0xab/0xc0 handle_fasteoi_irq+0x8a/0x1f0 __common_interrupt+0x41/0xa0 common_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_common_interrupt+0x1e/0x40 RIP: 0010:native_sched_clock+0x34/0x70 Code: c2 65 44 8b 05 0d c4 3f 46 44 89 c0 83 e0 01 48 c1 e0 04 48 8d 88 40 0a 03 00 65> RSP: 0018:ffffffffbb803df8 EFLAGS: 00000256 RAX: 0000000000000000 RBX: ffffffffbb81a940 RCX: 000000000000001f RDX: 000000020fb62397 RSI: 0000000037c1aab9 RDI: 0000000fe790ff1c RBP: ffff955a435bb000 R08: 0000000000000002 R09: 0000000000000007 R10: 0000000000000001 R11: 0000000000000000 R12: 00000010cd4f834a R13: 0000000000000000 R14: 0000000000098968 R15: 0000000000000000 sched_clock_cpu+0x9/0xa0 poll_idle+0xa5/0xb3 cpuidle_enter_state+0x89/0x350 cpuidle_enter+0x29/0x40 do_idle+0x1e1/0x270 cpu_startup_entry+0x19/0x20 start_kernel+0x9bb/0x9e2 secondary_startup_64_no_verify+0xc2/0xcb </TASK> handlers: [<000000009e238fe9>] amd_gpio_irq_handler [pinctrl_amd] Disabling IRQ #7 Anyone has an idea how could this issue be more investigated ? I've read many comments here, and it seemed that at the end this issue was fixed, but I don't know how or why I'm still facing it. Created attachment 300742 [details] attachment-2695-0.html The problem was fixed with a bios update.. have you seen if you have any? El vie, 8 abr 2022 3:49 p. m., <bugzilla-daemon@kernel.org> escribió: > https://bugzilla.kernel.org/show_bug.cgi?id=201817 > > --- Comment #121 from Lahfa Samy (samy@lahfa.xyz) --- > I'm still facing this issue on both kernel 5.17.1-arch1-1 and > 5.15.32-1-lts on > a T495 with an AMD Ryzen 7 PRO 3700U. > irq 7: nobody cared (try booting with the "irqpoll" option) > CPU: 0 PID: 0 Comm: swapper/0 Tainted: P OE 5.15.32-1-lts #1 > bb8765a1c0d> > Hardware name: LENOVO 20NKS28F00/20NKS28F00, BIOS R12ET55W(1.25 ) > 07/06/2020 > Call Trace: > <IRQ> > dump_stack_lvl+0x46/0x5a > __report_bad_irq+0x35/0xaa > note_interrupt.cold+0xb/0x64 > handle_irq_event+0xab/0xc0 > handle_fasteoi_irq+0x8a/0x1f0 > __common_interrupt+0x41/0xa0 > common_interrupt+0x7b/0xa0 > </IRQ> > <TASK> > asm_common_interrupt+0x1e/0x40 > RIP: 0010:native_sched_clock+0x34/0x70 > Code: c2 65 44 8b 05 0d c4 3f 46 44 89 c0 83 e0 01 48 c1 e0 04 48 8d 88 40 > 0a > 03 00 65> > RSP: 0018:ffffffffbb803df8 EFLAGS: 00000256 > RAX: 0000000000000000 RBX: ffffffffbb81a940 RCX: 000000000000001f > RDX: 000000020fb62397 RSI: 0000000037c1aab9 RDI: 0000000fe790ff1c > RBP: ffff955a435bb000 R08: 0000000000000002 R09: 0000000000000007 > R10: 0000000000000001 R11: 0000000000000000 R12: 00000010cd4f834a > R13: 0000000000000000 R14: 0000000000098968 R15: 0000000000000000 > sched_clock_cpu+0x9/0xa0 > poll_idle+0xa5/0xb3 > cpuidle_enter_state+0x89/0x350 > cpuidle_enter+0x29/0x40 > do_idle+0x1e1/0x270 > cpu_startup_entry+0x19/0x20 > start_kernel+0x9bb/0x9e2 > secondary_startup_64_no_verify+0xc2/0xcb > </TASK> > handlers: > [<000000009e238fe9>] amd_gpio_irq_handler [pinctrl_amd] > Disabling IRQ #7 > > Anyone has an idea how could this issue be more investigated ? I've read > many > comments here, and it seemed that at the end this issue was fixed, but I > don't > know how or why I'm still facing it. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are on the CC list for the bug. I'm on the third-latest BIOS from Lenovo. I wasn't very keen on upgrading because Lenovo restricts the amount of memory I could choose for the iGPU on the second-newest BIOS update. This item : Remove UMA buffer size item 128MB/256MB/512MB on BIOS setup. I became thus a bit skeptical from actually upgrading, even if I'm aware it's not good at all to ditch any BIOS upgrade, especially considering how rare they are. Here are the changelogs of the two last updates : <1.28> - [Important] Update Phoenix Security Issue. - [Important] Sync system base board version to SMBIOS type2. - [Important] Update CopyRight to 2021. - (Fix) Fixed an issue that BTS SPI protection fail. <1.27> - [Important] Remove UMA buffer size item 128MB/256MB/512MB on BIOS setup. - [Important] Remove no BIOS setup item into WMI list. - [Important] Modify Strange diagnostics error. - [New] Added Support SMBios release to support country code. - [New] Added back flash prevention for Type11 Country code. - [New] Added support for Nuvoton TPM firmware update function. - [New] Added (LEN-27581) Security:changed SmmOEMInt15 to return an error after ExitBootServices.Removed USB-API function. - (Fix) Fixed an issue that Fn+Tab have no function. - (Fix) Fixed an issue that it can't be waked by WOL in S4/S5 from TBT TR dock(TBT work station). - (Fix) Fixed an issue that battery icon in System task tray shows yellow mark when plug out 65W/90W AC adapter then plug in 135W AC adapter. Is it the Remove no BIOS setup item into WMI list that fixes this bug ? Or the (Fix) in 1.28 ? Or neither of these ? Finally, do you have a T495 or anyone having a ThinkPad T495 has tested that these BIOS updates do fix the issue? Or it is only supposed they do ? I could always update and rollback, but I'm just not a fan of flashing any low level stuff, if any operation fails I'm in for a rollercoaster. There are is one comment (#50) that says that updating their BIOS to 1.19 on a T495 got rid of this issue, so I don't know if a regression was introduced by another BIOS upgrade or a kernel upgrade or if the issue lies elsewhere : https://bugzilla.kernel.org/show_bug.cgi?id=201817#c50 A very similar issue to this has popped up (https://bugzilla.kernel.org/show_bug.cgi?id=216230) which has an interesting finding. By pinctrl-amd loading late some IRQs are not getting serviced. Moving it into the initramfs appears to help in that case. Anyone who is still experiencing this issue can you please do the following: 1) First reproduce on latest 5.18.y (or 5.19-rc) 2) Modify your kernel config for CONFIG_PINCTRL_AMD to be built-in. 3) See if you can still reproduce it. If you can still reproduce it, please turn on dynamic debugging for pinctrl-amd (dyndbg="module pinctrl_amd +p" on kernel command line) and then share another updated dmesg. I've seen that a somewhat similar issue in the ArchLinux linux kernel 6.0.1-arch2-1 has popped up again, it wasn't showing up before (but I don't know when it stopped showing up on 5.19.x), looking in journalctl, the issue was happening on 5.16.2-arch1-1, 5.16.8-arch1-1, 5.16.10-arch1-1, 5.16.11-arch1-1, 5.16.15-arch1-1, 5.15.32-lts-1, 5.15.35-lts-1, 5.15.55-lts-1, 5.15.69-lts, 5.17.1-arch1-1, 5.17.5-arch1-1, 5.17.9-arch1-1, 5.18.7-arch1-1, 5.19.2-arch1-1, 5.19.4-arch1-1, 5.19.7-arch1-1, 5.19.10-arch1-1 and finally just now on 6.0.1-arch2-1. On the latest kernel 6.0.1-arch2-1 : kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: P OE 6.0.1-arch2-1 #1 ca4c4b1e174d24f1d562eb4d3f9de5bf01e98574 kernel: Hardware name: LENOVO 20NKS28F00/20NKS28F00, BIOS R12ET55W(1.25 ) 07/06/2020 kernel: Call Trace: kernel: <IRQ> kernel: dump_stack_lvl+0x48/0x60 kernel: __report_bad_irq+0x35/0xaa kernel: note_interrupt.cold+0xa/0x65 kernel: handle_irq_event+0x75/0x80 kernel: handle_fasteoi_irq+0x8e/0x1f0 kernel: __common_interrupt+0x46/0xa0 kernel: common_interrupt+0x43/0xa0 kernel: asm_common_interrupt+0x26/0x40 kernel: RIP: 0010:__do_softirq+0x7c/0x2ca kernel: Code: 14 81 67 2c ff f7 ff ff be 00 01 00 00 e8 4c c3 2f ff c7 44 24 10 0a 00 00 00 65 66 c7 05 ca 24 e3 4c 00 00 fb 0f 1f 44 00 00 > kernel: RSP: 0018:ffffb60bc0003f90 EFLAGS: 00000246 kernel: RAX: 0000000000000000 RBX: ffffffffb4003de8 RCX: 00000001000f634f kernel: RDX: 0000000000000001 RSI: 0000000000000100 RDI: ffffffffb401a9c0 kernel: RBP: 0000000000000043 R08: 000000023ece886f R09: 8996bb55954eaf17 kernel: R10: 0000000000000040 R11: ffff933375ca7300 R12: 0000000000000001 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000040 kernel: ? handle_edge_irq+0x9a/0x260 kernel: __irq_exit_rcu+0xb7/0xe0 kernel: common_interrupt+0x86/0xa0 kernel: </IRQ> kernel: <TASK> kernel: asm_common_interrupt+0x26/0x40 kernel: RIP: 0010:tick_nohz_idle_enter+0x45/0x50 kernel: Code: 81 35 ab 4d 48 83 bb b0 00 00 00 00 75 22 80 4b 4c 01 e8 5e f0 fe ff 80 4b 4c 04 48 89 43 78 e8 71 cf f9 ff fb 0f 1f 44 00 00 > kernel: RSP: 0018:ffffffffb4003e90 EFLAGS: 00000282 kernel: RAX: 000003548cd93c16 RBX: ffff9335f0a24800 RCX: 000000000000260a kernel: RDX: 000003548cd93c16 RSI: 000003548cd93c16 RDI: 000003548cd93c16 kernel: RBP: 0000000000000000 R08: ffffffffffcdab20 R09: 0000000037c1c8f8 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9335feff2001 kernel: R13: 0000000000000000 R14: ffffffffb401a120 R15: 00000000bdf51000 kernel: do_idle+0x42/0x270 kernel: cpu_startup_entry+0x1d/0x20 kernel: rest_init+0xc8/0xd0 kernel: arch_call_rest_init+0xe/0x1c kernel: start_kernel+0x97a/0x9a3 kernel: secondary_startup_64_no_verify+0xe5/0xeb kernel: </TASK> kernel: handlers: kernel: [<00000000d191bbef>] amd_gpio_irq_handler kernel: Disabling IRQ #7 I will add dynamic debugging for pinctrl-amd, hopefully trigger the exact same issue reliably again (as there are 2 differents oops that seems to be triggered randomly, see the attached file next of this comment), then share another dmesg, in hopes a fix or a patch can come. Created attachment 303006 [details]
multiples oops for 5.16.11-arch1-1,5.16.14-arch1-1,5.15.28-1-lts,6.0.1-arch2-1
Oops traces all made on a Thinkpad T495 Ryzen 7 3700U with Vega RX 10, BIOS R12ET55W(1.25 )
Created attachment 303007 [details] 1-dmesg-oops-T495-Ryzen7PRO3700U-6.0.1-arch2-1 with dyndbg="module pinctrl_amd +p" T495-Ryzen 7 PRO 3700U with Vega RX10, 6.0.1-arch2-1 cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-linux zfs=zroot-ext/djqdje_arch/ROOT/default rw > radeon.si_support=0 amdgpu.si_support=1 radeon.cik_support=0 > amdgpu.cik_support=1 loglevel=3 quiet "dyndbg=module pinctrl_amd +p" cat /proc/config.gz | rg PINCTRL_AMD > CONFIG_PINCTRL_AMD=y Created attachment 303008 [details]
2-dmesg-oops-T495-Ryzen7PRO3700U-6.0.1-arch2-1 with dyndbg="module pinctrl_amd +p"
Same as settings as the 1-dmesg but the oops trace is longer, I've compared/diffed them using meld.
Is there a chance that this is tied to only happening on warm boot vs happening on cold boot too? If it's happening on warm boot, maybe we are missing some cleanup on shutdown for the GPIO controller. |