Bug 201817 - irq 7: nobody cared for HP laptop with with touchscreen and AMD processor
Summary: irq 7: nobody cared for HP laptop with with touchscreen and AMD processor
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on: 198715
Blocks:
  Show dependency tree
 
Reported: 2018-11-29 18:40 UTC by Bram Coenen
Modified: 2019-02-21 10:24 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.19.4
Tree: Mainline
Regression: No


Attachments
dmesg after boot, 4.19.4, no touchscreen (75.64 KB, text/plain)
2018-11-29 18:40 UTC, Bram Coenen
Details
dmesg after boot, 4.19.3, working touchscreen (73.19 KB, text/plain)
2018-11-29 18:51 UTC, Bram Coenen
Details
cat /proc/interrupts (1.22 KB, text/plain)
2018-12-20 21:09 UTC, nospamming11+kernel
Details
modified function (1.81 KB, text/plain)
2019-01-18 18:20 UTC, nospamming11+kernel
Details
modified - dmesg - working (322.06 KB, text/plain)
2019-01-18 18:22 UTC, nospamming11+kernel
Details
[RFC] pinctrl/amd: Clear interrupt enable bits on probe (10.20 KB, patch)
2019-02-19 13:24 UTC, Hans de Goede
Details | Diff
dmesg with leonard patch and a non-working touchscreen (72.48 KB, text/plain)
2019-02-20 14:26 UTC, Bram Coenen
Details
dmesg with leonard patch and a working touchscreen (72.85 KB, text/plain)
2019-02-20 16:41 UTC, Bram Coenen
Details
dmesg_leonard_patch_no_touchscreen (77.45 KB, text/plain)
2019-02-21 05:05 UTC, Bram Coenen
Details

Description Bram Coenen 2018-11-29 18:40:08 UTC
Created attachment 279741 [details]
dmesg after boot, 4.19.4, no touchscreen

The Elan touchscreen on HP laptops with an AMD processor just got fixed and worked properly for a while in 4.19.3. However after installing a new kernel version, 4.19.4, the touchscreen stopped working and new errors appeared.

I once got this in 4.19.3 as well, but after a few shutdowns and the last shutdown holding the power button, this was resolved. I did have issues login in as well (Don't)

I'm on a HP ENVY x360 Convertible 15-bq0xx/8311, BIOS F.08 with only Fedora 28.

The APCI-config was fixed in https://bugzilla.kernel.org/show_bug.cgi?id=198715 .
Comment 1 Bram Coenen 2018-11-29 18:51:30 UTC
Created attachment 279743 [details]
dmesg after boot, 4.19.3, working touchscreen
Comment 2 Bram Coenen 2018-11-29 18:58:55 UTC
Just checked. The kernel version doesn't matter, a decent about of reboots does the trick to get the touchscreen working even on 4.19.4.
Comment 3 Bram Coenen 2018-11-29 19:11:53 UTC
[   16.587361] irq 7: nobody cared (try booting with the "irqpoll" option)
[   16.587366] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        4.19.4-200.fc28.x86_64 #1
[   16.587367] Hardware name: HP HP ENVY x360 Convertible 15-bq0xx/8311, BIOS F.08 03/30/2018
[   16.587368] Call Trace:
[   16.587372]  <IRQ>
[   16.587381]  dump_stack+0x5c/0x80
[   16.587385]  __report_bad_irq+0x37/0xae
[   16.587388]  note_interrupt.cold.9+0xa/0x69
[   16.587390]  handle_irq_event_percpu+0x6a/0x80
[   16.587392]  handle_irq_event+0x27/0x44
[   16.587394]  handle_fasteoi_irq+0x7f/0x120
[   16.587398]  handle_irq+0xbf/0x100
[   16.587400]  do_IRQ+0x49/0xd0
[   16.587403]  common_interrupt+0xf/0xf
[   16.587405]  </IRQ>
[   16.587409] RIP: 0010:native_safe_halt+0x2/0x10
[   16.587411] Code: ff ff 7f c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c4 eb 8c 90 90 90 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
[   16.587412] RSP: 0018:ffffffffbd203e18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd8
[   16.587415] RAX: 0000000080000000 RBX: ffff887df5a01c00 RCX: 0000000000000034
[   16.587416] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffbd2dd200 RDI: ffff887df5a01c64
[   16.587418] RBP: ffff887df5a01c64 R08: 0000000000000002 R09: 0000000000020800
[   16.587419] R10: 0000000f66cccf99 R11: ffff887df721fde8 R12: 0000000000000001
[   16.587420] R13: 0000000000000001 R14: 0000000000000001 R15: 00000000d0c2fae0
[   16.587424]  acpi_safe_halt+0x1b/0x30
[   16.587427]  acpi_idle_enter+0x104/0x2a0
[   16.587431]  cpuidle_enter_state+0x71/0x320
[   16.587435]  do_idle+0x226/0x260
[   16.587438]  cpu_startup_entry+0x6f/0x80
[   16.587442]  start_kernel+0x523/0x543
[   16.587446]  secondary_startup_64+0xa4/0xb0
[   16.587448] handlers:
[   16.587453] [<00000000e6019074>] amd_gpio_irq_handler [pinctrl_amd]
[   16.587455] Disabling IRQ #7
Comment 4 Hans de Goede 2018-11-30 09:45:15 UTC
I discussed this a bit on the mailinglist. Here are the relevant parts of the discussion:

Me:

The amd_gpio chip/driver appears to be the only driver
connected to IRQ 7, so I think there is an issue with the
amd_gpio driver where it does not properly clear the interrupt
source. E.g. it might be that the BIOS requested interrupts
on a GPIO which Linux does not monitor and that the driver
does not disable this GPIO-IRQ on probe and since it is not
handling that pin in IRQ mode also does not clear it.

Anyways that is just a theory. It would greatly help if
someone who knows the amd_gpio driver better could take
a look.

Reply by Daniel Drake:

Sorry that I can't be much help here - I don't have access to any
useful info beyond the source code already present in Linux.

Maybe you could explore your theory by dumping the GPIO/GPIO-INT
enable regs, see if any of them are marked as enabled by something
other than Linux.

###

I'm afraid I don't have time to look into this myself atm. Maybe someone can add some printk calls to drivers/pinctrl/pinctrl-amd.c to dump relevant register values as Daniel suggested and see if that yields any useful info?
Comment 5 mruize85 2018-12-16 15:20:55 UTC
Forcing `amd_gpio_irq_handler()` from `drivers/pinctrl/pinctrl-amd.c` to always return `IRQ_HANDLED` makes touchscreen work, but now system is sending interrupts at very high rate (around 100 k/s). I a completely newbie to kernel development, so I don't really know what I'm doing... Any idea?
Comment 6 nospamming11+kernel 2018-12-20 21:09:07 UTC
Created attachment 280107 [details]
cat /proc/interrupts

Seeing the exact same problem on 4.19.10 (HP Envy x360 13-ag0004ng)

Attached the output of /proc/interrupts
Comment 7 Lukas Kahnert 2018-12-24 20:01:51 UTC
The only case where I got the "nobody cared" panic on my HP Envy x360 bq-1xx was if I used Windows 10 on my last boot and rebooted into Linux.
My theory is that Windows set the IRQ 7 on a state that persists on reboot(and trigger the panic in linux) and only get cleared if you hold down power button.
My Laptop is now Linux only and since then I never had this issue again(using 4.19.5 now).
Comment 8 JerryD 2019-01-15 02:33:41 UTC
This issue occurs without regard to Windows 10 previous boot. From cold powerup I get failure to boot about 2 out of 3 attempts. Anyway to remove this touchscreen driver? I think it should be backed out until the regression is fixxed.
Comment 9 Bram Coenen 2019-01-15 07:22:10 UTC
(In reply to JerryD from comment #8)
> This issue occurs without regard to Windows 10 previous boot. From cold
> powerup I get failure to boot about 2 out of 3 attempts. Anyway to remove
> this touchscreen driver? I think it should be backed out until the
> regression is fixxed.

I don't even have Windows any more and get the bug sometimes. However, I do not agree that the driver should be removed. I still use the touchscreen daily because for me it is working most of the time. Besides, it does not hurt having it there, does it?
Comment 10 JerryD 2019-01-17 02:22:07 UTC
Are there any workarounds?
Comment 11 nospamming11+kernel 2019-01-17 13:34:38 UTC
Some suggest here: https://github.com/linuxwacom/wacom-hid-descriptors/issues/12
that switching to Legacy BIOS boot helped them.
Comment 12 nospamming11+kernel 2019-01-18 18:19:59 UTC
I can confirm that always returning IRQ_HANDLED fixes the error, but spams the system with a lot of these interrupts (it never stops)

Sometimes booting with the kernel option noirqdebug helped me to get the touchscreen up and running again.

I then noticed the following: See attached three files. One with a modified amd_gpio_irq_handler and two dmesg outputs. One of them with a working touchscreen and one where the touchscreen does not work. I can see that in the working case all interrupts are handled correctly while in the "not-working"-case there are A LOT of interrupts handled at all.
Comment 13 nospamming11+kernel 2019-01-18 18:20:41 UTC
Created attachment 280583 [details]
modified function
Comment 14 nospamming11+kernel 2019-01-18 18:22:35 UTC
Created attachment 280585 [details]
modified - dmesg - working
Comment 15 nospamming11+kernel 2019-01-18 18:27:24 UTC
Not working - dmesg (too large to attach directly): https://bit.ly/2FHChBb
Comment 16 JerryD 2019-01-20 00:21:56 UTC
(In reply to nospamming11+kernel from comment #12)
> I can confirm that always returning IRQ_HANDLED fixes the error, but spams
> the system with a lot of these interrupts (it never stops)
> 
> Sometimes booting with the kernel option noirqdebug helped me to get the
> touchscreen up and running again.
> 
> I then noticed the following: See attached three files. One with a modified
> amd_gpio_irq_handler and two dmesg outputs. One of them with a working
> touchscreen and one where the touchscreen does not work. I can see that in
> the working case all interrupts are handled correctly while in the
> "not-working"-case there are A LOT of interrupts handled at all.

What does this imply? Does the driver need to actually handle this interupt? What hardware is actually generating this interupt? Or is IRQ 7 and unused pin that if floating and therefore must be disabled?
Comment 17 JerryD 2019-02-09 19:35:50 UTC
Appears to be fixed on Fedora kernel 4.20.6-200.fc29.x86_64
Comment 18 nospamming11+kernel 2019-02-09 19:59:22 UTC
Sadly, I can't confirm that. Neither on 4.20.6.arch1-1-ARCH nor on 4.20.7-arch1-1-ARCH the problem is fixed for me. (Laptop Firmware: F.32)

Still noirqdebug is a workaround.
Comment 19 JerryD 2019-02-10 01:11:38 UTC
(In reply to nospamming11+kernel from comment #18)
> Sadly, I can't confirm that. Neither on 4.20.6.arch1-1-ARCH nor on
> 4.20.7-arch1-1-ARCH the problem is fixed for me. (Laptop Firmware: F.32)
> 
> Still noirqdebug is a workaround.

I am on HP Envy Laptop with Bios F19 which HP pulled evidently because it has some other big problem. But since everything is stable for me at the moment I am just leaving it alone. From what I am reading the only way to back that bios out is with a special USB stick which they will send you. My current kernel boot line is:

BOOT_IMAGE=/vmlinuz-4.20.6-200.fc29.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait processor.max_cstate=5
Comment 20 Bram Coenen 2019-02-14 04:48:29 UTC
(In reply to JerryD from comment #17)
> Appears to be fixed on Fedora kernel 4.20.6-200.fc29.x86_64

It isn't fixed for me on the 4.20.6-200.fc29.x86_64. But it doesn't always happen, sometimes it can go days without the bug appearing.
Comment 21 JerryD 2019-02-16 03:16:40 UTC
(In reply to Bram Coenen from comment #20)
> (In reply to JerryD from comment #17)
> > Appears to be fixed on Fedora kernel 4.20.6-200.fc29.x86_64
> 
> It isn't fixed for me on the 4.20.6-200.fc29.x86_64. But it doesn't always
> happen, sometimes it can go days without the bug appearing.

You are right, it is intermittent, sometimes I get it and sometimes I dont.

A side note, I upgraded to Bios F20.  Dont do this. 4.20.xx fails to boot. 4.18 seems to boot fine.
Comment 22 nospamming11+kernel 2019-02-16 14:59:58 UTC
Can you guys confirm that "noirqdebug" as a kernel boot param works for you too?
Comment 23 JerryD 2019-02-17 16:53:37 UTC
(In reply to nospamming11+kernel from comment #22)
> Can you guys confirm that "noirqdebug" as a kernel boot param works for you
> too?

With Linux version 4.18.16-300.fc29.x86_64 mockbuild@bkernel04.phx2.fedoraproject.org)

I see no irq7 issue.

With 4.20.7-200.fc29.x86_64 it locks up right away and unable to get any sort of backtrace with ot without "noirqdebug" ABRT reports insufficiant information to generate a report and to contact kernel mailing list.
Comment 24 JerryD 2019-02-17 16:59:33 UTC
For clarity on comment 23: HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.20 12/25/2018
Comment 25 Hans de Goede 2019-02-19 13:24:40 UTC
Created attachment 281207 [details]
[RFC] pinctrl/amd: Clear interrupt enable bits on probe

Good news, Leonard Crestez has come up with a patch which likely fixes this.

I'm attaching the patch here, please give it a try.
Comment 26 nospamming11+kernel 2019-02-19 18:35:13 UTC
Unfortunately this still does not fix it for me.

I applied it to 4.20.10-arch1-1 (Archlinux kernel) and I still get the error:

"irq 7:nobody cared (try booting with the "irqpoll" option)"
with a Call Trace afterwards.

I can see that the new function is getting called and tells me that a bunch of PINs get disabled

"amd_gpio: AMD10030:00: Pin 67 interrupt enabled on boot: disable
.....
"

But right after that IRQ 7 starts to spam my logs again (having still added the above described log-outputs
Comment 27 Bram Coenen 2019-02-20 14:26:55 UTC
Created attachment 281231 [details]
dmesg with leonard patch and a non-working touchscreen

Dmesg output when trying Leonard Crestez's patch the first time and it didn't work. It worked when booting into the kernel the second time and I think the relevant error message was the one shown below. I'll come back and give an update if/when the touchscreen stops working again. 

The nobody cared error is gone at least! 

[    2.964272] i2c_hid i2c-ELAN0732:00: HID over i2c has not been provided an Int IRQ
[    2.964330] i2c_hid: probe of i2c-ELAN0732:00 failed with error -22

(This is my first time compiling a kernel in fedora. So I could also have done something wrong.)
Comment 28 nospamming11+kernel 2019-02-20 14:47:32 UTC
I have seen the i2c-error too - when my touchscreen worked, but the patch does not work for me (even after several linux-only-boots).

Do you have a windows system installed @Bram Coenen? Can you boot into it and test if the touchscreen still works, when you boot back into linux?
Comment 29 Bram Coenen 2019-02-20 16:39:29 UTC
(In reply to nospamming11+kernel from comment #28)
> I have seen the i2c-error too - when my touchscreen worked, but the patch
> does not work for me (even after several linux-only-boots).
> 
> Do you have a windows system installed @Bram Coenen? Can you boot into it
> and test if the touchscreen still works, when you boot back into linux?

No, unfortunately I don't have windows installed. However the bug happens now and again anyways
Comment 30 Bram Coenen 2019-02-20 16:41:30 UTC
Created attachment 281233 [details]
dmesg with leonard patch and a working touchscreen

I think the interesting parts here are these prints.

[    2.988099] i2c_hid i2c-ELAN0732:00: i2c-ELAN0732:00 supply vdd not found, using dummy regulator
[    2.988146] i2c_hid i2c-ELAN0732:00: Linked as a consumer to regulator.0
[    2.988148] i2c_hid i2c-ELAN0732:00: i2c-ELAN0732:00 supply vddl not found, using dummy regulator
[    2.993265] audit: type=1130 audit(1550670938.923:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    3.015329] acpi PNP0C14:01: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
[    3.015354] wmi_bus wmi_bus-PNP0C14:01: WQBJ data block query control method not found
[    3.017236] input: ELAN0732:00 04F3:24BC Touchscreen as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input13
[    3.017377] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input14
[    3.017434] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input15
[    3.017492] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input16
[    3.017589] hid-generic 0018:04F3:24BC.0004: input,hidraw3: I2C HID v1.00 Device [ELAN0732:00 04F3:24BC] on i2c-ELAN0732:00
[    3.064691] nvme nvme0: pci function 0000:03:00.0
[    3.107262] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    3.213210] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input18
[    3.213350] input: ELAN0732:00 04F3:24BC as /devices/platform/AMD0010:00/i2c-0/i2c-ELAN0732:00/0018:04F3:24BC.0004/input/input21
[    3.213445] hid-multitouch 0018:04F3:24BC.0004: input,hidraw2: I2C HID v1.00 Device [ELAN0732:00 04F3:24BC] on i2c-ELAN0732:00
Comment 31 Bram Coenen 2019-02-20 19:52:49 UTC
I got the bug again, but no print from the patch. So I probably messed up the compilation of the kernel with the patch. I'll try to patch it correctly tomorrow!
Comment 32 JerryD 2019-02-21 03:22:58 UTC
(In reply to JerryD from comment #23)
> (In reply to nospamming11+kernel from comment #22)
> > Can you guys confirm that "noirqdebug" as a kernel boot param works for you
> > too?
> 
> With Linux version 4.18.16-300.fc29.x86_64
> mockbuild@bkernel04.phx2.fedoraproject.org)
> 
> I see no irq7 issue.
> 
> With 4.20.7-200.fc29.x86_64 it locks up right away and unable to get any
> sort of backtrace with ot without "noirqdebug" ABRT reports insufficiant
> information to generate a report and to contact kernel mailing list.

Turns out the problem I have with the 4.20 kernel is not bios related and bios F.20 is OK. I ran into this bug:

https://bugs.freedesktop.org/show_bug.cgi?id=109206

Kernel 4.19.15-300.fc29.x86_64 is working fine.
Comment 33 Bram Coenen 2019-02-21 04:55:25 UTC
(In reply to JerryD from comment #32)
> (In reply to JerryD from comment #23)
> > (In reply to nospamming11+kernel from comment #22)
> > > Can you guys confirm that "noirqdebug" as a kernel boot param works for
> you
> > > too?
> > 
> > With Linux version 4.18.16-300.fc29.x86_64
> > mockbuild@bkernel04.phx2.fedoraproject.org)
> > 
> > I see no irq7 issue.
> > 
> > With 4.20.7-200.fc29.x86_64 it locks up right away and unable to get any
> > sort of backtrace with ot without "noirqdebug" ABRT reports insufficiant
> > information to generate a report and to contact kernel mailing list.
> 
> Turns out the problem I have with the 4.20 kernel is not bios related and
> bios F.20 is OK. I ran into this bug:
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=109206
> 
> Kernel 4.19.15-300.fc29.x86_64 is working fine.

Try a higher version of 4.20. Mine is working on 4.20.10 for example.
Comment 34 Bram Coenen 2019-02-21 05:05:04 UTC
Created attachment 281253 [details]
dmesg_leonard_patch_no_touchscreen

I applied the patch made by Leonard correctly this time. The touchscreen does not work and prints the following for pin 67 to 148.

[    3.080467] amd_gpio AMD0030:00: Pin 67 interrupt enabled on boot: disable
Comment 35 Bram Coenen 2019-02-21 05:11:12 UTC
(In reply to Bram Coenen from comment #34)
> Created attachment 281253 [details]
> dmesg_leonard_patch_no_touchscreen
> 
> I applied the patch made by Leonard correctly this time. The touchscreen
> does not work and prints the following for pin 67 to 148.
> 
> [    3.080467] amd_gpio AMD0030:00: Pin 67 interrupt enabled on boot: disable

"irq 7: nobody cared" is also still present.
Comment 36 Bram Coenen 2019-02-21 10:24:27 UTC
(In reply to Bram Coenen from comment #35)
> (In reply to Bram Coenen from comment #34)
> > Created attachment 281253 [details]
> > dmesg_leonard_patch_no_touchscreen
> > 
> > I applied the patch made by Leonard correctly this time. The touchscreen
> > does not work and prints the following for pin 67 to 148.
> > 
> > [    3.080467] amd_gpio AMD0030:00: Pin 67 interrupt enabled on boot:
> disable
> 
> "irq 7: nobody cared" is also still present.

Now my touchscreen is working with the patch. It still prints that the same pins are disabled and the nobody cared is gone. So I don't think the patch made any difference.

Note You need to log in before you can comment on or make changes to this bug.