Created attachment 279741 [details]
dmesg after boot, 4.19.4, no touchscreen
The Elan touchscreen on HP laptops with an AMD processor just got fixed and worked properly for a while in 4.19.3. However after installing a new kernel version, 4.19.4, the touchscreen stopped working and new errors appeared.
I once got this in 4.19.3 as well, but after a few shutdowns and the last shutdown holding the power button, this was resolved. I did have issues login in as well (Don't)
I'm on a HP ENVY x360 Convertible 15-bq0xx/8311, BIOS F.08 with only Fedora 28.
The APCI-config was fixed in https://bugzilla.kernel.org/show_bug.cgi?id=198715 .
Created attachment 279743 [details]
dmesg after boot, 4.19.3, working touchscreen
Just checked. The kernel version doesn't matter, a decent about of reboots does the trick to get the touchscreen working even on 4.19.4.
[ 16.587361] irq 7: nobody cared (try booting with the "irqpoll" option)
[ 16.587366] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 4.19.4-200.fc28.x86_64 #1
[ 16.587367] Hardware name: HP HP ENVY x360 Convertible 15-bq0xx/8311, BIOS F.08 03/30/2018
[ 16.587368] Call Trace:
[ 16.587372] <IRQ>
[ 16.587381] dump_stack+0x5c/0x80
[ 16.587385] __report_bad_irq+0x37/0xae
[ 16.587388] note_interrupt.cold.9+0xa/0x69
[ 16.587390] handle_irq_event_percpu+0x6a/0x80
[ 16.587392] handle_irq_event+0x27/0x44
[ 16.587394] handle_fasteoi_irq+0x7f/0x120
[ 16.587398] handle_irq+0xbf/0x100
[ 16.587400] do_IRQ+0x49/0xd0
[ 16.587403] common_interrupt+0xf/0xf
[ 16.587405] </IRQ>
[ 16.587409] RIP: 0010:native_safe_halt+0x2/0x10
[ 16.587411] Code: ff ff 7f c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c4 eb 8c 90 90 90 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
[ 16.587412] RSP: 0018:ffffffffbd203e18 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd8
[ 16.587415] RAX: 0000000080000000 RBX: ffff887df5a01c00 RCX: 0000000000000034
[ 16.587416] RDX: 4ec4ec4ec4ec4ec5 RSI: ffffffffbd2dd200 RDI: ffff887df5a01c64
[ 16.587418] RBP: ffff887df5a01c64 R08: 0000000000000002 R09: 0000000000020800
[ 16.587419] R10: 0000000f66cccf99 R11: ffff887df721fde8 R12: 0000000000000001
[ 16.587420] R13: 0000000000000001 R14: 0000000000000001 R15: 00000000d0c2fae0
[ 16.587424] acpi_safe_halt+0x1b/0x30
[ 16.587427] acpi_idle_enter+0x104/0x2a0
[ 16.587431] cpuidle_enter_state+0x71/0x320
[ 16.587435] do_idle+0x226/0x260
[ 16.587438] cpu_startup_entry+0x6f/0x80
[ 16.587442] start_kernel+0x523/0x543
[ 16.587446] secondary_startup_64+0xa4/0xb0
[ 16.587448] handlers:
[ 16.587453] [<00000000e6019074>] amd_gpio_irq_handler [pinctrl_amd]
[ 16.587455] Disabling IRQ #7
I discussed this a bit on the mailinglist. Here are the relevant parts of the discussion:
The amd_gpio chip/driver appears to be the only driver
connected to IRQ 7, so I think there is an issue with the
amd_gpio driver where it does not properly clear the interrupt
source. E.g. it might be that the BIOS requested interrupts
on a GPIO which Linux does not monitor and that the driver
does not disable this GPIO-IRQ on probe and since it is not
handling that pin in IRQ mode also does not clear it.
Anyways that is just a theory. It would greatly help if
someone who knows the amd_gpio driver better could take
Reply by Daniel Drake:
Sorry that I can't be much help here - I don't have access to any
useful info beyond the source code already present in Linux.
Maybe you could explore your theory by dumping the GPIO/GPIO-INT
enable regs, see if any of them are marked as enabled by something
other than Linux.
I'm afraid I don't have time to look into this myself atm. Maybe someone can add some printk calls to drivers/pinctrl/pinctrl-amd.c to dump relevant register values as Daniel suggested and see if that yields any useful info?
Forcing `amd_gpio_irq_handler()` from `drivers/pinctrl/pinctrl-amd.c` to always return `IRQ_HANDLED` makes touchscreen work, but now system is sending interrupts at very high rate (around 100 k/s). I a completely newbie to kernel development, so I don't really know what I'm doing... Any idea?
Created attachment 280107 [details]
Seeing the exact same problem on 4.19.10 (HP Envy x360 13-ag0004ng)
Attached the output of /proc/interrupts
The only case where I got the "nobody cared" panic on my HP Envy x360 bq-1xx was if I used Windows 10 on my last boot and rebooted into Linux.
My theory is that Windows set the IRQ 7 on a state that persists on reboot(and trigger the panic in linux) and only get cleared if you hold down power button.
My Laptop is now Linux only and since then I never had this issue again(using 4.19.5 now).
This issue occurs without regard to Windows 10 previous boot. From cold powerup I get failure to boot about 2 out of 3 attempts. Anyway to remove this touchscreen driver? I think it should be backed out until the regression is fixxed.
(In reply to JerryD from comment #8)
> This issue occurs without regard to Windows 10 previous boot. From cold
> powerup I get failure to boot about 2 out of 3 attempts. Anyway to remove
> this touchscreen driver? I think it should be backed out until the
> regression is fixxed.
I don't even have Windows any more and get the bug sometimes. However, I do not agree that the driver should be removed. I still use the touchscreen daily because for me it is working most of the time. Besides, it does not hurt having it there, does it?
Are there any workarounds?
Some suggest here: https://github.com/linuxwacom/wacom-hid-descriptors/issues/12
that switching to Legacy BIOS boot helped them.
I can confirm that always returning IRQ_HANDLED fixes the error, but spams the system with a lot of these interrupts (it never stops)
Sometimes booting with the kernel option noirqdebug helped me to get the touchscreen up and running again.
I then noticed the following: See attached three files. One with a modified amd_gpio_irq_handler and two dmesg outputs. One of them with a working touchscreen and one where the touchscreen does not work. I can see that in the working case all interrupts are handled correctly while in the "not-working"-case there are A LOT of interrupts handled at all.
Created attachment 280583 [details]
Created attachment 280585 [details]
modified - dmesg - working
Not working - dmesg (too large to attach directly): https://bit.ly/2FHChBb