Bug 200529

Summary: Elantech touchpad stops working after a while, shows "irq 16: nobody cared"
Product: Drivers Reporter: guimarcalsilva
Component: I2CAssignee: Drivers/I2C virtual user (drivers-i2c)
Status: NEW ---    
Severity: normal CC: benjamin.tissoires, guimarcalsilva, jwrdegoede, mika.westerberg
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.17.4-1-default to 5.4.0-72-generic Subsystem:
Regression: No Bisected commit-id:
Attachments: Devices available
Dmesg after boot
Dmesg diff with the bug
Xorg log
Full dmesg log with bug
My computer's interrupts
ACPI
Extracted
All hardware log
DMESG Again
Dmesg with Irqpool boot option is different.

Description guimarcalsilva 2018-07-17 19:41:36 UTC
Created attachment 277381 [details]
Devices available
Comment 1 guimarcalsilva 2018-07-17 19:48:43 UTC
My i2c Elantech touchpad stops working after a while on my Acer F5-573. Dmesg diff shows "irq 16: nobody cared (try booting with the "irqpoll" option)". I reported the bug to libinput developers and we came to the conclusion it's probably a kernel bug. I can reproduce it on a lot of different kernels and Linux distros, the kernel version i'm using right now is "4.17.4-1-default". I must say the touchpad works fine under Windows, and I only get this bug on Linux.

Here's the link from libinput GitHub with more information: https://gitlab.freedesktop.org/libinput/libinput/issues/77

I'll attach some logs, new ones made with kernel 4.17.4-1 instead of the ones on the link above.

Thanks!
Comment 2 guimarcalsilva 2018-07-17 19:49:26 UTC
Created attachment 277383 [details]
Dmesg after boot
Comment 3 guimarcalsilva 2018-07-17 19:50:03 UTC
Created attachment 277385 [details]
Dmesg diff with the bug
Comment 4 guimarcalsilva 2018-07-17 19:51:35 UTC
Created attachment 277387 [details]
Xorg log
Comment 5 guimarcalsilva 2018-07-17 19:53:05 UTC
Created attachment 277389 [details]
Full dmesg log with bug
Comment 6 guimarcalsilva 2018-07-17 20:03:24 UTC
I also need to say that sometimes when it stops, if I run...
modprobe -r i2c_hid
modprobe i2c_hid

...it starts working again, but that workaround doesn't always work.

Usually when the cursor stops moving, when I move my finger, it starts clicking like crazy, as noted in my bug report to libinput.

It frequently happens when i'm playing on Dolphin emulator (everytime actually), but it already happened when I was taking a look at some screensavers. It (still) didn't happen when using the laptop for other activities.

I found some other people with the exact same problem as me on the internet:

https://askubuntu.com/questions/1004769/touchpad-works-slow-disabling-irq-16
https://bbs.archlinux.org/viewtopic.php?id=221463

Thanks.
Comment 7 Benjamin Tissoires 2018-07-19 08:13:51 UTC
It seems to be the same kind of issue we have in https://bugzilla.kernel.org/show_bug.cgi?id=198473

In the two links in the previous comment, we see:
cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3       
  16:     126908      27190     205425      53703  IR-IO-APIC   16-fasteoi   idma64.0, i801_smbus, i2c_designware.0
  82:          4          2         10          1  IR-IO-APIC   82-edge      SYNA7DB5:00

And I believe this is far too many interrupts for a I2C chip.
Mika, Hans, any ideas if the pinctrl is at fault here too? I would have expected c41eb2c7f93531b8 which is in v4.17 to fix that, but it seems it's not the case.
Comment 8 guimarcalsilva 2018-07-19 10:20:27 UTC
(In reply to Benjamin Tissoires from comment #7)
> It seems to be the same kind of issue we have in
> https://bugzilla.kernel.org/show_bug.cgi?id=198473

Well, in my case the touchpad works normally until it stops moving and starts clicking everytime I move my finger.


> In the two links in the previous comment, we see:
> cat /proc/interrupts
>             CPU0       CPU1       CPU2       CPU3       
>   16:     126908      27190     205425      53703  IR-IO-APIC   16-fasteoi  
> idma64.0, i801_smbus, i2c_designware.0
>   82:          4          2         10          1  IR-IO-APIC   82-edge     
> SYNA7DB5:00

I can't post my "cat /proc/interrupts" now but if I remember well on my computer it looks exactly the same.
Comment 9 Mika Westerberg 2018-07-23 16:32:41 UTC
Looking at the /proc/interrupts the touchpad seems to use APIC interrupt (not GPIO) so I don't think pinctrl driver is involved in this at all.
Comment 10 Hans de Goede 2018-07-27 12:46:28 UTC
Hmm, the IOAPIC being used for the interrupt is weird, can you attach an acpidump for the machine please?
Comment 11 guimarcalsilva 2018-07-30 17:22:26 UTC
Created attachment 277611 [details]
My computer's interrupts
Comment 12 guimarcalsilva 2018-07-30 17:28:40 UTC
Created attachment 277613 [details]
ACPI

I don't know if i'm doing this correctly, but here's my acpidump (learned here: http://smackerelofopinion.blogspot.com/2009/10/dumping-acpi-tables-using-acpidump-and.html)
Comment 13 guimarcalsilva 2018-07-30 17:30:04 UTC
Created attachment 277615 [details]
Extracted

Guess all files are here
Comment 14 guimarcalsilva 2018-08-02 11:24:46 UTC
Created attachment 277663 [details]
All hardware log

Here's Suse hardware log. Interesting enough there's some parts saying it's a PS/2 Mouse, but if I remember well on Mint it doesn't show that.
Comment 15 guimarcalsilva 2018-08-02 13:05:20 UTC
Created attachment 277665 [details]
DMESG Again

Bug happened again, here's a new dmesg log. I don't know if this one is any different, but this time I was benchmarking with Unigine Valley GPU benchmark to confirm a suspicion I had, and I confirmed it:

Before that I used the computer about 3 times, about 40 minutes each without any problems. I'm certain the bug only happens when i'm stressing the GPU (possibly the CPU too). It already happened when using both OpenGL and Vulkan on Dolphin-emulator, when using the Valley benchmark and when I was taking a look at some 3D screensavers, when I use the computer to browse the web, or any other simple task the bug doesn't get triggered.
Comment 16 guimarcalsilva 2018-08-02 13:43:29 UTC
Created attachment 277667 [details]
Dmesg with Irqpool boot option is different.

Now I tried with the IRQPOOL option and dmesg seems to be different after the bug happens, before it would show: "CPU: 1 PID: 0 Comm: swapper/1 Not tainted", now it shows "CPU: 1 PID: 3280 Comm: Audio thread -  Not tainted".

This part is also different:

Before irqpool:

[ 1022.761949] RIP: 0010:cpuidle_enter_state+0xbc/0x2e0
[ 1022.761950] RSP: 0018:ffff97d000d23eb0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdb
[ 1022.761952] RAX: 0000000000000001 RBX: 000000ee215c3334 RCX: 000000000000001f
[ 1022.761953] RDX: 000000ee215c3334 RSI: 0000000000022740 RDI: 0000000000000000
[ 1022.761953] RBP: 0000000000000001 R08: 0000024532af73c8 R09: 0000000000000005
[ 1022.761954] R10: 00000000ffffffff R11: ffff89a8d14a1ea8 R12: ffffb7cfffc90870
[ 1022.761955] R13: ffffffff920d96d8 R14: 000000ee215c1668 R15: 0000000000000000
[ 1022.761960]  do_idle+0x21d/0x270
[ 1022.761962]  cpu_startup_entry+0x5f/0x70
[ 1022.761964]  start_secondary+0x1a0/0x1e0
[ 1022.761966]  secondary_startup_64+0xa5/0xb0

After irqpool:

[ 1011.718836] RIP: 0033:0x7f1a9dcc7720
[ 1011.718837] RSP: 002b:00007f1a40de63c8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdb
[ 1011.718839] RAX: 0000000000000000 RBX: 00007f1a3c003640 RCX: 00007f1a3c000d58
[ 1011.718840] RDX: 0000000000000001 RSI: 00007f1a40de63d7 RDI: 000000000000002a
[ 1011.718841] RBP: 0000000000000400 R08: 00000000000000fb R09: 0000000000000000
[ 1011.718841] R10: 0000000000000fb0 R11: 0000000000000202 R12: 0000000000000000
[ 1011.718842] R13: 0000000000000000 R14: 00007f1a3c012cd0 R15: 0000000000000000


With or without irqpool the bug still happens.
Comment 17 guimarcalsilva 2018-08-02 15:51:00 UTC
If I execute "modprobe -r i2c_hid" and "modprobe -f i2c_hid" the following shows up in dmesg (tried several times, doesn't work):

[ 7869.023842] i2c_designware i2c_designware.0: controller timed out
[ 7869.053169] i2c_designware i2c_designware.0: timeout in disabling adapter
[ 7869.053181] hid (null): reading report descriptor failed
[ 7869.053192] i2c_hid i2c-ELAN0501:00: can't add hid device: -5
[ 7869.053322] i2c_hid: probe of i2c-ELAN0501:00 failed with error -5
[ 7870.845461] i2c_designware i2c_designware.0: timeout in disabling adapter
[ 7934.239998] i2c_designware i2c_designware.0: controller timed out
[ 7934.269769] i2c_designware i2c_designware.0: timeout in disabling adapter
[ 7934.269781] hid (null): reading report descriptor failed
[ 7934.269792] i2c_hid i2c-ELAN0501:00: can't add hid device: -5
[ 7934.269992] i2c_hid: probe of i2c-ELAN0501:00 failed with error -5
[ 7935.837163] i2c_designware i2c_designware.0: timeout in disabling adapter
[ 7956.607886] i2c_designware i2c_designware.0: controller timed out
[ 7956.637430] i2c_designware i2c_designware.0: timeout in disabling adapter
[ 7956.637442] hid (null): reading report descriptor failed
[ 7956.637453] i2c_hid i2c-ELAN0501:00: can't add hid device: -5
[ 7956.637662] i2c_hid: probe of i2c-ELAN0501:00 failed with error -5
[ 7957.853402] i2c_designware i2c_designware.0: timeout in disabling adapter
[ 7968.314580] i2c_hid: module_layout: kernel tainted.
[ 7968.314582] Disabling lock debugging due to kernel taint
[ 7968.314639] i2c_hid: module verification failed: signature and/or required key missing - tainting kernel
[ 7970.431926] i2c_designware i2c_designware.0: controller timed out
[ 7970.463044] i2c_designware i2c_designware.0: timeout in disabling adapter
[ 7970.463059] hid (null): reading report descriptor failed
[ 7970.463071] i2c_hid i2c-ELAN0501:00: can't add hid device: -5
[ 7970.463289] i2c_hid: probe of i2c-ELAN0501:00 failed with error -5
[ 7971.870741] i2c_designware i2c_designware.0: timeout in disabling adapter
Comment 18 guimarcalsilva 2021-04-24 19:34:52 UTC
Sorry to revive such an old bug but in the past few days I've been using the same laptop again and the same thing happened. I used the command "journalctl --since "2021-04-24 16:20" | grep i2c" and this line from the time the bug happened could indicate something:

i2c_hid i2c-ELAN0501:00: i2c_hid_get_input: IRQ triggered but there's no data

I'm on kernel 5.4.0-72-generic using KDE Neon with Ubuntu 20.04LTS as a base.


Please note I'm only reporting this now due to the possibility of this bug affecting more people, so it's mostly for informative purposes, as this laptop is in it's last legs anyway.
Comment 19 guimarcalsilva 2023-02-27 00:32:15 UTC
I found a workaround for this bug. I'll post it here if someone else suffers from the same problem in the future:

On the UEFI setup, change the touchpad from Advanced to Basic mode. Acer laptops, like the one where I experienced this bug (Aspire F5-573 series), have this option.

On my particular laptop, this created another problem: sometimes the CPU would be flooded with IRQs and would run at 100% load, mostly while playing games or doing anything with 3D.

To fix that, blacklist the intel_lpss_pci module:

Add the file /etc/modprobe.d/lpss.conf with the following line: 

blacklist intel_lpss_pci

This will fix the high CPU usage problem and the touchpad will work fine this way. Please note that if you keep the touchpad in Advanced mode it will stop working if you blacklist that module. Only do this if your BIOS allows changing it to Basic. Also note that if you dual boot with Windows, the touchpad will lose some functionalities there.