Bug 215253

Summary: i2c-ELAN071B:00: can't add hid device: -5 (ELAN071B:00 touchpad randomly doesn't work), along with ACPI errors
Product: Drivers Reporter: coronagraph (antintin0)
Component: I2CAssignee: Drivers/I2C virtual user (drivers-i2c)
Status: NEW ---    
Severity: normal CC: marco.rodolfi, nehal-bakulchandra.shah, shyam-sundar.s-k
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.15.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: relevant dmesg output

Description coronagraph 2021-12-07 04:12:52 UTC
Created attachment 299925 [details]
relevant dmesg output

Tested on kernel 5.15.6 on HP Probook x360 435 g7

Description: 
My elantech touchpad randomly stops working completely. I'm do not 100% remember if it only happens after suspend, but the issue can persist after a reboot sometimes (and a shutdown will be the only way to solve the issue). Other times my touchpad will get detected as a mouse instead, and features like disable while typing, scrolling, and other gestures do not work. 

Doing 

rmmod i2c_hid_acpi i2c_hid
modprobe i2c-hid-acpi
modprobe i2c-hid

seems to fix the issue most of the time, but sometimes if my touchpad was not working at all it'll only half-fix the problem by making it be detected as a mouse. 

Attached below is the relevant dmesg output. 
The most relevant line seems to be
i2c_hid_acpi i2c-ELAN2513:00: i2c_hid_get_input: IRQ triggered but there's no data

, but my touchpad is ELAN071B:00 not ELAN2513:00. 

ELAN071B:00 is mentioned in the following lines:
i2c_hid_acpi i2c-ELAN071B:00: failed to change power setting.
i2c_hid_acpi i2c-ELAN071B:00: can't add hid device: -5
i2c_hid_acpi: probe of i2c-ELAN071B:00 failed with error -5

How reproducible:
Seems random, but once it starts it often persists after reboots. 

Steps to Reproduce:
Uknown

Actual results:
Touchpad either fails to work entirely or is detected as a mouse, causing touchpad-specific features to stop working.

Expected results:
Touchpad works.

dmesg is attached
Comment 1 Marco 2021-12-19 12:37:31 UTC
Yes, it's either a AMD Firmware platform bug or a driver issue. I have the same randomly occurring issue with a 4500U on an Asus Vivobook 14 TP420IA. This system has an Elan Touchscreen input also on i2c, and when the Elan touchpad stop working (which is also i2c) the touchscreen also stop working completely.

I've found people that using this modprobe parameter on their system fixed the issue, but on my case this hasn't changed anything:

cat /etc/modprobe.d/10.i2c-precedence.conf
softdep i2c_hid_acpi pre: pinctrl_amd

This forces the system to load i2c_hid_acpi after pinctrl_amd, which is the actual driver that prepares the i2c system on the APU.

If I have to guess, the probable cause rather than i2c_hid_acpi is probably more related in pinctrl_amd, since it's the actual provider of the interrupts, if I don't remember correctly; and in my case, the actual communication on the i2c bus completely shut down (and this also rarely happens even on the internal USB port on the m2 module, where the Bluetooth receiver is attached, although that is usually fixable with a sleep-wakeup of the machine, and is not persistent during reboot).

If anyone from AMD can shed some lights to this issue, I would be glad.

Marco.
Comment 2 Marco 2021-12-19 12:41:05 UTC
I've added the two contacts information to this issue, I'm sorry if this is unsolicited, but I do not know which other avenue to pursue to fix this. As soon as this issue will represent itself, I'll post logs for dmesg.
Comment 3 coronagraph 2021-12-19 20:28:22 UTC
(In reply to Marco from comment #2)
> I've added the two contacts information to this issue, I'm sorry if this is
> unsolicited, but I do not know which other avenue to pursue to fix this. As
> soon as this issue will represent itself, I'll post logs for dmesg.

I looked up some more stuff and thought it might be related to faulty ACPI tables (not 100% sure what those do though) because of an ACPI error I get every boot and because "acpi" is in the name of "i2c_hid_acpi"...

This error is printed on the screen during every boot:
ACPI BIOS ERROR (bug): Could not resolve symbol [\_SB.PCI0.BUSB.SAT1], AE_NOT_FOUND (20210730/dswload2-162)
ACOU Error: AE_NOT_FOUND, During name lookup/catalog (20210730,psobject-220)

And this 1 only shows in dmesg:
ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B06 (\_SB.PCI0.SMBS.SMBO) (20210730/utaddress-204)

Here's the link to the ACPI bug report: https://bugzilla.kernel.org/show_bug.cgi?id=215345

I tried the loading pinctrl_amd first fix already, but it had no effect.
Comment 4 Nehal Shah 2022-01-13 09:01:56 UTC
Hi

is the touch pad connected with SMBUS?

Regards
Nehal
Comment 5 coronagraph 2022-01-15 04:30:02 UTC
(In reply to Nehal Shah from comment #4)
> Hi
> 
> is the touch pad connected with SMBUS?
> 
> Regards
> Nehal

I'm not sure how to check, but my output from "lspci" includes a line with "SMBus"

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device 24: Function 7
01:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c2)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
04:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor (rev 01)
04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller
04:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Sensor Fusion Hub
Comment 6 coronagraph 2022-01-18 07:10:48 UTC
Also, in the original report I said restarting i2c_hid_acpi usually fixes the issue, but now I think it only works around 20% of the time.
Comment 7 Marco 2022-01-27 20:46:34 UTC
After the latest firmware update for my product, BIOS 305, I haven't been able to reproduce the issue; I'm not sure if it's just luck of what, but both touchpad and touchscreen never stop working, even with a lot of sleep/wake cycles, that seems to be the simpler way to trigger it.

I'll keep it updated if I see this again,

Marco.
Comment 8 coronagraph 2022-01-30 07:47:03 UTC
(In reply to Marco from comment #7)
> After the latest firmware update for my product, BIOS 305, I haven't been
> able to reproduce the issue; I'm not sure if it's just luck of what, but
> both touchpad and touchscreen never stop working, even with a lot of
> sleep/wake cycles, that seems to be the simpler way to trigger it.
> 
> I'll keep it updated if I see this again,
> 
> Marco.

Yeah that probably confirms that it's an ACPI protocol error because based on what I've read there's some kind of interaction with the BIOS and ACPI, but the implementation for linux is buggy because of proprietary microsoft standards. 

This bug report is pretty inactive, so if you know where I could file a bug specifically for pinctrl_amd (since that must still use ACPI even though its not in the name) that would be great.
Comment 9 Marco 2022-01-30 17:54:49 UTC
(In reply to coronagraph from comment #8)
> ...

Taken from the help text from CONFIG_PINCTRL_AMD, this is mentioned

...
Requires ACPI/FDT device enumeration code to set up a platform device.
...

So, yes, I actually think it's just an ACPI platform bug on our specific boards. The issue is that ACPI is heavily customized from the ODM, so even reporting this to AMD, I kinda doubt they will be able to help.

Frankly, I'm actually thinking of two options:

1) Pester your ODM (HP) for a firmware update for this issue (and try to apply the latest BIOS update for your platform, if available).
2) Post this under X86-64 platform-specific issues? It's probably the close one that it actually comes to mind to me for something like this.

Luckily for me the issue seems to be gone, I hope it will stay that way. The errors from ACPI are still identical even with the new firmware, so I don't really know what changed, just enough to make the hardware work correctly (?). Don't really know.

Hope that you will be able to solve this,

Marco.
Comment 10 coronagraph 2022-01-31 01:51:46 UTC
(In reply to Marco from comment #9)
> (In reply to coronagraph from comment #8)
> > ...
> 
> Taken from the help text from CONFIG_PINCTRL_AMD, this is mentioned
> 
> ...
> Requires ACPI/FDT device enumeration code to set up a platform device.
> ...
> 
> So, yes, I actually think it's just an ACPI platform bug on our specific
> boards. The issue is that ACPI is heavily customized from the ODM, so even
> reporting this to AMD, I kinda doubt they will be able to help.
> 
> Frankly, I'm actually thinking of two options:
> 
> 1) Pester your ODM (HP) for a firmware update for this issue (and try to
> apply the latest BIOS update for your platform, if available).
> 2) Post this under X86-64 platform-specific issues? It's probably the close
> one that it actually comes to mind to me for something like this.
> 
> Luckily for me the issue seems to be gone, I hope it will stay that way. The
> errors from ACPI are still identical even with the new firmware, so I don't
> really know what changed, just enough to make the hardware work correctly
> (?). Don't really know.
> 
> Hope that you will be able to solve this,
> 
> Marco.

Thanks for the advice, but I'm kind of starting to lose hope...
I opened an HP support ticket and obviously just got copy + paste responses, and they wouldn't forward my request to someone higher up. 
I scoured the internet for any kind of email or contact form for HP BIOS developers but found nothing.
The only hope I have left is a ticket I submitted on developers.hp.com, but that seems to be more focused on printers and other consumer software :(