Bug 219440

Summary: REGRESSION: Touchscreen stops working after Suspend
Product: Drivers Reporter: Michael (auslands-kv)
Component: Input DevicesAssignee: drivers_input-devices
Status: RESOLVED CODE_FIX    
Severity: high CC: jarkko.nikula, kl
Priority: P3    
Hardware: IA-64   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id: 7d6f065de37c
Attachments: dmesg after suspend with touchscreen not working on 6.12-rc4
dmesg after supend with touchscreen working on 6.9.12
Weida workaround using a pre-power-on delay
Weida workaround through resume-specific PWR_ON command retry
dmesg after suspend with touchscreen not working on 6.12-rc6 with patch2 from Kenny
Patch for additional logging to ensure that power retry is active
dmesg after suspend with touchscreen not working on 6.12-rc4 with patch2 and patch 3from Kenny
dmesg after supend with touchscreen working on 6.9.12 (with DEBUG output)
WIP patch to retry resume power-on multiple times
WIP add-on patch to also skip bus probe for Weida
dmesg after suspend with touchscreen not working on 6.12-rc7 with multiple retries
dmesg after suspend with touchscreen working on 6.12-rc7 with multiple retries and skip bus probe

Description Michael 2024-10-29 08:43:55 UTC
Just noticed that the touchscreen on my ASUS vivobook S14 stops working after a suspend-to-idle. As this is something, I clearly didn't have before, I tested every kernel version released in the last six months and found the kernel, where the bug was introduced: 6.10. The last 6.9.12 is still working correctly. Since 6.10 all kernel versions have the problem.

Some more info:

Hardware: ASUS Vivobook S14 (TP3402VA)
Kernel working: up to 6.9.12
Kernel defect: from 6.10
OS: nixos

I do not have much knowledge about the input devices. I tested that i2c_hid_acpi seems to be relevant for the touchscreen (and also the touchpad), as, when I remove it, both stop working. Reloading the kernel module restores functionality (but NOT after a suspend-to-idle!!!). Otherwise, I do not see any error messages or so. (Or do not recognize them...)

Any help I can offer to identify the regression bug?
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-31 12:48:33 UTC
Please check if 6.12-rc5 is still affected. If it does, please provide a dmesg output from a working kernel and 6.12 (both after a suspend)
Comment 2 Michael 2024-10-31 16:24:05 UTC
OK, I need to wait a couple of days until it is available in the nixpkgs testing repo. I'll report back then.
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-31 16:43:46 UTC
-rc4 is good enough, too; latest 6.11.y release might work as well, but better to test 6.12-rc
Comment 4 Michael 2024-10-31 17:00:59 UTC
Created attachment 307102 [details]
dmesg after suspend with touchscreen not working on 6.12-rc4
Comment 5 Michael 2024-10-31 17:05:39 UTC
Created attachment 307103 [details]
dmesg after supend with touchscreen working on 6.9.12
Comment 6 Michael 2024-10-31 17:32:05 UTC
6.12-rc4 does not work either. The regression started with 6.10. I have attached  two dmesg outputs after the following sequence: boot, login, suspend (Gnome via touchscreen), wake up via keyboard, login again, test touchscreen.

One for 6.9.12, where touuchscreen works correctly after suspend, and one for 6.12-rc4 with touchscreen inactive after suspend.

I hope that helps.
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-11-01 08:45:07 UTC
(In reply to Michael from comment #6)
> 6.12-rc4 does not work either.

Thx. Can I CC you when forwarding this bug by mail? This would expose your address to the public.

> I hope that helps.

It does, as it shows the problem:

i2c_designware i2c_designware.1: controller timed out
i2c_designware i2c_designware.1: timeout in disabling adapter
i2c_hid_acpi i2c-WDHT1F01:00: failed to change power setting.
i2c_hid_acpi i2c-WDHT1F01:00: PM: dpm_run_callback(): acpi_subsys_resume returns -110
i2c_hid_acpi i2c-WDHT1F01:00: PM: failed to resume async: error -110
Comment 8 Michael 2024-11-04 06:47:16 UTC
Happy that it helps. I hope the fix for the issue will be easy then.

I get notifications when the bug is updated, so no need to put me anywhere on CC (I guess).

Cheers

Michael
Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-11-04 07:02:34 UTC
(In reply to Michael from comment #8)
> I get notifications when the bug is updated, so no need to put me anywhere
> on CC (I guess).

If it is updated -- which is unlikely, as a lot (most) developer do not care. Saying this in the hope that you might reconsider.
Comment 10 Michael 2024-11-04 08:17:34 UTC
If it helps, put me on CC. I thought it wouldn't make a difference, but if it does, yes.
Comment 11 Jarkko Nikula 2024-11-06 10:28:22 UTC
Hmm.. there were only three minor i2c-designware changes for the v6.10 and they won't explain the regression here as far as I can see. Those 'controller timed out errors may indicate the I2C signals SCL and SDA are pulled down and controller is not able to drive them.

Bisecting between v6.9 and v6.10 should reveal the regressing commit.

Before full bisect I'd try first can some of these i2c-hid changes explain it. I'm purely guessing but at least commits 0eafc58f2194 and 7d6f065de37c seems to bu touching i2c-hid power sequencing.

git log --oneline  v6.9..v6.10 drivers/hid/i2c-hid/

0eafc58f2194 HID: i2c-hid: elan: fix reset suspend current leakage
c216843ca4cf Merge branch 'for-6.10/i2c-hid' into for-linus
d2b34fa81445 HID: i2c-hid: Remove unused label in i2c_hid_set_power
7d6f065de37c HID: i2c-hid: Use address probe to wake on resume
ab5ec06a7070 HID: i2c-hid: Retry address probe after delay
Comment 12 Michael 2024-11-06 16:06:25 UTC
Short update after some hours of compiling and testing:

ab5ec06a7070: still good
7d6f065de37c: bad

So, the culprit is 7d6f065de37c.
Comment 13 Jarkko Nikula 2024-11-08 07:52:34 UTC
Please report your finding to the patch author and linux-input@vger.kernel.org.
Comment 14 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-11-08 08:08:54 UTC
(In reply to Jarkko Nikula from comment #13)
> Please report your finding to the patch author and
> linux-input@vger.kernel.org.

let me handle that.
Comment 15 Michael 2024-11-10 16:54:22 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #14)
> (In reply to Jarkko Nikula from comment #13)
> > Please report your finding to the patch author and
> > linux-input@vger.kernel.org.
> 
> let me handle that.

Thanks!
Comment 16 Kenny Levinsen 2024-11-12 23:27:08 UTC
Created attachment 307210 [details]
Weida workaround using a pre-power-on delay
Comment 17 Kenny Levinsen 2024-11-12 23:27:59 UTC
Created attachment 307211 [details]
Weida workaround through resume-specific PWR_ON command retry
Comment 18 Kenny Levinsen 2024-11-12 23:38:52 UTC
I have added two different patches:

The first patch applies a delay quirk, waiting a (currently rather long) while before trying to send a command. This would work if the issue is just that Weida devices need time to settle somehow.

The second patch effectively reintroduces a retry of the power-on command, but only for the resume path. This one presumably works as well as before the regression.

Michael, would you mind testing if these? If the former works, it would also be very useful to know the ballpark the delay needs to be, as the current 1.5 second quirk meant for a Goodix device might be a tad long.
Comment 19 Michael 2024-11-13 12:00:58 UTC
Hi Kenny

Thanks for the patches, which I tested on 6.12-rc6. Strangely, none of them restored the touchscreen functionality.

As this surprised me (I expected the second one to work for sure), I retested the commits Jarkko mentioned, however with the same result as before, so ab5ec06a7070 works, but 7d6f065de37c not.

I have no idea, why your second patch doesn't work now. Maybe something else was changed in i2c-hid after that patch, which also disables the correct function.

I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps.
Comment 20 Michael 2024-11-13 12:05:33 UTC
Hi Kenny

Thanks for the patches, which I tested on 6.12-rc6. Strangely, none of them restored the touchscreen functionality.

As this surprised me (I expected the second one to work for sure), I retested the commits Jarkko mentioned, however with the same result as before, so ab5ec06a7070 works, but 7d6f065de37c not.

I have no idea, why your second patch doesn't work now. Maybe something else was changed in i2c-hid after that patch, which also disables the correct function.

I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps.
Comment 21 Michael 2024-11-13 12:06:14 UTC
Created attachment 307215 [details]
dmesg after suspend with touchscreen not working on 6.12-rc6 with patch2 from Kenny
Comment 22 Kenny Levinsen 2024-11-13 13:02:45 UTC
Created attachment 307216 [details]
Patch for additional logging to ensure that power retry is active
Comment 23 Kenny Levinsen 2024-11-13 13:04:19 UTC
> I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps.

Hmm. With the second patch, if set_power failed both times I was expecting to see the following message twice in your dmesg:

[   51.517343] i2c_hid_acpi i2c-WDHT1F01:00: failed to change power setting.

If on the other hand the second attempt had succeeded, I would have expected to see the 60ms delay from i2c_hid_set_power take effect and delay further messages for this device (on top of hopefully succeeding).

Before investigating further, are you sure that the shared dmesg did have the patches take effect? I added a third debug patch that helps verify if the code is active.
Comment 24 Michael 2024-11-18 12:28:42 UTC
(In reply to Kenny Levinsen from comment #23)
> > I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps.
> 
> Hmm. With the second patch, if set_power failed both times I was expecting
> to see the following message twice in your dmesg:
> 
> [   51.517343] i2c_hid_acpi i2c-WDHT1F01:00: failed to change power setting.
> 
> If on the other hand the second attempt had succeeded, I would have expected
> to see the 60ms delay from i2c_hid_set_power take effect and delay further
> messages for this device (on top of hopefully succeeding).
> 
> Before investigating further, are you sure that the shared dmesg did have
> the patches take effect? I added a third debug patch that helps verify if
> the code is active.

Yeah, I was surprised, too. Therefore I not only compiled the module, but also the whole kernel, to be somewhat sure. But having a debug message is better, so I'll give it a try and report back.
Comment 25 Kenny Levinsen 2024-11-18 15:22:23 UTC
You can enable additional dynamic debug messages with with:

sudo mount -t debugfs none /sys/kernel/debug
echo 'file i2c-hid-core.c +p' | sudo tee /sys/kernel/debug/dynamic_debug/control


To give a reference, I simulated a failure of the first and second i2c_hid_set_power call on resume with the debug patch and the i2c-hid-core dynamic debug. If first fails, but second succeeds:

[   26.275877] ACPI: PM: Waking up from system sleep state S3
[   26.339544] ACPI: EC: interrupt unblocked
[   26.361816] ACPI: EC: event unblocked
[   26.364636] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power
[   26.364643] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08
[   26.365125] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting.
[   26.425548] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting, retrying once.
[   26.425554] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power
[   26.425557] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08


When both fail:

[  194.735681] ACPI: PM: Waking up from system sleep state S3
[  194.795460] ACPI: EC: interrupt unblocked
[  194.817259] ACPI: EC: event unblocked
[  194.820358] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power
[  194.820367] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08
[  194.820584] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting.
[  195.434896] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting, retrying once.
[  195.434900] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power
[  195.434902] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08
[  195.435113] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting.
[  195.497064] i2c_hid_acpi i2c-ELAN24EE:00: resume failed.
[  195.497070] i2c_hid_acpi i2c-ELAN24EE:00: PM: dpm_run_callback(): acpi_subsys_resume returns -1
[  195.497080] i2c_hid_acpi i2c-ELAN24EE:00: PM: failed to resume async: error -1
Comment 26 Michael 2024-11-19 07:04:03 UTC
Created attachment 307229 [details]
dmesg after suspend with touchscreen not working on 6.12-rc4 with patch2 and patch 3from Kenny
Comment 27 Michael 2024-11-19 07:10:29 UTC
I only compiled the module this time, as we now have the debug log. Seems that the patch is active, but not working.

Maybe there are additional changes to i2c_hid somewhere after 7d6f065de37c that pose additional problems for the correct working on the ASUS?

For that to check, I would probably need to test different commits (e.g. the ones that Jarkko mentioned) and apply the patch to all of them. Then we may see, if there is another commit that interferes?

I'm open for suggestions.
Comment 28 Michael 2024-11-19 07:14:51 UTC
Created attachment 307231 [details]
dmesg after supend with touchscreen working on 6.9.12 (with DEBUG output)
Comment 29 Michael 2024-11-19 07:15:42 UTC
For completeness I added a dmesg with a working touchscreen on 6.9.12 with DEBUG output.
Comment 30 Kenny Levinsen 2024-11-19 12:33:29 UTC
Thanks for the logs. Observations:

1. We definitely see the patch active now, and interestingly the second failure is that the bus is still busy. Assuming the Weida device is the only device on that bus, it would mean that the device is hanging for an extended period of time. Hopefully not indefinitely.

2. The 6.9.12 log did *not* rely on the old retry within i2c_hid_set_power at all.

The power-on sequence debug looks like this:

[   94.642854] i2c_hid_acpi i2c-WDHT1F01:00: i2c_hid_set_power
[   94.642864] i2c_hid_acpi i2c-WDHT1F01:00: i2c_hid_xfer: cmd=22 00 00 08

(cmd=22 00 01 08 is power off)

That his worked on first try on the older kernel would either mean that it is only occasionally a problem (e.g., tied to how the system is woken up, timing, or the alignment of the stars), or that another change has caused the failure which may or may not be solvable with the retry.

I don't have quite enough info to draw a conclusion yet. It would certainly be interesting to know if 6.9.12 just *never* needs the retry, and if retry suddenly ends up needed in an older kernel.

However, we can at least try two things from within i2c-hid: First, retrying many more times to see if the device lets go of the data line a bit later. Second, removing the initial bus probe to effectively revert all resume behavior changes and see if that's what causes the Weida device to hang.
Comment 31 Kenny Levinsen 2024-11-19 12:34:18 UTC
Created attachment 307239 [details]
WIP patch to retry resume power-on multiple times
Comment 32 Kenny Levinsen 2024-11-19 12:35:01 UTC
Created attachment 307240 [details]
WIP add-on patch to also skip bus probe for Weida
Comment 33 Michael 2024-11-19 16:26:25 UTC
Created attachment 307242 [details]
dmesg after suspend with touchscreen not working on 6.12-rc7 with multiple retries
Comment 34 Michael 2024-11-19 16:27:07 UTC
Created attachment 307243 [details]
dmesg after suspend with touchscreen working on 6.12-rc7 with multiple retries and skip bus probe
Comment 35 Michael 2024-11-19 16:29:16 UTC
So, using the multiple retry patch on 6.12-rc7 does not help. Adding the skip bus probe patch restores functionality of the touchscreen. Apparently the bus probe hangs the touchscreen.
Comment 36 Michael 2024-11-19 16:37:25 UTC
(Awfully 6.12 seems to have another regression, as the /sys/class/power_supply interface is now missing. This will then be the next investigation I probably need to do as soon as we fixed the touchscreen. :-) )
Comment 37 Kenny Levinsen 2024-11-19 20:39:37 UTC
On one hand I'm glad that we have found the trigger, on the other hand knowing what the trigger is makes me really sad. :(

If our I2C drivers had more consistent error reporting we would be able to do away with the bus probe read entirely and always use the power command for wake-up, but wouldn't it be nice if devices would just compliant with even basic I2C...
Comment 38 Kenny Levinsen 2024-11-19 23:59:58 UTC
Final patch posted here: https://lore.kernel.org/regressions/20241119235615.23902-1-kl@kl.wtf/T/#u

Michael, a confirmation of this patch would on that thread would be nice. Feel free to specify if you want the Reported-by (and later, Tested-by) to be presented differently, name- and email-wise.

Thanks for all your testing!
Comment 39 Michael 2024-11-20 15:45:35 UTC
Hi Kenny

I can confirm that the last patch alone restores touchscreen functionality after suspend. Pity that the devices are so incompliant to specs, but I'm happy we could get it working again.

Thanks for all your help and work (also to Jarkko and Thorsten)!

(Now I need to see where I report the other regression with some missing power_supply entries (e.g. charge_control_end_threshold).