Bug 219440
Description
Michael
2024-10-29 08:43:55 UTC
Please check if 6.12-rc5 is still affected. If it does, please provide a dmesg output from a working kernel and 6.12 (both after a suspend) OK, I need to wait a couple of days until it is available in the nixpkgs testing repo. I'll report back then. -rc4 is good enough, too; latest 6.11.y release might work as well, but better to test 6.12-rc Created attachment 307102 [details]
dmesg after suspend with touchscreen not working on 6.12-rc4
Created attachment 307103 [details]
dmesg after supend with touchscreen working on 6.9.12
6.12-rc4 does not work either. The regression started with 6.10. I have attached two dmesg outputs after the following sequence: boot, login, suspend (Gnome via touchscreen), wake up via keyboard, login again, test touchscreen. One for 6.9.12, where touuchscreen works correctly after suspend, and one for 6.12-rc4 with touchscreen inactive after suspend. I hope that helps. (In reply to Michael from comment #6) > 6.12-rc4 does not work either. Thx. Can I CC you when forwarding this bug by mail? This would expose your address to the public. > I hope that helps. It does, as it shows the problem: i2c_designware i2c_designware.1: controller timed out i2c_designware i2c_designware.1: timeout in disabling adapter i2c_hid_acpi i2c-WDHT1F01:00: failed to change power setting. i2c_hid_acpi i2c-WDHT1F01:00: PM: dpm_run_callback(): acpi_subsys_resume returns -110 i2c_hid_acpi i2c-WDHT1F01:00: PM: failed to resume async: error -110 Happy that it helps. I hope the fix for the issue will be easy then. I get notifications when the bug is updated, so no need to put me anywhere on CC (I guess). Cheers Michael (In reply to Michael from comment #8) > I get notifications when the bug is updated, so no need to put me anywhere > on CC (I guess). If it is updated -- which is unlikely, as a lot (most) developer do not care. Saying this in the hope that you might reconsider. If it helps, put me on CC. I thought it wouldn't make a difference, but if it does, yes. Hmm.. there were only three minor i2c-designware changes for the v6.10 and they won't explain the regression here as far as I can see. Those 'controller timed out errors may indicate the I2C signals SCL and SDA are pulled down and controller is not able to drive them. Bisecting between v6.9 and v6.10 should reveal the regressing commit. Before full bisect I'd try first can some of these i2c-hid changes explain it. I'm purely guessing but at least commits 0eafc58f2194 and 7d6f065de37c seems to bu touching i2c-hid power sequencing. git log --oneline v6.9..v6.10 drivers/hid/i2c-hid/ 0eafc58f2194 HID: i2c-hid: elan: fix reset suspend current leakage c216843ca4cf Merge branch 'for-6.10/i2c-hid' into for-linus d2b34fa81445 HID: i2c-hid: Remove unused label in i2c_hid_set_power 7d6f065de37c HID: i2c-hid: Use address probe to wake on resume ab5ec06a7070 HID: i2c-hid: Retry address probe after delay Short update after some hours of compiling and testing: ab5ec06a7070: still good 7d6f065de37c: bad So, the culprit is 7d6f065de37c. Please report your finding to the patch author and linux-input@vger.kernel.org. (In reply to Jarkko Nikula from comment #13) > Please report your finding to the patch author and > linux-input@vger.kernel.org. let me handle that. (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #14) > (In reply to Jarkko Nikula from comment #13) > > Please report your finding to the patch author and > > linux-input@vger.kernel.org. > > let me handle that. Thanks! Created attachment 307210 [details]
Weida workaround using a pre-power-on delay
Created attachment 307211 [details]
Weida workaround through resume-specific PWR_ON command retry
I have added two different patches: The first patch applies a delay quirk, waiting a (currently rather long) while before trying to send a command. This would work if the issue is just that Weida devices need time to settle somehow. The second patch effectively reintroduces a retry of the power-on command, but only for the resume path. This one presumably works as well as before the regression. Michael, would you mind testing if these? If the former works, it would also be very useful to know the ballpark the delay needs to be, as the current 1.5 second quirk meant for a Goodix device might be a tad long. Hi Kenny Thanks for the patches, which I tested on 6.12-rc6. Strangely, none of them restored the touchscreen functionality. As this surprised me (I expected the second one to work for sure), I retested the commits Jarkko mentioned, however with the same result as before, so ab5ec06a7070 works, but 7d6f065de37c not. I have no idea, why your second patch doesn't work now. Maybe something else was changed in i2c-hid after that patch, which also disables the correct function. I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps. Hi Kenny Thanks for the patches, which I tested on 6.12-rc6. Strangely, none of them restored the touchscreen functionality. As this surprised me (I expected the second one to work for sure), I retested the commits Jarkko mentioned, however with the same result as before, so ab5ec06a7070 works, but 7d6f065de37c not. I have no idea, why your second patch doesn't work now. Maybe something else was changed in i2c-hid after that patch, which also disables the correct function. I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps. Created attachment 307215 [details]
dmesg after suspend with touchscreen not working on 6.12-rc6 with patch2 from Kenny
Created attachment 307216 [details]
Patch for additional logging to ensure that power retry is active
> I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps.
Hmm. With the second patch, if set_power failed both times I was expecting to see the following message twice in your dmesg:
[ 51.517343] i2c_hid_acpi i2c-WDHT1F01:00: failed to change power setting.
If on the other hand the second attempt had succeeded, I would have expected to see the 60ms delay from i2c_hid_set_power take effect and delay further messages for this device (on top of hopefully succeeding).
Before investigating further, are you sure that the shared dmesg did have the patches take effect? I added a third debug patch that helps verify if the code is active.
(In reply to Kenny Levinsen from comment #23) > > I will get a dmesg from 6.12-rc6 with your patch. Maybe it helps. > > Hmm. With the second patch, if set_power failed both times I was expecting > to see the following message twice in your dmesg: > > [ 51.517343] i2c_hid_acpi i2c-WDHT1F01:00: failed to change power setting. > > If on the other hand the second attempt had succeeded, I would have expected > to see the 60ms delay from i2c_hid_set_power take effect and delay further > messages for this device (on top of hopefully succeeding). > > Before investigating further, are you sure that the shared dmesg did have > the patches take effect? I added a third debug patch that helps verify if > the code is active. Yeah, I was surprised, too. Therefore I not only compiled the module, but also the whole kernel, to be somewhat sure. But having a debug message is better, so I'll give it a try and report back. You can enable additional dynamic debug messages with with: sudo mount -t debugfs none /sys/kernel/debug echo 'file i2c-hid-core.c +p' | sudo tee /sys/kernel/debug/dynamic_debug/control To give a reference, I simulated a failure of the first and second i2c_hid_set_power call on resume with the debug patch and the i2c-hid-core dynamic debug. If first fails, but second succeeds: [ 26.275877] ACPI: PM: Waking up from system sleep state S3 [ 26.339544] ACPI: EC: interrupt unblocked [ 26.361816] ACPI: EC: event unblocked [ 26.364636] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power [ 26.364643] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08 [ 26.365125] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting. [ 26.425548] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting, retrying once. [ 26.425554] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power [ 26.425557] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08 When both fail: [ 194.735681] ACPI: PM: Waking up from system sleep state S3 [ 194.795460] ACPI: EC: interrupt unblocked [ 194.817259] ACPI: EC: event unblocked [ 194.820358] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power [ 194.820367] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08 [ 194.820584] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting. [ 195.434896] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting, retrying once. [ 195.434900] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_set_power [ 195.434902] i2c_hid_acpi i2c-ELAN24EE:00: i2c_hid_xfer: cmd=05 00 00 08 [ 195.435113] i2c_hid_acpi i2c-ELAN24EE:00: failed to change power setting. [ 195.497064] i2c_hid_acpi i2c-ELAN24EE:00: resume failed. [ 195.497070] i2c_hid_acpi i2c-ELAN24EE:00: PM: dpm_run_callback(): acpi_subsys_resume returns -1 [ 195.497080] i2c_hid_acpi i2c-ELAN24EE:00: PM: failed to resume async: error -1 Created attachment 307229 [details]
dmesg after suspend with touchscreen not working on 6.12-rc4 with patch2 and patch 3from Kenny
I only compiled the module this time, as we now have the debug log. Seems that the patch is active, but not working. Maybe there are additional changes to i2c_hid somewhere after 7d6f065de37c that pose additional problems for the correct working on the ASUS? For that to check, I would probably need to test different commits (e.g. the ones that Jarkko mentioned) and apply the patch to all of them. Then we may see, if there is another commit that interferes? I'm open for suggestions. Created attachment 307231 [details]
dmesg after supend with touchscreen working on 6.9.12 (with DEBUG output)
For completeness I added a dmesg with a working touchscreen on 6.9.12 with DEBUG output. Thanks for the logs. Observations: 1. We definitely see the patch active now, and interestingly the second failure is that the bus is still busy. Assuming the Weida device is the only device on that bus, it would mean that the device is hanging for an extended period of time. Hopefully not indefinitely. 2. The 6.9.12 log did *not* rely on the old retry within i2c_hid_set_power at all. The power-on sequence debug looks like this: [ 94.642854] i2c_hid_acpi i2c-WDHT1F01:00: i2c_hid_set_power [ 94.642864] i2c_hid_acpi i2c-WDHT1F01:00: i2c_hid_xfer: cmd=22 00 00 08 (cmd=22 00 01 08 is power off) That his worked on first try on the older kernel would either mean that it is only occasionally a problem (e.g., tied to how the system is woken up, timing, or the alignment of the stars), or that another change has caused the failure which may or may not be solvable with the retry. I don't have quite enough info to draw a conclusion yet. It would certainly be interesting to know if 6.9.12 just *never* needs the retry, and if retry suddenly ends up needed in an older kernel. However, we can at least try two things from within i2c-hid: First, retrying many more times to see if the device lets go of the data line a bit later. Second, removing the initial bus probe to effectively revert all resume behavior changes and see if that's what causes the Weida device to hang. Created attachment 307239 [details]
WIP patch to retry resume power-on multiple times
Created attachment 307240 [details]
WIP add-on patch to also skip bus probe for Weida
Created attachment 307242 [details]
dmesg after suspend with touchscreen not working on 6.12-rc7 with multiple retries
Created attachment 307243 [details]
dmesg after suspend with touchscreen working on 6.12-rc7 with multiple retries and skip bus probe
So, using the multiple retry patch on 6.12-rc7 does not help. Adding the skip bus probe patch restores functionality of the touchscreen. Apparently the bus probe hangs the touchscreen. (Awfully 6.12 seems to have another regression, as the /sys/class/power_supply interface is now missing. This will then be the next investigation I probably need to do as soon as we fixed the touchscreen. :-) ) On one hand I'm glad that we have found the trigger, on the other hand knowing what the trigger is makes me really sad. :( If our I2C drivers had more consistent error reporting we would be able to do away with the bus probe read entirely and always use the power command for wake-up, but wouldn't it be nice if devices would just compliant with even basic I2C... Final patch posted here: https://lore.kernel.org/regressions/20241119235615.23902-1-kl@kl.wtf/T/#u Michael, a confirmation of this patch would on that thread would be nice. Feel free to specify if you want the Reported-by (and later, Tested-by) to be presented differently, name- and email-wise. Thanks for all your testing! Hi Kenny I can confirm that the last patch alone restores touchscreen functionality after suspend. Pity that the devices are so incompliant to specs, but I'm happy we could get it working again. Thanks for all your help and work (also to Jarkko and Thorsten)! (Now I need to see where I report the other regression with some missing power_supply entries (e.g. charge_control_end_threshold). |