Bug 199821 - Initialization issues with some i2c touchpads related to PM QoS
Summary: Initialization issues with some i2c touchpads related to PM QoS
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: Run-Time-PM (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-24 08:41 UTC by Hans de Goede
Modified: 2018-10-30 10:23 UTC (History)
8 users (show)

See Also:
Kernel Version: 4.16
Tree: Mainline
Regression: No


Attachments
Trimmed patch to fix the touchpad issue (12.25 KB, patch)
2018-05-24 17:20 UTC, Rafael J. Wysocki
Details | Diff
Trimmed patch to fix the touchpad issue (6.46 KB, patch)
2018-05-24 17:22 UTC, Rafael J. Wysocki
Details | Diff
fix PM_QOS_RESUME_LATENCY_NO_CONSTRAINT value (729 bytes, patch)
2018-05-24 22:02 UTC, Olivier Fock Ping
Details | Diff
PM / QoS: Device resume latency debug patch (1.87 KB, patch)
2018-05-25 09:11 UTC, Rafael J. Wysocki
Details | Diff
PM / QoS: Device resume latency debug patch (1.92 KB, patch)
2018-05-25 09:15 UTC, Rafael J. Wysocki
Details | Diff
PM / QoS: Device resume latency debug patch (v2) (2.42 KB, patch)
2018-05-28 09:46 UTC, Rafael J. Wysocki
Details | Diff
dmesg_after_patch_attachment_276235 (63.73 KB, text/plain)
2018-06-07 05:20 UTC, Arliton Rocha
Details
dmesg_touchpad_ok_after_bootup (125.87 KB, text/plain)
2018-06-09 05:35 UTC, Arliton Rocha
Details
dmesg_touchpad_bug_after_bootup (92.44 KB, text/plain)
2018-06-09 05:36 UTC, Arliton Rocha
Details
combined patch for Yepo (2.69 KB, patch)
2018-07-31 12:52 UTC, Philip Müller
Details | Diff
Yepo 373a dmesg (59.34 KB, text/plain)
2018-07-31 12:56 UTC, Philip Müller
Details
pm-qos patch on top of cf7eeea and 0fc784f (612 bytes, patch)
2018-08-03 05:38 UTC, Philip Müller
Details | Diff
Desc-Dump Application (1.13 KB, text/plain)
2018-08-05 09:28 UTC, Philip Müller
Details

Description Hans de Goede 2018-05-24 08:41:50 UTC
There are various users reporting of the touchpad not working on various models Apollo Lake based laptops:

https://bugs.launchpad.net/bugs/1728244

Olivier Fock Ping, has been doing some great detective work on this lately.

There were reports that 4.14-rc7 works where as 4.14-rc8 does not.

Olivier has pinned the difference down to 4.14-rc7 having an earlier version of the "PM / QoS: Fix device resume latency PM QoS"
patch:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=0cc2b4e5a020fc7f4d1795741c116c983e9467d7

In 4.14-rc8 that got dropped and later a different version got
merged:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=0759e80b84e34a84e7e46e2b1adb528c83d84a47

If Olivier reverts the later version and then re-applies the older
version on top of a 4.16 kernel everything starts to work, the resulting diff from this revert + cherry-pick old (broken) version is here:
https://launchpadlibrarian.net/371428658/kernel-4.16.10-thomson-x6.patch

Note I'm not suggesting this is the right way to fix this, but this
is the closest we've come to finding a solution for the Apollo
Lake based laptops touchpad problems in months.

###

This has already been discussed on the list, this might be related to the pm_qos handling in drivers/mfd/intel_lpss.c, as the touchpads on the laptop models in question are all connected to the i2c-designware controller which is part of the LPSS block.

Things tried so far are, without them helping:

1) Commenting out the intel_lpss_ltr_expose() call in drivers/mfd/intel-lpss.c:
intel_lpss_probe() (around line 424).

2) Comparing the output of:
cat /sys/kernel/debug/intel_lpss/0000\:00\:16.1/*
Between working and non working kernels results in the same output:

> root@debian:/home/pi# cat /sys/kernel/debug/intel_lpss/0000\:00\:16.1/*
> 0x00000800
> 0x00000601
> 0x00000800
Comment 1 Rafael J. Wysocki 2018-05-24 08:49:35 UTC
So I'm assuming that

https://launchpadlibrarian.net/371428658/kernel-4.16.10-thomson-x6.patch

fixes the touchpad problem on top of the current mainline.  Is that correct?
Comment 2 Hans de Goede 2018-05-24 08:52:00 UTC
(In reply to Rafael J. Wysocki from comment #1)
> So I'm assuming that
> 
> https://launchpadlibrarian.net/371428658/kernel-4.16.10-thomson-x6.patch
> 
> fixes the touchpad problem on top of the current mainline.  Is that correct?

Yes, this has been confirmed on 2 different Apollo Lake models now, and a test Ubuntu kernel has been build with that patch, so hopefully we will get results from other reporters / models too. TL;DR: so far that patch on top of mainline fixes things for 2 out of 2 models tested.
Comment 3 Rafael J. Wysocki 2018-05-24 17:20:45 UTC
Created attachment 276167 [details]
Trimmed patch to fix the touchpad issue

There is a lot of irrelevant stuff in that patch, though, so let's drop some of it and see if it still works.
Comment 4 Rafael J. Wysocki 2018-05-24 17:22:41 UTC
Created attachment 276169 [details]
Trimmed patch to fix the touchpad issue

Actually trim it this time.
Comment 5 Rafael J. Wysocki 2018-05-24 17:48:45 UTC
I don't think that the latency tolerance PM QoS has anything to do with it as it is not touched by the changes in the patch.

Most likely, changing PM_QOS_RESUME_LATENCY_DEFAULT_VALUE from 0 to PM_QOS_LATENCY_ANY is the source of the problem.
Comment 6 Rafael J. Wysocki 2018-05-24 18:16:56 UTC
That said, we first need to figure out what's the effect of device PM QoS on the affected systems.
Comment 7 Olivier Fock Ping 2018-05-24 22:00:01 UTC
Rafael, you're right.

I found the line which causes troubles.

In the file "include/linux/pm_qos.h", if you set the PM_QOS_RESUME_LATENCY_NO_CONSTRAINT constant to value 0, then it works.

I used a vanilla 4.16.10 kernel.
Comment 8 Olivier Fock Ping 2018-05-24 22:02:28 UTC
Created attachment 276175 [details]
fix PM_QOS_RESUME_LATENCY_NO_CONSTRAINT value

fix PM_QOS_RESUME_LATENCY_NO_CONSTRAINT value
Comment 9 Rafael J. Wysocki 2018-05-25 08:09:44 UTC
You cannot do that, though, as it will break other things.
Comment 10 Rafael J. Wysocki 2018-05-25 09:10:24 UTC
PM_QOS_RESUME_LATENCY_NO_CONSTRAINT is used just in a few places and some of them don't matter at all.  Let's try to figure out which one of them is at fault here.
Comment 11 Rafael J. Wysocki 2018-05-25 09:11:53 UTC
Created attachment 276181 [details]
PM / QoS: Device resume latency debug patch

Please apply this one and see if it still makes the problem go away.
Comment 12 Rafael J. Wysocki 2018-05-25 09:15:50 UTC
Created attachment 276183 [details]
PM / QoS: Device resume latency debug patch

Sorry, actually, please apply this one instead and report back.
Comment 13 Rafael J. Wysocki 2018-05-25 09:40:26 UTC
Well, I think I know what the problem is.

rpm_check_suspend_allowed() calls __dev_pm_qos_read_value(dev) and returns -EPERM if it is equal to 0.

With the patch in comment #8, __dev_pm_qos_read_value() will return 0 for non-existing dev->power.qos, as the dev_pm_qos_raw_read_value() return value, and that will effectively disable runtime PM for devices not having it.

So the reason why things work with this patch is because runtime PM is then effectively disabled for the devices in question.

It should be easy enough to confirm this theory by trying to disable runtime PM for the affected devices (eg. via the "control" attribute in sysfs), so please try to do that.
Comment 14 Rafael J. Wysocki 2018-05-25 10:15:48 UTC
Alternatively, you can replace PM_QOS_RESUME_LATENCY_NO_CONSTRAINT in dev_pm_qos_raw_read_value() with 0 and see if that makes the touchpad problem go away then (but of course, that will effectively disable runtime PM for many other devices too, so it is a no-go in general).
Comment 15 Olivier Fock Ping 2018-05-26 07:22:51 UTC
#comment 13 : 
I disabled runtime PM with this command line (+ kernel without modifications) :
echo 'on' > '/sys/bus/i2c/devices/i2c-SYNA3602:00/power/control'

Touchpad is still not working.

I point out, maybe it's important, that the touchpad works after a suspend/resume (and PM is enabled) but does not work only after a shutdown/startup.

#comment 14 :
Replacing PM_QOS_RESUME_LATENCY_NO_CONSTRAINT with 0 in dev_pm_qos_raw_read_value(), sorry, touchpad is still not working
Comment 16 Olivier Fock Ping 2018-05-26 07:34:38 UTC
#comment 12 :
I applied the trimmed patch, there are 2 files that have dependency issues : drivers/base/power/domain_governor.c and drivers/base/power/domain.c. 


I added these 2 lines, but I'm not sure if it's correct : 
#define PM_QOS_LATENCY_ANY_NS	((s64)PM_QOS_LATENCY_ANY * NSEC_PER_USEC)
#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS	PM_QOS_LATENCY_ANY_


So, the touchpad works.
Comment 17 Rafael J. Wysocki 2018-05-27 09:04:15 UTC
Let's focus on your finding from comment #8 as that is key.

You said that the patch in comment #8 fixed the problem.  I assume that this was on top of the current mainline.

At the same time, you are saying that replacing PM_QOS_RESUME_LATENCY_NO_CONSTRAINT with 0 in dev_pm_qos_raw_read_value() alone doesn't help.

In that case please test the patch from comment #12.
Comment 18 Rafael J. Wysocki 2018-05-27 09:15:04 UTC
In any case, that's just for debugging.  PM_QOS_RESUME_LATENCY_NO_CONSTRAINT must be set the way it it for the device PM QoS to work, but we need to find out why changing it to 0 magically fixed the touchpad issue for you.

The fact that it works after a suspend-resume cycle strongly indicates that this is related to power management somehow.
Comment 19 Rafael J. Wysocki 2018-05-27 09:27:09 UTC
I would like to know one more thing.

If after the

# echo 'on' > '/sys/bus/i2c/devices/i2c-SYNA3602:00/power/control'

test in comment #13 you do

# echo 'auto' > '/sys/bus/i2c/devices/i2c-SYNA3602:00/power/control'

does this make any difference?
Comment 20 Hans de Goede 2018-05-27 14:13:12 UTC
Hi,

(In reply to Olivier Fock Ping from comment #15)
> #comment 13 : 
> I disabled runtime PM with this command line (+ kernel without
> modifications) :
> echo 'on' > '/sys/bus/i2c/devices/i2c-SYNA3602:00/power/control'

I missed this before, but you probably want to (also) try disabling runtime-pm on the i2c-controller, not just on the i2c-client/device.

ls -l /sys/bus/i2c/devices/i2c-SYNA3602:00 should show you that the touchpad is on controller i2c-# then do:

echo 'on' > '/sys/bus/i2c/devices/i2c-#/device/power/control'

Regards,

Hans
Comment 21 Olivier Fock Ping 2018-05-27 19:13:24 UTC
#comment 19:
no difference

#comment 20:
no difference

#comment 17:
> You said that the patch in comment #8 fixed the problem.  I assume that this
> was on top of the current mainline.
Yes, that's it

> In that case please test the patch from comment #12.
I already reported the result, see comment #16.
With the patch in comment #12, the problem is resolved. Touchpad is fully working at bootup.

Note, I need to add 2 #define to compile 
#define PM_QOS_LATENCY_ANY_NS	((s64)PM_QOS_LATENCY_ANY * NSEC_PER_USEC)
#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS	PM_QOS_LATENCY_ANY_
Comment 22 Rafael J. Wysocki 2018-05-28 08:21:10 UTC
(In reply to Olivier Fock Ping from comment #21)
> #comment 17:
> > You said that the patch in comment #8 fixed the problem.  I assume that
> this
> > was on top of the current mainline.
> Yes, that's it

OK

> > In that case please test the patch from comment #12.
> I already reported the result, see comment #16.

OK (It wasn't clear to me that you were talking about the patch from comment #12, because it is not "trimmed", it is just a different patch).

> With the patch in comment #12, the problem is resolved. Touchpad is fully
> working at bootup.
> 
> Note, I need to add 2 #define to compile 
> #define PM_QOS_LATENCY_ANY_NS ((s64)PM_QOS_LATENCY_ANY * NSEC_PER_USEC)
> #define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT_NS        PM_QOS_LATENCY_ANY_NS

I wonder why.  The patch from comment #12 doesn't modify or remove any symbol definitions.  In fact, it only replaces symbols with zeros in a couple of places.
Comment 23 Rafael J. Wysocki 2018-05-28 08:23:22 UTC
So actually I don't think you really tested the patch from comment #12, that is attachment #276183 [details], and so please test it.
Comment 24 Rafael J. Wysocki 2018-05-28 09:15:40 UTC
Overall, it looks like an initialization problem somewhere and apparently some initialization is skipped if PM_QOS_RESUME_LATENCY_NO_CONSTRAINT is not 0.

Since you have verified that replacing PM_QOS_RESUME_LATENCY_NO_CONSTRAINT with 0 in dev_pm_qos_raw_read_value() alone doesn't help, the problem is pm_qos_read_value() returning a nonzero value for the affected device.

What probably happens is that the device has the dev->power.qos pointer set, but the target_value in there is equal to PM_QOS_RESUME_LATENCY_NO_CONSTRAINT for some reason and if that is 0, then runtime PM is effectively disabled at init time and the device gets initialized properly then.

Unfortunately, though, I cannot explain how target_value can become equal to PM_QOS_RESUME_LATENCY_NO_CONSTRAINT as the latter is only used for initializing c->no_constraint_value in dev_pm_qos_constraints_allocate() and the no_constraint_value is only used in pm_qos_get_value() for an empty constraints list.  But pm_qos_get_value() is local and effectively only used by pm_qos_update_target().
Comment 25 Rafael J. Wysocki 2018-05-28 09:32:25 UTC
It might happen if the last resume latency PM QoS request was removed for a device without removing its dev->power.qos object, because in that case the request list in pm_qos_update_target() would be empty before calling pm_qos_get_value() and that would return PM_QOS_RESUME_LATENCY_NO_CONSTRAINT as the no_constraint_value for the class.
Comment 26 Rafael J. Wysocki 2018-05-28 09:38:09 UTC
And it looks like the only place that can happen is in st1232_ts_irq_handler() where the request is removed for count == 0.
Comment 27 Rafael J. Wysocki 2018-05-28 09:46:07 UTC
Created attachment 276235 [details]
PM / QoS: Device resume latency debug patch (v2)

Let's find out if the no_constraint_value for resume latency is used and where.

Please apply this patch instead of the patch from comment #12 and report back.
Comment 28 Rafael J. Wysocki 2018-05-28 10:02:34 UTC
If you see stack dunps triggered by the WARN() added by the patch in comment #27, please attach dmesg output containing them.
Comment 29 Olivier Fock Ping 2018-06-02 16:20:07 UTC
Comment #27:

The touchpad works with the patch (attachment 276235 [details]) at bootup (and after sleep / resume).

In kern.log file :

kern.log:Jun  2 18:08:06 debian kernel: [    0.060264] WARNING: CPU: 0 PID: 1 at kernel/power/qos.c:155 pm_qos_update_target+0x20e/0x220
kern.log:Jun  2 18:08:06 debian kernel: [    0.060560] WARNING: CPU: 0 PID: 1 at kernel/power/qos.c:155 pm_qos_update_target+0x20e/0x220
kern.log:Jun  2 18:08:06 debian kernel: [    3.398933] WARNING: CPU: 1 PID: 100 at kernel/power/qos.c:155 pm_qos_update_target+0x20e/0x220
Comment 30 Olivier Fock Ping 2018-06-02 18:31:02 UTC
Please forget my previous reports. I made a mistake applying the patches.
Comment 31 Olivier Fock Ping 2018-06-02 19:13:59 UTC
Sorry, I was afraid I did not use the modified kernel and finally there is no problem.


So, I canceled all modifications and I found that only one line "c-> no_constraint_value = 0;" in drivers / base / power / qos.c file is relevant.

This change is enough for everything to work.
Comment 32 Rafael J. Wysocki 2018-06-06 09:49:31 UTC
(In reply to Olivier Fock Ping from comment #29)
> Comment #27:
> 
> The touchpad works with the patch (attachment 276235 [details]) at bootup
> (and after sleep / resume).

OK

> In kern.log file :
> 
> kern.log:Jun  2 18:08:06 debian kernel: [    0.060264] WARNING: CPU: 0 PID:
> 1 at kernel/power/qos.c:155 pm_qos_update_target+0x20e/0x220
> kern.log:Jun  2 18:08:06 debian kernel: [    0.060560] WARNING: CPU: 0 PID:
> 1 at kernel/power/qos.c:155 pm_qos_update_target+0x20e/0x220
> kern.log:Jun  2 18:08:06 debian kernel: [    3.398933] WARNING: CPU: 1 PID:
> 100 at kernel/power/qos.c:155 pm_qos_update_target+0x20e/0x220

There should be a stack trace below each of these, aren't there any?

Please attach full dmesg output containing the above lines as requested previously.
Comment 33 Arliton Rocha 2018-06-07 05:20:09 UTC
Created attachment 276365 [details]
dmesg_after_patch_attachment_276235
Comment 34 Arliton Rocha 2018-06-07 05:24:58 UTC
(In reply to Arliton Rocha from comment #33)
> Created attachment 276365 [details]
> dmesg_after_patch_attachment_276235

Hi Rafael, I'm following this topic... I'm providing the requested dmesg.
Proposed patch (attachment 276235 [details]) applied to kernel version 4.16.14:
https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.14.tar.xz

Sadly doesn't work for me.
In fact, works, but at some moment (turn off and waiting for 10 minutes or more),
after turn on the touchpad doesn't respond. Certainly happen when booting up with
power supply from battery.
When the touchpad works, I can reboot any times I want and all will be ok.
But after turn off, waiting, and turn on again... :(

When doesn't work, if I do:
  # echo 'on' > /sys/bus/i2c/devices/i2c-SYNA3602\:00/power/control
  # rmmod i2c_hid
  # modprobe i2c_hid
Works! Not always, but works.
For example when not works, I could see in dmesg, after the commands:
   ... unexpected HID descriptor bcdVersion (0xff)
   or
   ... can't add hid device

Seems that the device is not properly powering on at bootup.

I see messages saying: "failed to change power state".
This message always appears, even when the touchpad works.
I was trying to understand some parts in the code (drivers/hid/i2c-hid/i2c_hid.c)... 
In these conditions eventually the reset command will fail and the power will be set to PWR_SLEEP...
Functions i2c_hid_set_power and i2c_hid_hwreset.
The all mess follow...

I did some tries with the code (function i2c_hid_set_power), like two attemps in sequence to wakeup...:

    int retry;

    if (power_state == I2C_HID_PWR_ON) {
        for (retry = 0; retry < 2; retry++) {
            ret = __i2c_hid_command(client, &hid_set_power_cmd,
                power_state, 0, NULL, 0, NULL, 0);

            if (!ret)
                goto set_pwr_exit;
        }
    }

Doesn't work. :)

Well, I can help providing more data and I can apply patches/compile.
Thanks in advance.
Comment 35 Rafael J. Wysocki 2018-06-07 08:37:40 UTC
(In reply to Arliton Rocha from comment #34)
> (In reply to Arliton Rocha from comment #33)
> > Created attachment 276365 [details]
> > dmesg_after_patch_attachment_276235
> 
> Hi Rafael, I'm following this topic... I'm providing the requested dmesg.
> Proposed patch (attachment 276235 [details]) applied to kernel version
> 4.16.14:
> https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.14.tar.xz
> 
> Sadly doesn't work for me.
> In fact, works, but at some moment (turn off and waiting for 10 minutes or
> more),
> after turn on the touchpad doesn't respond. Certainly happen when booting up
> with
> power supply from battery.
> When the touchpad works, I can reboot any times I want and all will be ok.
> But after turn off, waiting, and turn on again... :(
> 
> When doesn't work, if I do:
>   # echo 'on' > /sys/bus/i2c/devices/i2c-SYNA3602\:00/power/control
>   # rmmod i2c_hid
>   # modprobe i2c_hid
> Works! Not always, but works.
> For example when not works, I could see in dmesg, after the commands:
>    ... unexpected HID descriptor bcdVersion (0xff)
>    or
>    ... can't add hid device
> 
> Seems that the device is not properly powering on at bootup.

That's exactly right and if runtime PM is disabled for it to start with, it will work most of the time.

> I see messages saying: "failed to change power state".

That is suspicious.  Do you have dmesg output containing that?

> This message always appears, even when the touchpad works.
> I was trying to understand some parts in the code
> (drivers/hid/i2c-hid/i2c_hid.c)... 
> In these conditions eventually the reset command will fail and the power
> will be set to PWR_SLEEP...
> Functions i2c_hid_set_power and i2c_hid_hwreset.
> The all mess follow...

OK, I'm neither a HID or an i2c expert, so we'll need input from someone more familiar with that code I suppose.

In any case, the issue seems to be there in the i2c-hid driver and the PM QoS change only affected it because it affected runtime PM.
Comment 36 Benjamin Tissoires 2018-06-07 09:30:07 UTC
> OK, I'm neither a HID or an i2c expert, so we'll need input from someone more
> familiar with that code I suppose.

Regarding the HID point of view, we expect the device to follow a precise specification, and we only communicate over I2C (there is no side channels). IIRC, a touchpad is supposed to always handle the SET_POWER command, even when put to sleep, and it should also always handle at any time the get_report_descriptor command.
So when there are issues, it's very rarely a i2c-hid issue, but either a firmware bug or a i2c-designware bug.

From the I2C point of view, I am not that familiar with i2c-designware, but if the driver fails at being resumed, then of course it will block any communication to the device, and then the errors.

> In any case, the issue seems to be there in the i2c-hid driver and the PM QoS
> change only affected it because it affected runtime PM.

Hans suggested to disable runtime PM for Apollo Lake, but I honestly would rather have a proper fix. If disabling runtime PM makes the whole thing working, the issue must lie in the i2c-designware runtime PM, so we should better have an opinion from someone more knowledgeable than me in this field.

Olivier/Arliton, have you tried disabling runtime PM for only i2c-designware?
Comment 37 Rafael J. Wysocki 2018-06-07 10:01:26 UTC
It would have to be disabled at init time, though, ie. not via sysfs.
Comment 38 Arliton Rocha 2018-06-08 09:07:45 UTC
(In reply to Rafael J. Wysocki from comment #35)
> (In reply to Arliton Rocha from comment #34)
> > (In reply to Arliton Rocha from comment #33)
> > > Created attachment 276365 [details]
> > > dmesg_after_patch_attachment_276235

...
> That is suspicious.  Do you have dmesg output containing that?
Yes, look at the attachment dmesg_after_patch_attachment_276235.
The messages "failed to change power state" is in there.
Also the WARN and stack traces you ask on comment #32...

I have noted that my config was missing CONFIG_CPU_IDLE_GOV_LADDER.
The hid_multitouch was not loading before I set it and recompile.
Despite it was loading along with i2c_hid, this did not prevent the error from continuing.
Comment 39 Rafael J. Wysocki 2018-06-08 09:21:27 UTC
(In reply to Arliton Rocha from comment #38)
>
> I have noted that my config was missing CONFIG_CPU_IDLE_GOV_LADDER.
> The hid_multitouch was not loading before I set it and recompile.
> Despite it was loading along with i2c_hid, this did not prevent the error
> from continuing.

Wait.  Does hid_multitouch not load until you set CONFIG_CPU_IDLE_GOV_LADDER?
Comment 40 Arliton Rocha 2018-06-08 10:06:51 UTC
(In reply to Rafael J. Wysocki from comment #39)
> (In reply to Arliton Rocha from comment #38)
> >
> > I have noted that my config was missing CONFIG_CPU_IDLE_GOV_LADDER.
> > The hid_multitouch was not loading before I set it and recompile.
> > Despite it was loading along with i2c_hid, this did not prevent the error
> > from continuing.
> 
> Wait.  Does hid_multitouch not load until you set CONFIG_CPU_IDLE_GOV_LADDER?

Yes, I did not notice it in my listing (lsmod) before.
Comment 41 Arliton Rocha 2018-06-08 10:07:34 UTC
(In reply to Benjamin Tissoires from comment #36)
> > OK, I'm neither a HID or an i2c expert, so we'll need input from someone
> more
> > familiar with that code I suppose.
> 
> Regarding the HID point of view, we expect the device to follow a precise
> specification, and we only communicate over I2C (there is no side channels).
> IIRC, a touchpad is supposed to always handle the SET_POWER command, even
> when put to sleep, and it should also always handle at any time the
> get_report_descriptor command.
> So when there are issues, it's very rarely a i2c-hid issue, but either a
> firmware bug or a i2c-designware bug.
> 
> From the I2C point of view, I am not that familiar with i2c-designware, but
> if the driver fails at being resumed, then of course it will block any
> communication to the device, and then the errors.
> 
> > In any case, the issue seems to be there in the i2c-hid driver and the PM
> QoS
> > change only affected it because it affected runtime PM.
> 
> Hans suggested to disable runtime PM for Apollo Lake, but I honestly would
> rather have a proper fix. If disabling runtime PM makes the whole thing
> working, the issue must lie in the i2c-designware runtime PM, so we should
> better have an opinion from someone more knowledgeable than me in this field.
> 
> Olivier/Arliton, have you tried disabling runtime PM for only i2c-designware?

You mean, do the follow?
   # echo 'on' > /sys/bus/hid/devices/0018:0911:5288:0001/power/control
   Assuming that 0018:0911:5288:0001 represents the device created...

If Yes, yes! I did. But no effect.
The real effect happens if I do this on /sys/bus/i2c/devices/i2c-SYNA3602\:00/power/control, followed by unload and load the module i2c_hid.
When the touchpad does not work it is almost certain that there is nothing in /sys/bus/hid/devices/. 

I made a few attempts in these files...:
drivers/hid/i2c-hid/i2c-hid.c
drivers/hid/hid-multitouch.c
I tried to remove the CONFIG_PM session and associated functions.
In one and then the other. In both. No effect.

I tried blacklist hid_multitouch but the touchpad does not work without it.

Using lsmod, I noticed that when it works the two modules (i2c_hid and hid_multitouch) are loaded, but when it does not work almost always the module hid_multitouch is not loaded.

The bug (at least in my case, I do not know if for everyone) happens almost always when leaving the notebook off for a while and turn on again. Especially when using battery power.
Comment 42 Benjamin Tissoires 2018-06-08 10:09:29 UTC
(In reply to Arliton Rocha from comment #40)
> (In reply to Rafael J. Wysocki from comment #39)
> > (In reply to Arliton Rocha from comment #38)
> > >
> > > I have noted that my config was missing CONFIG_CPU_IDLE_GOV_LADDER.
> > > The hid_multitouch was not loading before I set it and recompile.
> > > Despite it was loading along with i2c_hid, this did not prevent the error
> > > from continuing.
> > 
> > Wait.  Does hid_multitouch not load until you set
> CONFIG_CPU_IDLE_GOV_LADDER?
> 
> Yes, I did not notice it in my listing (lsmod) before.

Well, if i2c-hid was failing because of an error at the I2C level, there is nothing that will tell the system to load hid-multitouch. So i2c-hid must have been failing there, and it could be interesting to have a look at the dmesg in such configuration.
Comment 43 Rafael J. Wysocki 2018-06-08 10:58:38 UTC
(In reply to Arliton Rocha from comment #40)
> (In reply to Rafael J. Wysocki from comment #39)
> > (In reply to Arliton Rocha from comment #38)
> > >
> > > I have noted that my config was missing CONFIG_CPU_IDLE_GOV_LADDER.
> > > The hid_multitouch was not loading before I set it and recompile.
> > > Despite it was loading along with i2c_hid, this did not prevent the error
> > > from continuing.
> > 
> > Wait.  Does hid_multitouch not load until you set
> CONFIG_CPU_IDLE_GOV_LADDER?
> 
> Yes, I did not notice it in my listing (lsmod) before.

So please try to unset CONFIG_CPU_IDLE_GOV_LADDER again and boot with something like intel_idle.max_cstate=1 in the kernel command line.  Please see if the driver loads then.
Comment 44 Arliton Rocha 2018-06-09 05:35:41 UTC
Created attachment 276411 [details]
dmesg_touchpad_ok_after_bootup
Comment 45 Arliton Rocha 2018-06-09 05:36:29 UTC
Created attachment 276413 [details]
dmesg_touchpad_bug_after_bootup
Comment 46 Arliton Rocha 2018-06-09 06:00:46 UTC
(In reply to Rafael J. Wysocki from comment #43)
> (In reply to Arliton Rocha from comment #40)
> > (In reply to Rafael J. Wysocki from comment #39)
> > > (In reply to Arliton Rocha from comment #38)
> > > >
> > > > I have noted that my config was missing CONFIG_CPU_IDLE_GOV_LADDER.
> > > > The hid_multitouch was not loading before I set it and recompile.
> > > > Despite it was loading along with i2c_hid, this did not prevent the
> error
> > > > from continuing.
> > > 
> > > Wait.  Does hid_multitouch not load until you set
> > CONFIG_CPU_IDLE_GOV_LADDER?
> > 
> > Yes, I did not notice it in my listing (lsmod) before.
> 
> So please try to unset CONFIG_CPU_IDLE_GOV_LADDER again and boot with
> something like intel_idle.max_cstate=1 in the kernel command line.  Please
> see if the driver loads then.

Ok. I did and same behavior.

I've set the following parameters on the kernel command line...:
i2c_designware_core.dyndbg=+p i2c-hid.debug=1 intel_idle.max_cstate=1

The following attachments contains the dmesg output:
attachment 276411 [details] (touchpad working)
attachment 276413 [details] (touchpad doesn't working)

Interesting this line in attachment 276413 [details]:
i2c_designware i2c_designware.2: i2c_dw_handle_tx_abort: data not acknowledged
Always followed by:
i2c_hid i2c-SYNA3602:00: failed to change power setting.

Also I'm changing some BIOS parameters related to the touchpad... I need to try to see if they change anything.
Comment 47 Philip Müller 2018-07-31 12:52:59 UTC
Created attachment 277629 [details]
combined patch for  Yepo

Partial revert of 0759e80 and revert of c523c68.
Comment 48 Philip Müller 2018-07-31 12:54:14 UTC
Somehow with given patch from comment #47 it works on following machine:

[joed@joe-pc ~]$ inxi -Fxyc0
Error 10: Unsupported value: 0 for option: y
Check -h for correct parameters.
[joed@joe-pc ~]$ inxi -Fxzc0
System:
  Host: joe-pc Kernel: 4.17.11-2-MANJARO x86_64 bits: 64 compiler: gcc 
  v: 8.1.1 Desktop: KDE Plasma 5.13.3 Distro: Manjaro Linux 
Machine:
  Type: Laptop System: YEPO product: YEPO v: N/A serial: <filter> 
  Mobo: INET model: P313R serial: <filter> UEFI: American Megatrends 
  v: YEPOM10x.13.WP313R.NHNAUHL01 date: 10/09/2017 
Battery:
  ID-1: BAT0 charge: 27.2 Wh condition: 40.0/40.0 Wh (100%) 
  model: Intel SR 1 SR Real Battery status: Discharging 
CPU:
  Topology: Quad Core model: Intel Celeron N3450 bits: 64 type: MCP 
  arch: N/A L2 cache: 1024 KiB 
  flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 8756 
  Speed: 978 MHz min/max: 800/2200 MHz Core speeds (MHz): 1: 1112 2: 945 
  3: 1293 4: 1293 
Graphics:
  Card-1: Intel driver: i915 v: kernel bus ID: 00:02.0 
  Display: x11 server: X.Org 1.19.6 driver: intel unloaded: modesetting 
  tty: N/A 
  OpenGL: renderer: Mesa DRI Intel HD Graphics 500 (Broxton 2x6) 
  v: 4.5 Mesa 18.1.5 direct render: Yes 
Audio:
  Card-1: Intel Celeron N3350/Pentium N4200/Atom E3900 Series Audio 
  Cluster 
  driver: snd_hda_intel v: kernel bus ID: 00:0e.0 
  Sound Server: ALSA v: k4.17.11-2-MANJARO 
Network:
  Card-1: Intel Wireless 3165 driver: iwlwifi v: kernel bus ID: 01:00 
  IF: wlp1s0 state: up mac: <filter> 
Drives:
  Local Storage: total: 117.87 GiB used: 6.61 GiB (5.6%) 
  ID-1: /dev/mmcblk1 model: DF4064 size: 58.24 GiB 
  ID-2: /dev/sda model: NT-64 size: 59.63 GiB 
Partition:
  ID-1: / size: 48.13 GiB used: 6.61 GiB (13.7%) fs: ext4 
  dev: /dev/mmcblk1p2 
  ID-2: swap-1 size: 8.80 GiB used: 0 KiB (0.0%) fs: swap 
  dev: /dev/mmcblk1p3 
Sensors:
  System Temperatures: cpu: 43.0 C mobo: N/A 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 175 Uptime: 7m Memory: 5.67 GiB used: 888.7 MiB (15.3%) 
  Init: systemd Compilers: gcc: 8.1.1 Shell: bash v: 4.4.23 inxi: 3.0.18 
[joed@joe-pc ~]$
Comment 49 Philip Müller 2018-07-31 12:56:21 UTC
Created attachment 277631 [details]
Yepo 373a dmesg

Here the DMESG with fully functional working touchpad after applying patch from comment #47.
Comment 50 Arliton Rocha 2018-08-02 04:53:57 UTC
Hi, 

I'd like to share what ended up working for me.
I believe in something related to definition of clock frequency in i2c designware bus.
In drivers/i2c/busses/i2c-designware-platdrv.c, function dw_i2c_plat_probe, I did the following modification:

-       /*
-        * Find bus speed from the "clock-frequency" device property, ACPI
-        * or by using fast mode if neither is set.
-        */
-       if (acpi_speed && dev->clk_freq)
-               dev->clk_freq = min(dev->clk_freq, acpi_speed);
-       else if (acpi_speed || dev->clk_freq)
-               dev->clk_freq = max(dev->clk_freq, acpi_speed);
-       else
-               dev->clk_freq = 400000;
+       // Forcing bus speed "fast mode" to Hantick Touchpad [0911:5288]
+       dev->clk_freq = 400000;

I forced the bus operation to 400000 (400KHz).
I think that somehow the speed being set was not ideal.
Obviously I made the change to a kernel running on this particular equipment.
Only this change, without any of the patches discussed previously.
Unfortunately, I'm no longer with the equipment, so I can't do tests anymore. 

Hope it helps in some way.
Comment 51 Philip Müller 2018-08-03 05:38:39 UTC
Created attachment 277675 [details]
pm-qos patch on top of cf7eeea and 0fc784f

Having '0 : pm_qos_read_value(&dev->power.qos->resume_latency);' for 's32 dev_pm_qos_raw_read_value' seems to work on top of cf7eeea and 0fc784f. However the patch from #50 makes even more sense. Needs to be optimized to be only active on the specific hardware, though. Will take a look at it. Thx for sharing Arliton.
Comment 52 Philip Müller 2018-08-04 07:04:49 UTC
Arliton, seems changing the bus speed to 400KHz doesn't fix it on my end with Linux 4.18-rc7. Only having '0 : pm_qos_read_value(&dev->power.qos->resume_latency);' for 's32 dev_pm_qos_raw_read_value' seems to work.
Comment 53 Philip Müller 2018-08-05 09:28:13 UTC
Created attachment 277683 [details]
Desc-Dump Application

After checking more about the issue, I found some interesting on my end. Therefore I used Hans' Desc-Dump application (attached). This was my results:

[joed@joe-pc dev]$ gcc -o i2c_hid_desc_dump i2c_hid_desc_dump.c
[joed@joe-pc dev]$ sudo modprobe i2c_dev
[joed@joe-pc dev]$ sudo ./i2c_hid_desc_dump
i2c_rdwr ret: 2 errno: 0
001e 0100 01bb 0021 0024 001b 0025 0011 0022 0023 0911 5288 0002 

I noticed after executing, that the touchpad was off again. I had to close the lid of the laptop and resume from this short standby to get it back working.

Changing it to 0x01 as given by MS example code results in:

[joed@joe-pc dev]$ gcc -o i2c_hid_desc_dump i2c_hid_desc_dump.c
[joed@joe-pc dev]$ sudo ./i2c_hid_desc_dump
[sudo] password for joed: 
i2c_rdwr ret: 2 errno: 0
0400 9801 c700 0004 0000 0000 0000 0000 0000 0000 0000 005c 0001 

However, this changed anything to the touchpad. So the interesting part is having this modified, makes it instantly work on first boot, but may create issues on other ends, like having no sound at all (based on reports other users might have with other machines).

 static inline s32 dev_pm_qos_raw_read_value(struct device *dev)
 {
 	return IS_ERR_OR_NULL(dev->power.qos) ?
		0 : pm_qos_read_value(&dev->power.qos->resume_latency);
 }

Having this instead, you have to close the lid first to get the touchpad working:

 static inline s32 dev_pm_qos_raw_read_value(struct device *dev)
 {
 	return IS_ERR_OR_NULL(dev->power.qos) ?
		PM_QOS_RESUME_LATENCY_NO_CONSTRAINT :
		pm_qos_read_value(&dev->power.qos->resume_latency);
 }
Comment 54 Robert Schüttler 2018-08-16 11:00:49 UTC
Were are we at with this issue now?

I have hold of a Trekstor Primebook C13 for the next week or so and could do some testing, but kinda lost track here. I could also try stuff on a Jumper EZbook 3 as featured in https://bugs.launchpad.net/linux/+bug/1728244/ (is this really the same touchpad device?). Both still have a Windows 10 partition (touchpad works) and are also running Ubuntu 18.04 where the touchpad neither works with kernel 4.15 (that comes with Ubuntu 18-04) nor with 4.18.1 (which was installed via ukuu = quick way to install mainline kernels). 

Anything I can do to help?

Philip, you seem to have tried different things with varying results, but I fail to see what you suggest needs to be patched at the end.
Comment 55 Philip Müller 2018-08-16 15:49:09 UTC
Hi Robert,

I tested the suggestion with >clk_freq = 400000, which didn't helped on my end. Only when I had the patch of comment 51 added on top of commit cf7eeea and 0fc784f it worked instantly. However, this created some regressions on other hardware, like sound cards.

So I removed my patch and use this workaround:

  # echo 'on' > /sys/bus/i2c/devices/i2c-SYNA3602\:00/power/control
  # rmmod i2c_hid
  # modprobe i2c_hid

Currently it works to get the touchpad working. You may also close the lid for some seconds and resume from suspend to RAM. After that most likely it will work also, at least with kernel 4.18 series.

So having:

 static inline s32 dev_pm_qos_raw_read_value(struct device *dev)
 {
 	return IS_ERR_OR_NULL(dev->power.qos) ?
		0 : pm_qos_read_value(&dev->power.qos->resume_latency);
 }

in 'include/linux/pm_qos.h' seems to work, but is not a solution. Hardware may differ and you may have different results. See also dump-desc-application in comment 53.
Comment 56 Robert Schüttler 2018-08-24 13:21:29 UTC
FTR: Desc-dump didn't work (for me) on the Primebook C13. Neither did the workaround from comment 55. The touchpad on that particular model works though using the "brute force" solution (replacing i2c-hid.c) from "b" found at:

https://github.com/brotfessor/sipodev
Comment 57 Philip Müller 2018-08-24 13:38:25 UTC
Hi Robert,

seems you have a SIPODEV SP1064 touchpad which gets wrongly detected as SYNA3602 touchpad. Therefore your "fix" is totally differnt as given by my hardware I currently have here.
Comment 58 Philip Müller 2018-09-12 16:49:51 UTC
Just got an email from Hans, pointing me to https://patchwork.kernel.org/patch/10597519/, which fixes this. Please test and report back ...
Comment 59 Hans de Goede 2018-09-26 17:19:01 UTC
I believe that this is fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid.git/commit?id=807588ac92018bde88a1958f546438e840eb0158

Which is queued for merging into 4.19:
https://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid.git/log/?h=for-4.19/fixes

The patch has a Cc stable, so this will hopefully show up in stable kernels soonish.
Comment 60 Hans de Goede 2018-10-30 10:23:11 UTC
I believe that this is resolved by this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=807588ac92018bde88a1958f546438e840eb0158

Closing. If you are still seeing issues with 4.19 with i2c-hid touchpads not working or only working after a rmmod + modprobe i2c-hid (possibly done after a suspend/resume), please file a new bug.

Please file separate bugs for each laptop model which still has issues in 4.19 and please attach an acpidump for your model laptop to the bug.

Note You need to log in before you can comment on or make changes to this bug.