Bug 216706 - USBC resume callback takes far too long, between 650ms and 1200ms
Summary: USBC resume callback takes far too long, between 650ms and 1200ms
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Heikki Krogerus
URL:
Keywords:
: 216788 (view as bug list)
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2022-11-18 20:24 UTC by Todd Brandt
Modified: 2023-02-16 09:32 UTC (History)
4 users (show)

See Also:
Kernel Version: 6.1.0-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
issue.def (369 bytes, text/plain)
2022-11-18 20:54 UTC, Todd Brandt
Details
Schedule work for resume (2.70 KB, patch)
2022-11-21 11:25 UTC, Heikki Krogerus
Details | Diff

Description Todd Brandt 2022-11-18 20:24:14 UTC
Created attachment 303218 [details]
lenb-Dell-XPS-13-9310_freeze.html

The new USBC resume callback is taking far too long in the resume phase. I bisected the behavior to this commit:

[10402] 4e3a50293c2b21961f02e1afa2f17d3a1a90c7c8 is the first bad commit
[10402] commit 4e3a50293c2b21961f02e1afa2f17d3a1a90c7c8
[10402] Author: Heikki Krogerus <heikki.krogerus@linux.intel.com>
[10402] Date:   Fri Oct 7 13:09:51 2022 +0300
[10402]
[10402] usb: typec: ucsi: acpi: Implement resume callback
[10402]
[10402] The ACPI driver needs to resume the interface by calling
[10402] ucsi_resume(). Otherwise we may fail to detect connections
[10402] and disconnections that happen while the system is
[10402] suspended.
[10402]
[10402] Link: https://bugzilla.kernel.org/show_bug.cgi?id=210425
[10402] Fixes: a94ecde41f7e ("usb: typec: ucsi: ccg: enable runtime pm support")
[10402] Cc: <stable@vger.kernel.org>
[10402] Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
[10402] Link: https://lore.kernel.org/r/20221007100951.43798-3-heikki.krogerus@linux.intel.com
[10402] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

I have timelines from 7 separate machines that are now affected but this behavior. We have a guidline that neither suspend or resume should take longer than a second, and this is causing us problems.
Comment 1 Todd Brandt 2022-11-18 20:24:54 UTC
Created attachment 303219 [details]
lenb-Dell-XPS-13-9315_freeze.html
Comment 2 Todd Brandt 2022-11-18 20:25:16 UTC
Created attachment 303220 [details]
otcpl-adl-m-1_freeze.html
Comment 3 Todd Brandt 2022-11-18 20:25:38 UTC
Created attachment 303221 [details]
otcpl-adl-n-1_freeze.html
Comment 4 Todd Brandt 2022-11-18 20:26:00 UTC
Created attachment 303222 [details]
otcpl-dell-9370-kbl_mem.html
Comment 5 Todd Brandt 2022-11-18 20:26:23 UTC
Created attachment 303223 [details]
otcpl-hp-spectre-tgl_freeze.html
Comment 6 Todd Brandt 2022-11-18 20:26:43 UTC
Created attachment 303224 [details]
otcpl-lenovo-tix1-tgl_freeze.html
Comment 7 Todd Brandt 2022-11-18 20:38:11 UTC
Created attachment 303225 [details]
otcpl-adl-m-1_freeze.html
Comment 8 Todd Brandt 2022-11-18 20:38:32 UTC
Created attachment 303226 [details]
otcpl-adl-n-1_freeze.html
Comment 9 Todd Brandt 2022-11-18 20:54:15 UTC
Created attachment 303227 [details]
issue.def
Comment 10 gregkh 2022-11-19 08:53:52 UTC
On Fri, Nov 18, 2022 at 08:24:14PM +0000, bugzilla-daemon@kernel.org wrote:
> I have timelines from 7 separate machines that are now affected but this
> behavior. We have a guidline that neither suspend or resume should take
> longer
> than a second, and this is causing us problems.

What is making this arbritrary requirement?  If it is an issue, please
provide a working change to resolve it as your group can help implement
and test this.

thanks,

greg k-h
Comment 11 Heikki Krogerus 2022-11-21 11:25:11 UTC
Created attachment 303247 [details]
Schedule work for resume

I got a report about this from our internal validation team earlier, and I've been working on a fix for it already. If someone could test it out, I would much appreciate.
Comment 13 Todd Brandt 2022-11-21 16:43:45 UTC
(In reply to gregkh from comment #10)
> On Fri, Nov 18, 2022 at 08:24:14PM +0000, bugzilla-daemon@kernel.org wrote:
> > I have timelines from 7 separate machines that are now affected but this
> > behavior. We have a guidline that neither suspend or resume should take
> > longer
> > than a second, and this is causing us problems.
> 
> What is making this arbritrary requirement?  If it is an issue, please
> provide a working change to resolve it as your group can help implement
> and test this.
> 
> thanks,
> 
> greg k-h

It's a general in-house requirement that our linux power & performance team works by (I believe Len Brown initially set the standard based on android and chrome requirements). Any time resume goes beyond a second we flag the callbacks responsible and work to speed them up.
Comment 15 Todd Brandt 2022-11-21 17:00:35 UTC
(In reply to Heikki Krogerus from comment #11)
> Created attachment 303247 [details]
> Schedule work for resume
> 
> I got a report about this from our internal validation team earlier, and
> I've been working on a fix for it already. If someone could test it out, I
> would much appreciate.

I'm building a test kernel now with your patch over 6.1.0-rc5, I'll try to have the results of a 1 hour test today.
Comment 16 Todd Brandt 2022-11-21 21:18:17 UTC
I've just tested on 27 machines using kernel 6.1.0-rc6 plus your attached patch and am very happy to say the issue is completely resolved by your patch. The issue no longer shows up in any of our machines (USBC resume callback is miniscule again):

Kernel            Host                 Mode-xRuns   Issue-Count Rate
6.1.0-rc6-usbcfix otcpl-tgl-rvp-u      mem-x51      0 0%
6.1.0-rc6-usbcfix otcpl-tgl-rvp-u      freeze-x115  0 0%
6.1.0-rc6-usbcfix otcpl-tgl-h          mem-x58      0 0%
6.1.0-rc6-usbcfix otcpl-tgl-h          freeze-x138  0 0%
6.1.0-rc6-usbcfix otcpl-rpl-s-1        freeze-x137  0 0%
6.1.0-rc6-usbcfix otcpl-rpl-p-1        mem-x16      0 0%
6.1.0-rc6-usbcfix otcpl-rpl-p-1        freeze-x36   0 0%
6.1.0-rc6-usbcfix otcpl-rkl-s-prod     mem-x22      0 0%
6.1.0-rc6-usbcfix otcpl-rkl-s-prod     freeze-x145  0 0%
6.1.0-rc6-usbcfix otcpl-rkl-s          mem-x56      0 0%
6.1.0-rc6-usbcfix otcpl-rkl-s          freeze-x130  0 0%
6.1.0-rc6-usbcfix otcpl-lkf            freeze-x140  0 0%
6.1.0-rc6-usbcfix otcpl-lenovo-tix1-tgl freeze-x180 0 0%
6.1.0-rc6-usbcfix otcpl-jsl-p          mem-x63      0 0%
6.1.0-rc6-usbcfix otcpl-jsl-p          freeze-x141  0 0%
6.1.0-rc6-usbcfix otcpl-hp-spectre-tgl freeze-x145  0 0%
6.1.0-rc6-usbcfix otcpl-dell-9370-kbl  mem-x4       0 0%
6.1.0-rc6-usbcfix otcpl-dell-9370-kbl  freeze-x127  0 0%
6.1.0-rc6-usbcfix otcpl-dell-9360-kbl  freeze-x28   0 0%
6.1.0-rc6-usbcfix otcpl-dell-7390-cmlu mem-x50      0 0%
6.1.0-rc6-usbcfix otcpl-dell-7390-cmlu freeze-x51   0 0%
6.1.0-rc6-usbcfix otcpl-dell-7386-whl  mem-x3       0 0%
6.1.0-rc6-usbcfix otcpl-dell-7386-whl  freeze-x103  0 0%
6.1.0-rc6-usbcfix otcpl-aml-y          mem-x74      0 0%
6.1.0-rc6-usbcfix otcpl-aml-y          freeze-x154  0 0%
6.1.0-rc6-usbcfix otcpl-adl-s-5        mem-x52      0 0%
6.1.0-rc6-usbcfix otcpl-adl-s-5        freeze-x38   0 0%
6.1.0-rc6-usbcfix otcpl-adl-p-3        mem-x47      0 0%
6.1.0-rc6-usbcfix otcpl-adl-p-3        freeze-x109  0 0%
6.1.0-rc6-usbcfix otcpl-adl-n-1        freeze-x139  0 0%
6.1.0-rc6-usbcfix otcpl-adl-m-1        mem-x29      0 0%
6.1.0-rc6-usbcfix otcpl-adl-m-1        freeze-x78   0 0%

So I can confidently confirm this patch is a good fix and is ready to be upstreamed. Thanks for quick fix!

 Acked-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Comment 17 Todd Brandt 2022-11-21 21:45:10 UTC
Created attachment 303252 [details]
otcpl-lenovo-tix1-tgl_freeze-with-usbc-fix.html

This is a suspend/resume run with the fix which and it shows the USBC resume callback is no longer a problem.
Comment 18 Len Brown 2022-11-21 23:13:15 UTC
re: comment #10

> What is making this arbitrary requirement?

System and OS vendors state the needs of their end-users as requirements.

Most vendors today accept 1000 ms, but desire 500 ms.  The end-goal, of course, is the perception of "instant-on".

Some systems have been delivering 500 ms for years -- since we migrated to low-power-idle states where the responsibility is in the OS, and less in the BIOS.
Comment 19 Ignacio Hernandez 2022-12-10 01:40:21 UTC
*** Bug 216788 has been marked as a duplicate of this bug. ***
Comment 20 Todd Brandt 2022-12-10 14:33:54 UTC Comment hidden (obsolete)

Note You need to log in before you can comment on or make changes to this bug.