Bug 198909 - acpi_pm_finish: 10x performance regression on XPS 13 9360
Summary: acpi_pm_finish: 10x performance regression on XPS 13 9360
Status: REOPENED
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Zhang Rui
URL:
Keywords:
: 198583 (view as bug list)
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2018-02-23 20:42 UTC by Esokrarkose
Modified: 2024-02-08 20:18 UTC (History)
9 users (show)

See Also:
Kernel Version: 4.16
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Errors appearing during testing (22.87 KB, text/plain)
2018-04-23 08:45 UTC, Esokrarkose
Details
Errors appearing during testing (425.98 KB, text/plain)
2018-04-23 08:50 UTC, Esokrarkose
Details
Output of today (1) (4.20 KB, text/plain)
2018-04-28 13:30 UTC, Esokrarkose
Details
Output of today (ftrace) (418.18 KB, text/plain)
2018-04-28 13:32 UTC, Esokrarkose
Details
Kernel config (196.60 KB, text/x-mpsub)
2018-04-28 13:40 UTC, Esokrarkose
Details
Dmidecode output (28.24 KB, text/plain)
2018-04-28 13:41 UTC, Esokrarkose
Details
dmidecode (Bios v2.6.2) (28.24 KB, text/plain)
2018-05-08 21:18 UTC, Esokrarkose
Details
dmesg of when I did the analyze_suspend (102.72 KB, text/plain)
2018-05-08 21:25 UTC, Esokrarkose
Details
Trying to track acpi_pm_finish() (131.29 KB, application/zip)
2018-05-08 21:42 UTC, Esokrarkose
Details
Bad case, Linux 4.17.17, BIOS 2.8.1 (140.98 KB, application/zip)
2018-08-22 09:23 UTC, Esokrarkose
Details
Bad case, Linux 4.17.17, BIOS 2.8.1 (142.17 KB, application/zip)
2018-08-25 09:23 UTC, Esokrarkose
Details
Output of `acpidump` (Firmware 2.10.0) (1.01 MB, text/plain)
2019-02-06 15:14 UTC, Paul Menzel
Details
Screenshot of HTML output of `./analyze_suspend.py` (290.73 KB, image/png)
2020-06-25 08:18 UTC, Paul Menzel
Details
issue.def for long _WAK evaluation in acpi_pm_finish() (457 bytes, text/plain)
2020-06-28 02:34 UTC, Zhang Rui
Details

Description Esokrarkose 2018-02-23 20:42:41 UTC
Outsourcing the relevant info from the closed Bug 185611.
The XPS 13 resume time from suspend can be heavily optimized by activating busy polling during noirq suspend/resume phases. Doing the latter one could reduce the time spent in acpi_pm_finish to ~40ms, however only when booted with nopti.

In the following I present the benchmarks that proof that pti considerably slows down the wake up from suspend on the machine:

Resume from suspend without PTI (acpi_pm_finish: 38.897 ms):
https://bugzilla.kernel.org/attachment.cgi?id=274139
Resume from suspend with PTI (acpi_pm_finish: 391.431 ms)
https://bugzilla.kernel.org/attachment.cgi?id=274141

Could kpti be safely be disabled just before entering suspend at a very late stage and enabled again in the very early stage of activating user space?
Comment 1 Chen Yu 2018-03-12 01:18:42 UTC
Humm, why PTI would impact acpi_pm_finish is interesting! But I don't think PTI could be dynamically turned off/on during suspend/resume, could you provide pm-suspend with -f to see the backtrace of this stage
Comment 2 Esokrarkose 2018-03-12 22:37:42 UTC
Thanks very much for having a look at the bug. I tested 4.16rc5 today and could not reproduce the issue anymore :-).

@pmenzel: Btw: without any special boot parameters resume from suspend takes now 31ms.
Comment 3 Esokrarkose 2018-03-12 22:49:33 UTC
Ok it turns out I was too fast :-(. I don't know why but suddenly the problem reappeared. I'll provide the debug output as soon as possible.
Comment 4 Paul Menzel 2018-03-19 13:05:57 UTC
(In reply to Esokrarkose from comment #3)
> Ok it turns out I was too fast :-(. I don't know why but suddenly the
> problem reappeared. I'll provide the debug output as soon as possible.

On my system acpi_pm_finish takes very long, if I suspend the system the *second* time. Can you verify that?
Comment 5 superm1 2018-03-19 13:22:02 UTC
Created attachment 274817 [details]
attachment-23468-0.html

?
Hello,

I will be on vacation through 3/26/18. Expect delayed response.
Comment 6 Zhang Rui 2018-03-21 13:13:44 UTC
can you attach your dmidecode?

I also have a Dell XPS 13 9360 on hand, but can not reproduce the issue.
I'm using 4.16-rc4 kernel, and the acpi_pm_finish always takes around 21ms.

in your case, does the long latency and short latency happens in the same boot? or is the behavior consistent (always long or always short) for a single boot?
Comment 7 Chen Yu 2018-03-26 08:09:00 UTC
*** Bug 198583 has been marked as a duplicate of this bug. ***
Comment 8 Chen Yu 2018-03-26 08:11:07 UTC
@Esokrarkose @Paul
Do you have a change to test with pm-graph using -f option? It might take some time to generate/parse the html result, but it is very helpful to narrow down at which stage it is most time-consuming.
Comment 9 Chen Yu 2018-03-26 08:16:12 UTC
Wait...or maybe you can just test only tracking acpi_pm_finish() by:
1. append no_console_suspend in grub
2. after boot up,
  echo acpi_pm_finish > /sys/kernel/debug/tracing/set_graph_function
3. do the test
4. provide dmesg and /sys/kernel/debug/tracing/trace
Comment 10 Chen Yu 2018-04-12 00:58:07 UTC
@Esokrarkose@gmail.com
Could you provide your kernel config? We have a xps9360 in hand and could not reproduce, maybe we can have a try based on your config.
Comment 11 Chen Yu 2018-04-23 07:53:52 UTC
ping Esokrarkose@gmail.com
Comment 12 Esokrarkose 2018-04-23 08:43:56 UTC
Hi Chen,

First of all kudos for your commitment!
Sorry for the very late response, I am very busy with university recently.

The last kernel I tested was 4.16.0-rc6 which I did not build myself (from Debian experimental) and I did not observe the issue there strangely (I tried to invoke pm-graph a few times randomly and never got the inconsistent results anymore). The prior kernel (rc5) was built by me and I observed the problem there with long latency and short latency in the the SAME boot.

So to be honest I need to test again, I just did not have the time to look into it yet.

Just before writing this I invoked pm-graph again and got the attached error output.
Comment 13 Esokrarkose 2018-04-23 08:44:50 UTC
Created attachment 275519 [details]
Errors appearing during testing
Comment 14 Esokrarkose 2018-04-23 08:45:42 UTC
Created attachment 275521 [details]
Errors appearing during testing
Comment 15 Esokrarkose 2018-04-23 08:50:06 UTC
Created attachment 275523 [details]
Errors appearing during testing

I can reproduce the errors appearing during resume right now, have not yet rebooted.
Comment 16 Paul Menzel 2018-04-23 08:51:54 UTC
(In reply to Esokrarkose from comment #12)

[…]

> Just before writing this I invoked pm-graph again and got the attached error
> output.

Thank you for testing, but to keep this bug focused, these errors should be dealt with in separate bug reports. Looking at the output, the `acpi_pm_finish()` took 40 ms.

The traces in the Linux kernel log should be good enough to point developers in the right direction.
Comment 17 Esokrarkose 2018-04-28 13:30:05 UTC
Created attachment 275637 [details]
Output of today

I tested again today and observed the issue.
Comment 18 Esokrarkose 2018-04-28 13:30:44 UTC
Created attachment 275639 [details]
Output of today (1)
Comment 19 Esokrarkose 2018-04-28 13:32:05 UTC
Created attachment 275641 [details]
Output of today (ftrace)
Comment 20 Esokrarkose 2018-04-28 13:33:36 UTC
I could not try the -f option for pm-graph, everytime I try I get:

CAPTURING DMESG
PROCESSING DATA
ERROR: Not enough test data to build a timeline
DONE
Traceback (most recent call last):
  File "./analyze_suspend.py", line 5929, in <module>
    runTest()
  File "./analyze_suspend.py", line 5265, in runTest
    testruns, stamp = processData(True)
  File "./analyze_suspend.py", line 5227, in processData
    data = testruns[0]
IndexError: list index out of range

I will attach the files it generated while throwing that errors.
Comment 21 Esokrarkose 2018-04-28 13:40:31 UTC
Created attachment 275643 [details]
Kernel config
Comment 22 Esokrarkose 2018-04-28 13:41:45 UTC
Created attachment 275645 [details]
Dmidecode output
Comment 23 Esokrarkose 2018-04-28 19:25:18 UTC
Please find the big ftrace file where python threw the traceback (comment 20) here:
https://drive.google.com/file/d/1DL7Sq4fxqCnnZ38SnXTjh6DhOHQe0C0c/view?usp=sharing
Comment 24 Chen Yu 2018-05-02 06:43:46 UTC
HI Esokrarkose,
You can test without analyze_suspend tool - to track ftrace result directly
via #Comment 9. Please make sure to append 'no_console_suspend ignore_loglevel' in the boot commandline then after test provide both the dmesg and ftrace.
It is likely to be time costly in device power on phase, say, in _ON method.
Comment 25 Zhang Rui 2018-05-07 06:51:54 UTC
I'm confused, so the 10X delay is not 100% reproducible with KPI enabled?
how many suspend cycles have you tried to reproduce the problem?
and I'd suggest we do the similar number of suspend cycles in kpi disabled, to see if the problem is really related with kpti or not.

BTW, usually I use "analyze_suspend -m mem -multi 50 2 -rtcwake 10 -dev"
to run suspend-to-mem for 50 cycles, with rtcwake set to 10 seconds, or even 5 seconds.
Comment 26 Esokrarkose 2018-05-08 21:16:30 UTC
@Zhang, @Chen, To be honest the tests back then were statistically not significant, I got the feeling now that it was a coincidence that on the boot with pti disabled I got the shorter resume cycle. I did not have enough time to test back then.

However things have changed. In the meantime I have upgraded to the latest Bios from Dell (2.6.2) and now acpi_pm_finish() always takes around 300ms reproducibly without exception! :-(
Pti does not change that at all.

@Paul: Which Bios version are you at? Can you confirm what I am getting?
Comment 27 Esokrarkose 2018-05-08 21:18:40 UTC
Created attachment 275861 [details]
dmidecode (Bios v2.6.2)
Comment 28 Esokrarkose 2018-05-08 21:22:25 UTC
Created attachment 275863 [details]
analyze_suspend with kernel 4.16.7, Bios 2.6.2

This is 100% reproducible for me. I do not get the 40ms anymore. I get 400ms everytime I try.
Kernel is from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16.7/
Comment 29 Esokrarkose 2018-05-08 21:25:02 UTC
Created attachment 275865 [details]
dmesg of when I did the analyze_suspend
Comment 30 Esokrarkose 2018-05-08 21:42:49 UTC
Created attachment 275867 [details]
Trying to track acpi_pm_finish()

(In reply to Chen Yu from comment #9)
> Wait...or maybe you can just test only tracking acpi_pm_finish() by:
> 1. append no_console_suspend in grub
> 2. after boot up,
>   echo acpi_pm_finish > /sys/kernel/debug/tracing/set_graph_function
> 3. do the test
> 4. provide dmesg and /sys/kernel/debug/tracing/trace

I tried that, you can find all the files in the attached zip.
However I am really skeptical this worked, the trace does not look much like anything.
Comment 31 Esokrarkose 2018-05-10 08:54:53 UTC
As this is still tagged NEEDINFO, which information would you need?
Anything I could try for you or do you have a system to debug?
Comment 32 Esokrarkose 2018-05-10 09:16:09 UTC
(In reply to Esokrarkose from comment #28)
> This is 100% reproducible for me. I do not get the 40ms anymore. I get 400ms
> everytime I try.
> Kernel is from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16.7/

Today's test shows me that sometimes I do get 40ms again.
When I get 40ms I always seem to get 40ms and when I get 400ms I always seem to get 400ms until the machine is restarted, I still have not carried out enough tests though.

Could someone of your address the following?
> I could not try the -f option for pm-graph, everytime I try I get:
> 
> CAPTURING DMESG
> PROCESSING DATA
> ERROR: Not enough test data to build a timeline
> DONE
> Traceback (most recent call last):
>   File "./analyze_suspend.py", line 5929, in <module>
>     runTest()
>   File "./analyze_suspend.py", line 5265, in runTest
>     testruns, stamp = processData(True)
>   File "./analyze_suspend.py", line 5227, in processData
>     data = testruns[0]
> IndexError: list index out of range

What's wronge with using the -f option?
Comment 33 Zhang Rui 2018-06-20 05:50:18 UTC
hmmm, then can you please attach the tarball for the sleepgraph tool output in -dev mode? for both good and bad case?
say "sleepgraph.py -m mem -dev"
Comment 34 Chen Yu 2018-08-20 04:51:22 UTC
ping
Comment 35 Esokrarkose 2018-08-22 09:23:05 UTC
Created attachment 278021 [details]
Bad case, Linux 4.17.17, BIOS 2.8.1

Could at the moment not reproduce the good case.
Comment 36 Esokrarkose 2018-08-25 09:23:28 UTC
Created attachment 278085 [details]
Bad case, Linux 4.17.17, BIOS 2.8.1

I could not manage to get the 40ms case from half a year ago, often I get 400ms.
Comment 37 Zhang Rui 2018-12-27 14:34:29 UTC
what's the status in latest upstream kernel?
it seems that the 400ms is took in _WAK method. can you please attach the acpidump output?
Comment 38 Paul Menzel 2019-02-06 15:14:58 UTC
Created attachment 281019 [details]
Output of `acpidump` (Firmware 2.10.0)

(In reply to Zhang Rui from comment #37)

[…]

> it seems that the 400ms is took in _WAK method. can you please attach the
> acpidump output?

Please find it attached from BIOS 2.10.0 09/27/2018.
Comment 39 superm1 2019-02-06 15:26:54 UTC
Created attachment 281021 [details]
attachment-21503-0.html

?
?Hello,

I'm home sick Feb 6, expect delayed response..
Comment 40 Zhang Rui 2020-06-24 02:19:53 UTC
Is this still a problem in the latest upstream?
We have made quite some EC related changes recently.

Plus, we have 2 XPS 13 9360 machines running s3/s2idle stress test almost every week, but didn't observe this issue.
Comment 42 Paul Menzel 2020-06-24 04:50:20 UTC
(In reply to Zhang Rui from comment #40)
> Is this still a problem in the latest upstream?

I’ll have to check.

> We have made quite some EC related changes recently.

Are these changes in the 5.7.x release?

> Plus, we have 2 XPS 13 9360 machines running s3/s2idle stress test almost
> every week, but didn't observe this issue.

Are the results public? How long does `acpi_pm_finish()` take in the first and second resume cycle?
Comment 43 Paul Menzel 2020-06-25 08:18:51 UTC
Created attachment 289881 [details]
Screenshot of HTML output of `./analyze_suspend.py`

I just tested with Ubuntu 20.04 LTS with Linux 5.4.0-37-generic, and ran `./sleepgraph.py -config config/suspend-callgraph.cfg` and `maxdepth: 15`.

Both, the first and second suspend/resume cycle, `acpi_pm_finish()` took around 350 ms.

> acpi_pm_finish (351.081 ms) resume_machine

It could be though, that this is caused by active tracing.

The method below is the hotspot:

> acpi_ps_parse_aml (326.995 ms @ 121.989604)

It calls hundreds(?) of methods, each taking less than 1 ms, but a few take even up to 15 ms.
Comment 44 Zhang Rui 2020-06-28 02:27:41 UTC
I rechecked our s2ram test results on three different 9360 laptops, and confirms that this problem does exists, but occasionally. The average resume time is around 40ms, but it can increase to 400ms in some cases.

Todd, I will create a issue.def first so that we can get an overview of this problem among our test results.
Comment 45 Zhang Rui 2020-06-28 02:34:05 UTC
Created attachment 289903 [details]
issue.def for long _WAK evaluation in acpi_pm_finish()
Comment 46 Paul Menzel 2020-06-29 12:26:41 UTC
I can confirm your results. Even with no callgraph, there are outliers, in my case 350 ms. I ran `./sleepgraph.py -config/suspend.cfg -multi 50 15`.
Comment 47 Paul Menzel 2024-02-08 20:18:11 UTC
Zhang Rui, any update on this issue? Were you able to get an overview?

Note You need to log in before you can comment on or make changes to this bug.