Bug 217955 - Performance regression: resume_console takes 100ms longer in S2idle/S3 resume in v6.6-rc1
Summary: Performance regression: resume_console takes 100ms longer in S2idle/S3 resume...
Alias: None
Product: Drivers
Classification: Unclassified
Component: Serial (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Russell King
Depends on:
Blocks: 178231
  Show dependency tree
Reported: 2023-09-27 20:15 UTC by Todd Brandt
Modified: 2023-09-30 10:23 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.6.0-rc1
Regression: Yes
Bisected commit-id: 9e70a5e109a4a23367810de09be826c52d27ee2f

otcpl-asus-e200-cht_freeze-6.5.html (757.85 KB, text/html)
2023-09-27 20:26 UTC, Todd Brandt
otcpl-asus-e200-cht_freeze-6.6-rc1.html (724.35 KB, text/html)
2023-09-27 20:26 UTC, Todd Brandt
otcpl-dell-p3520_freeze-6.5.html (560.03 KB, text/html)
2023-09-27 20:26 UTC, Todd Brandt
otcpl-dell-p3520_freeze-6.6-rc1.html (567.17 KB, text/html)
2023-09-27 20:27 UTC, Todd Brandt
otcpl-z170x-ud5_freeze-6.5.html (474.01 KB, text/html)
2023-09-27 20:27 UTC, Todd Brandt
otcpl-z170x-ud5_freeze-6.6-rc1.html (476.63 KB, text/html)
2023-09-27 20:28 UTC, Todd Brandt

Description Todd Brandt 2023-09-27 20:15:53 UTC
Ever since 6.6.0-rc1 we've seen S3 and S2idle resume take 100ms longer because of resume_comsole. resume_console ordinarily takes only a few milliseconds, but now it's consistently 100ms. I've bisected the issue to this commit:

commit 9e70a5e109a4a23367810de09be826c52d27ee2f
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Jul 17 21:52:06 2023 +0206

    printk: Add per-console suspended state
    Currently the global @console_suspended is used to determine if
    consoles are in a suspended state. Its primary purpose is to allow
    usage of the console_lock when suspended without causing console
    printing. It is synchronized by the console_lock.
    Rather than relying on the console_lock to determine suspended
    state, make it an official per-console state that is set within
    console->flags. This allows the state to be queried via SRCU.
    Remove @console_suspended. Console printing will still be avoided
    when suspended because console_is_usable() returns false when
    the new suspended flag is set for that console.

We are seeing this on roughly 2/3 of our machines, both on test systems and production systems.

I will attach sleepgraph timelines showing the effect. The dmesg logs and system data are contained in the html timelines and can be viewed by clicking the log and dmesg buttons.
Comment 1 Todd Brandt 2023-09-27 20:26:21 UTC
Created attachment 305152 [details]
Comment 2 Todd Brandt 2023-09-27 20:26:39 UTC
Created attachment 305153 [details]
Comment 3 Todd Brandt 2023-09-27 20:26:58 UTC
Created attachment 305154 [details]
Comment 4 Todd Brandt 2023-09-27 20:27:17 UTC
Created attachment 305155 [details]
Comment 5 Todd Brandt 2023-09-27 20:27:45 UTC
Created attachment 305156 [details]
Comment 6 Todd Brandt 2023-09-27 20:28:07 UTC
Created attachment 305157 [details]
Comment 7 Todd Brandt 2023-09-27 20:32:41 UTC
The effect is most pronounced in the GigaByte z170x UD5. It goes from 300ms to 400ms because of an msleep 100 in the resume_console code. This might not seem like much but it's in series with everything else so it will always be there. Our goal is to keep both suspend and resume under 1 second if at all possible, so every bit counts.
Comment 8 John Ogness 2023-09-29 09:33:10 UTC
Thanks for reporting. I can reproduce this with Qemu. I am looking into it.
Comment 10 Todd Brandt 2023-09-29 20:51:05 UTC
I'll test this in this weekend's 48 hour run. Thanks.
Comment 11 Todd Brandt 2023-09-30 02:24:40 UTC
I've just completed an hour block of testing on the affected systems and can confirm that the issue is resolved. Thanks!

Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>

Note You need to log in before you can comment on or make changes to this bug.