Ever since 6.6.0-rc1 we've seen S3 and S2idle resume take 100ms longer because of resume_comsole. resume_console ordinarily takes only a few milliseconds, but now it's consistently 100ms. I've bisected the issue to this commit: commit 9e70a5e109a4a23367810de09be826c52d27ee2f Author: John Ogness <john.ogness@linutronix.de> Date: Mon Jul 17 21:52:06 2023 +0206 printk: Add per-console suspended state Currently the global @console_suspended is used to determine if consoles are in a suspended state. Its primary purpose is to allow usage of the console_lock when suspended without causing console printing. It is synchronized by the console_lock. Rather than relying on the console_lock to determine suspended state, make it an official per-console state that is set within console->flags. This allows the state to be queried via SRCU. Remove @console_suspended. Console printing will still be avoided when suspended because console_is_usable() returns false when the new suspended flag is set for that console. We are seeing this on roughly 2/3 of our machines, both on test systems and production systems. I will attach sleepgraph timelines showing the effect. The dmesg logs and system data are contained in the html timelines and can be viewed by clicking the log and dmesg buttons.
Created attachment 305152 [details] otcpl-asus-e200-cht_freeze-6.5.html
Created attachment 305153 [details] otcpl-asus-e200-cht_freeze-6.6-rc1.html
Created attachment 305154 [details] otcpl-dell-p3520_freeze-6.5.html
Created attachment 305155 [details] otcpl-dell-p3520_freeze-6.6-rc1.html
Created attachment 305156 [details] otcpl-z170x-ud5_freeze-6.5.html
Created attachment 305157 [details] otcpl-z170x-ud5_freeze-6.6-rc1.html
The effect is most pronounced in the GigaByte z170x UD5. It goes from 300ms to 400ms because of an msleep 100 in the resume_console code. This might not seem like much but it's in series with everything else so it will always be there. Our goal is to keep both suspend and resume under 1 second if at all possible, so every bit counts.
Thanks for reporting. I can reproduce this with Qemu. I am looking into it.
Patch posted: https://lore.kernel.org/lkml/20230929113233.863824-1-john.ogness@linutronix.de/
I'll test this in this weekend's 48 hour run. Thanks.
I've just completed an hour block of testing on the affected systems and can confirm that the issue is resolved. Thanks! Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>