Bug 202679

Summary: disabling secondary CPU hangs / system fails to suspend with kernel 4.19+ on Lenovo ThinkPad X1 Carbon 5th
Product: Drivers Reporter: Thomas (thomas)
Component: WatchdogAssignee: drivers_watchdog (drivers_watchdog)
Status: RESOLVED OBSOLETE    
Severity: normal CC: chrisjohgorman, rui.zhang
Priority: P1    
Hardware: Intel   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1671504
Kernel Version: 4.19 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg from failed suspend attempt with 4.20.6 and wifi and LTE disabled

Description Thomas 2019-02-25 20:12:01 UTC
Created attachment 281343 [details]
dmesg from failed suspend attempt with 4.20.6 and wifi and LTE disabled

Starting with kernel 4.19 my Lenovo ThinkPad X1 Carbon 5th fails to suspend to RAM.
When closing the lid or executing "systemctl suspend" the screen goes black and the status led starts to blink rapidly (just like when power is plugged in).
The keyboard lights can still be toggled using Fn+space, so the firmware appears to be (partly) alive.

When I execute
> echo core > /sys/power/pm_test
or
> echo processors > /sys/power/pm_test
and then
> echo mem > /sys/power/state
the system immediately goes blank and freezes just like when I really try to activate suspend.

If I try to execute
# chcpu -d 1,2,3
or
# echo 0 > /sys/devices/system/cpu/cpu1/online
the command blocks, while the system itself remains (mostly) usable.
Unfortunately, no message whatsoever is shown in the kernel logs. Also, reboot or poweroff no longer works and the system needs a hard reset. :(


I have bisected the kernel and found the culprit (or at least something, that appears to trigger the bad behavior):

[be45bf5395e0886a93fc816bbe41a008ec2e42e2] watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug
be45bf5395e0886a93fc816bbe41a008ec2e42e2 is the first bad commit
commit be45bf5395e0886a93fc816bbe41a008ec2e42e2
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jul 13 12:42:08 2018 +0200

    watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug
    
    When scheduling is delayed for longer than the softlockup interrupt
    period it is possible to double-queue the cpu_stop_work, causing list
    corruption.
    
    Cure this by adding a completion to track the cpu_stop_work's
    progress.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Tested-by: Rong Chen <rong.a.chen@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
    Link: http://lkml.kernel.org/r/20180713104208.GW2494@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

:040000 040000 6aca2dbb84bc33fe442b18b3d0a135c27adff7b9 2710af12d32e4b98df07768716689b213bce45fc M      kernel
Comment 1 Zhang Rui 2019-03-11 03:17:23 UTC
*** Bug 202137 has been marked as a duplicate of this bug. ***
Comment 2 Thomas 2019-04-12 05:29:05 UTC
Good news: starting with 5.0.6 suspend is working again.