Created attachment 255063 [details] dmesg log with failed suspend On an Intel KBL machine suspends fails with the following stack trace. I attached the dmesg log. [ 244.186724] PM: Preparing system for sleep (mem) [ 244.190427] Freezing user space processes ... [ 264.199520] Freezing of tasks failed after 20.009 seconds (1 tasks refusing to freeze, +wq_busy=0): [ 264.199722] systemd-udevd D 0 319 281 0x00000004 [ 264.199785] Call Trace: [ 264.199819] __schedule+0x30e/0xbd0 [ 264.199848] schedule+0x3b/0x90 [ 264.199867] schedule_timeout+0x23b/0x490 [ 264.199886] ? _raw_spin_unlock_irq+0x27/0x50 [ 264.199904] ? __this_cpu_preempt_check+0x13/0x20 [ 264.199923] ? trace_hardirqs_on_caller+0xe7/0x200 [ 264.199947] wait_for_common+0x11a/0x1d0 [ 264.200075] ? wake_up_q+0x70/0x70 [ 264.200100] wait_for_completion+0x18/0x20 [ 264.200118] cpuhp_issue_call+0x9b/0xd0 [ 264.200137] __cpuhp_setup_state+0xe9/0x170 [ 264.200159] ? coretemp_remove+0x60/0x60 [coretemp] [ 264.200174] ? 0xffffffffa0275000 [ 264.200197] coretemp_init+0x8b/0x1000 [coretemp] [ 264.200217] do_one_initcall+0x3f/0x170 [ 264.200236] ? rcu_read_lock_sched_held+0x75/0x80 [ 264.200255] ? kmem_cache_alloc_trace+0x274/0x2e0 [ 264.200269] ? do_init_module+0x22/0x1fb [ 264.200311] load_module+0x2091/0x2410 [ 264.200326] ? symbol_put_addr+0x60/0x60 [ 264.200356] ? kernel_read_file+0x105/0x190 [ 264.200389] SyS_finit_module+0xbc/0xf0 [ 264.200424] entry_SYSCALL_64_fastpath+0x1c/0xb1
CC'ing more maintainers.
Maybe related to hotplug state machine. Copying tglx@ for input.
Created attachment 255099 [details] 4.11-rc1 dmesg 4.11-rc1 hung systemd-udev dmesg
The above attached dmesg is from this mornings drm-tip kernel, building 4.11-rc1 ATM to verify.
Seeing the same error under 4.11 rc1, but not every boot.
Created attachment 255107 [details] tlp configuration Disabling tlp via systemtl disable tlp fixes the seems to fix the issue for me (3 boots in a row successful), attaching my /etc/tlp/default as a reference.
Created attachment 255285 [details] dmesg-disabled-tlp The bug still happens with a disabled tlp, although a lot less, it is sometimes triggered.
Adding Thorsten Leemhuis for potential regression tracking.
(In reply to Rouven Czerwinski from comment #8) > Adding Thorsten Leemhuis for potential regression tracking. Thx for this, but I'm sorry, it won't make it to the regression reports. I only do them in my spare time and the time I have to compile those is quite limited. So to get at least something done I'm focussing on mainline regressions in the current development version (4.11); seem this regression was introduced in 4.10, so it won't make the list I'm currently compiling, as that is for 4.11 only. Sorry to disappoint you. I hope to have things a bit more optimized in the future to keep track of regressions in older release as well. But one step at a time.
Commit dc434e056fe1dada20df7ba07f32739d3a701adf from 4.11-rc3 seems to have fixed the problem for me.
Imre, can you please check if applying commit dc434e056fe1dada20df7ba07f32739d3a701adf on top of your 4.10 kernel also solves the problem for you?
(In reply to Jean Delvare from comment #11) > Imre, can you please check if applying commit > dc434e056fe1dada20df7ba07f32739d3a701adf on top of your 4.10 kernel also > solves the problem for you? The problem doesn't seem to happen any more on the KBL machine I reported with a 4.11-rc4 based drm-tip kernel (which then has dc434e056fe1da). If it's also ok with you we could close this bug based on this result. Getting access to that machine is cumbersome (it's in our CI farm).
If 4.11-rc4 works for you then we can indeed close this as fixed. If it could be confirmed that commit dc434e056fe1dada20df7ba07f32739d3a701adf fixes it, then that commit should be considered for stable and longterm kernel branches.
For the record, commit dc434e056fe1 ("cpu/hotplug: Serialize callback invocations proper") was included in stable kernel 4.10.14.