Created attachment 232871 [details] dmesg of multiple suspend/cycles with 4.4.9 Cannot suspend multiple times in my laptop (samsung 305V4A) after upgrading from kernel 4.4.9 to 4.5.3 (also tested in 4.7.2) when i have some custom power saving udev rules. Udev rule: ACTION=="add", SUBSYSTEM=="pci", ATTR{power/control}="auto"
Created attachment 232881 [details] journalctl -k of suspend/resume cycle until freeze in 4.5.3
Created attachment 232891 [details] journalctl of suspend/resume cycles until freeze in 4.7.2
bisection of kernel: # bad: [fbc310e9c553412ebe72c14e5a7bb9807a3d1109] Linux 4.5.3 # good: [1a1a512b983108015ced1e7a7c7775cfeec42d8c] Linux 4.4.9 git bisect start 'v4.5.3' 'v4.4.9' # good: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4 git bisect good afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc # good: [e535d74bc50df2357d3253f8f3ca48c66d0d892a] Merge tag 'docs-4.5' of git://git.lwn.net/linux git bisect good e535d74bc50df2357d3253f8f3ca48c66d0d892a # bad: [d43421565bf0510d35e6a39ebf96586ad486f3aa] Merge tag 'pci-v4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci git bisect bad d43421565bf0510d35e6a39ebf96586ad486f3aa # skip: [984065055e6e39f8dd812529e11922374bd39352] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux git bisect skip 984065055e6e39f8dd812529e11922374bd39352 # good: [6c03a6bd0dd836db388feb28fda1868037491ee7] drm/i915: Don't register CRT connector when it's fused off git bisect good 6c03a6bd0dd836db388feb28fda1868037491ee7 # good: [d0710abbcd88b1ff17760e97d74a673e67b49ea1] drm/i915: Set the map-and-fenceable flag for preallocated objects git bisect good d0710abbcd88b1ff17760e97d74a673e67b49ea1 # good: [2d663b55816e5c1d211a77fff90687053fe78aac] Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit git bisect good 2d663b55816e5c1d211a77fff90687053fe78aac # good: [1305eda751d7df3069b1fcb6f62036185acd24a0] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect good 1305eda751d7df3069b1fcb6f62036185acd24a0 # good: [f0dba77620368d154bff9542675c6844e4678761] Merge tag 'davinci-for-v4.5/dts' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into next/dt git bisect good f0dba77620368d154bff9542675c6844e4678761 # good: [f9cd69fe5eb6347b4de56458d0378bc0fa44bce9] Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect good f9cd69fe5eb6347b4de56458d0378bc0fa44bce9 # bad: [30f05309bde49295e02e45c7e615f73aa4e0ccc2] Merge tag 'pm+acpi-4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 30f05309bde49295e02e45c7e615f73aa4e0ccc2 # good: [ce96cb7386a57b270648f9ba6003065329a26bd3] Merge tag 'samsung-clk-exynos4-4.5' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into next/drivers git bisect good ce96cb7386a57b270648f9ba6003065329a26bd3 # bad: [f11aef69b235bc30c323776d75ac23b43aac45bb] Merge branch 'pm-cpuidle' git bisect bad f11aef69b235bc30c323776d75ac23b43aac45bb # bad: [6efd3f8cde1d6acc20a715ac6ea17e01421742df] Merge branch 'pm-core' git bisect bad 6efd3f8cde1d6acc20a715ac6ea17e01421742df # good: [a72aea722f1b43442c9e219de824d5975dcdaa61] Merge branches 'acpica', 'acpi-video' and 'acpi-fan' git bisect good a72aea722f1b43442c9e219de824d5975dcdaa61 # bad: [aa8e54b559479d0cb7eb632ba443b8cacd20cd4b] PM / sleep: Go direct_complete if driver has no callbacks git bisect bad aa8e54b559479d0cb7eb632ba443b8cacd20cd4b # good: [6b9cb42752dafba3761dde0002ca58ca518b6311] device core: add device_is_bound() git bisect good 6b9cb42752dafba3761dde0002ca58ca518b6311 # good: [989561de9b5112999475b406557d9c7e9e59c041] PM / Domains: add setter for dev.pm_domain git bisect good 989561de9b5112999475b406557d9c7e9e59c041 # first bad commit: [aa8e54b559479d0cb7eb632ba443b8cacd20cd4b] PM / sleep: Go direct_complete if driver has no callbacks
does the problem go away if you revert this commit aa8e54b559479d0cb7eb632ba443b8cacd20cd4b?
reverting the commit on kernel 4.5.3 the problem seems to go away
This seems indeed a regression to me. I will ping the patch author to look at this issue. BTW, can you please confirm the problem still exists in the latest upstream kernel?
the problem still persist in with fedora kernels 4.8.0-0.rc8.git0.1 and 4.7.4
Don't have access to the HW, so we should get some debug logging. Maybe work with the Fedora kernel team to provide those?
Tomeu, what should we do for this bug? If you want something to be tested, please ask n0000b.n000b@gmail.com directly :)
please supply the output from lsmod it appears that a driver needs to be updated to suspend properly.
Created attachment 242631 [details] lsmod lsmod before suspend and freeze
can you please check if the problem still exists with the latest kernel?
(In reply to Zhang Rui from comment #12) > can you please check if the problem still exists with the latest kernel? problem still persist in fedora 23 with kernel 4.9 rc5
Please try the test modes of system suspend (as described in Documentation/power/basic-pm-debugging.txt) and see if you can reproduce the problem in any of them. It also would be good to check if unloading any driver modules before suspending the system makes the problem go away.
ping...
testing in the different modes I can suspend and resume multiple times (tested 5 times each) without hanging the laptop. Also testing with the options 'devices' 'platform' processors' and 'core the screen goes blank and doesn't come back. Testing unloading driver modules i cannot find one that make the problem go away.
Created attachment 248041 [details] journalctl -b after echo freezer > /sys/power/pm_test
Created attachment 248051 [details] journalctl -b after echo devices > /sys/power/pm_test screen goes blank after suspend
Created attachment 248061 [details] journalctl -b after echo platform > /sys/power/pm_test
Created attachment 248071 [details] journalctl -b after echo processor > /sys/power/pm_test
Created attachment 248081 [details] journalctl -b after echo core > /sys/power/pm_test
Created attachment 250841 [details] debug patch please apply this patch with latest kernel, and attach the dmesg output after multiple suspends. (Note that the suspend failure should not be reproducible with this debug patch applied)
Created attachment 251071 [details] dmesg after multiple suspend with patched kernel
it seems that there are really a lot of devices impacted by this. (In reply to n0000b.n000b from comment #0) > Created attachment 232871 [details] > dmesg of multiple suspend/cycles with 4.4.9 > > Cannot suspend multiple times in my laptop (samsung 305V4A) after upgrading > from kernel 4.4.9 to 4.5.3 (also tested in 4.7.2) when i have some custom > power saving udev rules. > when saying "Cannot suspend", what do you mean? does the system freezes during suspend?resume? > Udev rule: > > ACTION=="add", SUBSYSTEM=="pci", ATTR{power/control}="auto" what if you disable this udev rule?
(In reply to Zhang Rui from comment #24) > when saying "Cannot suspend", what do you mean? does the system freezes > during suspend?resume? I can suspend successfully 2 times, on the third time the system hang in the suspend cycle, the leds doesn't turn off. > what if you disable this udev rule? if i disable the udev rule i can suspend multiple times without problems (of all the times i've tried)
then, I think the problem can be reproduced if you remove the udev rule, and enable the runtime PM for PCI devices explicitly, using a script like for device in $(ls /sys/bus/pci/devices) do echo auto > /sys/bus/pci/devices/$device/power/control done If yes, then please 1. disable the device runtime PM for all the PCI devices, using for device in $(ls /sys/bus/pci/devices) do echo on > /sys/bus/pci/devices/$device/power/control done 2. and then, enable the device runtime PM one by one, and check which device' runtime PM causes the suspend hang.
Created attachment 251251 [details] output of lspci the system hangs the third time suspending after the next command echo auto > sys/bus/pci/devices/0000\:00\:14.4/power/control attached is the output of lspci on this machine
and before aa8e54b559479d0cb7eb632ba443b8cacd20cd4b ("PM / sleep: Go direct_complete if driver has no callbacks"), suspend always works even if you set auto to 0000:00:14.4, right? please attach the output of "lspci -xv" instead.
Created attachment 251351 [details] lspci -xv >and before aa8e54b559479d0cb7eb632ba443b8cacd20cd4b ("PM / sleep: Go >>direct_complete if driver has no callbacks"), suspend always works even if >you >set auto to 0000:00:14.4, right? yes, suspend always work attached is lspci -xv
we need PCI expert on this issue.
Now we've confirmed that the problem can only be reproduced 1. with commit commit aa8e54b55947 ("PM / sleep: Go >direct_complete if driver has no callbacks") AND 2. with runtime PM for pci:0000:00:14.4 is enabled. The difference brought by commit aa8e54b55947 is that device->direct_complete flag is set. And the difference brought by runtime PM is that the device can be in runtime suspended state when the system suspends.
please attach the output of "tree /sys/bus/pci/devices/0000\:00\:14.4". please check if the problem can be reproduced with async PM disabled (echo 0 > /sys/power/pm_async please check if this is also a regression to suspend-to-idle (echo freeze > /sys/power/state)
Created attachment 252411 [details] tree /sys/bus/pci/devices/0000\:00\:14.4 > please check if the problem can be reproduced with async PM disabled >(echo 0 > /sys/power/pm_async the problem can be reproduced (the system freezes on the third suspend) with pm_async 0 >please check if this is also a regression to suspend-to-idle (echo freeze > /sys/power/state) i cannot test this at the moment because the screen doesn't come back after the first try on suspend-to-idle
Created attachment 252481 [details] dmesg output after various suspend to idle cycles suspend to idle seems to work, but the screen doesn't come back. Attached is the dmesg after various cycles (extracted via ssh)
ping?
I got a question about commit aa8e54b55947 ("PM / sleep: Go >direct_complete if driver has no callbacks") Above commit checks if driver for device A has any pm_callbacks, if not, device A will be marked as go_direct_complete, thus A's parent P will ignore A. But how about A's children? If A's children has pm callbacks, will they be ignored as a result of this patch? (Since the original patch to introduce go_direct_complete has mentioned A's children: A and A's children are ok to remain in runtimesuspend, then prepare() will return a non-zero value.)
problem is still present in fedora 25 with kernel 4.11.0-0.rc5.git0.1 from rawhide
system still hangs in the third suspend cycle with the udev rule in fedora 26 beta and kernel 4.12.0-0.rc5 from rawhide
Can you test Linux-4.14-rc2 or newer? It may contain a patch that fixes this.
tested kernel 4.14.0-0.rc2.git1.2.fc28.x86_64 in fedora 26, the bug is still present, the laptop hangs in the third suspend.
Well, OK Can you please test kernels that don't come from Fedora? Like something you compiled yourself? I'm asking, because I'm wondering if you can test patches.
Some updates: Now I'm running fedora 27 with rawhide kernels. Since kernel version 4.15-rc1 I'm able to suspend more than 3 times, but after some days of use (and several suspend and resume cycles) the system hangs. I'm testing the package 4.15.0-0.rc4.git0.1.fc28.x86_64 from fedora, i've been able to suspend for the third time but im seeing some errors in the log: [37312.538944] swiotlb_tbl_map_single: 6 callbacks suppressed [37312.538954] radeon 0000:00:01.0: swiotlb buffer is full (sz: 2097152 bytes) [37312.538957] swiotlb: coherent allocation failed for device 0000:00:01.0 size=2097152 [37312.538964] CPU: 1 PID: 10832 Comm: kworker/u8:25 Not tainted 4.15.0-0.rc4.git0.1.fc28.x86_64 #1 [37312.538967] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 305V4A/305V5A/3415VA/305V4A/305V4A, BIOS 09PW.ME13.20121101.SKK 11/01/2012 [37312.538980] Workqueue: events_unbound async_run_entry_fn [37312.538983] Call Trace: [37312.538999] dump_stack+0x5c/0x85 [37312.539006] swiotlb_alloc_coherent+0xe0/0x150 [37312.539027] ttm_dma_pool_get_pages+0x20b/0x5e0 [ttm] [37312.539043] ttm_dma_populate+0x24d/0x340 [ttm] [37312.539055] ttm_tt_bind+0x23/0x50 [ttm] [37312.539070] ttm_bo_handle_move_mem+0x5cd/0x600 [ttm] [37312.539083] ttm_bo_evict+0x147/0x310 [ttm] [37312.539097] ttm_mem_evict_first+0x15b/0x1d0 [ttm] [37312.539109] ttm_bo_force_list_clean+0x67/0x110 [ttm] [37312.539180] radeon_suspend_kms+0xb5/0x3b0 [radeon] [37312.539189] pci_pm_suspend+0x76/0x120 [37312.539194] ? pci_pm_freeze+0xb0/0xb0 [37312.539198] dpm_run_callback+0x4b/0x130 [37312.539203] __device_suspend+0x116/0x420 [37312.539207] async_suspend+0x1a/0x90 [37312.539213] async_run_entry_fn+0x33/0x160 [37312.539218] process_one_work+0x182/0x3a0 [37312.539223] worker_thread+0x2e/0x380 [37312.539228] ? process_one_work+0x3a0/0x3a0 [37312.539231] kthread+0x111/0x130 [37312.539236] ? kthread_create_worker_on_cpu+0x70/0x70 [37312.539242] ret_from_fork+0x1f/0x30 swiotlb: coherent allocation failed for device 0000:02:00.0 size=2097152 [37313.853268] CPU: 1 PID: 10825 Comm: kworker/u8:18 Not tainted 4.15.0-0.rc4.git0.1.fc28.x86_64 #1 [37313.853271] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 305V4A/305V5A/3415VA/305V4A/305V4A, BIOS 09PW.ME13.20121101.SKK 11/01/2012 [37313.853285] Workqueue: events_unbound async_run_entry_fn [37313.853288] Call Trace: [37313.853303] dump_stack+0x5c/0x85 [37313.853310] swiotlb_alloc_coherent+0xe0/0x150 [37313.853333] ttm_dma_pool_get_pages+0x20b/0x5e0 [ttm] [37313.853349] ttm_dma_populate+0x24d/0x340 [ttm] [37313.853362] ttm_bo_move_memcpy+0x17f/0x600 [ttm] [37313.853369] ? acpi_os_release_object+0xa/0x10 [37313.853445] radeon_bo_move+0x1a7/0x220 [radeon] [37313.853460] ttm_bo_handle_move_mem+0x2ae/0x600 [ttm] [37313.853473] ttm_bo_evict+0x147/0x310 [ttm] [37313.853530] ? radeon_pm_compute_clocks_dpm+0xf3/0x500 [radeon] [37313.853551] ? drm_kms_helper_poll_enable.part.4+0x50/0xb0 [drm_kms_helper] [37313.853557] ? find_next_iomem_res+0x33/0x100 [37313.853570] ttm_mem_evict_first+0x15b/0x1d0 [ttm] [37313.853582] ttm_bo_force_list_clean+0x67/0x110 [ttm] [37313.853622] radeon_suspend_kms+0x112/0x3b0 [radeon] [37313.853630] pci_pm_suspend+0x76/0x120 [37313.853634] ? pci_pm_freeze+0xb0/0xb0 [37313.853638] dpm_run_callback+0x4b/0x130 [37313.853643] __device_suspend+0x116/0x420 [37313.853648] async_suspend+0x1a/0x90 [37313.853652] async_run_entry_fn+0x33/0x160 [37313.853658] process_one_work+0x182/0x3a0 [37313.853663] worker_thread+0x2e/0x380 [37313.853668] ? process_one_work+0x3a0/0x3a0 [37313.853671] kthread+0x111/0x130 [37313.853675] ? kthread_create_worker_on_cpu+0x70/0x70 [37313.853681] ret_from_fork+0x1f/0x30 And yes I can test kernels out of Fedora and patches but at a slow pace
what if you blacklist radeon driver? you may get vga console only, but does suspend/resume works for many times in this case?
im in kernel 4.15-rc4 and since rc1 I can suspend multiple times (i'm in the six time right now), i haven't tested if this works if i keep running the laptop for several days
good news. Please confirm if the problem is gone in latest upstream kernel.
can you please confirm if the problem in gone in latest upstream kernel?
i've been using the kernel 4.15-rc9 and i can suspend and resume multiple times, I will now test the final release and report back
With recent kernels i can suspend and resume multiple times, sometimes the session crashes but the bug reported is solved. Thanks.