Bug 201519 - Using direct-complete with radeon breaks system-wide suspend - HP ProBook 4540s
Summary: Using direct-complete with radeon breaks system-wide suspend - HP ProBook 4540s
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-25 09:55 UTC by Ярослав Семченко
Modified: 2019-02-14 23:28 UTC (History)
2 users (show)

See Also:
Kernel Version: v4.17
Tree: Mainline
Regression: Yes


Attachments
suspend log (116.56 KB, text/plain)
2019-02-07 20:49 UTC, Ярослав Семченко
Details
direct_complete debug patch (645 bytes, patch)
2019-02-07 23:13 UTC, Rafael J. Wysocki
Details | Diff
suspend log with direct_complete patch (147.20 KB, text/plain)
2019-02-08 13:58 UTC, Ярослав Семченко
Details
direct_complete debug without direct_complete propagation (693 bytes, patch)
2019-02-11 11:50 UTC, Rafael J. Wysocki
Details | Diff
suspend log with second patch (145.77 KB, text/plain)
2019-02-12 19:26 UTC, Ярослав Семченко
Details
radeon: Set DPM_FLAG_NEVER_SKIP (589 bytes, patch)
2019-02-13 23:18 UTC, Rafael J. Wysocki
Details | Diff

Description Ярослав Семченко 2018-10-25 09:55:15 UTC
Overview:

After kernel update, system no longer resumes from suspend.

Steps to Reproduce:

1) Suspend (e.g. by closing notebook lid)
2) Hit power button to resume

Actual Results:

Power LED is switched ON and nothing more happens. Screen remains black,
system is not responding even to SysRq commands, no requests to HDD is made.

Expected Results: 

System resumes to working state.

Build:

All kernels since c62ec4610c40bcc44f2d3d5ed1c312737279e2f3, including latest
upstream - v4.19 at the moment. Does not occur on 4.16.* kernels.

git bisect output:
c62ec4610c40bcc44f2d3d5ed1c312737279e2f3 is the first bad commit
commit c62ec4610c40bcc44f2d3d5ed1c312737279e2f3
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Tue May 22 13:02:17 2018 +0200

    PM / core: Fix direct_complete handling for devices with no callbacks
    
    Commit 08810a4119aa (PM / core: Add NEVER_SKIP and SMART_PREPARE
    driver flags) inadvertently prevented the power.direct_complete flag
    from being set for devices without PM callbacks and with disabled
    runtime PM which also prevents power.direct_complete from being set
    for their parents.  That led to problems including a resume crash on
    HP ZBook 14u.
    
    Restore the previous behavior by causing power.direct_complete to be
    set for those devices again, but do that in a more direct way to
    avoid overlooking that case in the future.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=199693
    Fixes: 08810a4119aa (PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags)
    Reported-by: Thomas Martitz <kugel@rockbox.org>
    Tested-by: Thomas Martitz <kugel@rockbox.org>
    Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
    Reviewed-by: Johan Hovold <johan@kernel.org>

:040000 040000 24388033a6e19486a816ae483b7a30dc8feb3de2 51921e87fbf3715cd3705f59f2216a886f4ca5d5 M      drivers

Additional Information:

Systemd journal does not have anything after entering suspend.
Applying this patch on upstream (or any other revision) seems to fix the issue:
diff --git a/drivers/base/power/main.c b/dric
index a690fd400260..ad827119c280 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1944,7 +1944,7 @@ static int device_prepv, pm_message_t state)
 unlock:
        device_unlock(dev);
 
-       if (ret < 0) {
+       if (ret < 0 || dev->power.no_pm_call
                suspend_report_result(callba
                pm_runtime_put(dev);
                return ret;

However, it is effectively the same as reverting commit
c62ec4610c40bcc44f2d3d5ed1c312737279e2f3, so I have no idea how to investigate
this further.

System:

HP ProBook 4540s
openSUSE Tumbleweed
Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz

lspci output:

00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4)
00:1c.5 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 6 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM76 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Thames [Radeon HD 7550M/7570M/7650M] (rev ff)
03:00.0 System peripheral: JMicron Technology Corp. SD/MMC Host Controller (rev 30)
03:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller (rev 30)
03:00.3 System peripheral: JMicron Technology Corp. MS Host Controller (rev 30)
04:00.0 Network controller: Ralink corp. RT3290 Wireless 802.11n 1T/1R PCIe
04:00.1 Bluetooth: Ralink corp. RT3290 Bluetooth
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)
Comment 1 Ярослав Семченко 2018-10-25 10:00:53 UTC
Whoops, patch is truncated in original message. Here it is:

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index a690fd400260..ad827119c280 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1944,7 +1944,7 @@ static int device_prepare(struct device *dev, pm_message_t state)
 unlock:
        device_unlock(dev);
 
-       if (ret < 0) {
+       if (ret < 0 || dev->power.no_pm_callbacks) {
                suspend_report_result(callback, ret);
                pm_runtime_put(dev);
                return ret;
Comment 2 Rafael J. Wysocki 2019-02-06 22:28:12 UTC
Please try to do (as root)

# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state

without reverting commit c62ec4610c40bcc44f and see if you get back to the command prompt after that (after several seconds).
Comment 3 Ярослав Семченко 2019-02-07 14:21:25 UTC
Yes, I am returned to command prompt after 5 seconds. I also tried to write another options into /sys/power/pm_test, all are working fine except 'none'.
Comment 4 Rafael J. Wysocki 2019-02-07 18:08:34 UTC
OK, so please do (as root):

# echo 1 > /sys/power/pm_debug_messages
# echo 1 > /sys/power/pm_print_times
# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state
# dmesg > suspend.log

and attach suspend.log to this bug entry.
Comment 5 Ярослав Семченко 2019-02-07 20:49:25 UTC
Created attachment 281051 [details]
suspend log
Comment 6 Rafael J. Wysocki 2019-02-07 23:13:32 UTC
Created attachment 281055 [details]
direct_complete debug patch

Thanks, but I still need more information.

Please apply this patch (on top of 5.0-rc5, for example), repeat the steps from the previous comment and attach the log.
Comment 7 Ярослав Семченко 2019-02-08 13:58:15 UTC
Created attachment 281061 [details]
suspend log with direct_complete patch

Here it is (patch applied on top of 4.20.4 kernel version).
Comment 8 Ярослав Семченко 2019-02-09 13:00:39 UTC
I've also checked suggestion that suspend isn't working with kernel versions before v4.15, but it is working fine with both v4.14.39 and v4.14.98, none of them include 08810a4119aa commit.
Comment 9 Dimitri 2019-02-09 18:38:52 UTC
(In reply to Ярослав Семченко from comment #8)
> I've also checked suggestion that suspend isn't working with kernel versions
> before v4.15, but it is working fine with both v4.14.39 and v4.14.98, none
> of them include 08810a4119aa commit.

In my asus zenbook ux331ual almost the same problem with resume. The difference is in type of sleep. S2Idle is set by default - and with it my laptop suspends and wakeups good. But if I manually set echo deep > /sys/power/mem_sleep and then push device to sleep (by lid or Fn+F1) it starts to slowly blink by led, but then if I try to wake it up - led on power button activates, then led on F2 (flight mode) blinks ones and then nothing happens, no reaction on anything except long press on power button.
But, if AC power cable is plugged the "deep" sleep pass well as well resume. The cable must be plugged all stages. If I put laptop to deep sleep and unplug cable - then resume is never happens.
I checked kernels 4.14.98 and the latests - always the same behaviour
Comment 10 Rafael J. Wysocki 2019-02-11 11:50:05 UTC
Created attachment 281101 [details]
direct_complete debug without direct_complete propagation

Thanks!

Please apply this patch (instead of the previous one), repeat the previous steps and attach the resultant log.

Also please check if full suspend/resume work with this patch applied.
Comment 11 Rafael J. Wysocki 2019-02-11 12:02:38 UTC
(In reply to Ярослав Семченко from comment #8)
> I've also checked suggestion that suspend isn't working with kernel versions
> before v4.15, but it is working fine with both v4.14.39 and v4.14.98, none
> of them include 08810a4119aa commit.

Well, according to your report, setting direct_complete for devices without PM callbacks is the source of the problem.

However, that used to be done in the mainline before commit 08810a4119aa (or in the stable kernels without it), so they also should fail in theory.

Now, if they don't, this indicates that there's an additional change needed to trigger the failure that was made in the meantime.
Comment 12 Ярослав Семченко 2019-02-12 19:26:42 UTC
Created attachment 281115 [details]
suspend log with second patch

Here is the log, and with this patch full suspend/resume works correctly.
Comment 13 Rafael J. Wysocki 2019-02-13 23:18:27 UTC
Created attachment 281131 [details]
radeon: Set DPM_FLAG_NEVER_SKIP

Please check if suspend works with this patch applied (and without the previous patches).
Comment 14 Ярослав Семченко 2019-02-14 14:29:28 UTC
Yes, it works correctly with this patch.
Comment 15 Rafael J. Wysocki 2019-02-14 21:49:00 UTC
Good, thank you!

Let me add a changelog to this patch and post it, then.
Comment 16 Rafael J. Wysocki 2019-02-14 23:28:34 UTC
Patch submitted: https://patchwork.kernel.org/patch/10813959/

Note You need to log in before you can comment on or make changes to this bug.