Bug 201095

Summary: HP Pavilion x360: Hang on freeze after disabling runtime suspend for DMA controller (INTL9C60)
Product: Power Management Reporter: Todd Brandt (todd.e.brandt)
Component: Run-Time-PMAssignee: Andy Shevchenko (andy.shevchenko)
Status: CLOSED CODE_FIX    
Severity: normal CC: rui.zhang
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.19.0-rc2 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 178231    
Attachments: A dmesg log captured prior to a test run

Description Todd Brandt 2018-09-11 23:04:38 UTC
Created attachment 278461 [details]
A dmesg log captured prior to a test run

I found this issue during our weekly stress testing. I run sleepgraph with the "-rs off" option which disables runtime suspend on all devices prior to suspend. Freeze failed every time on the HP Pavilion x360, so I debugged the issue to one device: the DMA controller (INTL9C60:01).

If I disable runtime suspend on INTL9C60:01 and then issue a freeze, the system hangs. Note that this doesn't happen with mem (mem works fine after the disable).

echo "on" > /sys/devices/pci0000:00/INTL9C60:01/power/control
sudo sleepgraph -m freeze
*** system hangs ****

I'm not sure if the issue is with allowing runtime suspend to be disabled at all or with the code, but in either case it shouldn't be possible to hang the system with an echo.
Comment 1 Zhang Rui 2018-09-19 06:54:02 UTC
what kernel version you're using?

There is a known issue for the DMA controller, and there are quite some proposed fixes recently.
The problem is that if we don't force power on the DMA controller, then for freeze state, we must force shut it down during suspend and bring it up during resume, or else the system hangs upon freeze resume.
Comment 2 Todd Brandt 2018-09-20 04:59:14 UTC
This test was on 4.19.0-rc2, but it also happens on 4.19.0-rc3. I've verified this error occurs on kernels 4.17.0-rc7 through 4.19.0-rc3. 

However on kernels 4.17.0-rc6 and earlier the freeze hangs regardless of what device settings I set (which is apparently a different issue). So something was corrected in 4.17.0-rc7 which allowed this other issue to be detected.

Could the answer simply be to disable user control over runtime suspend for the DMA Controller? That would actually solve it (at least in the interim).
Comment 3 Zhang Rui 2018-12-27 15:36:10 UTC
well, I don't think we can simply disable user control over runtime suspend.

Let's see if Andy has any thoughts.
Comment 4 Zhang Rui 2019-03-25 07:50:51 UTC
I think the issue should have been fixed in the latest upstream kernel.
Thus I will close this bug.
Anyway, please feel free to reopen it if you can reproduce it with latest upstream kernel.
Comment 5 Andy Shevchenko 2019-03-25 09:37:27 UTC
DMA controller there has an auto gating mechanisms for power, when one disables RPM on it, it means that system doesn't follow the actual state of the controller and fails to recognize that device is actually being powered off.
There is more details about power auto gating mechanism in the comments in drivers/acpi/acpi_lpss.c.
Comment 6 Andy Shevchenko 2019-03-25 09:51:05 UTC
JFYI: I guess this is the commit that "fixes something"

c62ec4610c40 PM / core: Fix direct_complete handling for devices with no callbacks
Comment 7 Todd Brandt 2019-05-09 12:59:38 UTC
I just tried in 5.1.0-rc7 and it appears to work fine, so the issue appears to be fixed, thanks!