Created attachment 278461 [details] A dmesg log captured prior to a test run I found this issue during our weekly stress testing. I run sleepgraph with the "-rs off" option which disables runtime suspend on all devices prior to suspend. Freeze failed every time on the HP Pavilion x360, so I debugged the issue to one device: the DMA controller (INTL9C60:01). If I disable runtime suspend on INTL9C60:01 and then issue a freeze, the system hangs. Note that this doesn't happen with mem (mem works fine after the disable). echo "on" > /sys/devices/pci0000:00/INTL9C60:01/power/control sudo sleepgraph -m freeze *** system hangs **** I'm not sure if the issue is with allowing runtime suspend to be disabled at all or with the code, but in either case it shouldn't be possible to hang the system with an echo.
what kernel version you're using? There is a known issue for the DMA controller, and there are quite some proposed fixes recently. The problem is that if we don't force power on the DMA controller, then for freeze state, we must force shut it down during suspend and bring it up during resume, or else the system hangs upon freeze resume.
This test was on 4.19.0-rc2, but it also happens on 4.19.0-rc3. I've verified this error occurs on kernels 4.17.0-rc7 through 4.19.0-rc3. However on kernels 4.17.0-rc6 and earlier the freeze hangs regardless of what device settings I set (which is apparently a different issue). So something was corrected in 4.17.0-rc7 which allowed this other issue to be detected. Could the answer simply be to disable user control over runtime suspend for the DMA Controller? That would actually solve it (at least in the interim).
well, I don't think we can simply disable user control over runtime suspend. Let's see if Andy has any thoughts.
I think the issue should have been fixed in the latest upstream kernel. Thus I will close this bug. Anyway, please feel free to reopen it if you can reproduce it with latest upstream kernel.
DMA controller there has an auto gating mechanisms for power, when one disables RPM on it, it means that system doesn't follow the actual state of the controller and fails to recognize that device is actually being powered off. There is more details about power auto gating mechanism in the comments in drivers/acpi/acpi_lpss.c.
JFYI: I guess this is the commit that "fixes something" c62ec4610c40 PM / core: Fix direct_complete handling for devices with no callbacks
I just tried in 5.1.0-rc7 and it appears to work fine, so the issue appears to be fixed, thanks!