"]# uname -r 4.16.17 [root@master2 ken]# cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynOff:0000:01:00.0 2:DIS-Audio: :Off:0000:01:00.1 ]# uname -r 4.17.0 [root@master2 ken]# cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynPwr:0000:01:00.0 2:DIS-Audio: :DynPwr:0000:01:00.1 ]# uname -r 5.3.7 ]# cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynPwr:0000:01:00.0 2:DIS-Audio: :DynPwr:0000:01:00.1 ]# inxi -G Graphics: Device-1: Intel Mobile 4 Series Integrated Graphics driver: i915 v: kernel Device-2: AMD RV710/M92 [Mobility Radeon HD 4330/4350/4550] driver: radeon v: kernel Display: server: X.Org 1.20.5 driver: ati,intel,modesetting,v4l unloaded: radeon resolution: 1366x768~60Hz OpenGL: renderer: Mesa DRI Mobile Intel GM45 Express v: 2.1 Mesa 19.2.1" The problem is present with all kernels, official or distribution from 4.17.0 on, so that is where the regression appears to have occurred. The laptop runs hot and the battery drains quickly on newer kernel series as a result. I'm not an expert on kernel code - compiling is about my limit - but I'll help in any way I can. I hope I have the right component - Power Management describes the function, but not necessarily the source of the problem.
Can you bisect?
Please attach your dmesg output in both the working and non-working cases.
Created attachment 288637 [details] dmesg for 4.16.17 I no longer had the kernels installed so I have used the nearest distribution kernels I could find in the interest of simplicity and speed, as that doesn't seem to make a difference and compiling kernels takes a while on this machine. I hope that's OK. If not it might take a while to get round to compiling the kernels again.
Created attachment 288639 [details] dmesg for 4.17.2
(In reply to Alex Deucher from comment #1) > Can you bisect? Probably not. I can just about compile a kernel, but finding my way around anything other than a standard source download is likely to be beyond me.
Comment on attachment 288639 [details] dmesg for 4.17.2 Scrub this. It has radeon.runpm=0 on it's command line. I'll upload a corrected version.
Created attachment 288641 [details] Corrected dmesg for 4.17.2 This one doesn't have radeon.runpm=0 in its command line!
Starting with v4.17, the power management of HDA controllers on discrete GPUs was changed such that the HDA keeps the GPU awake as long as it's in use: https://lists.freedesktop.org/archives/nouveau/2018-February/029851.html This exposed an issue with some ATI cards which was fixed in June 2018: https://git.kernel.org/linus/57cb54e53bdd So if you still experience GPU insomnia with v5.3 (which contains that fix), then it's a different problem. In one case, a user reported GPU insomnia with an Nvidia card and it turned out that it was caused by a user space tool called "tlp" which disabled runtime power management of the HDA via sysfs. Naturally, this caused the GPU to stay awake. The solution in this case was to change the configuration of "tlp". But it was also possible to manually override disablement of runtime PM on the HDA by echoing "on" to the "power/control" file in the HDA PCI device's sysfs directory: https://bugs.freedesktop.org/show_bug.cgi?id=75985#c116 So you first may want to check whether runtime PM is disabled in sysfs, try to manually enable it and see if the GPU runtime suspends, and if that works, find out which user space tool disabled runtime PM on the HDA. It's also possible that you've got a user space tool running which has opened the HDA and thereby keeps the GPU awake. Some audio mixers do that. If none of that fixes the problem, then we may indeed be dealing with a kernel bug. The other bugs related to runtime PM of the HDA contain all the steps and several debug patches to understand what's keeping the HDA awake, so we need you to follow those instructions and report the results back. Here are the relevant bugzillas: https://bugs.freedesktop.org/show_bug.cgi?id=106597#c4 https://bugs.freedesktop.org/show_bug.cgi?id=106957#c1 One oddity I notice in your dmesg output is that there's only a single HDA controller detected in your machine and that's the one on the discrete GPU. Normally there are two HDAs, one is part of the Intel chipset and is responsible for headphones, loudspeakers, mic and so on, and the other one is on the discrete GPU and is only responsible for HDMI audio. On your machine, there's no Intel chipset HDA and the one on the discrete GPU has a Line Out for loudspeakers, headphone out, digital out and two microphone inputs. So that's a little odd and may contribute to this issue.
Well, it'll take time to patch and recompile the kernel, but in the meantime here is all the contents of the power directories: AMD Radeon GPU cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status active cat /sys/bus/pci/devices/0000\:01\:00.0/power/control auto cat /sys/bus/pci/devices/0000\:01\:00.0/power/autosuspend_delay_ms 5000 cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_active_time 1744516 cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_suspended_time 184197 AMD HDA cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_status active cat /sys/bus/pci/devices/0000\:01\:00.1/power/control auto cat /sys/bus/pci/devices/0000\:01\:00.1/power/autosuspend_delay_ms cat: '/sys/bus/pci/devices/0000:01:00.1/power/autosuspend_delay_ms': Input/output error cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_active_time 2415454 cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_suspended_time 214644 I double-checked that IO error, in case it was just a fluke, but it's consistent.
Looks like I need to recompile with CONFIG_PM_ADVANCED_DEBUG, I suspect.
I decided to compile 5.6.6 with CONFIG_PM_ADVANCED_DEBUG. To my amazement, this gave me: # uname -r 5.6.6 # cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynOff:0000:01:00.0 2:DIS-Audio: :DynOff:0000:01:00.1 (!!!!!!) (GPU) # cat /sys/bus/pci/devices/0000\:01\:00.0/power/control auto # cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status suspended cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_usage 0 # cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_active_kids 0 (HDA) # cat /sys/bus/pci/devices/0000\:01\:00.1/power/control auto # cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_status suspended # cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_usage 0 # cat /sys/bus/pci/devices/0000\:01\:00.1/power/runtime_active_kids 0 So, either enabling CONFIG_PM_ADVANCED_DEBUG affects this or it's fixed in the latest kernels. As a quick check I installed the distribution's latest kernel: # uname -r 5.5.19-pclos1 [root@master2 ken]# cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynOff:0000:01:00.0 2:DIS-Audio: :DynOff:0000:01:00.1 So it looks as if it was fixed in the 5.5 kernels. However, both the 5.5 (distro) and 5.6 (mainline) kernels emit a terrible clattering sound during services start up. I'm unsure whether that's coming from the speakers, HDD or optical drive. If the former, it's just a nuisance, if one of the latter it's not good news. I hope that's not related to the fix!
Reopening as the problem has returned with 5.6.11, although the rattling sound from the speaker (bug 207437) has now gone!