Created attachment 288137 [details] dmesg of boot. After a troubleshooting session with the folks over at nouveau (thanks Karol&Ilia), we concluded that the snd_hda_intel driver prevents my laptop from entering deeper sleep states than pc3. Unbinding snd_hda_intel from the GPUs HDA subdevice resolves this issue. From the top: this Asus K501UB laptop contains an NVIDIA GeForce 940M alongside Intel IGP. This particular laptop has no display outputs connected to this NVIDIA GPU, it's only useful for Optimus render-offloading. However, the GPU does expose a HDA subdevice for... reasons. Upon boot, my dmesg contains the following lines: [ 542.704216] snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002) [ 542.704282] snd_hda_intel 0000:01:00.1: Disabling MSI [ 542.704288] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client [ 542.708085] snd_hda_intel 0000:01:00.1: no codecs found! Of course, not having a codec is an exceptional situation, but on itself no problem as the HDA device doesn't actually have a physical audio input/output port I can use. Unfortunately, because snd_hda_intel binds to this device but fails subsequent initialisation, it is left in a limbo where neither snd_hda_intel nor the pci subsystem claims to control RunPM. As a consequence, the audio device is left in D0 despite runpm set to auto. This keeps the GPU itself from entering D3Hot, which keeps the PCIe bridge on, which keeps the CPU package in state pc3. Manually unbinding snd_hda_intel from this particular PCI(e) device using the SysFS interface instantly solves the problem. The PCI subsystem takes charge and kicks the sound device into D3hot, which sets the entire chain of runpm in motion. Suddenly, powertop reports that my CPU is in the lower-power pc8 state 80% of the time. How to reproduce: - Make sure powertop is enabled in SystemD (or a similar laptop power management service) such that all runpm options are enabled on boot - Boot Linux 5.5.11 Expected: CPU to spend 50%+ in pc8. Observed result: CPU spends all its time in pc3 and above. Workaround: Use the SysFS interface to unbind snd_hda_intel from the HDA audio device associated with the headless GeForce 940M GPU. In my case: # cd /sys/module/snd_hda_intel/drivers/pci\:snd_hda_intel/ # echo 0000:01:00.1 > unbind I attached a dmesg from the current boot. I presume the gap in timestamps is the result of me not entering my disk encryption keys for a few minutes and thus not a problem on itself. I truncated the logs to remove WIFI authentication info, but I did not remove anything before the last line in the text file.
Unfortunately at the time of codec probe, we can't return an error from the driver probe itself because it's already done in the delayed probe in a workqueue. For the easiness, we should create a probe blacklist to return -ENODEV at the beginning of the probe. But it shouldn't harm also even without codecs. Could you try the following? --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2333,6 +2333,10 @@ static int azx_probe_continue(struct azx *chip) display_power(chip, false); if (err < 0) hda->init_failed = 1; + if (hda->vga_switcheroo_registered) { + vga_switcheroo_unregister_client(pci); + hda->vga_switcheroo_registered = false; + } complete_all(&hda->probe_wait); to_hda_bus(bus)->bus_probing = 0; return err; Also, please give alsa-info.sh output. Run it with --no-upload option and attach to Bugzilla. Thanks.
Created attachment 288141 [details] alsa-info.txt
Thank you for your reply Takashi. I just built and ran a kernel with the proposed change. Unfortunately, it doesn't solve the issue of the HDA bus not going into D3. It just removes the HDA device from switcheroo, but I don't think it was in the way there. From my limited understanding: azx_probe_continue probes codecs in line 2295. If it fails (which it does, the azx_probe_codecs() returns -ENXIO), the prober will jump straight to the error path without running setup_vga_switcheroo_runtime_pm() or any runpm code after that needed at init time. Perhaps the absence of codecs should not be treated as an error from the hda bus point of view. I could alter line 2296 to ignore the ENXIO error rather than fail, and then presumably make another change around line 2311-2315 to make sure azx_codec_configure()'s -ENODEV is ignored too. That way initialisation of the HDA device would succeed and hopefully make it respond to runpm events properly. But I'm not 100% sure what the consequences could be further down the line. Is this a good approach towards fixing this issue? Or do you think snd_hda_intel should find a way to unbind from this HDA device instead?
OK, then it's really tricky and fragile to make the system runtime-aware. As you figured out, the driver goes straight to the error exit, but it keeps staying because the probe already returned success. The later heavy task is done in a workqueue, and this error happens at this stage, hence the driver can't abort probing any longer. And unfortunately there is no way to kill the driver itself. In theory it's possible to enable the runtime PM and all other stuff, but it's a half-baked state, hence it's error-prone. Moreover, this is just a waste of resources as is. So, I guess the simplest and best way would be: - Blacklist the known broken device; it returns -ENODEV immediately at probe - Show the proper error message to suggest user unbind / blacklist the driver
On the second thought, it's not entirely simple at all to detect such a device, judging from alsa-info.sh output now... Hmm. Then we might need to go to a way to jungle.
OK, below is another attempt. Let me know if this works better, then I'll brush it up as a proper patchset.
Created attachment 288151 [details] test fix patch
Created attachment 288167 [details] Continue probing when no codec is found The bad news: unfortunately your patch did not do the trick. The good news: I crafted and attached a simpler patch that does. It simply stops treating absence of codecs as an error during probe, so that all the switcheroo and runpm initialisation methods at the end of azx_probe_continue() are successfully run. I don't observe any adversary effects. I am able to use the GPU for work-loads, showing it succesfully wakes up from D3hot as well as goes back into it. The bad news: after a suspend-to-RAM cycle it stops working, and my GPU+HDA controller remains in D0 (or in VGASwitcheroo terms: DynPwr).
Great, that's promising. I'd skip rather the calls of azx_probe_codecs() and azx_codec_configure() if bus->codec_mask=0 instead of overwriting the probe_mask option, but it's just a detail. About the issue after S3: maybe this can be addressed by using the generic pm_runtime_force_*() call. A revised test patch is below (totally untested).
Created attachment 288245 [details] A revised test patch
And for more complete fixes and cleanups around PM, I created a patch set, found in topic/hda-pm-fixes branch of sound git tree. git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git Hopefully this gives something better...
Thank you for all your work so far. Patch 288245 behaves the same as the one I posted (288167), so the hda device stays in D3cold as long as I don't suspend-to-RAM my device. After a suspend-to-RAM, the HDA devices stays in D0, vgaswitcheroo reports DynPwr for the hda device. This keeps the rest of the chain awake too. It's better than upstream, but not perfect yet. One thing I noticed is that upon boot, powertop.service makes sure the power/control settings for all devices are set to auto. After a suspend-to-RAM, these controls revert to off for quite a few devices including the HDA audio bus. Shouldn't these settings be restored after resume? Setting them to auto in powertop does not change the problem unfortunately, the HDA audio device stays in DynPwr (D0) after a suspend-to-RAM cycle, regardless of the power/control value. Is it worth trying your full git tree?
Pardon, when I said D3cold in the second sentence, I meant D3hot of course!
Yes, testing the git repo would be helpful in any case, as it contains a few more other fixes. It's interesting actually who changes the power state. The patch should make the device resumed via pm_runtime_force_resume() and this should skip the actual resume if the device has been runtime-suspended. You can check the runtime PM state of the device after the system resume. If it's runtime-suspended but has a D0, something else is wrong. Or if the runtime suspend after the resume doesn't work, we may need to track down that cause.
Ok, I applied the top 5 patches from that tree onto the 5.6.2 kernel and rebuilt, but to no avail. Runpm works until I do a suspend-to-RAM cycle, after which both the GPU and the audio device remain in D0/DynPwr - even after setting all the power/control bits to auto using powertop. At this point I can confirm that force-unbinding snd_hda_intel using the commands in comment #1 "solves" the problem in that the GPU will go into D3hot/DynOff.
Oh, and for what it's worth: rebinding the snd_hda_intel driver to the HDA device after that does bring the device to (or keeps the device in) D3hot.
Could you check the following points? - Check whether force_resume flag is false in azx_resume() for this device - Check the runtime PM states via sysfs before and after S3; that is, if the device shows it's being runtime-suspended but it blocks D3hot, this comes from something else than the sound driver itself. Also, one possible test is to combine vga_switcheroo unregister, e.g. apply the following on top of the five patches: --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2327,6 +2327,11 @@ static int azx_probe_continue(struct azx *chip) goto out_free; } + if (!bus->codec_mask && hda->vga_switcheroo_registered) { + vga_switcheroo_unregister_client(chip->pci); + hda->vga_switcheroo_registered = false; + } + err = snd_card_register(chip->card); if (err < 0) goto out_free;
Created attachment 288397 [details] ALSA: hda: Explicitly permit using autosuspend if runtime PM is supported This patch, applied on top of your hda-pm-fixes, solves the problem of the HDA device on my GPU not falling asleep after a suspend-to-RAM cycle. From what I understand, the function call merely informs the runtime PM framework that autosuspend is supported. It is still up to the user to enable or disable it through the power/control handle. Hence I don't see a reason why this shouldn't just be called unconditionally. Aside: I am under the impression that codec_set_power_save() is trying to be way more clever than it needs to be. Calling set_autosuspend_delay() with a negative delay should already have the implicit effect of calling pm_runtime_forbid(), and it will be "automatically" undone when it's being called with a positive delay (or 0). The use_autosuspend flag doesn't have to be updated when the driver requests setting the delay to negative (e.g. disable), and doing to has the nasty side-effects that 1) the driver-controlled settings are lost after a suspend-to-RAM, and 2) the user ("root") or power saving daemon cannot override the delay or the controller (auto or on) if desired.
(In reply to Roy from comment #18) > Created attachment 288397 [details] > ALSA: hda: Explicitly permit using autosuspend if runtime PM is supported > > This patch, applied on top of your hda-pm-fixes, solves the problem of the > HDA device on my GPU not falling asleep after a suspend-to-RAM cycle. From > what I understand, the function call merely informs the runtime PM framework > that autosuspend is supported. It is still up to the user to enable or > disable it through the power/control handle. Hence I don't see a reason why > this shouldn't just be called unconditionally. Good to hear. Now I refreshed the branch with this patch. I think we can submit and merge this for 5.7 cycle, as most of them are trivial and only relevant with the no-codec case that was tested here. > Aside: I am under the impression that codec_set_power_save() is trying to be > way more clever than it needs to be. Calling set_autosuspend_delay() with a > negative delay should already have the implicit effect of calling > pm_runtime_forbid(), and it will be "automatically" undone when it's being > called with a positive delay (or 0). The use_autosuspend flag doesn't have > to be updated when the driver requests setting the delay to negative (e.g. > disable), and doing to has the nasty side-effects that 1) the > driver-controlled settings are lost after a suspend-to-RAM, and 2) the user > ("root") or power saving daemon cannot override the delay or the controller > (auto or on) if desired. I'm willing to take any improvements, so patches are welcome. But beware that we're dealing mostly the case delay=0 there, which indicates stopping the runtime PM, and a negative value is almost never passed. The value is passed at the init and at changing the power_save module option, which is a major controlling knob that is used by many applications including the system setup. So it needs a bit careful handling.
(In reply to Takashi Iwai from comment #19) > Good to hear. Now I refreshed the branch with this patch. > I think we can submit and merge this for 5.7 cycle, as most of them are > trivial and only relevant with the no-codec case that was tested here. Thank you very much for your help! I'm glad we could get this resolved so quickly. > > > Aside: I am under the impression that codec_set_power_save() is trying to > be > > way more clever than it needs to be. Calling set_autosuspend_delay() with a > > negative delay should already have the implicit effect of calling > > pm_runtime_forbid(), and it will be "automatically" undone when it's being > > called with a positive delay (or 0). The use_autosuspend flag doesn't have > > to be updated when the driver requests setting the delay to negative (e.g. > > disable), and doing to has the nasty side-effects that 1) the > > driver-controlled settings are lost after a suspend-to-RAM, and 2) the user > > ("root") or power saving daemon cannot override the delay or the controller > > (auto or on) if desired. > I'm willing to take any improvements, so patches are welcome. > But beware that we're dealing mostly the case delay=0 there, which indicates > stopping the runtime PM, and a negative value is almost never passed. The > value is passed at the init and at changing the power_save module option, > which is a major controlling knob that is used by many applications > including the system setup. So it needs a bit careful handling. I will think about it, but before I make any proposals I should study the prioritisation of controls very carefully. It looks as if the power_save module parameter has functionality that overlaps with the sysfs power/[control|autosuspend_delay_ms] nodes, but the state of the two mechanisms might not always be synced/compatible. This could lead to unexpected behaviour in the untested code-paths. I completely understand your caution, it is easy to break the users' assumptions of the power_save knob by introducing changes.
As the patches have landed to the upstream, let's close this bug once. The rest details can be discussed on ML or another bug, if any. Thanks!