Bug 208649

Summary:	Bisected - System wakes up asap after standby
Product:	Drivers	Reporter:	Martin (mwolf)
Component:	Sound(ALSA)	Assignee:	Jaroslav Kysela (perex)
Status:	RESOLVED CODE_FIX
Severity:	normal	CC:	kai.heng.feng, tiwai
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	5.7.x	Subsystem:
Regression:	No	Bisected commit-id:
Attachments:	Alsa-Info Test fix patch Debug patch full dmesg __azx_runtime_suspend(chip); debug power-save debug power-save-controller-debug power-save-controller-corrected-debug snd_hda_intel.enable=1,0 snd_hda_intel.enable=0,1 Test fix patch #2 Another try, minimalistic changes Revised patch

Description Martin 2020-07-21 16:35:48 UTC

After installing Kernel 5.7.x my system asap wakes up from standby. This was not the case till 5.7.0-rc1

I bisected the Kernel and found out that this commit seems to be the problem:

c4c8dd6ef807663e42a5f04ea77cd62029eb99fa is the first bad commit
commit c4c8dd6ef807663e42a5f04ea77cd62029eb99fa
Author: Takashi Iwai <tiwai@suse.de>
Date:   Mon Apr 13 10:20:33 2020 +0200

    ALSA: hda: Skip controller resume if not needed
    
    The HD-audio controller does system-suspend and resume operations by
    directly calling its helpers __azx_runtime_suspend() and
    __azx_runtime_resume().  However, in general, we don't have to resume
    always the device fully at the system resume; typically, if a device
    has been runtime-suspended, we can leave it to runtime resume.
    
    Usually for achieving this, the driver would call
    pm_runtime_force_suspend() and pm_runtime_force_resume() pairs in the
    system suspend and resume ops.  Unfortunately, this doesn't work for
    the resume path in our case.  For handling the jack detection at the
    system resume, a child codec device may need the (literally) forcibly
    resume even if it's been runtime-suspended, and for that, the
    controller device must be also resumed even if it's been suspended.
    
    This patch is an attempt to improve the situation.  It replaces the
    direct __azx_runtime_suspend()/_resume() calls with with
    pm_runtime_force_suspend() and pm_runtime_force_resume() with a slight
    trick as we've done for the codec side.  More exactly:
    
    - azx_has_pm_runtime() check is dropped from azx_runtime_suspend() and
      azx_runtime_resume(), so that it can be properly executed from the
      system-suspend/resume path
    
    - The WAKEEN handling depends on the card's power state now; it's set
      and cleared only for the runtime-suspend
    
    - azx_resume() checks whether any codec may need the forcible resume
      beforehand.  If the forcible resume is required, it does temporary
      PM refcount up/down for actually triggering the runtime resume.
    
    - A new helper function, hda_codec_need_resume(), is introduced for
      checking whether the codec needs a forcible runtime-resume, and the
      existing code is rewritten with that.
    
    BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207043
    Link: https://lore.kernel.org/r/20200413082034.25166-6-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>

 include/sound/hda_codec.h |  5 +++++
 sound/pci/hda/hda_codec.c |  2 +-
 sound/pci/hda/hda_intel.c | 38 +++++++++++++++++++++++++++-----------
 3 files changed, 33 insertions(+), 12 deletions(-)

Comment 1 Takashi Iwai 2020-07-21 16:48:10 UTC

This should have been addressed in the later commit 8d6762af302d69f76fa788a277a56a9d9cd275d5
    ALSA: hda: Always use jackpoll helper for jack update after resume

Comment 2 Martin 2020-07-21 16:57:48 UTC

that is odd, because I still have it with Kernel 5.7.9 and Kernel 5.8.0-rc5

Comment 3 Takashi Iwai 2020-07-21 18:14:45 UTC

So it might be another issue?  Try to run bisect with applying 8d6762af302 manually when c4c8dd6e is committed.  IOW, if you apply 8d6762af302 on top of c4c8dd6e, does the problem still appear?

Comment 4 Martin 2020-07-21 18:51:56 UTC

Can you please elaborate in detail how I do that?
I don't want to make a mistake here ...

Comment 5 Takashi Iwai 2020-07-21 19:09:40 UTC

You tried bisection and it pointed to the commit c4c8dd6e.
And try to cherry-pick 8d6762af302 on top of the commit c4c8dd6e, and see whether this state still shows the problem.  If yes, it implies that the fix in commit 8d6762af302 doesn't suffice.  Or if it doesn't show the problem, the problem is something different, introduced by another commit at a later point.

Comment 6 Martin 2020-07-21 20:55:40 UTC

thank you I hope I did it right.
1. git reset --hard c4c8dd6e
2. git cherry-pick 8d6762af302

3. make localmodconfig
4. make -j10

sadly it does not compile. 

Did I do something wrong?

make[1]: *** [scripts/Makefile.build:101: arch/x86/kernel/asm-offsets.s] Fehler 1
make: *** [Makefile:1113: prepare0] Fehler 2
make: *** Es wird auf noch nicht beendete Prozesse gewartet....

In Datei, eingebunden von ./include/linux/fs.h:27,
                 von ./include/linux/cgroup.h:17,
                 von ./include/linux/memcontrol.h:13,
                 von ./include/linux/swap.h:9,
                 von ./include/linux/suspend.h:5,
                 von arch/x86/kernel/asm-offsets.c:13:
./include/linux/fiemap.h:23:3: Fehler: Unbekannter Typname »get_block_t«; meinten Sie »seqlock_t«?
   23 |   get_block_t *get_block);
      |   ^~~~~~~~~~~
      |   seqlock_t
  HDRINST usr/include/linux/atm.h
  HDRINST usr/include/linux/minix_fs.h
  HDRINST usr/include/linux/if_hippi.h
  HDRINST usr/include/linux/igmp.h
  HDRINST usr/include/linux/nbd-netlink.h
  HDRINST usr/include/linux/tls.h
In Datei, eingebunden von ./include/linux/cgroup.h:17,
                 von ./include/linux/memcontrol.h:13,
                 von ./include/linux/swap.h:9,
                 von ./include/linux/suspend.h:5,
                 von arch/x86/kernel/asm-offsets.c:13:
./include/linux/fs.h:1746:8: Fehler: Redefinition von »struct fiemap_extent_info«
 1746 | struct fiemap_extent_info {
      |        ^~~~~~~~~~~~~~~~~~
In Datei, eingebunden von ./include/linux/fs.h:27,
                 von ./include/linux/cgroup.h:17,
                 von ./include/linux/memcontrol.h:13,
                 von ./include/linux/swap.h:9,
                 von ./include/linux/suspend.h:5,
                 von arch/x86/kernel/asm-offsets.c:13:
./include/linux/fiemap.h:8:8: Anmerkung: ursprünglich hier definiert
    8 | struct fiemap_extent_info {
      |        ^~~~~~~~~~~~~~~~~~
In Datei, eingebunden von ./include/linux/cgroup.h:17,
                 von ./include/linux/memcontrol.h:13,
                 von ./include/linux/swap.h:9,
                 von ./include/linux/suspend.h:5,
                 von arch/x86/kernel/asm-offsets.c:13:
./include/linux/fs.h:1753:5: Fehler: In Konflikt stehende Typen für »fiemap_fill_next_extent«
 1753 | int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
      |     ^~~~~~~~~~~~~~~~~~~~~~~
In Datei, eingebunden von ./include/linux/fs.h:27,
                 von ./include/linux/cgroup.h:17,
                 von ./include/linux/memcontrol.h:13,
                 von ./include/linux/swap.h:9,
                 von ./include/linux/suspend.h:5,
                 von arch/x86/kernel/asm-offsets.c:13:
./include/linux/fiemap.h:18:5: Anmerkung: Vorherige Deklaration von »fiemap_fill_next_extent« war hier
   18 | int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,

Comment 7 Takashi Iwai 2020-07-22 06:29:39 UTC

(In reply to Martin from comment #6)
> 1. git reset --hard c4c8dd6e

At this point you should try to build the kernel once and confirm that
it's broken.  Do you get the same build error now?

> 2. git cherry-pick 8d6762af302

After confirming the bad kernel, do cherry-pick, build and retest.

Comment 8 Martin 2020-07-22 09:20:28 UTC

Hi, I probably have broken something in the git tree while figuring out how to proceed, so I checked it out again and ran both commands.
(git reset --hard c4c8dd6e && git cherry-pick 8d6762af302)

Now I got a bootable build. 

(Just for me to double-check, was this the correct way?)

Sadly my system wakes up with that kernel.

What do I do now?

Comment 9 Martin 2020-07-24 10:01:50 UTC

@Takashi
I have some free time over the weekend, is there something I can do?
This Problem is really annoying for me.

Comment 10 Takashi Iwai 2020-07-24 10:42:45 UTC

Hm, then please double-check whether the commit 9479e75fca does *not* show the problem.

% git reset --hard 9479e75fca
make, install and test

Comment 11 Martin 2020-07-24 11:13:12 UTC

I can confirm, that a kernel built from what you suggested, does NOT show the problem and stays therefore in standby.

Comment 12 Takashi Iwai 2020-07-24 11:39:02 UTC

OK, then it's likely the change to call pm_runtime_force_suspend() in azx_suspend().  Try the change below:

--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1026,7 +1026,7 @@ static int azx_suspend(struct device *dev)
 	chip = card->private_data;
 	bus = azx_bus(chip);
 	snd_power_change_state(card, SNDRV_CTL_POWER_D3hot);
-	pm_runtime_force_suspend(dev);
+	__azx_runtime_suspend(chip);
 	if (bus->irq >= 0) {
 		free_irq(bus->irq, chip);
 		bus->irq = -1;

This may possibly break the resume side, but at least we can know whether that's the cause.

BTW, is it about S3 (suspend/resume), or really stand-by mode?

Last but not least: this must be very specific to the platform.  Please give alsa-info.sh output to Bugzilla.  Run the script with --no-upload option and attach to Bugzilla.

Comment 13 Martin 2020-07-24 11:51:55 UTC

Created attachment 290495 [details]
Alsa-Info

Comment 14 Martin 2020-07-24 11:52:48 UTC

Yeah, it is about S3.
I am patching now and will try again.

Comment 15 Martin 2020-07-24 12:01:25 UTC

patch -p0 < ../patch.patch 
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|
|--- a/sound/pci/hda/hda_intel.c
|+++ b/sound/pci/hda/hda_intel.c
--------------------------


"azx_runtime_suspend(chip);" is already there
did I do something wrong?

Comment 16 Takashi Iwai 2020-07-24 12:16:52 UTC

Apply the patch with -p1.  But it'd be faster for you to edit the code directly, it's just oneliner.

Oh, maybe it wasn't clear: try to modify the code that is broken (i.e. commit c4c8dd6e and cherry-pick 8d6762af3), then replace the call of
  pm_runtime_force_suspend(dev);
in azx_suspend() with
  __azx_runtime_suspend(chip);

Comment 17 Martin 2020-07-24 12:45:55 UTC

right, I mixed it up.

c4c8dd6e + 8d6762af3 + "__azx_runtime_suspend(chip);"
keeps my System in Standby.

Comment 18 Takashi Iwai 2020-07-24 12:51:03 UTC

So it's a positive effect?

The next test to try is the patch below instead of the previous oneliner.
Scratch the previous change and apply it on the top of c4c8dd6e + 8d6762af3.

Comment 19 Takashi Iwai 2020-07-24 12:51:24 UTC

Created attachment 290497 [details]
Test fix patch

Comment 20 Martin 2020-07-24 13:20:15 UTC

With your new patch ontop of c4c8dd6e + 8d6762af3 my system sadly wakes up automatically.

Comment 21 Takashi Iwai 2020-07-24 15:33:21 UTC

Hmm, strange.  Could you try to remove or comment out the two lines in azx_runtime_suspend()?

	if (snd_power_get_state(card) == SNDRV_CTL_POWER_D0)
		enable_wakeen(chip);

The spurious wake up should be driven from this WAKEEN bit.  But this must have been cleared in azx_suspend().

If removing enable_wakeen() works, it implies that disable_wakeen() call in azx_suspend() had no effect.

Comment 22 Martin 2020-07-24 15:53:23 UTC

if I comment these two out, the module does not build.

Comment 23 Takashi Iwai 2020-07-24 16:19:03 UTC

It can't be.  Check it again.  You might have deleted too much or too little.

Comment 24 Martin 2020-07-24 17:05:55 UTC

you were right, I used # instead of /* */ sorry for that.
But sadly it still wakes up.

Comment 25 Takashi Iwai 2020-07-24 17:20:07 UTC

OK, then apply the patch below instead of the previous one.
The enable_wakeen() is already commented out there.
This patch contains the debug prints during runtime and system-suspend/resume.
So, boot with "no_console_suspend" boot option and get the dmesg output after suspend/resume, and attach to Bugzilla.

Comment 26 Takashi Iwai 2020-07-24 17:22:14 UTC

Created attachment 290541 [details]
Debug patch

Comment 27 Martin 2020-07-24 17:49:03 UTC

Created attachment 290543 [details]
full dmesg

Comment 28 Martin 2020-07-24 17:59:41 UTC

just curious, is no_console_suspend or no_console_suspend=1 the correct way to set it?

Comment 29 Takashi Iwai 2020-07-24 19:35:07 UTC

Thanks.  It's no_console_suspend, without anything else.

There is nothing obviously wrong in the call pattern.  The only slight concern is the azx_runtime_resume() call just before azx_runtime_suspend() call at the system suspend time.

Could you replace again the call of pm_runtime_force_suspend(dev) with __azx_runtime_suspend(chip) on this already patched kernel, retest (and confirm it working), and give the dmesg output?

Comment 30 Martin 2020-07-24 21:42:42 UTC

ok, I did the following:
        snd_power_change_state(card, SNDRV_CTL_POWER_D3hot);
/*      pm_runtime_force_suspend(dev); */
        __azx_runtime_suspend(chip);

now I will install the kernel and reboot reboot.

Comment 31 Martin 2020-07-24 21:52:32 UTC

Created attachment 290547 [details]
__azx_runtime_suspend(chip); debug

It stayed in standby, but when I woke it, it took ages to wake up.

Comment 32 Takashi Iwai 2020-07-26 09:28:02 UTC

Thanks.  The result indicated that the latter case didn't contain the runtime resume before suspend.

To make it clearer, could you do a couple of more test?  Namely:
- Go back to c4c8dd6e + 8d6762af3 + debug patch in comment 26.
  Boot with no_console_suspend option.
  Then, turn off the runtime suspend (as root):
    echo 0 > /sys/module/snd_hda_intel/parameters/power_save

  Test suspend/resume, save the dmesg output.

- If the result is positive (no spurious wakeup), try
    echo 1 > /sys/module/snd_hda_intel/parameters/power_save

  Then test suspend/resume and confirm that the spurious wakeup returns.
  Save the dmesg output again.

- Now, try another in addition to the previous state:
    echo 0 > /sys/module/snd_hda_intel/parameters/power_save_controller

  Then test again suspend/resume, and again save the dmesg output.

Comment 33 Martin 2020-07-26 13:52:55 UTC

Created attachment 290571 [details]
power-save debug

it woke up by itself when I set
echo 0 > /sys/module/snd_hda_intel/parameters/power_save

Comment 34 Martin 2020-07-26 13:58:41 UTC

Created attachment 290573 [details]
power-save-controller-debug

fresh boot 
(just to be sure)
set echo 1 > /sys/module/snd_hda_intel/parameters/power_save
standby 
woke up on itself
set echo 0 > /sys/module/snd_hda_intel/parameters/power_save_controller
standby
stayed asleep

Comment 35 Martin 2020-07-26 14:02:32 UTC

Created attachment 290575 [details]
power-save-controller-corrected-debug

sorry I made an error.
ignore my last post.
I forgot for a moment, that my pc only wakes once on itself. the following standby cycles are normal.

this time I only set
echo 0 > /sys/module/snd_hda_intel/parameters/power_save_controller

and system woke again automatically

Comment 36 Takashi Iwai 2020-07-26 17:24:34 UTC

Thanks.  So the all tests with those options don't seem influencing on the spurious wakeup behavior.  Interesting.  The problem is much deeper (and more mysterious) than I expected, unfortunately.

Your system has two different HD-audio controllers (the onboard audio on PCH and HDMI audio from AMDGPU), and we need to identify which causes the problem.  Judging from the pattern, my guess is the latter one.

For testing it, please check the following:
- The same kernel as the previous test, pass snd_hda_intel.enable=1,0 boot option together with no_console_suspend, and test suspend/resume.
  This should disable HDMI audio while keeping the onboard audio

- Similarly, test with snd_hda_intel.enable=0,1 option.  This will disable the onboard audio instead.

Comment 37 Martin 2020-07-26 17:59:54 UTC

Created attachment 290579 [details]
snd_hda_intel.enable=1,0

dmesg with snd_hda_intel.enable=1,0

Comment 38 Martin 2020-07-26 18:01:46 UTC

and I forgot to mention, that it woke up automatically.

Comment 39 Martin 2020-07-26 18:11:13 UTC

Created attachment 290581 [details]
snd_hda_intel.enable=0,1

with snd_hda_intel.enable=0,1 the system stays in standby.

Also about my Mainboard ... it is some horrible piece of hardware.
I had to disable all wakeup triggers except ps/2 because networking, usb2 and usb3 ports cause the exact same problem as the hd-audio causes now. 
"a one time wakeup but on linux AND windows"

I really appreciate your help!

Comment 40 Takashi Iwai 2020-07-26 19:39:35 UTC

Thanks!  Then I must have looked at the wrong direction.  The trace difference might be rather a red herring.

Now, scratch the previous debug patch, but apply the below instead, and retest.
You don't need to take dmesg output at this time because it has no debug prints added.

Comment 41 Takashi Iwai 2020-07-26 19:41:42 UTC

Created attachment 290585 [details]
Test fix patch #2

Comment 42 Martin 2020-07-26 21:16:24 UTC

c4c8dd6e + 8d6762af3 + attachment 290585 [details]
does not fix it. My PC still wakes up.

Comment 43 Takashi Iwai 2020-07-27 13:15:22 UTC

Too bad.  It's really weird that a spurious wakeup is triggered...

Does the simple replacement below work instead?

Comment 44 Takashi Iwai 2020-07-27 13:15:49 UTC

Created attachment 290609 [details]
Another try, minimalistic changes

Comment 45 Martin 2020-07-27 14:35:27 UTC

this one does the trick. My PC stays in standby.

Comment 46 Takashi Iwai 2020-07-27 15:33:23 UTC

OK, then let's take this way for now, although I'm not confident what actually caused this spurious wakeup.

Below is the slightly revised patch for the formal submission.  Please confirm this one.  Thanks.

Comment 47 Takashi Iwai 2020-07-27 15:33:48 UTC

Created attachment 290621 [details]
Revised patch

Comment 48 Martin 2020-07-27 15:54:58 UTC

Yes, this patch also works fine. The System stays asleep.
Thank you very much!
I really appreciate what you did ;)

Since you work for Suse, if you are ever in Nuernberg, I will buy you a beer ;)

Comment 49 Takashi Iwai 2020-07-27 16:48:46 UTC

Good to hear, I submitted the fix and will merge it soon.
I'm not sure whether I'll send a pull request to Linus for 5.8-final (or rc8), but it'll be certainly in 5.9-rc1.

I'm in Nuremberg, but it's no good time to go out now.  Let's postpone after Corona crisis :)

Now closing as CODE_FIX.  Thank you for your patient testing!

Comment 50 Martin 2020-07-27 16:56:36 UTC

sure ;)
I patched my latest fedora kernel with it. building right now. that should work as well right?

Comment 51 Kai-Heng Feng 2020-10-15 13:15:26 UTC

Martin,

I am currently working on refactoring HDA PM, can you please test the following branch and see if it can keep your system stay in S3?
https://github.com/khfeng/linux/tree/hda-pm-refactor

Comment 52 Martin 2020-10-15 13:16:50 UTC

Hello,
I am quite busy at the moment. I will compile the kernel at the weekend if that is fine?

Comment 53 Kai-Heng Feng 2020-10-15 13:31:40 UTC

Of course, I'll wait for your testing before sending it to upstream.

Comment 54 Martin 2020-10-15 16:49:35 UTC

I had a few minutes so I thought I at least can compile the kernel.
Sadly I was unable to build it.
I ran:
make localmodconfig (accepted the default for all new stuff)
make -j12

pager.c: In Funktion »pager_preexec«:
pager.c:36:12: Fehler: Argument 2 wird an den mit »restrict« qualifizierten Parameter übergeben, dadurch wird es zum Alias für Argument 4 [-Werror=restrict]
   36 |  select(1, &in, NULL, &in, NULL);
      |            ^~~        ~~~
cc1: Alle Warnungen werden als Fehler behandelt
mv: der Aufruf von stat für '/home/martin/linux/tools/objtool/.pager.o.tmp' ist nicht möglich: Datei oder Verzeichnis nicht gefunden

Comment 55 Kai-Heng Feng 2020-10-15 18:51:15 UTC

Weird, I am not seeing this issue on my system.

I rebased the patches on top of 5.8.15, please see if it works for you:
https://github.com/khfeng/linux/tree/hda-pm-refactor-5.8.15

Comment 56 Martin 2020-10-15 21:03:37 UTC

this time it worked.

First I can say, that it stays asleep (tested 3 times), but on the first try my system locked up after wakeup. It didnt happen on the two other tries.

Comment 57 Kai-Heng Feng 2020-10-16 03:50:50 UTC

Can you please attach the log from the locked up session?
Do you see the same issue on stable kernel 5.8.15?

Comment 58 Martin 2020-10-16 05:07:19 UTC

I am sorry, but I do not see anything related to that in the logs. I have tested now 10 times and it did not occur again. 

I will test over the next few days and if it occurs again, I will prepare something for you.

Comment 59 Martin 2020-10-20 11:20:03 UTC

I rebooted my pc since friday at least 15 times and sent it to standby 40 times.
It did not occur again. So maybe it was something else?

Comment 60 Kai-Heng Feng 2020-10-27 05:43:07 UTC

Martin,

Can you please test my v2 series:
https://patchwork.kernel.org/project/alsa-devel/list/?series=371143

Comment 61 Martin 2020-10-27 17:49:21 UTC

I will do in the next few days.
Will this Patch work with Kernel 5.8.x?

Comment 62 Kai-Heng Feng 2020-10-28 05:50:32 UTC

Martin,

I sent v3 and you were in the cc list. The patch series can be applied to 5.8.x cleanly.

Comment 63 Martin 2020-10-30 16:30:53 UTC

Building the kernel now. I will let you know tonight or tomorrow if it is ok.

Comment 64 Martin 2020-10-31 15:50:44 UTC

@Kai-Heng Feng
it wakes up when it should, have not noticed anything out of the ordinary.
I will test till Monday and then report back.

Comment 65 Martin 2020-11-03 09:57:36 UTC

I had no issues over the weekend. I think you can send it to production.
ty for giving me the chance to test it.

Comment 66 Kai-Heng Feng 2020-11-03 11:41:24 UTC

Martin, nice to hear it doesn't break your system. These patch will be in v5.10.