Bug 209379 - No MSI azx_interrupt after HDA is runtime-suspended and not runtime-resumed by PME#
Summary: No MSI azx_interrupt after HDA is runtime-suspended and not runtime-resumed b...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Sound(ALSA) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jaroslav Kysela
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-24 10:50 UTC by Kai-Heng Feng
Modified: 2020-10-16 16:29 UTC (History)
3 users (show)

See Also:
Kernel Version: mainline
Subsystem:
Regression: No
Bisected commit-id:


Attachments
alsa-info (64.27 KB, text/plain)
2020-09-24 10:58 UTC, Kai-Heng Feng
Details
alsa-info (64.21 KB, text/plain)
2020-10-05 11:29 UTC, Kai-Heng Feng
Details
Candidate patch to fix the bug (1.33 KB, application/mbox)
2020-10-07 13:08 UTC, Kai Vehmanen
Details
Request codec resume (688 bytes, patch)
2020-10-07 18:24 UTC, Kai-Heng Feng
Details | Diff

Description Kai-Heng Feng 2020-09-24 10:50:26 UTC
Reproduce steps:
1. Enable runtime pm:
$ echo auto | sudo tee /sys/bus/pci/devices/0000\:00\:1f.3/power/control

2. Make sure it's runtime suspended and enters D3:
$ sudo lspci -vv -s 00:1f.3
0000:00:1f.3 Audio device: Intel Corporation Tiger Lake-LP Smart Sound Technology Audio Controller (rev 20)
        Subsystem: Dell Tiger Lake-LP Smart Sound Technology Audio Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64
        Interrupt: pin A routed to IRQ 177
        IOMMU group: 12
        Region 0: Memory at 601f288000 (64-bit, non-prefetchable) [size=16K]
        Region 4: Memory at 601f000000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
        Capabilities: [80] Vendor Specific Information: Len=14 <?>
        Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee007d8  Data: 0000
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel, snd_sof_pci

3. Runtime resume the HDA devie by any method other than plugging a jack:
$ echo on | sudo tee /sys/bus/pci/devices/0000\:00\:1f.3/power/control 
Or something like open Gnome Control Center's sound tab.

4. Plug a jack and nothing happens. azx_interrupt() doesn't get called.
Comment 1 Kai-Heng Feng 2020-09-24 10:54:06 UTC
1. If we plug the jack when HDA controller is runtime suspended, it can wake from D3 PME# and can find jack.

2. If the HDA never gets runtime suspended, plugging jack will trigger azx_interrupt() and can detect jack correctly.
Comment 2 Kai-Heng Feng 2020-09-24 10:58:09 UTC
Created attachment 292621 [details]
alsa-info
Comment 3 Takashi Iwai 2020-09-24 11:16:07 UTC
Is it specific to runtime-suspend/resume?  Do you see the same problem with the system suspend/resume, too?
Comment 4 Kai-Heng Feng 2020-09-24 11:57:17 UTC
Only runtime suspend/resume is affected. System suspend/resume doesn't have this issue.
Comment 5 Kai Vehmanen 2020-10-02 15:26:59 UTC
I just tried this scenario on Dell TGL laptop with ALC289 codec, and I can't reproduce the debug. Aside the codec, everything else seems the same. I'll continue trying on other devices (I don't have the exact device in report). We'd need to filter down a bit in which conditions this happens.
Comment 6 Kai-Heng Feng 2020-10-05 11:29:19 UTC
Created attachment 292825 [details]
alsa-info
Comment 7 Kai Vehmanen 2020-10-07 09:37:16 UTC
Interesting. I now found one TGL system where I can trigger this. The strange thing is that the bug doesn't occur with SOF, only with snd-hda-intel driver. @Kai-Heng Feng, can you try the same experiment on the system you have -- i.e. force to use SOF and see whether the bug reproduces
Comment 8 Kai Vehmanen 2020-10-07 11:32:09 UTC
Hmm, this seems to be missing code in the legacy driver.

The problem only occurs for me if the controller is active but the codec driver is in runtime suspend. In this case, SOF kicks the jack_poll workqueue to check jack status, but as far I can see, the snd-hda-intel does not do the same thing.

If both controller and codec drivers are forced to be on, then jack detection works. Similarly, if both controller and codec driver are suspended, then detection again works.

Here's the patch for SOF:
https://lore.kernel.org/r/20190722141402.7194-13-pierre-louis.bossart@linux.intel.com
Comment 9 Kai-Heng Feng 2020-10-07 11:36:57 UTC
Thanks for pointing this out!
Are you already working on a patch? Otherwise I can work on it.
Comment 10 Kai Vehmanen 2020-10-07 13:08:53 UTC
Created attachment 292887 [details]
Candidate patch to fix the bug

Here's an experimental patch. Can you give it a go? 

This does fix the problem on the system I have, but something is still not right. Basicly jackpoll_interval is zero, so all this does is to power up the codec once when controller is resumed, and then codec goes back to suspend. But with this action in controller resume, jack detection is fixed.
Comment 11 Kai-Heng Feng 2020-10-07 14:38:58 UTC
Wait, I think it's just a workaround.

It never relies on hda_jackpoll_work to detect jack event. Instead, MSI IRQ triggers azx_interrupt() and everything works fine.

So I think we should find out why MSI stops working after a runtime suspend/resume cycle.
Comment 12 Takashi Iwai 2020-10-07 16:16:41 UTC
Yes, we need to figure out the culprit.  The workaround in comment 10 won't bring back the interrupt at re-plugging after runtime resume, right?

Do I understand correctly that this happens only when the HD-audio controller is runtime-resumed?  i.e. What if you keep HD-audio controller awaken (e.g. just passing power_save_controller=0 option to snd-hda-intel)?

And, if WAKEEN setup is the key, what value is used in the working case (e.g. after system resume) and the non-working case?
Comment 13 Kai Vehmanen 2020-10-07 17:01:06 UTC
@Kai-Heng Ack on that. My patch is just that, an experimental patch, nothing more.

Can you confirm patch in comment 10 helps on your system? This info will help to confirm we are debugging the same case.

Can you also confirm whether you see the problem in the same scenario? I.e. 
 - controller "active" and codec "suspended" -> FAIL, no irq on jack insert 
 - controller "active" and codec "active" -> OK, irq normally on jack insert
 - controller "suspended" and codec "suspended" -> OK, irq normally on jack insert

@Takashi: the comment 10 workaround does bring back the interrupt! There's no actual polling done as jackpoll_interval is zero in these cases, but jackpoll_work is run once for each controller resume, and this seems to help. It powers up the codec.

I now tested with power_save_controller=0 and I could _not_ trigger the problem, so it would seem further narrowed to case where controller was suspended, then active and only then you hit the problem.

I tried forcing WAKEEN on controller side to different values, but that had no impact. Somehow we need to "refresh the codec" after controller resets the link in resume.
Comment 14 Kai-Heng Feng 2020-10-07 18:20:09 UTC
(In reply to Kai Vehmanen from comment #13)
> @Kai-Heng Ack on that. My patch is just that, an experimental patch, nothing
> more.

Ok, testing result below.

> 
> Can you confirm patch in comment 10 helps on your system? This info will
> help to confirm we are debugging the same case.

Yes and it does bring back the MSI interrupt.

> 
> Can you also confirm whether you see the problem in the same scenario? I.e. 
>  - controller "active" and codec "suspended" -> FAIL, no irq on jack insert 
There are actually two scenario here:
1) right after boot, controller is active while codec is suspended, there's IRQ for jack event.

2) Runtime suspend/resume the controller, controller is active, codec is suspended, no IRQ for jack event.

>  - controller "active" and codec "active" -> OK, irq normally on jack insert

Yes.

>  - controller "suspended" and codec "suspended" -> OK, irq normally on jack
> insert

Yes.

> 
> @Takashi: the comment 10 workaround does bring back the interrupt! There's
> no actual polling done as jackpoll_interval is zero in these cases, but
> jackpoll_work is run once for each controller resume, and this seems to
> help. It powers up the codec.
> 
> I now tested with power_save_controller=0 and I could _not_ trigger the
> problem, so it would seem further narrowed to case where controller was
> suspended, then active and only then you hit the problem.
> 
> I tried forcing WAKEEN on controller side to different values, but that had
> no impact. Somehow we need to "refresh the codec" after controller resets
> the link in resume.
Comment 15 Kai-Heng Feng 2020-10-07 18:24:43 UTC
Created attachment 292893 [details]
Request codec resume

I wonder if this makes the intention clearer.
Comment 16 Takashi Iwai 2020-10-08 09:36:14 UTC
Well, we want to resume only the codec that needs the wakeup, and this was identified by STATESTS bit.  If any, I'd keep the STATESTS check for other platforms.

After all it looks like a bug of Tigerlake that doesn't update STATESTS bit.
Comment 17 Kai Vehmanen 2020-10-08 09:38:47 UTC
Thanks @Kai-Heng. So we are seeing the exact same case. 

I investigated a bit more and made a test where I polled the STATESTS register when the problem occurs, and I see no update to its value. It seems codec is not raising SDI in this case when a headset is connected.

I did a further test where I modified __azx_runtime_resume() to not init the controller (BCLK not running). When I run the bug sequence with this modification, now I can see STATESTS change as expected.

When BCLK is running, codec will send a normal unsol message to controller. This is now not working, if we have reset the controller while the codec was in suspend.

I agree the patch in comment 15 looks cleaner, but what I'm not sure is how wide we should make this. Due to covid restrictions, I have somewhat limited gear to test headset detection with. Is this only with Intel controller and/or only with (some?) Realtek codecs. OTOH, this change seems harmless even if applied more generally (and that's what we do in SOF already).

I'll do some more testing on older Intel platforms.
Comment 18 Kai-Heng Feng 2020-10-08 09:53:16 UTC
> After all it looks like a bug of Tigerlake that doesn't update STATESTS bit.

FWIW, the same issue is also observed on a Whisky Lake platform, Dell OptiPlex 7070 Ultra, which also uses ALC236.
Comment 19 Kai Vehmanen 2020-10-08 09:57:42 UTC
Confirmed, at least not specific to Tiger Lake. I can trigger the same exact problem on a i5-10210U Comet Lake with same Realtek ALC3204 codec.
Comment 20 Takashi Iwai 2020-10-08 10:02:09 UTC
OK, then most of recent Intel platforms are broken there :-<
Then let's simplify the check.
Comment 21 Kai-Heng Feng 2020-10-08 10:18:39 UTC
Subscribed Kailang, maybe he knows what's wrong here. Didn't think this was related to codec...
Comment 22 Kai Vehmanen 2020-10-08 10:31:22 UTC
@Takashi: I'm not fully convinced the platform is necessarily broken here. I can't test it just now, but I'm fairly certain, given test results so far, you can go back many, many generations and behaviour will still be same. Without a dump of the bus signals, hard to say for sure, but I'd be suprised if STATESTS value would be wrong. Could it be this is just a case that has gone undetected? In the typical use-cases, either runtime pm is disabled for both, or they are both in suspend when jack is inserted. 
@Kai-Heng if I just run GNOME sound settings, the codec will be active, not suspended and this bug does not happen. Some more steps are needed to trigger this, so I don't think this is all that common case.
Comment 23 Kai-Heng Feng 2020-10-08 10:44:10 UTC
(In reply to Kai Vehmanen from comment #22)
> @Kai-Heng if I just run GNOME sound settings, the codec will be active, not
> suspended and this bug does not happen. Some more steps are needed to
> trigger this, so I don't think this is all that common case.

For desktops don't have internal speaker/mic like this one, the codec stays suspended when I open GNOME sound settings. Since many desktops don't have builtin speaker/mic, I'd say it's a rather common case.
Comment 24 Kai Vehmanen 2020-10-08 10:50:47 UTC
@Kai-Heng: ack, that explains. Do you have any configuration at hand, where this scenario works (older Intel or non-Intel)?
Comment 25 Kai Vehmanen 2020-10-08 11:07:00 UTC
One more data point, i5-9600/Z390 and ALC887-VD -> bug occurs. Not saying this should not be fixed, just trying to understand the scope. It would help to find at least some system where this does not trigger.
Comment 26 Kai-Heng Feng 2020-10-08 11:24:47 UTC
Intel J4105 and ALC233, same bug.
AMD 2500U and ALC3246, same bug.
Intel i7-8700 and CX20632, doesn't have this issue.

So looks like the issue only happens on Realtek codecs.
Comment 27 Takashi Iwai 2020-10-08 12:59:13 UTC
OK that's a convincing argument, given that the majority of devices are equipped with Realtek codecs.
Comment 28 Kai Vehmanen 2020-10-09 10:56:25 UTC
I started putting together a patch. I can send it later to alsa-devel for proper review:
https://github.com/thesofproject/linux/pull/2489
Comment 29 Kai Vehmanen 2020-10-09 14:07:27 UTC
Now sent a version to alsa-devel:
https://mailman.alsa-project.org/pipermail/alsa-devel/2020-October/175572.html
Let's continue discussion on the list.

Hopefully Kailang you have chance to take a look and comment.
Comment 30 Kai Vehmanen 2020-10-16 16:12:10 UTC
Patches merged in ALSA upstream master, so this can be closed.
Comment 31 Kai-Heng Feng 2020-10-16 16:29:02 UTC
It'll still be great if Kailang can share some insights.

Note You need to log in before you can comment on or make changes to this bug.