1. Enable runtime pm:
$ echo auto | sudo tee /sys/bus/pci/devices/0000\:00\:1f.3/power/control
2. Make sure it's runtime suspended and enters D3:
$ sudo lspci -vv -s 00:1f.3
0000:00:1f.3 Audio device: Intel Corporation Tiger Lake-LP Smart Sound Technology Audio Controller (rev 20)
Subsystem: Dell Tiger Lake-LP Smart Sound Technology Audio Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 177
IOMMU group: 12
Region 0: Memory at 601f288000 (64-bit, non-prefetchable) [size=16K]
Region 4: Memory at 601f000000 (64-bit, non-prefetchable) [size=1M]
Capabilities:  Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
Capabilities:  Vendor Specific Information: Len=14 <?>
Capabilities:  MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee007d8 Data: 0000
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel, snd_sof_pci
3. Runtime resume the HDA devie by any method other than plugging a jack:
$ echo on | sudo tee /sys/bus/pci/devices/0000\:00\:1f.3/power/control
Or something like open Gnome Control Center's sound tab.
4. Plug a jack and nothing happens. azx_interrupt() doesn't get called.
1. If we plug the jack when HDA controller is runtime suspended, it can wake from D3 PME# and can find jack.
2. If the HDA never gets runtime suspended, plugging jack will trigger azx_interrupt() and can detect jack correctly.
Created attachment 292621 [details]
Is it specific to runtime-suspend/resume? Do you see the same problem with the system suspend/resume, too?
Only runtime suspend/resume is affected. System suspend/resume doesn't have this issue.
I just tried this scenario on Dell TGL laptop with ALC289 codec, and I can't reproduce the debug. Aside the codec, everything else seems the same. I'll continue trying on other devices (I don't have the exact device in report). We'd need to filter down a bit in which conditions this happens.
Created attachment 292825 [details]
Interesting. I now found one TGL system where I can trigger this. The strange thing is that the bug doesn't occur with SOF, only with snd-hda-intel driver. @Kai-Heng Feng, can you try the same experiment on the system you have -- i.e. force to use SOF and see whether the bug reproduces
Hmm, this seems to be missing code in the legacy driver.
The problem only occurs for me if the controller is active but the codec driver is in runtime suspend. In this case, SOF kicks the jack_poll workqueue to check jack status, but as far I can see, the snd-hda-intel does not do the same thing.
If both controller and codec drivers are forced to be on, then jack detection works. Similarly, if both controller and codec driver are suspended, then detection again works.
Here's the patch for SOF:
Thanks for pointing this out!
Are you already working on a patch? Otherwise I can work on it.
Created attachment 292887 [details]
Candidate patch to fix the bug
Here's an experimental patch. Can you give it a go?
This does fix the problem on the system I have, but something is still not right. Basicly jackpoll_interval is zero, so all this does is to power up the codec once when controller is resumed, and then codec goes back to suspend. But with this action in controller resume, jack detection is fixed.
Wait, I think it's just a workaround.
It never relies on hda_jackpoll_work to detect jack event. Instead, MSI IRQ triggers azx_interrupt() and everything works fine.
So I think we should find out why MSI stops working after a runtime suspend/resume cycle.
Yes, we need to figure out the culprit. The workaround in comment 10 won't bring back the interrupt at re-plugging after runtime resume, right?
Do I understand correctly that this happens only when the HD-audio controller is runtime-resumed? i.e. What if you keep HD-audio controller awaken (e.g. just passing power_save_controller=0 option to snd-hda-intel)?
And, if WAKEEN setup is the key, what value is used in the working case (e.g. after system resume) and the non-working case?
@Kai-Heng Ack on that. My patch is just that, an experimental patch, nothing more.
Can you confirm patch in comment 10 helps on your system? This info will help to confirm we are debugging the same case.
Can you also confirm whether you see the problem in the same scenario? I.e.
- controller "active" and codec "suspended" -> FAIL, no irq on jack insert
- controller "active" and codec "active" -> OK, irq normally on jack insert
- controller "suspended" and codec "suspended" -> OK, irq normally on jack insert
@Takashi: the comment 10 workaround does bring back the interrupt! There's no actual polling done as jackpoll_interval is zero in these cases, but jackpoll_work is run once for each controller resume, and this seems to help. It powers up the codec.
I now tested with power_save_controller=0 and I could _not_ trigger the problem, so it would seem further narrowed to case where controller was suspended, then active and only then you hit the problem.
I tried forcing WAKEEN on controller side to different values, but that had no impact. Somehow we need to "refresh the codec" after controller resets the link in resume.
(In reply to Kai Vehmanen from comment #13)
> @Kai-Heng Ack on that. My patch is just that, an experimental patch, nothing
Ok, testing result below.
> Can you confirm patch in comment 10 helps on your system? This info will
> help to confirm we are debugging the same case.
Yes and it does bring back the MSI interrupt.
> Can you also confirm whether you see the problem in the same scenario? I.e.
> - controller "active" and codec "suspended" -> FAIL, no irq on jack insert
There are actually two scenario here:
1) right after boot, controller is active while codec is suspended, there's IRQ for jack event.
2) Runtime suspend/resume the controller, controller is active, codec is suspended, no IRQ for jack event.
> - controller "active" and codec "active" -> OK, irq normally on jack insert
> - controller "suspended" and codec "suspended" -> OK, irq normally on jack
> @Takashi: the comment 10 workaround does bring back the interrupt! There's
> no actual polling done as jackpoll_interval is zero in these cases, but
> jackpoll_work is run once for each controller resume, and this seems to
> help. It powers up the codec.
> I now tested with power_save_controller=0 and I could _not_ trigger the
> problem, so it would seem further narrowed to case where controller was
> suspended, then active and only then you hit the problem.
> I tried forcing WAKEEN on controller side to different values, but that had
> no impact. Somehow we need to "refresh the codec" after controller resets
> the link in resume.
Created attachment 292893 [details]
Request codec resume
I wonder if this makes the intention clearer.
Well, we want to resume only the codec that needs the wakeup, and this was identified by STATESTS bit. If any, I'd keep the STATESTS check for other platforms.
After all it looks like a bug of Tigerlake that doesn't update STATESTS bit.
Thanks @Kai-Heng. So we are seeing the exact same case.
I investigated a bit more and made a test where I polled the STATESTS register when the problem occurs, and I see no update to its value. It seems codec is not raising SDI in this case when a headset is connected.
I did a further test where I modified __azx_runtime_resume() to not init the controller (BCLK not running). When I run the bug sequence with this modification, now I can see STATESTS change as expected.
When BCLK is running, codec will send a normal unsol message to controller. This is now not working, if we have reset the controller while the codec was in suspend.
I agree the patch in comment 15 looks cleaner, but what I'm not sure is how wide we should make this. Due to covid restrictions, I have somewhat limited gear to test headset detection with. Is this only with Intel controller and/or only with (some?) Realtek codecs. OTOH, this change seems harmless even if applied more generally (and that's what we do in SOF already).
I'll do some more testing on older Intel platforms.
> After all it looks like a bug of Tigerlake that doesn't update STATESTS bit.
FWIW, the same issue is also observed on a Whisky Lake platform, Dell OptiPlex 7070 Ultra, which also uses ALC236.
Confirmed, at least not specific to Tiger Lake. I can trigger the same exact problem on a i5-10210U Comet Lake with same Realtek ALC3204 codec.
OK, then most of recent Intel platforms are broken there :-<
Then let's simplify the check.
Subscribed Kailang, maybe he knows what's wrong here. Didn't think this was related to codec...
@Takashi: I'm not fully convinced the platform is necessarily broken here. I can't test it just now, but I'm fairly certain, given test results so far, you can go back many, many generations and behaviour will still be same. Without a dump of the bus signals, hard to say for sure, but I'd be suprised if STATESTS value would be wrong. Could it be this is just a case that has gone undetected? In the typical use-cases, either runtime pm is disabled for both, or they are both in suspend when jack is inserted.
@Kai-Heng if I just run GNOME sound settings, the codec will be active, not suspended and this bug does not happen. Some more steps are needed to trigger this, so I don't think this is all that common case.
(In reply to Kai Vehmanen from comment #22)
> @Kai-Heng if I just run GNOME sound settings, the codec will be active, not
> suspended and this bug does not happen. Some more steps are needed to
> trigger this, so I don't think this is all that common case.
For desktops don't have internal speaker/mic like this one, the codec stays suspended when I open GNOME sound settings. Since many desktops don't have builtin speaker/mic, I'd say it's a rather common case.
@Kai-Heng: ack, that explains. Do you have any configuration at hand, where this scenario works (older Intel or non-Intel)?
One more data point, i5-9600/Z390 and ALC887-VD -> bug occurs. Not saying this should not be fixed, just trying to understand the scope. It would help to find at least some system where this does not trigger.
Intel J4105 and ALC233, same bug.
AMD 2500U and ALC3246, same bug.
Intel i7-8700 and CX20632, doesn't have this issue.
So looks like the issue only happens on Realtek codecs.
OK that's a convincing argument, given that the majority of devices are equipped with Realtek codecs.
I started putting together a patch. I can send it later to alsa-devel for proper review:
Now sent a version to alsa-devel:
Let's continue discussion on the list.
Hopefully Kailang you have chance to take a look and comment.
Patches merged in ALSA upstream master, so this can be closed.
It'll still be great if Kailang can share some insights.