Bug 199007 - hda driver initialization takes too much time on the machine with coffeelake audio controller [8086:a348]
Summary: hda driver initialization takes too much time on the machine with coffeelake ...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Sound(ALSA) (show other bugs)
Hardware: Intel Linux
: P1 blocking
Assignee: Jaroslav Kysela
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-05 10:16 UTC by Hui Wang
Modified: 2018-03-21 08:19 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.16-rc4 and all old version kernel
Subsystem:
Regression: No
Bisected commit-id:


Attachments
trace of snd_hda_codec_build_controls (570.60 KB, text/plain)
2018-03-09 03:21 UTC, Hui Wang
Details
dmesg about snd_hdac_bus_exec_verb_unlocked() entry and exit (91.32 KB, text/plain)
2018-03-14 10:15 UTC, Hui Wang
Details
trace event for test 1 (66.47 KB, text/plain)
2018-03-15 06:28 UTC, Hui Wang
Details
trace event for test 2 (73.29 KB, text/plain)
2018-03-15 06:32 UTC, Hui Wang
Details
test on another CFL machine (53.30 KB, text/plain)
2018-03-16 10:05 UTC, Hui Wang
Details
the trace events log of 1st test on the laptop with alc289 (98.79 KB, text/plain)
2018-03-16 10:06 UTC, Hui Wang
Details
the trace events log of 2nd test on the laptop with alc289 (98.79 KB, text/plain)
2018-03-16 10:07 UTC, Hui Wang
Details
Untested fix patch (771 bytes, patch)
2018-03-19 08:05 UTC, Takashi Iwai
Details | Diff

Description Hui Wang 2018-03-05 10:16:25 UTC
We have a couple of laptop and desktop computers, all of them are cfl platforms and all of them have cfl audio controller [8086:a348], and these computers have different realtek codecs on them (alc256, alc255, alc289 and ...). The hda driver takes too much time (3 secs - 8 secs) to initialize on these machines. This problem never happened on kbl, skl, bdl, ... before.

If I do some change in the alsa hda driver, the time of initialization can be within 0.6 seconds, but even 0.6 seconds is much longer than before on other platforms.

The change I did:
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 96143df..2460af7 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2422,7 +2422,7 @@ static const struct pci_device_id azx_ids[] = {
          .driver_data = AZX_DRIVER_SKL | AZX_DCAPS_INTEL_SKYLAKE },
        /* Coffelake */
        { PCI_DEVICE(0x8086, 0xa348),
-         .driver_data = AZX_DRIVER_SKL | AZX_DCAPS_INTEL_SKYLAKE},
+         .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM},
        /* Cannonlake */
        { PCI_DEVICE(0x8086, 0x9dc8),
          .driver_data = AZX_DRIVER_SKL | AZX_DCAPS_INTEL_SKYLAKE},
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 7a9a867..2b97b04 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7016,7 +7016,7 @@ static int patch_alc269(struct hda_codec *codec)
 
        spec = codec->spec;
        spec->gen.shared_mic_vref_pin = 0x18;
-       codec->power_save_node = 1;
+       codec->power_save_node = 0;
 
 #ifdef CONFIG_PM
        codec->patch_ops.suspend = alc269_suspend;
Comment 1 Hui Wang 2018-03-05 10:20:07 UTC
The driver initialization takes too much time, the dmesg like below shown (from 6.27 to 12.99):

[    6.272797] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC3246: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:speaker
[    6.272803] snd_hda_codec_realtek hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[    6.272807] snd_hda_codec_realtek hdaudioC0D0:    hp_outs=1 (0x21/0x0/0x0/0x0/0x0)
[    6.272809] snd_hda_codec_realtek hdaudioC0D0:    mono: mono_out=0x0
[    6.272811] snd_hda_codec_realtek hdaudioC0D0:    inputs:
[    6.272816] snd_hda_codec_realtek hdaudioC0D0:      Headset Mic=0x19
[    6.272819] snd_hda_codec_realtek hdaudioC0D0:      Headphone Mic=0x1a
[    6.272822] snd_hda_codec_realtek hdaudioC0D0:      Internal Mic=0x12
[    6.281028] iwlwifi 0000:00:14.3: base HW address: 34:13:e8:6f:61:4d
[    6.285470] Adding 32894972k swap on /dev/nvme0n1p4.  Priority:-1 extents:1 across:32894972k SSFS

...

[   12.998515] input: HDA Intel PCH Headphone Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input18
[   12.998695] input: HDA Intel PCH HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input19
[   12.998861] input: HDA Intel PCH HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input20
[   12.998969] input: HDA Intel PCH HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input21
[   12.999051] input: HDA Intel PCH HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input22
[   12.999133] input: HDA Intel PCH HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:1f.3/sound/card0/i
Comment 2 Hui Wang 2018-03-05 10:21:39 UTC
And the audio device looks like below show:

00:1f.3 Audio device [0403]: Intel Corporation Device [8086:a348] (rev 01) (prog-if 80)
        Subsystem: Dell Device [1028:0826]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 449
        Region 0: Memory at ed418000 (64-bit, non-prefetchable) [size=16K]
        Region 4: Memory at ed100000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: <access denied>
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
Comment 3 Hui Wang 2018-03-05 10:26:41 UTC
Sometimes this can make pulseaudio can't work well.

pulseaudio starts, but the driver (as kernel modules) initialization has not finished, then there is no valid sink in the pulseaudio. At this moment, some UI Apps already start using pulseaudio to play sound, but users can't hear the sound as expected. This problem never happened before, it is exposed on cfl platforms, and all cfl platforms (with audio controller 8086:a348) have this problem.
Comment 4 Takashi Iwai 2018-03-05 11:31:34 UTC
Could you identify which code actually takes so long?  You can check via perf or such.
Comment 5 Hui Wang 2018-03-05 15:02:47 UTC
It is not always same for every time.

But most of time, the hdac_regmap.c:hda_reg_read()--->hda_codec.c:codec_exec_verb() will takes more than 1 second to finish randomly. I don't know which time of calling hda_reg_read() takes more than 1 second, but there are several times that takes more than 1 second within 100 times of calling this function.
Comment 6 Takashi Iwai 2018-03-05 15:10:13 UTC
For going further from that point, you need to identify which codec verb triggers such an issue.  You can turn on tracing HD-audio verbs to catch it.  In most cases, it's some unexpected verb to perform, hence either codec or controller chip stalls.
Comment 7 Hui Wang 2018-03-05 15:18:03 UTC
Ok, I will try to trace the verbs.
Comment 8 Hui Wang 2018-03-09 03:21:17 UTC
Created attachment 274629 [details]
trace of snd_hda_codec_build_controls

set max_depth to be 10.

We can find $ in this txt, all functions take long time to run are snd_hdac_exec_verb().
Comment 9 Hui Wang 2018-03-14 10:15:09 UTC
Created attachment 274723 [details]
dmesg about snd_hdac_bus_exec_verb_unlocked() entry and exit

I added printk at entry and exit of snd_hdac_bus_exec_verb_unlocked(), this function will randomly take 1 second to finish like this:

[   16.171904] wwwwwwwwwww enter snd_hdac_bus_exec_verb_unlocked, addr = 0x0, cmd = 0x1ef0500
[   17.176324] ============ exit snd_hdac_bus_exec_verb_unlocked


After reading the hda_verbs.h, I found that the verb AC_VERB_GET_AMP_GAIN_MUTE and AC_VERB_GET_POWER_STATE will take 1 second to finish sometimes (snd_hdac_bus_exec_verb_unlocked), but not all these two verbs take 1 second to finish, most of time, these two verbs work normally.

This problem only happens on CFL platforms (with audio controller 8086:a348).
Comment 10 Takashi Iwai 2018-03-14 10:23:21 UTC
Check tracing of verbs to see exactly which verb (and to which NID) stalls.  It would work without adding debug prints.  See "Tracepoints" section in Documentation/sound/hd-audio/notes.rst.

And such a problem isn't necessarily depending on the controller.  Unless it's exactly the same codec chip, it might be a slight difference of codec chip models.  For example, some chips don't like accesses to a NID that is disabled only for some variants of chip models.
Comment 11 Hui Wang 2018-03-15 06:28:15 UTC
Created attachment 274735 [details]
trace event for test 1

This is codec info:
Vendor Id: 0x10ec0255
Revision Id: 0x100002

I did the test of comment#10 twice, and the verbs which took 1 second to finsh are different between 2 tests (does not look like a problem of codec side). In the 1st time, we can see the verb took 1 second to finish are shown as below:

raw value = 0x00bb0000
cid = 0, nid = 0x0b, verb = 0xb00, parm = 0x00
raw value: verb = 0xb00, parm = 0x0
verbname = get_amp_gain_mute
amp raw val = 0x0
input, right, idx=0

raw value = 0x020c0000
cid = 0, nid = 0x20, verb = 0xc00, parm = 0x00
raw value: verb = 0xc00, parm = 0x0
verbname = get_proc_coef

raw value = 0x017f000c
cid = 0, nid = 0x17, verb = 0xf00, parm = 0x0c
raw value: verb = 0xf00, parm = 0xc
verbname = get_parameters
parameter = pin_cap

raw value = 0x00db0000
cid = 0, nid = 0x0d, verb = 0xb00, parm = 0x00
raw value: verb = 0xb00, parm = 0x0
verbname = get_amp_gain_mute
amp raw val = 0x0
input, right, idx=0

raw value = 0x008b0000
cid = 0, nid = 0x08, verb = 0xb00, parm = 0x00
raw value: verb = 0xb00, parm = 0x0
verbname = get_amp_gain_mute
amp raw val = 0x0
input, right, idx=0
...
Comment 12 Hui Wang 2018-03-15 06:32:52 UTC
Created attachment 274737 [details]
trace event for test 2

The verbs took 1 sec to finish are shown as below:

raw value = 0x00bb2003
cid = 0, nid = 0x0b, verb = 0xb20, parm = 0x03
raw value: verb = 0xb20, parm = 0x3
verbname = get_amp_gain_mute
amp raw val = 0x2003
input, left, idx=3

raw value = 0x00bb2004
cid = 0, nid = 0x0b, verb = 0xb20, parm = 0x04
raw value: verb = 0xb20, parm = 0x4
verbname = get_amp_gain_mute
amp raw val = 0x2004
input, left, idx=4

raw value = 0x020c0000
cid = 0, nid = 0x20, verb = 0xc00, parm = 0x00
raw value: verb = 0xc00, parm = 0x0
verbname = get_proc_coef

raw value = 0x00cb2001
cid = 0, nid = 0x0c, verb = 0xb20, parm = 0x01
raw value: verb = 0xb20, parm = 0x1
verbname = get_amp_gain_mute
amp raw val = 0x2001
input, left, idx=1

raw value = 0x009b0000
cid = 0, nid = 0x09, verb = 0xb00, parm = 0x00
raw value: verb = 0xb00, parm = 0x0
verbname = get_amp_gain_mute
amp raw val = 0x0
input, right, idx=0

raw value = 0x002f0500
cid = 0, nid = 0x02, verb = 0xf05, parm = 0x00
raw value: verb = 0xf05, parm = 0x0
verbname = get_power_state
...
Comment 13 Takashi Iwai 2018-03-15 09:04:18 UTC
Hm, at least, a common pattern is that NID 0x0b triggers the stall at first.
And NID 0x20 is another one commonly seen, which is a COEF, so it's no wonder.

Could you give alsa-info.sh output?
Comment 14 Takashi Iwai 2018-03-15 09:36:32 UTC
... and another question is why it still "works" after one second stall.
Doesn't the controller driver go to the polling mode after timeout?  If yes, the kernel should have given the warning message.
(But also it doesn't explain why multiple stalls happen, and each of them works after 1 second.)
Comment 15 Hui Wang 2018-03-16 10:05:07 UTC
Created attachment 274775 [details]
test on another CFL machine

This is the alsa-info.txt generated on another CFL laptop, this laptop also has this issue. the codec on this machine is alc289, is different from the codec (alc255) on the old machine (desktop)。
Comment 16 Hui Wang 2018-03-16 10:06:44 UTC
Created attachment 274777 [details]
the trace events log of 1st test on the laptop with alc289

also has 1 sec stall in some verbs
Comment 17 Hui Wang 2018-03-16 10:07:20 UTC
Created attachment 274779 [details]
the trace events log of 2nd test on the laptop with alc289
Comment 18 Hui Wang 2018-03-16 10:08:55 UTC
reply comment#14:

looks like the hda driver did not complain any timeout, so it did not enter polling mode.
Comment 19 Takashi Iwai 2018-03-16 10:10:24 UTC
Then try to force polling mode from the beginning.  If this works, it's either the missing interrupt in the CFL controller, or the memory cache issue in CORB/RIRB.
Comment 20 Hui Wang 2018-03-16 10:48:26 UTC
OK, got it. will test polling mode.
Comment 21 Hui Wang 2018-03-19 05:02:37 UTC
Correct the comment #18, after enabling dev_dbg(), I saw "azx_get_response timeout, polling the codec once: last cmd=" in the dmesg, looks like all snd_hdac_bus_exec_verb_unlocked() which take 1 second to finish will print out this warning message.

And after making the driver work in polling mode, the problem disappears, the driver initialization will finish within 0.1s, just like other platforms.

So how do we fix this problem?
Comment 22 Takashi Iwai 2018-03-19 08:04:53 UTC
Well, a quick-and-easy fix would be to set polling_mode when CFL chip is probed, an (untested) patch like below.

But I'd rather like to check more details.  For example, whether the problem still appears when you probe with snoop=0 option to snd-hda-intel module.
Comment 23 Takashi Iwai 2018-03-19 08:05:33 UTC
Created attachment 274805 [details]
Untested fix patch
Comment 24 Hui Wang 2018-03-19 11:15:27 UTC
Ok, will test snoop=0.
Comment 25 Hui Wang 2018-03-21 03:42:16 UTC
Reply comment #22,

set snoop=0 when loading snd-hda-intel doesn't help fix this problem.
Comment 26 Hui Wang 2018-03-21 04:02:31 UTC
Reply comment#23,

tested the untested patch, it worked very well on my CFL machines.
Comment 27 Takashi Iwai 2018-03-21 07:28:43 UTC
OK.  The impact of polling mode is much smaller than disabling the power-save, so we can take it for now as a tentative solution.

I'll submit the patch to ML later.

Note You need to log in before you can comment on or make changes to this bug.