With the Update from 5.4.36 to 5.4.37 the snd_hda_intel driver fails to work with Dell Wyse 5070 and Futro S740 hardware (possible other hardware is also affected but these two I tested with).
The error appear in the kernel log with following messages:
Jul 07 13:37:17 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: azx_get_response timeout, switching to polling mode: last cmd=0x20bf8100
Jul 07 13:37:18 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: No response from codec, disabling MSI: last cmd=0x20bf8100
Jul 07 13:37:19 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: No response from codec, resetting bus: last cmd=0x20bf8100
Jul 07 13:37:20 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: No response from codec, resetting bus: last cmd=0x20bf8100
Jul 07 13:37:22 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: No response from codec, resetting bus: last cmd=0x20170500
Jul 07 13:37:23 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: No response from codec, resetting bus: last cmd=0x20270500
Jul 07 13:37:24 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: No response from codec, resetting bus: last cmd=0x20370500
Jul 07 13:39:40 wyse5070 kernel: snd_hda_intel 0000:00:0e.0: azx_get_response timeout, switching to single_cmd mode: last cmd=0x20170503
This happens with Ubuntu and our IGEL system but is also a little bit timing related as it occurs not every time on Ubuntu which made debugging quite annoying.
Some Testing and searching for the commit causing the issue I found this one:
Reverting only this commit fixed the issue on the affected hardware also with newer kernels (up to 5.4.48).
According to the log, your machine is an Intel gemini-lake platform. I remember we met lots of audio issue on this platform before, and most of them have sth to do with i915 driver (on hdmi audio).
Like this one: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1826868
So please help check if all machines are gemini-lake machines? and could you please test the latest mainline kernel, if you still could reproduce the problem with the latest mainline kernel. (like this one 5.8-rc4:
Created attachment 290203 [details]
dmesg output of 5.4-rc4 kernel test on Dell Wyse 5070
The Dell Wyse 5070 is a J5005 and the Futro S740 is a J4105 both GeminiLake platforms. But the issue was also present if using a DP to DVI adapter and seems not to be related to the DP/HDMI audio directly. I know it could still be a issue in this area.
Tests with kernel 5.8-rc4 were also done with a DP to DVI adapter so no DP/HDMI audio involved. But also with the 5.8-rc4 the system is running in the same issue.
you could disable the hdmi audio temporarily by adding "snd_hda_intel.probe_mask=1" in the boot args, or within ubuntu, you could edit /etc/modprobe.d/alsa-base.conf, and add options snd-hda-intel probe_mask=1.
After booting up, you could verify the hdmi codec is disabled by "ls /proc/asound/card0/", there should be only codec#0 in that folder. Then you could do the test, let us see if the issue still could be reproduced or not after disable the hdmi codec.
Created attachment 290211 [details]
dmesg with disabled HDMI audio support
So tested the 5.8-rc4 kernel with HDMI audio disabled and there the problem does not occur. Attached is the dmesg of the test with added kernel parameter "snd_hda_intel.probe_mask=1".
So this seems to be HDMI audio related, but I remember playing with the snd_hda_intel parameters before and with a 5.4.x kernel and there the probe_mask=1 needed a single_cmd=1 additionally to work and then it was not always stable (but this was on the Futro S740 with J4105 CPU).
So yes the options seems to help but I don't know how reliable it will help.
Could you give your kernel config? Just to make sure that you've enabled the proper drivers, especially for HDMI codecs.
Created attachment 290213 [details]
kernel log with drm.debug=0x07
Did another run with drm.debug=0x07 active to probably see a connection to the HDMI part. But this seems to have changed some timings and so I need 3 tries to get the error case again, so the whole thing as a timing component as it seems.
Attached is the compressed log from the run with error.
Created attachment 290215 [details]
Kernel config for the 5.8-rc4 kernel
Hope this helps and this is the config you wanted.
If you remove the snd_hda_intel.probe_mask=1, instead you add snd_hda_intel.power_save_controller=0, does it help. Maybe we could add a workaround by setting power_save_controller for geminilake platforms.
I tested the snd_hda_intel.power_save_controller=0 setting and it is also working. Which was to expect as the reverted commit was in the power save area.
Created attachment 290251 [details]
Testing patch (disable the audio controller's runtime pm on geminilake platforms)
Could you please test this patch.
Sorry for the delay. I tested the patch and it is solving the problem for the Dell Wyse 5070 and Futro S740 which I tested just before some minutes.
It's OK to merge the fix as a temporary workaround in a short term, but I'd like to get this addressed properly in the i915 side (or in HD-audio somewhere, if any).
Kai, could Intel can take a look?
Hmm, this looks tricky. 5.8-rc2 had a few related patches in i915 driver (forcing code wake), but that does not seem to help. The patch to disable PM will also keep the display active, so if merged, it will impact PM also outside audio. Could Stefan try with i915.disable_power_well=0 i915 driver patch? This will have a PM hit as well. I'll browse through the logs in this bug better tomorrow.
Hi will try i915.disable_power_well=0 on Thursday as I don't have access to the hardware tomorrow. If you need additional Logs with other drm.debug settings just write it and I will look if I can provide them.
But you said "The patch to disable PM will also keep the display active, so if merged, it will impact PM also outside audio." If you refer to the i915 display_power_get/put, this patch will not impact that, since this patch only disable controller's PM, not codec's PM, the codec's runtime PM still works as before.
And It is best we could fix this issue from root cause instead of a workaround like my patch.
Stefan, ok great! Test w/ normal drm.debug is ok. Gemini Lake platform has had its own type of issues with HDMI probe as GLK supports much lower display clock (CDCLK), and we can hit some unique clocking constraint issues on the HDA link between audio and display.
One additional thing to try is this old Chrome patch (basicly limits the lowest CDCLK and thus avoids the transitions to the lowest clock):
The original Chrome issue has since been fixed in upstream, and the above patch is no longer used. But in case there is some board specific issues with clock stability, this is worth a shot.
Hui, ack, I stand corrected. Indeed, with this hw, keeping controller up won't block i915.
Takashi, with above note from Hui, I think it's ok to merge this patch. Unless the above tests reveal something about the behaviour, I fear this is going to be a time consuming task to debug the hw and will likely require access to the specific hw and/or to reproduce the issue. Currently our test benches (audio and graphics) don't hit this.
I got an report from a customer which hit the same issue with a Intel NUC but there I will not have access to the hardware. But probably it is interesting for you in case you can got one of these devices easier.
The dmesg says this about the system:
Intel(R) Client Systems NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0053.2019.1015.1510 10/15/2019
CPU0: Intel(R) Celeron(R) J4005 CPU @ 2.00GHz (family: 0x6, model: 0x7a, stepping: 0x1)
Will try the chromium patch tomorrow too if it can solve the problem.
the i915.disable_power_well=0 does not fix the problem or seems not to have any influence on the issue.
I tried the Chrome Patch and it is solving the issue too.
Thanks Stefan for testing! I'll try to find hw where I could reproduce the issue. But until then, I recommend merging Hui's patch.
Recently, users reported the audio jack can't detect plug/unplug event anymore, the two machines all have the alc662 codec, revert the commit 9a6418487b566 (ALSA: hda: call runtime_allow() for all hda controllers) make the jack detection work again.
Looks like "call runtime_allow() for all hda controllers" too aggressive, let me revert this patch.