Bug 114341 - CORB reset timeout #1 on Skylake
Summary: CORB reset timeout #1 on Skylake
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Sound(ALSA) (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Jaroslav Kysela
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-11 07:22 UTC by Patrick Steinhardt
Modified: 2019-09-15 09:27 UTC (History)
5 users (show)

See Also:
Kernel Version: v4.4.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel config (69.24 KB, application/octet-stream)
2016-03-11 07:22 UTC, Patrick Steinhardt
Details
dmesg for a non-working configuration (51.53 KB, application/octet-stream)
2016-03-11 07:23 UTC, Patrick Steinhardt
Details
lspci -vvn (32.32 KB, application/octet-stream)
2016-03-11 07:23 UTC, Patrick Steinhardt
Details
dmesg - pin nid 7 not registered (52.66 KB, application/octet-stream)
2016-03-14 20:55 UTC, Patrick Steinhardt
Details
dmesg - no sound but sound card is recognised (51.56 KB, application/octet-stream)
2016-03-16 06:28 UTC, Patrick Steinhardt
Details
patch for debug (1.05 KB, application/octet-stream)
2016-04-29 13:50 UTC, han lu
Details
lspci -vvn (24.55 KB, text/plain)
2019-03-07 08:26 UTC, Alexander P
Details

Description Patrick Steinhardt 2016-03-11 07:22:01 UTC
Created attachment 208651 [details]
Kernel config

I've got an Asus Z170-A motherboard with Realtek ALC892 onboard audio and an Intel Core i5-6600K. Sound usually works fine until some point in time where I reboot and after the reboot the sound persistently stops working even across reboots and power down.

Unfortunately I haven't got the kernel log at hand for the working configuration right now. But I know that when I'm rebooting from a working into a non-working configuration that the kernel log spills out something like "HDMI: Invalid ELD buf size -1" and I can remember a line like "azx_get_response timeout, switching to polling mode". These messages are not usually displayed when rebooting from a working to a working configuration and when rebooting from a non-working to a non-working configuration.

When booting from non-working to non-working configurations snd_hda_intel gives the following messages:

[    0.843829] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops ffffffffae7586f0)
[    0.954574] snd_hda_intel 0000:00:1f.3: CORB reset timeout#1, CORBRP = 0
[    0.957012] snd_hda_intel 0000:00:1f.3: no codecs found!

Regaining proper sound requires powering off the PC and resetting the motherboard multiple times. I have not been able to determine what actually triggers getting a working configuration again, but it seems rather random.

I've tried setting snd_hda_intel.single_cmd=1 and snd_hda_intel.probe_mask=1 as proposed in https://bugzilla.redhat.com/show_bug.cgi?id=1297003 but this does not change anything.

My current kernel configuration has everything built-in. But I've also tested with an ArchLinux live USB stick with modules without any success.
Comment 1 Patrick Steinhardt 2016-03-11 07:23:00 UTC
Created attachment 208661 [details]
dmesg for a non-working configuration
Comment 2 Patrick Steinhardt 2016-03-11 07:23:40 UTC
Created attachment 208671 [details]
lspci -vvn
Comment 3 Takashi Iwai 2016-03-11 07:31:40 UTC
Hmm, such a communication error was caused by the missing clock management in the past, and it was recently fixed by the commit
  7e31a0159461818a1bda49662921b98a29c1187b
    ALSA: hda - Apply clock gate workaround to Skylake, too

This should have been backported to 4.4.x, too.  Make sure that your kernel contains this backport and confirm that the issue really still happens with it.
Comment 4 Patrick Steinhardt 2016-03-11 07:33:31 UTC
I've already seen your commit and was happy to see it included in v4.4.5. Unfortunately it didn't change the outcome for me.
Comment 5 Takashi Iwai 2016-03-11 07:41:33 UTC
Then I have no idea in the audio side, for now...

Which Skylake CPU do you have?  There's been regressions in the recent kernel due to the C-state management.  Try "intel_idle.max_cstate=0" and "intel_pstate=disable" boot options once when it happens.  I don't think it matters, but just to be sure.
Comment 6 Patrick Steinhardt 2016-03-11 07:48:40 UTC
No, adding "intel_idle.max_cstate=0" and "intel_pstate=disable" changes nothing.

I've got an Intel(R) Core(TM) i5-6600K CPU. One (maybe?) relevant tidbit I've forgotten to mention is that I use Vt-d. I've got an Nvidia GeForce GTX 750 Ti and stub its VGA controller and audio device with vfio-pci during boot with "vfio-pci.ids=10de:1380,10de:0fbc".
Comment 7 Patrick Steinhardt 2016-03-14 20:55:03 UTC
Created attachment 209151 [details]
dmesg - pin nid 7 not registered

I've just noticed another new message popping up which might be related to the issue:

[   16.866681] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered
[   16.881795] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered

It's the first time I actively notice this message and it happened after doing a hard reset. No idea if it is actually connected to the issue as I still have working sound but reporting it never the less as there are no other ideas floating around currently.
Comment 8 Patrick Steinhardt 2016-03-16 06:28:31 UTC
Created attachment 209431 [details]
dmesg - no sound but sound card is recognised

And another dmesg with different behavior. The sound card is recognized but sound does not work. Highlights from dmesg:

[  +2.288323] snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to polling mode: last cmd=0x208f8100
[  +1.003403] snd_hda_intel 0000:00:1f.3: No response from codec, disabling MSI: last cmd=0x208f8100
[  +1.003405] snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to single_cmd mode: last cmd=0x208f8100
[  +0.008846] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register 0x2f0d00. -5
[  +0.000172] snd_hda_codec_hdmi hdaudioC0D2: HDMI: invalid ELD buf size -1
(repeated a few times)
[  +0.242124] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered
[  +0.019740] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered
[  +0.018615] snd_hda_codec_hdmi hdaudioC0D2: HDMI: invalid ELD buf size -1
(repeated a few times)
[  +9.734267] azx_single_send_cmd: 117 callbacks suppressed
[  +0.178726] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[  +0.000088] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[  +5.287777] azx_single_send_cmd: 779 callbacks suppressed
[Mar16 07:23] azx_single_send_cmd: 4 callbacks suppressed
[  +8.002868] azx_single_send_cmd: 8 callbacks suppressed
[ +11.918226] azx_single_send_cmd: 84 callbacks suppressed
Comment 9 Libin Yang 2016-03-25 08:04:32 UTC
Based on the description, both analog audio (Realtek ALC892) and the digital audio don't work, right? I'm thinking whether digital audio no response cmd cause the audio totally doesn't work.
Comment 10 Patrick Steinhardt 2016-03-26 08:03:40 UTC
(In reply to Libin Yang from comment #9)
> Based on the description, both analog audio (Realtek ALC892) and the digital
> audio don't work, right? I'm thinking whether digital audio no response cmd
> cause the audio totally doesn't work.

This is correct, there is not a single output available for the Realtek card when the bug occurs. I am able to use audio of the dedicated graphics card though when it is not stubbed with vfio.

I think it likely to have something to do with digital audio. Now that I think about it the error did not occur when I connected my primary monitor via HDMI but only when it was connected via DisplayPort.
Comment 11 Patrick Steinhardt 2016-04-16 22:00:18 UTC
So by now I think I've got a better understanding as to when the error occurs. 

As said before I use Vt-d to pass my external GPU to qemu and access it inside of the VM. I start up the kernel with my external GPU and its audio controller added to the vfio-pci framework: "vfio-pci.ids=10de:1380,10de:0fbc". I then pass these devices to qemu via "-device vfio-pci,host=01:00.0 -device vfio-pci,host=01:00.1". After starting up the qemu VM I have sound working on both the host OS and the guest OS via "-soundhw hda" from qemu, that is qemu passes audio via the emulated sound hardware to the host's pulseaudio instance.

Now, when shutting down the VM after some time and later trying to play sound on the host machine it more often than not stops working. dmesg then spams "HDMI: invalid ELD buf size -1" Now when rebooting the machine sound will still not work as now I am getting the "CORB reset timeout #1" messages.

Usually the only way how to fix it now is to "Load optimized defaults" in UEFI (Asus Z170-A) and let the computer reset.
Comment 12 han lu 2016-04-29 13:50:34 UTC
Created attachment 214721 [details]
patch for debug
Comment 13 han lu 2016-04-29 13:54:40 UTC
Could you please
  (1) Apply attached patch and check if it make any difference, and attach dmesg;
  (2) Load optimized default in BIOS, and attach dmesg.
Thanks.
Comment 14 Patrick Steinhardt 2016-04-30 11:05:11 UTC
(In reply to han lu from comment #13)
> Could you please
>   (1) Apply attached patch and check if it make any difference, and attach
> dmesg;
>   (2) Load optimized default in BIOS, and attach dmesg.
> Thanks.

I'm not at home over weekend, but I'll do so on monday. Thanks
Comment 15 Patrick Steinhardt 2016-05-02 20:37:58 UTC
Well, now that I want to actively reproduce the issue I'm unable to do so. In the meantime I've upgraded to v4.5.2, so maybe the issue is fixed by now. I'll report back when the issue comes back to bite me.
Comment 16 han lu 2016-05-05 07:53:16 UTC
(In reply to Patrick Steinhardt from comment #15)
> Well, now that I want to actively reproduce the issue I'm unable to do so.
> In the meantime I've upgraded to v4.5.2, so maybe the issue is fixed by now.
> I'll report back when the issue comes back to bite me.

so can we close this issue at the moment? It can be reopened if issue be reproduced in future.
Comment 17 Patrick Steinhardt 2016-05-05 09:10:09 UTC
(In reply to han lu from comment #16)
> (In reply to Patrick Steinhardt from comment #15)
> > Well, now that I want to actively reproduce the issue I'm unable to do so.
> > In the meantime I've upgraded to v4.5.2, so maybe the issue is fixed by
> now.
> > I'll report back when the issue comes back to bite me.
> 
> so can we close this issue at the moment? It can be reopened if issue be
> reproduced in future.

Yes, will set to unreproducible for now. Thanks.
Comment 18 Alexander P 2019-03-07 08:24:21 UTC
This issue seems to have manifested for me yesterday, seemingly out of the blue (guess after an S3 resume and I don't think any updates could have applied automatically), on the stock 4.15.0-46 kernel (Ubuntu 18.04) in a similar configuration (an i7-6700K CPU on Z-170A):

[ 5789.856394] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[ 5789.968962] snd_hda_intel 0000:00:1f.3: CORB reset timeout#1, CORBRP = 0
[ 5789.970538] snd_hda_intel 0000:00:1f.3: no codecs found!

I have seemingly tried every recipe out there that exists, to no avail. I've also updated my Ubuntu to 18.10 (the 4.18.0-16-generic kernel) but this hasn't helped either.
Comment 19 Alexander P 2019-03-07 08:26:59 UTC
Created attachment 281567 [details]
lspci -vvn
Comment 20 Alexander P 2019-03-07 08:37:44 UTC
It's also worth mentioning that multiple resets as well as disabling and re-enabling the audio device in the BIOS haven't helped my situation.
Comment 21 Qiming HOU 2019-09-15 09:27:22 UTC
Encountered this problem and successfully fixed it. But due to the method I used to fix it, I don't want to reproduce it.

Full story:

I'm debugging a VFIO setup with a Windows VM whose snapshot is in the middle of an update. That means, I'm repeatedly rebooting a Windows VM with full control of the audio device *during Windows update*.

The `CORB reset timeout` problem manifested after one such reboot. I tried reboot, cold reboot, rmmod than insmod, delete device and re-add in Windows device manager. Nothing worked.

At this point I'm suspecting firmware corruption and did a BIOS update like suggested by a forum post. It fixed the problem.

TL;DR Don't reboot when your Windows VM is updating!

Note You need to log in before you can comment on or make changes to this bug.