Bug 114341
Summary: | CORB reset timeout #1 on Skylake | ||
---|---|---|---|
Product: | Drivers | Reporter: | Patrick Steinhardt (ps) |
Component: | Sound(ALSA) | Assignee: | Jaroslav Kysela (perex) |
Status: | RESOLVED UNREPRODUCIBLE | ||
Severity: | normal | CC: | han.lu, hqm03ster, libin.yang, pavlov81, tiwai |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | v4.4.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Kernel config
dmesg for a non-working configuration lspci -vvn dmesg - pin nid 7 not registered dmesg - no sound but sound card is recognised patch for debug lspci -vvn |
Description
Patrick Steinhardt
2016-03-11 07:22:01 UTC
Created attachment 208661 [details]
dmesg for a non-working configuration
Created attachment 208671 [details]
lspci -vvn
Hmm, such a communication error was caused by the missing clock management in the past, and it was recently fixed by the commit 7e31a0159461818a1bda49662921b98a29c1187b ALSA: hda - Apply clock gate workaround to Skylake, too This should have been backported to 4.4.x, too. Make sure that your kernel contains this backport and confirm that the issue really still happens with it. I've already seen your commit and was happy to see it included in v4.4.5. Unfortunately it didn't change the outcome for me. Then I have no idea in the audio side, for now... Which Skylake CPU do you have? There's been regressions in the recent kernel due to the C-state management. Try "intel_idle.max_cstate=0" and "intel_pstate=disable" boot options once when it happens. I don't think it matters, but just to be sure. No, adding "intel_idle.max_cstate=0" and "intel_pstate=disable" changes nothing. I've got an Intel(R) Core(TM) i5-6600K CPU. One (maybe?) relevant tidbit I've forgotten to mention is that I use Vt-d. I've got an Nvidia GeForce GTX 750 Ti and stub its VGA controller and audio device with vfio-pci during boot with "vfio-pci.ids=10de:1380,10de:0fbc". Created attachment 209151 [details]
dmesg - pin nid 7 not registered
I've just noticed another new message popping up which might be related to the issue:
[ 16.866681] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered
[ 16.881795] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered
It's the first time I actively notice this message and it happened after doing a hard reset. No idea if it is actually connected to the issue as I still have working sound but reporting it never the less as there are no other ideas floating around currently.
Created attachment 209431 [details]
dmesg - no sound but sound card is recognised
And another dmesg with different behavior. The sound card is recognized but sound does not work. Highlights from dmesg:
[ +2.288323] snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to polling mode: last cmd=0x208f8100
[ +1.003403] snd_hda_intel 0000:00:1f.3: No response from codec, disabling MSI: last cmd=0x208f8100
[ +1.003405] snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to single_cmd mode: last cmd=0x208f8100
[ +0.008846] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register 0x2f0d00. -5
[ +0.000172] snd_hda_codec_hdmi hdaudioC0D2: HDMI: invalid ELD buf size -1
(repeated a few times)
[ +0.242124] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered
[ +0.019740] snd_hda_codec_hdmi hdaudioC0D2: HDMI: pin nid 7 not registered
[ +0.018615] snd_hda_codec_hdmi hdaudioC0D2: HDMI: invalid ELD buf size -1
(repeated a few times)
[ +9.734267] azx_single_send_cmd: 117 callbacks suppressed
[ +0.178726] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[ +0.000088] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[ +5.287777] azx_single_send_cmd: 779 callbacks suppressed
[Mar16 07:23] azx_single_send_cmd: 4 callbacks suppressed
[ +8.002868] azx_single_send_cmd: 8 callbacks suppressed
[ +11.918226] azx_single_send_cmd: 84 callbacks suppressed
Based on the description, both analog audio (Realtek ALC892) and the digital audio don't work, right? I'm thinking whether digital audio no response cmd cause the audio totally doesn't work. (In reply to Libin Yang from comment #9) > Based on the description, both analog audio (Realtek ALC892) and the digital > audio don't work, right? I'm thinking whether digital audio no response cmd > cause the audio totally doesn't work. This is correct, there is not a single output available for the Realtek card when the bug occurs. I am able to use audio of the dedicated graphics card though when it is not stubbed with vfio. I think it likely to have something to do with digital audio. Now that I think about it the error did not occur when I connected my primary monitor via HDMI but only when it was connected via DisplayPort. So by now I think I've got a better understanding as to when the error occurs. As said before I use Vt-d to pass my external GPU to qemu and access it inside of the VM. I start up the kernel with my external GPU and its audio controller added to the vfio-pci framework: "vfio-pci.ids=10de:1380,10de:0fbc". I then pass these devices to qemu via "-device vfio-pci,host=01:00.0 -device vfio-pci,host=01:00.1". After starting up the qemu VM I have sound working on both the host OS and the guest OS via "-soundhw hda" from qemu, that is qemu passes audio via the emulated sound hardware to the host's pulseaudio instance. Now, when shutting down the VM after some time and later trying to play sound on the host machine it more often than not stops working. dmesg then spams "HDMI: invalid ELD buf size -1" Now when rebooting the machine sound will still not work as now I am getting the "CORB reset timeout #1" messages. Usually the only way how to fix it now is to "Load optimized defaults" in UEFI (Asus Z170-A) and let the computer reset. Created attachment 214721 [details]
patch for debug
Could you please (1) Apply attached patch and check if it make any difference, and attach dmesg; (2) Load optimized default in BIOS, and attach dmesg. Thanks. (In reply to han lu from comment #13) > Could you please > (1) Apply attached patch and check if it make any difference, and attach > dmesg; > (2) Load optimized default in BIOS, and attach dmesg. > Thanks. I'm not at home over weekend, but I'll do so on monday. Thanks Well, now that I want to actively reproduce the issue I'm unable to do so. In the meantime I've upgraded to v4.5.2, so maybe the issue is fixed by now. I'll report back when the issue comes back to bite me. (In reply to Patrick Steinhardt from comment #15) > Well, now that I want to actively reproduce the issue I'm unable to do so. > In the meantime I've upgraded to v4.5.2, so maybe the issue is fixed by now. > I'll report back when the issue comes back to bite me. so can we close this issue at the moment? It can be reopened if issue be reproduced in future. (In reply to han lu from comment #16) > (In reply to Patrick Steinhardt from comment #15) > > Well, now that I want to actively reproduce the issue I'm unable to do so. > > In the meantime I've upgraded to v4.5.2, so maybe the issue is fixed by > now. > > I'll report back when the issue comes back to bite me. > > so can we close this issue at the moment? It can be reopened if issue be > reproduced in future. Yes, will set to unreproducible for now. Thanks. This issue seems to have manifested for me yesterday, seemingly out of the blue (guess after an S3 resume and I don't think any updates could have applied automatically), on the stock 4.15.0-46 kernel (Ubuntu 18.04) in a similar configuration (an i7-6700K CPU on Z-170A): [ 5789.856394] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915]) [ 5789.968962] snd_hda_intel 0000:00:1f.3: CORB reset timeout#1, CORBRP = 0 [ 5789.970538] snd_hda_intel 0000:00:1f.3: no codecs found! I have seemingly tried every recipe out there that exists, to no avail. I've also updated my Ubuntu to 18.10 (the 4.18.0-16-generic kernel) but this hasn't helped either. Created attachment 281567 [details]
lspci -vvn
It's also worth mentioning that multiple resets as well as disabling and re-enabling the audio device in the BIOS haven't helped my situation. Encountered this problem and successfully fixed it. But due to the method I used to fix it, I don't want to reproduce it. Full story: I'm debugging a VFIO setup with a Windows VM whose snapshot is in the middle of an update. That means, I'm repeatedly rebooting a Windows VM with full control of the audio device *during Windows update*. The `CORB reset timeout` problem manifested after one such reboot. I tried reboot, cold reboot, rmmod than insmod, delete device and re-add in Windows device manager. Nothing worked. At this point I'm suspecting firmware corruption and did a BIOS update like suggested by a forum post. It fixed the problem. TL;DR Don't reboot when your Windows VM is updating! |