Bug 219087
Summary: | Broken Audio After Commit f5ff79fddf0 | ||
---|---|---|---|
Product: | Drivers | Reporter: | orbea (orbea) |
Component: | IOMMU | Assignee: | drivers_iommu |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | regressions, tiwai |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 6.10.0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | f5ff79fddf0efecca538046b5cc20fb3ded2ec4f |
Attachments: |
dmesg output from 6.10.1
Workaround patch Test patch for HD-audio dmesg output from the patched 6.10.2 kernel Another test fix Revised test fix Yet another possible workaround Additional fix patch Test fix patch 1 Test fix patch 2 Yet another test fix Fix patch for AMD HD-audio Cleanup patch 1 Cleanup patch 2 |
Description
orbea
2024-07-23 15:54:52 UTC
That commit exposed some bugs in drivers. Please share a dmesg from a very recent kernel, ideally after triggering the bug to see if this caused any kernel messages. Created attachment 306613 [details]
dmesg output from 6.10.1
I attached the dmesg output from kernel 6.10.1, but I am not sure it provides any helpful clues what is wrong?
Created attachment 306614 [details]
Workaround patch
This patch tested against 6.10.0 and 6.6.41 kernels works around the issues for me, but probably isn't the correct fix.
Reassigned this to Alsa Sound driver; yes, a mm commit exposed the regression, but this likely needs to be fixed somewhere in the sound drivers It's not about the sound driver, but rather the AMD IOMMU stuff about SG buffer management. Reassigned. ... meanwhile, for AMD systems, we might be able to use the fallback page allocations, and it might work if prefer this instead of the non-contiguous page allocations over IOMMU. Does the patch below work? Created attachment 306622 [details]
Test patch for HD-audio
I tested that patch against 6.10.2, but unfortunately the result is no sound at all, just silence. Just to be sure, if you pass snd_hda_intel.fallback_alloc=0 option to the patched kernel, it goes back to the original state? Also, does the below change make any difference on the vanilla kernel? --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -554,8 +554,7 @@ static void *snd_dma_noncontig_alloc(struct snd_dma_buffer *dmab, size_t size) if (!sgt) return NULL; - dmab->dev.need_sync = dma_need_sync(dmab->dev.dev, - sg_dma_address(sgt->sgl)); + dmab->dev.need_sync = true; p = dma_vmap_noncontiguous(dmab->dev.dev, size, sgt); if (p) { dmab->private_data = sgt; > Just to be sure, if you pass snd_hda_intel.fallback_alloc=0 option to the
> patched kernel, it goes back to the original state?
No, with the previously patched 6.10.2 kernel I get the same result, just silence. I neglected to check the dmesg output in my first test which does show new output that seems related and will be attached below.
I next rebuilt the vanilla 6.10.2 kernel with the above patch, but it didn't make a difference and I get the original audio issue.
To be clear I am using the vanilla kernels from kernel.org.
Created attachment 306625 [details] dmesg output from the patched 6.10.2 kernel dmesg output with the patch from comment #7. OK, then with IOMMU, this workaround doesn't work. That's a kind of expected. Another blind shot: how about the change below? Created attachment 306626 [details]
Another test fix
Created attachment 306628 [details]
Revised test fix
I rebuilt the 6.10.2 kernel with the revised fix, but unfortunately it didn't make a difference. Hmm, I wonder whether dma_alloc_coherent() works at all now for your device. Could you try the patch below, instead? Created attachment 306629 [details]
Yet another possible workaround
I built the 6.10.2 kernel again with the new patch, but it results with no sound. aplay fails: ALSA lib /var/tmp/portage/media-libs/alsa-lib-1.2.12/work/alsa-lib-1.2.12/src/confmisc.c:165:(snd_config_get_card) Cannot get card index for 0 aplay: main:850: audio open error: No such file or directory Where 'aplay -l' doesn't even show the TV I am using for audio and dmesg has a few relevant lines. $ dmesg | grep snd [ 4.142593] ata10.00: Features: Trust Dev-Sleep NCQ-sndrcv [ 15.053529] snd_hda_intel 0000:03:00.1: enabling device (0000 -> 0002) [ 15.053626] snd_hda_intel 0000:03:00.1: Force to non-snoop mode [ 15.053705] snd_hda_intel 0000:1b:00.1: enabling device (0000 -> 0002) [ 15.053763] snd_hda_intel 0000:1b:00.6: enabling device (0000 -> 0002) [ 15.056311] snd_hda_intel 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x10753a000 flags=0x0000] [ 15.056320] snd_hda_intel 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x10753a808 flags=0x0020] [ 15.056327] snd_hda_intel 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x10753a800 flags=0x0000] [ 15.097760] snd_hda_codec_realtek hdaudioC2D0: autoconfig for ALC897: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line [ 15.097763] snd_hda_codec_realtek hdaudioC2D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) [ 15.097765] snd_hda_codec_realtek hdaudioC2D0: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0) [ 15.097766] snd_hda_codec_realtek hdaudioC2D0: mono: mono_out=0x0 [ 15.097767] snd_hda_codec_realtek hdaudioC2D0: inputs: [ 15.097768] snd_hda_codec_realtek hdaudioC2D0: Front Mic=0x19 [ 15.097769] snd_hda_codec_realtek hdaudioC2D0: Rear Mic=0x18 [ 15.097770] snd_hda_codec_realtek hdaudioC2D0: Line=0x1a [ 16.057497] snd_hda_intel 0000:03:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x000f0000 [ 17.060749] snd_hda_intel 0000:03:00.1: No response from codec, disabling MSI: last cmd=0x000f0000 [ 18.061488] snd_hda_intel 0000:03:00.1: Codec #0 probe error; disabling it... [ 18.061492] snd_hda_intel 0000:03:00.1: no codecs initialized [ 18.384597] snd_hda_intel 0000:1b:00.1: bound 0000:1b:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) Hmm, weird. Then scratch the previous patch and just try to disable S/G pages: --- a/sound/core/Kconfig +++ b/sound/core/Kconfig @@ -255,7 +255,7 @@ config SND_VMASTER bool config SND_DMA_SGBUF - def_bool y + def_bool n depends on X86 config SND_CTL_LED Ah, this might be due to the current (bad) way of x86-specific wc-page allocations. Could you try the patch below together with the previous one? (Or only with this one + vanilla and apply Kconfig change in comment 20.) Created attachment 306630 [details]
Additional fix patch
Grrr, the previous one would have likely another issue. Yet more revised patches below (the one replacing comment 22 and another for replacing comment 18). Created attachment 306632 [details]
Test fix patch 1
Created attachment 306633 [details]
Test fix patch 2
First I rebuilt the 6.10.2 kernel with only the Kconfig patch in comment 10, this does fix the sound. Second I tried only the patches from both comments 24 & 25, this results with no changes in the original issue. Could you double-check? Those results contradict. The patch in comment 25 essentially uses only dma_alloc_wc(), so it should have the same effect as the Kconfig change in comment 20. (I suppose you're referring to comment 20 instead of comment 10, right?) ... and if it's confirmed that applying both patches in comment 24 and 25 on top of the vanilla kernel doesn't work -- then try only the first patch in comment 24, then apply kconfig change and recheck. If this *breaks* the thing (while only applying kconfig still works), then it implies that it's the explicit WC page setups that matters. (In reply to Takashi Iwai from comment #28) > ... and if it's confirmed that applying both patches in comment 24 and 25 on > top of the vanilla kernel doesn't work -- then try only the first patch in > comment 24, then apply kconfig change and recheck. If this *breaks* the > thing (while only applying kconfig still works), then it implies that it's > the explicit WC page setups that matters. Scratch this comment. Instead -- if applying both patches comment 24 and comment 25 doesn't work, try to apply only the first patch and change like the following instead of the Kconfig change: --- a/sound/pci/hda/hda_controller.c +++ b/sound/pci/hda/hda_controller.c @@ -709,7 +709,7 @@ int snd_hda_attach_pcm_stream(struct hda_bus *_bus, struct hda_codec *codec, int pcm_dev = cpcm->device; unsigned int size; int s, err; - int type = SNDRV_DMA_TYPE_DEV_SG; + int type = SNDRV_DMA_TYPE_DEV; list_for_each_entry(apcm, &chip->pcm_list, list) { if (apcm->pcm->device == pcm_dev) { @@ -749,7 +749,7 @@ int snd_hda_attach_pcm_stream(struct hda_bus *_bus, struct hda_codec *codec, if (size > MAX_PREALLOC_SIZE) size = MAX_PREALLOC_SIZE; if (chip->uc_buffer) - type = SNDRV_DMA_TYPE_DEV_WC_SG; + type = SNDRV_DMA_TYPE_DEV_WC; snd_pcm_set_managed_buffer_all(pcm, type, chip->card->dev, size, MAX_PREALLOC_SIZE); return 0; If this still shows the original problem, it's the WC page things. Yes, I meant comment 20. I again rebuilt 6.10.2 with the only the patch from comment 20, this still works. Next I built it with the patches from comment 24 and comment 25, this still does not work. And then I built it with the patch from comment 24 and the change from comment 29, this also does not work. OK, thanks. Then what happens if the patch in comment 24 and comment 29, and apply the following in addition? --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -525,13 +525,13 @@ static void *snd_dma_wc_alloc(struct snd_dma_buffer *dmab, size_t size) p = dma_alloc_wc(dmab->dev.dev, size, &dmab->addr, DEFAULT_GFP); if (!p) return p; - mark_wc_pages(p, size); + //mark_wc_pages(p, size); return p; } static void snd_dma_wc_free(struct snd_dma_buffer *dmab) { - unmark_wc_pages(dmab->area, dmab->bytes); + //unmark_wc_pages(dmab->area, dmab->bytes); dma_free_wc(dmab->dev.dev, dmab->bytes, dmab->area, dmab->addr); } Yet one more question: what if you boot with a clean kernel without patches, but passing snd_hda_intel.snoop=1 boot option? The change in comment 31 doesn't seem to apply to the 6.10.2 kernel? I rebuilt the kernel without patches and created a file in /ect/modprobe.d/ with: options snd_hda_intel snoop=1 This does work, I confirmed by commenting the line and unloading and reloading the module which reintroduced the problem. # rmmod snd_hda_intel && modprobe snd_hda_intel And then uncommenting the line fixed it again. (In reply to orbea from comment #33) > The change in comment 31 doesn't seem to apply to the 6.10.2 kernel? It's applied on top of other two patches. But this test can be skipped, if the next test works. > I rebuilt the kernel without patches and created a file in /ect/modprobe.d/ > with: > > options snd_hda_intel snoop=1 > > This does work, I confirmed by commenting the line and unloading and > reloading the module which reintroduced the problem. > > # rmmod snd_hda_intel && modprobe snd_hda_intel > > And then uncommenting the line fixed it again. Thanks for checking. Then we may need another approach. You can try the patch below instead, apply to a clean kernel. Hopefully this makes it working without snoop=1 option. Created attachment 306642 [details]
Yet another test fix
I commented the "snoop=1" line and rebuilt the kernel with the patch in comment 35, but it didn't make a difference for the original issue. > It's applied on top of other two patches. But this test can be skipped, if > the next test works. I then rebuilt the kernel with the patches from comment 24, comment 29 and comment 31, but it also did not work. (In reply to orbea from comment #36) > I commented the "snoop=1" line and rebuilt the kernel with the patch in > comment 35, but it didn't make a difference for the original issue. > > > It's applied on top of other two patches. But this test can be skipped, if > > the next test works. > > I then rebuilt the kernel with the patches from comment 24, comment 29 and > comment 31, but it also did not work. Interesting. Then it's rather about the mmap? For the vanilla kernel, try the following oneliner: --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -699,7 +699,7 @@ static void snd_dma_sg_wc_free(struct snd_dma_buffer *dmab) static int snd_dma_sg_wc_mmap(struct snd_dma_buffer *dmab, struct vm_area_struct *area) { - area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); + // area->vm_page_prot = pgprot_writecombine(area->vm_page_prot); return dma_mmap_noncontiguous(dmab->dev.dev, area, dmab->bytes, dmab->private_data); } I rebuilt the 6.10.2 kernel with that change and it does fix the sound. I confirmed the snoop line is commented before and after rebooting. OK, thanks for confirmation! I believe this extra pgprot setup is still needed in the case of direct DMA, so the above can't be applied as is. Since the snoop mode works for AMD with IOMMU now, we can avoid that workaround conditionally, instead. Below is the fix patch I planned to submit. Could you give it a quick shot? More cleanup / fix of the core ALSA memalloc code will follow after that, but let's fix the regression at first. Created attachment 306646 [details]
Fix patch for AMD HD-audio
I rebuilt the 6.10.2 kernel with that patch while disabling the boot option and it does work! Thank you very much for all your time and energy spent debugging and fixing this. Is it possible a change like this could be backported to a LTS kernel? If there is anything else I can test, please let me know. Thanks for your quick and patient testing! I'm going to submit the fix patch now, likely included in this week's PR to Linus, so hopefully included in 6.11-rc2. Then it'll be backported to stable trees. For more favor: could you try the two patches below together with the previous fix, and verify that it doesn't break things again? Those are rather cleanups and a material for 6.12. Created attachment 306647 [details]
Cleanup patch 1
Created attachment 306648 [details]
Cleanup patch 2
i built the 6.10.2 kernel with patches from comment 40, comment 43 and comment 44 and the sound still works. Also no errors or warnings in the dmesg output. Awesome. Now the patch landed in sound.git tree https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=478689b5990deb626a0b3f1ebf165979914d6be4 Let's close the bug. Unfortunately, this seems causing regressions on other machines, so we have to revert it. But 6.12 received other changes and they might address this problem. Namely, PCM buffers are allocated via contiguous pages at first, and this can work better. I tested 6.12-rc1 and there are no sound issues, should this be new enough to test? (Unfortunately my mouse doesn't work in 6.12-rc1 so I can't use it to test more thoroughly...) Could you try to set snoop=0 option to snd-hda-intel and check whether it still works, too? Yes, I added 'options snd_hda_intel snoop=0' to /etc/modprobe.d/ and rebooted into the 6.12-rc1 kernel. Then I tried the same flac files in mpv that I originally used to bisect this issue and found no audio issues. I tried rapidly changing workspaces and starting firefox which used to worsen the audio issues before without any problems. However given the lack of a working mouse in 6.12-rc1 means I can't do any more thorough testing like starting steam games. Good to hear. So it implies that the allocation change in 6.12 made it working without an extra workaround, so we can safely revert the previous workaround. Feel free to reopen if you encounter the issue on 6.12-rc2 or later. |