Bug 219087

Summary: Broken Audio After Commit f5ff79fddf0
Product: Drivers Reporter: orbea (orbea)
Component: IOMMUAssignee: drivers_iommu
Status: RESOLVED CODE_FIX    
Severity: normal CC: regressions, tiwai
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.10.0 Subsystem:
Regression: Yes Bisected commit-id: f5ff79fddf0efecca538046b5cc20fb3ded2ec4f
Attachments: dmesg output from 6.10.1
Workaround patch
Test patch for HD-audio
dmesg output from the patched 6.10.2 kernel
Another test fix
Revised test fix
Yet another possible workaround
Additional fix patch
Test fix patch 1
Test fix patch 2
Yet another test fix
Fix patch for AMD HD-audio
Cleanup patch 1
Cleanup patch 2

Description orbea 2024-07-23 15:54:52 UTC
OS: Gentoo
GPU: [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1)
CPU: x86_64 AMD Ryzen 9 7900X 12-Core Processor
Motherboard: ASRock X670E PG Lightning

I am having severe audio issues when using ALSA and HDMI audio which include skipping, stuttering and white noise. This can be reproduced by playing a flac music file with mpv where increased even minor CPU usage greatly affects audio quality. For example opening a web browser (Firefox) will break the sound momentarily and normal usage with many programs open will cause the audio to be entirely broken. If only mpv is running with no other usage including changing focus in the window manager then audio is mostly okay.

I exposed this issue when I switched to the above motherboard and CPU where it was previously working for me (With the same GPU), but after switching hardware I noticed that the 5.15.x kernels were unaffected while the 6.1.x kernels were so I bisected it to commit f5ff79fddf0efecca538046b5cc20fb3ded2ec4f.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f5ff79fddf0efecca538046b5cc20fb3ded2ec4f

This might be related to issue 216103 which was bisected to the same commit?

https://bugzilla.kernel.org/show_bug.cgi?id=216103
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-24 08:28:30 UTC
That commit exposed some bugs in drivers. Please share a dmesg from a very recent kernel, ideally after triggering the bug to see if this caused any kernel messages.
Comment 2 orbea 2024-07-24 15:15:59 UTC
Created attachment 306613 [details]
dmesg output from 6.10.1

I attached the dmesg output from kernel 6.10.1, but I am not sure it provides any helpful clues what is wrong?
Comment 3 orbea 2024-07-24 15:19:34 UTC
Created attachment 306614 [details]
Workaround patch

This patch tested against 6.10.0 and 6.6.41 kernels works around the issues for me, but probably isn't the correct fix.
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-24 15:31:08 UTC
Reassigned this to Alsa Sound driver; yes, a mm commit exposed the regression, but this likely needs to be fixed somewhere in the sound drivers
Comment 5 Takashi Iwai 2024-07-24 15:44:03 UTC
It's not about the sound driver, but rather the AMD IOMMU stuff about SG buffer management.  Reassigned.
Comment 6 Takashi Iwai 2024-07-27 07:39:45 UTC
... meanwhile, for AMD systems, we might be able to use the fallback page allocations, and it might work if prefer this instead of the non-contiguous page allocations over IOMMU.

Does the patch below work?
Comment 7 Takashi Iwai 2024-07-27 07:40:23 UTC
Created attachment 306622 [details]
Test patch for HD-audio
Comment 8 orbea 2024-07-27 17:32:18 UTC
I tested that patch against 6.10.2, but unfortunately the result is no sound at all, just silence.
Comment 9 Takashi Iwai 2024-07-28 07:23:21 UTC
Just to be sure, if you pass snd_hda_intel.fallback_alloc=0 option to the patched kernel, it goes back to the original state?
Comment 10 Takashi Iwai 2024-07-28 07:34:19 UTC
Also, does the below change make any difference on the vanilla kernel?

--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -554,8 +554,7 @@ static void *snd_dma_noncontig_alloc(struct snd_dma_buffer *dmab, size_t size)
        if (!sgt)
                return NULL;
 
-       dmab->dev.need_sync = dma_need_sync(dmab->dev.dev,
-                                           sg_dma_address(sgt->sgl));
+       dmab->dev.need_sync = true;
        p = dma_vmap_noncontiguous(dmab->dev.dev, size, sgt);
        if (p) {
                dmab->private_data = sgt;
Comment 11 orbea 2024-07-28 16:20:44 UTC
> Just to be sure, if you pass snd_hda_intel.fallback_alloc=0 option to the
> patched kernel, it goes back to the original state?

No, with the previously patched 6.10.2 kernel I get the same result, just silence. I neglected to check the dmesg output in my first test which does show new output that seems related and will be attached below.

I next rebuilt the vanilla 6.10.2 kernel with the above patch, but it didn't make a difference and I get the original audio issue.

To be clear I am using the vanilla kernels from kernel.org.
Comment 12 orbea 2024-07-28 16:22:14 UTC
Created attachment 306625 [details]
dmesg output from the patched 6.10.2 kernel

dmesg output with the patch from comment #7.
Comment 13 Takashi Iwai 2024-07-29 08:47:14 UTC
OK, then with IOMMU, this workaround doesn't work.  That's a kind of expected.

Another blind shot: how about the change below?
Comment 14 Takashi Iwai 2024-07-29 08:47:45 UTC
Created attachment 306626 [details]
Another test fix
Comment 15 Takashi Iwai 2024-07-29 11:08:20 UTC
Created attachment 306628 [details]
Revised test fix
Comment 16 orbea 2024-07-29 15:29:17 UTC
I rebuilt the 6.10.2 kernel with the revised fix, but unfortunately it didn't make a difference.
Comment 17 Takashi Iwai 2024-07-29 16:10:34 UTC
Hmm, I wonder whether dma_alloc_coherent() works at all now for your device.
Could you try the patch below, instead?
Comment 18 Takashi Iwai 2024-07-29 16:11:09 UTC
Created attachment 306629 [details]
Yet another possible workaround
Comment 19 orbea 2024-07-29 22:16:27 UTC
I built the 6.10.2 kernel again with the new patch, but it results with no sound.

aplay fails:

ALSA lib /var/tmp/portage/media-libs/alsa-lib-1.2.12/work/alsa-lib-1.2.12/src/confmisc.c:165:(snd_config_get_card) Cannot get card index for 0
aplay: main:850: audio open error: No such file or directory

Where 'aplay -l' doesn't even show the TV I am using for audio and dmesg has a few relevant lines.


$ dmesg | grep snd
[    4.142593] ata10.00: Features: Trust Dev-Sleep NCQ-sndrcv
[   15.053529] snd_hda_intel 0000:03:00.1: enabling device (0000 -> 0002)
[   15.053626] snd_hda_intel 0000:03:00.1: Force to non-snoop mode
[   15.053705] snd_hda_intel 0000:1b:00.1: enabling device (0000 -> 0002)
[   15.053763] snd_hda_intel 0000:1b:00.6: enabling device (0000 -> 0002)
[   15.056311] snd_hda_intel 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x10753a000 flags=0x0000]
[   15.056320] snd_hda_intel 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x10753a808 flags=0x0020]
[   15.056327] snd_hda_intel 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0012 address=0x10753a800 flags=0x0000]
[   15.097760] snd_hda_codec_realtek hdaudioC2D0: autoconfig for ALC897: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line
[   15.097763] snd_hda_codec_realtek hdaudioC2D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[   15.097765] snd_hda_codec_realtek hdaudioC2D0:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
[   15.097766] snd_hda_codec_realtek hdaudioC2D0:    mono: mono_out=0x0
[   15.097767] snd_hda_codec_realtek hdaudioC2D0:    inputs:
[   15.097768] snd_hda_codec_realtek hdaudioC2D0:      Front Mic=0x19
[   15.097769] snd_hda_codec_realtek hdaudioC2D0:      Rear Mic=0x18
[   15.097770] snd_hda_codec_realtek hdaudioC2D0:      Line=0x1a
[   16.057497] snd_hda_intel 0000:03:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x000f0000
[   17.060749] snd_hda_intel 0000:03:00.1: No response from codec, disabling MSI: last cmd=0x000f0000
[   18.061488] snd_hda_intel 0000:03:00.1: Codec #0 probe error; disabling it...
[   18.061492] snd_hda_intel 0000:03:00.1: no codecs initialized
[   18.384597] snd_hda_intel 0000:1b:00.1: bound 0000:1b:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Comment 20 Takashi Iwai 2024-07-30 05:16:39 UTC
Hmm, weird.  Then scratch the previous patch and just try to disable S/G pages:

--- a/sound/core/Kconfig
+++ b/sound/core/Kconfig
@@ -255,7 +255,7 @@ config SND_VMASTER
        bool
 
 config SND_DMA_SGBUF
-       def_bool y
+       def_bool n
        depends on X86
 
 config SND_CTL_LED
Comment 21 Takashi Iwai 2024-07-30 06:33:00 UTC
Ah, this might be due to the current (bad) way of x86-specific wc-page allocations.  Could you try the patch below together with the previous one?
(Or only with this one + vanilla and apply Kconfig change in comment 20.)
Comment 22 Takashi Iwai 2024-07-30 06:33:54 UTC
Created attachment 306630 [details]
Additional fix patch
Comment 23 Takashi Iwai 2024-07-30 12:46:07 UTC
Grrr, the previous one would have likely another issue.  Yet more revised patches below (the one replacing comment 22 and another for replacing comment 18).
Comment 24 Takashi Iwai 2024-07-30 12:46:41 UTC
Created attachment 306632 [details]
Test fix patch 1
Comment 25 Takashi Iwai 2024-07-30 12:47:12 UTC
Created attachment 306633 [details]
Test fix patch 2
Comment 26 orbea 2024-07-30 15:44:27 UTC
First I rebuilt the 6.10.2 kernel with only the Kconfig patch in comment 10, this does fix the sound.

Second I tried only the patches from both comments 24 & 25, this results with no changes in the original issue.
Comment 27 Takashi Iwai 2024-07-30 16:22:12 UTC
Could you double-check?  Those results contradict.  The patch in comment 25 essentially uses only dma_alloc_wc(), so it should have the same effect as the Kconfig change in comment 20.

(I suppose you're referring to comment 20 instead of comment 10, right?)
Comment 28 Takashi Iwai 2024-07-30 17:40:00 UTC
... and if it's confirmed that applying both patches in comment 24 and 25 on top of the vanilla kernel doesn't work -- then try only the first patch in comment 24, then apply kconfig change and recheck.  If this *breaks* the thing (while only applying kconfig still works), then it implies that it's the explicit WC page setups that matters.
Comment 29 Takashi Iwai 2024-07-30 17:47:43 UTC
(In reply to Takashi Iwai from comment #28)
> ... and if it's confirmed that applying both patches in comment 24 and 25 on
> top of the vanilla kernel doesn't work -- then try only the first patch in
> comment 24, then apply kconfig change and recheck.  If this *breaks* the
> thing (while only applying kconfig still works), then it implies that it's
> the explicit WC page setups that matters.

Scratch this comment.  Instead -- if applying both patches comment 24 and comment 25 doesn't work, try to apply only the first patch and change like the following instead of the Kconfig change:

--- a/sound/pci/hda/hda_controller.c
+++ b/sound/pci/hda/hda_controller.c
@@ -709,7 +709,7 @@ int snd_hda_attach_pcm_stream(struct hda_bus *_bus, struct hda_codec *codec,
 	int pcm_dev = cpcm->device;
 	unsigned int size;
 	int s, err;
-	int type = SNDRV_DMA_TYPE_DEV_SG;
+	int type = SNDRV_DMA_TYPE_DEV;
 
 	list_for_each_entry(apcm, &chip->pcm_list, list) {
 		if (apcm->pcm->device == pcm_dev) {
@@ -749,7 +749,7 @@ int snd_hda_attach_pcm_stream(struct hda_bus *_bus, struct hda_codec *codec,
 	if (size > MAX_PREALLOC_SIZE)
 		size = MAX_PREALLOC_SIZE;
 	if (chip->uc_buffer)
-		type = SNDRV_DMA_TYPE_DEV_WC_SG;
+		type = SNDRV_DMA_TYPE_DEV_WC;
 	snd_pcm_set_managed_buffer_all(pcm, type, chip->card->dev,
 				       size, MAX_PREALLOC_SIZE);
 	return 0;

If this still shows the original problem, it's the WC page things.
Comment 30 orbea 2024-07-30 18:25:20 UTC
Yes, I meant comment 20.

I again rebuilt 6.10.2 with the only the patch from comment 20, this still works.

Next I built it with the patches from comment 24 and comment 25, this still does not work.

And then I built it with the patch from comment 24 and the change from comment 29, this also does not work.
Comment 31 Takashi Iwai 2024-07-30 18:55:58 UTC
OK, thanks.  Then what happens if the patch in comment 24 and comment 29, and apply the following in addition?

--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -525,13 +525,13 @@ static void *snd_dma_wc_alloc(struct snd_dma_buffer *dmab, size_t size)
        p = dma_alloc_wc(dmab->dev.dev, size, &dmab->addr, DEFAULT_GFP);
        if (!p)
                return p;
-       mark_wc_pages(p, size);
+       //mark_wc_pages(p, size);
        return p;
 }
 
 static void snd_dma_wc_free(struct snd_dma_buffer *dmab)
 {
-       unmark_wc_pages(dmab->area, dmab->bytes);
+       //unmark_wc_pages(dmab->area, dmab->bytes);
        dma_free_wc(dmab->dev.dev, dmab->bytes, dmab->area, dmab->addr);
 }
Comment 32 Takashi Iwai 2024-07-30 18:57:49 UTC
Yet one more question: what if you boot with a clean kernel without patches, but passing snd_hda_intel.snoop=1 boot option?
Comment 33 orbea 2024-07-30 19:25:46 UTC
The change in comment 31 doesn't seem to apply to the 6.10.2 kernel?

I rebuilt the kernel without patches and created a file in /ect/modprobe.d/ with:

options snd_hda_intel snoop=1

This does work, I confirmed by commenting the line and unloading and reloading the module which reintroduced the problem.

# rmmod snd_hda_intel && modprobe snd_hda_intel

And then uncommenting the line fixed it again.
Comment 34 Takashi Iwai 2024-07-30 20:07:10 UTC
(In reply to orbea from comment #33)
> The change in comment 31 doesn't seem to apply to the 6.10.2 kernel?

It's applied on top of other two patches.  But this test can be skipped, if the next test works.
 
> I rebuilt the kernel without patches and created a file in /ect/modprobe.d/
> with:
> 
> options snd_hda_intel snoop=1
> 
> This does work, I confirmed by commenting the line and unloading and
> reloading the module which reintroduced the problem.
> 
> # rmmod snd_hda_intel && modprobe snd_hda_intel
> 
> And then uncommenting the line fixed it again.

Thanks for checking.  Then we may need another approach.

You can try the patch below instead, apply to a clean kernel.
Hopefully this makes it working without snoop=1 option.
Comment 35 Takashi Iwai 2024-07-30 20:08:01 UTC
Created attachment 306642 [details]
Yet another test fix
Comment 36 orbea 2024-07-30 21:03:02 UTC
I commented the "snoop=1" line and rebuilt the kernel with the patch in comment 35, but it didn't make a difference for the original issue.

> It's applied on top of other two patches.  But this test can be skipped, if
> the next test works.

I then rebuilt the kernel with the patches from comment 24, comment 29 and comment 31, but it also did not work.
Comment 37 Takashi Iwai 2024-07-31 06:54:19 UTC
(In reply to orbea from comment #36)
> I commented the "snoop=1" line and rebuilt the kernel with the patch in
> comment 35, but it didn't make a difference for the original issue.
> 
> > It's applied on top of other two patches.  But this test can be skipped, if
> > the next test works.
> 
> I then rebuilt the kernel with the patches from comment 24, comment 29 and
> comment 31, but it also did not work.

Interesting.  Then it's rather about the mmap?  For the vanilla kernel, try the following oneliner:

--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -699,7 +699,7 @@ static void snd_dma_sg_wc_free(struct snd_dma_buffer *dmab)
 static int snd_dma_sg_wc_mmap(struct snd_dma_buffer *dmab,
 			      struct vm_area_struct *area)
 {
-	area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
+	// area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
 	return dma_mmap_noncontiguous(dmab->dev.dev, area,
 				      dmab->bytes, dmab->private_data);
 }
Comment 38 orbea 2024-07-31 16:09:27 UTC
I rebuilt the 6.10.2 kernel with that change and it does fix the sound. I confirmed the snoop line is commented before and after rebooting.
Comment 39 Takashi Iwai 2024-07-31 16:14:10 UTC
OK, thanks for confirmation!

I believe this extra pgprot setup is still needed in the case of direct DMA, so the above can't be applied as is.

Since the snoop mode works for AMD with IOMMU now, we can avoid that workaround conditionally, instead.  Below is the fix patch I planned to submit.
Could you give it a quick shot?

More cleanup / fix of the core ALSA memalloc code will follow after that, but let's fix the regression at first.
Comment 40 Takashi Iwai 2024-07-31 16:14:55 UTC
Created attachment 306646 [details]
Fix patch for AMD HD-audio
Comment 41 orbea 2024-07-31 16:51:20 UTC
I rebuilt the 6.10.2 kernel with that patch while disabling the boot option and it does work!

Thank you very much for all your time and energy spent debugging and fixing this. Is it possible a change like this could be backported to a LTS kernel?

If there is anything else I can test, please let me know.
Comment 42 Takashi Iwai 2024-07-31 17:02:51 UTC
Thanks for your quick and patient testing!  I'm going to submit the fix patch now, likely included in this week's PR to Linus, so hopefully included in 6.11-rc2.  Then it'll be backported to stable trees.

For more favor: could you try the two patches below together with the previous fix, and verify that it doesn't break things again?  Those are rather cleanups and a material for 6.12.
Comment 43 Takashi Iwai 2024-07-31 17:03:52 UTC
Created attachment 306647 [details]
Cleanup patch 1
Comment 44 Takashi Iwai 2024-07-31 17:04:12 UTC
Created attachment 306648 [details]
Cleanup patch 2
Comment 45 orbea 2024-07-31 18:11:59 UTC
i built the 6.10.2 kernel with patches from comment 40, comment 43 and comment 44 and the sound still works. Also no errors or warnings in the dmesg output.
Comment 46 Takashi Iwai 2024-08-01 06:28:29 UTC
Awesome.  Now the patch landed in sound.git tree
  https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=478689b5990deb626a0b3f1ebf165979914d6be4

Let's close the bug.