Created attachment 301151 [details] Boot messages with loglevel=7 showing the crash on Broadwell machine Since kernel 5.18, two of my machines panic reproducibly as soon as snd_hda_intel is loaded, unless I set snd_hda_intel.snoop=1. The Haswell machine: - CPU: i5-4570 (a model with iGPU) - Motherboard: Intel DH87RL (with the latest BIOS 0332) - dedicated GPU: Sapphire Radeon RX550 The Broadwell machine: - CPU: Xeon E3-1270 v5 (a model without iGPU) - Motherboard: Dell Precision T3620 (with the latest BIOS 2.21.0) - dedicated GPU: Dell FirePro W7100 I suspect the presence of an AMD GPU having something to do with the panics because I also have an otherwise identical Broadwell machine with an NVidia GPU where these crash does not occur. I do use the "linux-hardened" patchset that is currently maintained by Levente Polyak, but this bug is not related to the patchset since I'm also seeing the exact same panics when using an unmodified vanilla kernel that was compiled using the same kernel config as my hardened kernel. I've been able to bisect this down to between commit 37fcacb50be7071d146144a6c5c5bf0194b9a1cf (good/no crashes) and commit 9ae2a143081fa8fba5042431007b33d9a855b7a2 (bad/crashes), but any further bisecting steps result in compiler errors because the remaining range of commits is one contiguous patchset/merge: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.18.3&id=9ae2a143081fa8fba5042431007b33d9a855b7a2 I have attached a full loglevel=7 bootlog (with only a few hardware serial numbers redacted) that shows the kernel panic on the Broadwell machine. The messages are essentially the same on the Haswell machine, but that machine sadly doesn't have a serial port, and using a USB to serial adapter for console=ttyUSB0 turned out to be useless because the output stops 2-3 lines after the "cut here" line when the crash occurs).
Created attachment 301152 [details] kernel config with which the the bootlog was created
Created attachment 301153 [details] reduced kernel config to accelerate the kernel build process when bisecting
Could you try to flip like below? --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -471,7 +471,7 @@ static const struct snd_malloc_ops snd_dma_dev_ops = { /* * Write-combined pages */ -#ifdef CONFIG_X86 +#if 0 // def CONFIG_X86 /* On x86, share the same ops as the standard dev ops */ #define snd_dma_wc_ops snd_dma_dev_ops #else /* CONFIG_X86 */ And if it doesn't crash, it's interesting to know whether this keeps the HDMI audio output still working without glitches.
Thanks a lot, a very quick preliminary test shows that your patch does indeed prevent the crash. I'll give you more feedback on the HDMI sound stuff probably some time this evening when I have more time. :)
Sorry, testing took a bit longer than expected. The results: - I can't test this on the Broadwell machine since that machine has a CPU without integrated GPU, and the dedicated video card has no HDMI but only DisplayPort ports. Also, AFAICT, I don't have any audio capable displays around that machine. :/ - On the Haswell machine, HDMI audio on the AMD GPU works just fine on a kernel with your patch, and also on a kernel without your patch (with the latter being booted with snd_hda_intel.snoop=1 to prevent the crash). However, I seem unable to get HDMI audio to work at all with the onboard HDMI port (connected to the GPU integrated into the Haswell CPU), regardless of whether a kernel with or without your patch is used. Both pavucontrol and "aplay -L" simply don't show any HDMI audio output for the Intel GPU (whilst they both *do* show them for the AMD GPU). I have also tried booting with intel_iommu=off, but that didn't help either with getting HDMI audio to work at all. - Apart from the two machines mentioned above, I do also have a Thinkpad T440p with a Haswell CPU but without any dedicated GPU. On that machine, when I connect an audio-capable HDMI display via a mini-DP to HDMI adapter, HDMI audio works fine on a kernel without your patch, but it becomes extremely choppy (probably best described as "60-80% silence with an occasional audio chunk being played for a fraction of a second", with those audio chunks occurring without a discernible timing pattern) on a kernel with your patch. So we're clearly on the right track, but your patch seems to break HDMI audio on Intel GPUs (at least on Haswell).
Forgot to mention: Unless mentioned otherwise, I boot my machines with intel_iommu=on.
OK, thanks for testing. The stuttering sound with the patch means that dma_alloc_wc() still doesn't work for x86 as expected, so we need to keep the hack in some other way. Could you check the patch below instead of my previous oneliner, and see whether it doesn't break? It'll show the buffer pointer and address, so we can see what values are passed in the working and non-working cases.
Created attachment 301172 [details] Test fix patch
Another, a bit more intensive change would be like below. It's only compile-tested :)
Created attachment 301173 [details] Another test fix
Created attachment 301187 [details] netconsole logs from booting the Haswell machine with various kernels Hi, sorry this took so long, but I seem to be unable to get HDMI audio to work at all using only aplay on any of my machines (regardless of the kernel, even with 5.15.47 - this is probably some form of PEBCAK on my side, since in the meantime, I've been able to get HDMI audio to work with GNOME on all three of my machines at least once). I've attached a bunch of netconsole logs obtained from booting the Haswell machine with various kernels, each once with and once without the "snd_hda_intel.snoop=1" boot parameter. The "Haswell_5.18.4-*-debug_*.txt" will likely be most interesting to you. 5.18.4-0-debug is a vanilla 5.18.4 kernel 5.18.4-1-debug was built using your patch from comment 8 5.18.4-2-debug was built using your patch from comment 10 Also, I've taken the liberty of cranking up some other debugging options in the kernel config some more in the vague hope that the additional information might be helpful. IIRC, all of the logs were obtained with the internal Intel GPU being configured as primary GPU in the BIOS, but *without* removing the AMD GPU from the PCIe slot.
(In reply to Pascal Ernster from comment #11) > Created attachment 301187 [details] > netconsole logs from booting the Haswell machine with various kernels > > Hi, sorry this took so long, but I seem to be unable to get HDMI audio to > work at all using only aplay on any of my machines (regardless of the > kernel, even with 5.15.47 - this is probably some form of PEBCAK on my side, > since in the meantime, I've been able to get HDMI audio to work with GNOME > on all three of my machines at least once). Hm, but this is no regression, at least, right? > I've attached a bunch of netconsole logs obtained from booting the Haswell > machine with various kernels, each once with and once without the > "snd_hda_intel.snoop=1" boot parameter. The "Haswell_5.18.4-*-debug_*.txt" > will likely be most interesting to you. > > 5.18.4-0-debug is a vanilla 5.18.4 kernel > 5.18.4-1-debug was built using your patch from comment 8 > 5.18.4-2-debug was built using your patch from comment 10 If I understand correctly, only the patch 3 worked without crashing (with snoop=off)? If yes, this looks like the way to go. Then the next step would be to check whether patch 3 brings any regressions. e.g. check with Thinkpad and HDMI audio whether this causes stuttering problem you've seen with the first patch.
Since I haven't been able to get things to work with only aplay and without the full blown GNOME desktop, I've built a "normal" kernel with your patches to be able to test HDMI audio. With the patch from comment 8, it crashes on boot (just like the debug kernel did). With the patch from comment 10, it doesn't crash, however HDMI audio from via the Intel GPU is still choppy/unusable on the Haswell machine. Also, it appears as if the AMD GPU isn't listed anymore as sound device at all, though this might also be related to the Intel GPU being configured as primary GPU in the BIOS. I'll dig further into this and post the results here. I've only tested the "normal" kernel with this patch on the Haswell machine so far, not on the Broadwell machine or the T440p.
I'll be damned, now I'm seeing intermittent choppy audio and similar glitches with *all* kernels on the Haswell machine, even with 5.15.48 (the current Archlinux LTS kernel in their testing repo). It appears to be a random thing, even during the same "boot session", sometimes audio ends up choppy, sometimes it doesn't - regardless of wheter I use VLC or the audio test thing that's built into the GNOME settings GUI. I'd really like to try cutting out the whole GNOME and pipewire stack out of the equation to have less variables and more reproducibility in my test setup… :/
Basically Intel GPU is always with snooping enabled, hence all patches wouldn't influence on its behavior. So your explanation relieved me in one side (while it made me wonder how it happens in another side, though). Now basically the remaining question is whether HDMI audio works with AMD GPU. And you have no machine that plays it, right? I guess we need to ask other people for testing.
After some more testing: - HDMI audio via the Intel GPU is intermittently choppy with both your patch from comment 3 and the patch from comment 10, but in both cases the choppyness seems to wane off after a while (perhaps 20-60 seconds). - The patch from comment 3 does not break HDMI audio on the AMD card - The patch from comment 10 does seem to break HDMI audio on the AMD card. So right now it looks like the patch from comment 3 seems to be the best option as it prevents the crashes, it doesn't break HDMI audio on the AMD card from being detected/working at all, and I'm not so sure anymore if the intermittent choppyness is a) related to your patches, and b) a kernel issue at all and not some issue with GNOME, Pipewire, Pipewire's Pulseaudio implementation or some config issue specific to my machines. My best hope to narrow this down further is to get somehow get aplay to work on my stripped-down testing/debugging system/partition, preferably with raw ALSA and without the whole Pipewire and GNOME stack.
Additional clarifications: All the tests from comments 11, 13, 14 and 16 were only run on the Haswell machine, but not yet on the Broadwell machine or the T440p. I hope to get around to test my "non-debugging" "normal" 5.18.5 kernel with your patch from comment 3 on Broadwell and T440p later today. Also, as stated in comment 11 and just to be sure this is clear: Whilst I did originally have some issues with getting HDMI audio to work with some machines/GPUs, I have been able to use HDMI audio at least with *some* Kernel with the AMD GPU on both the Haswell and the Broadwell machine (the T440p does not have a dedicated GPU), and I have also been able to use it with the Intel GPU on both the Haswell and the T440p machine (the Broadwell machine has a CPU without integrated GPU). The AMD card in the Broadwell machine does only have Displayport ports, but those seem to support HDMI signalling with audio if a DP to HDMI cable/adapter is used. So I am now absolutely certain that HDMI audio is possible in principle with all GPUs in all of my machines, either via a HDMI port or a DP/DP++ port.
OK, thanks. I asked testing of the patch that is equivalent with comment 3 on alsa-devel ML. Let's see whether we get a feedback. https://lore.kernel.org/r/87bkur1nil.wl-tiwai@suse.de
Okay, I've tested that same kernel 5.18.5 with the patch from comment 3 on the Haswell machine, the T440p, the Broadwell machine with the AMD card and the Broadwell machine's twin with the NVidia card and nouveau driver. HDMI audio stutters intermittently on: - Haswell machine when using the Intel GPU - T440p (only has an Intel GPU) HDMI audio works fine on: - Haswell machine when using the AMD GPU - Broadwell machine (AMD GPU) - the Broadwell machine's twin (only has an NVidia GPU, using the nouveau driver)
OK, if the AMD GPU is working on your machine with my patch, it must be fine. I'll submit and merge the fix. Speaking of the stuttering of Intel HDMI: you may try to pass snd_hda_intel.snoop=0 option and check whether this helps. If not, it may be rather about the buffer position report, and position_fix option might have influence.
Strangely, over the weekend, the stuttering issues have completely disappeared - I'm not sure yet if this is related to a bunch of package updates (among other things, various alsa and gstreamer packages) or something else. I've already tried downgrading the alsa and gstreamer packages and rebooting, but so far, I'm unable to reproduce the stuttering, and thus also unable to figure out what caused it in the first place… :/
The fix patch for the crash was merged to the upstream, so let's close.
I have bisected down from 5.18.8 to this commit because my Oxgen Express 01:00.0 Audio device: C-Media Electronics Inc CM8888 [Oxygen Express] is no longer detected. WITH this bugfix I have [ 7.105344] snd_hda_intel 0000:01:00.0: enabling device (0000 -> 0002) [ 7.108201] snd_hda_intel 0000:01:00.0: Disabling MSI [ 7.111097] snd_hda_intel 0000:01:00.0: Force to non-snoop mode [ 7.121781] snd_hda_intel 0000:01:00.0: no codecs initialized And nothing more Running the kernel at the commit prior to this one I have [ 7.194960] snd_hda_intel 0000:01:00.0: enabling device (0000 -> 0002) [ 7.207262] snd_hda_intel 0000:01:00.0: Disabling MSI [ 7.211772] snd_hda_intel 0000:01:00.0: Force to non-snoop mode and the C-Media codec is initialized: [ 7.296422] input: HDA C-Media Mic as etc. So this bug fix at least for me causes a regression.
Yet more fix patch has been submitted and merged today in sound git tree. a8d302a0b77057568350fe0123e639d02dba0745 ALSA: memalloc: Revive x86-specific WC page allocations again Give it a try.
Kind thanks for the hint. I applied the patch to 5.19.3 and my PCIe card is again detected.
(In reply to Takashi Iwai from comment #24) Sorry for crashing the party, but commit a8d302a0b77057568350fe0123e639d02dba0745 seems to break HDMI audio on the Radeon RX550 in my Haswell machine (the Radeon's HDMI out isn't recognized anymore as sound device at all). I write "seems to" because I've only tested this with a modified 5.19.3 kernel without the commit and with a modified 5.19.4 kernel with the commit, so in theory the culprit *might* be one of the other changes (though this seems unlikely to me). I'll attach the output of alsa-info.sh for both kernels.
Created attachment 301667 [details] Output of alsa-info.sh on a modified kernel 5.19.3 without commit a8d302a0b77057568350fe0123e639d02dba0745
Created attachment 301668 [details] Output of alsa-info.sh on a modified kernel 5.19.4 with commit a8d302a0b77057568350fe0123e639d02dba0745
The commit a8d302a0b770 changes the page allocation to the legacy ones with the raw alloc_pages_exact(), so I don't think this would break things. Please double-check.
(In reply to Pascal Ernster from comment #26) > (In reply to Takashi Iwai from comment #24) > > Sorry for crashing the party, but commit > a8d302a0b77057568350fe0123e639d02dba0745 seems to break HDMI audio on the > Radeon RX550 in my Haswell machine (the Radeon's HDMI out isn't recognized > anymore as sound device at all). > > I write "seems to" because I've only tested this with a modified 5.19.3 > kernel without the commit and with a modified 5.19.4 kernel with the commit, > so in theory the culprit *might* be one of the other changes (though this > seems unlikely to me). I'll attach the output of alsa-info.sh for both > kernels. For me, I had the contrary thing: on my "Ryzen 5 3400G with Radeon Vega Graphics" starting with roughly 5.19 there was no HDMI output any more. I have now installed 5.19.3 _WITH_ this patch and it works, however, I do not know if the change from 5.19.2 -> 5.19.3 make my HDMI sound appear again or it was a side effect of the bug. In principle I had bisected the problem down to commit 512881eacfa72c2136b27b9934b7b27504a9efc2 (bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management) on linux-stable, but then did not investigate further when HDMI sound reappeared on v5.19.3 with this patch included.
The commit (bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management) is known to break AMD HDMI audio. Please refer to the upstream discussion on https://lore.kernel.org/r/874jy4cqok.wl-tiwai@suse.de
I've rebuilt my modified kernel 5.19.4 with only commit a8d302a0b77057568350fe0123e639d02dba0745 removed, and HDMI audio is working again flawlessly on my Radeon RX550. In case it matters, here's an excerpt from my /proc/cmdline (with only the parameters pertaining to my rootfs removed): i915.fastboot=1 rw add_efi_memmap mitigations=auto,nosmt l1tf=full kvm-intel.vmentry_l1d_flush=always spectre_v2=on spec_store_bypass_disable=on tsx=off intel_iommu=on iomem=strict iommu.forcedac=1 iommu.strict=1 iommu.passthrough=0 lockdown=confidentiality efi=disable_early_pci_dma I'll also attach the output of alsa-info.sh and the kernel config.
Created attachment 301680 [details] Output of alsa-info.sh on a modified kernel 5.19.4 without commit a8d302a0b77057568350fe0123e639d02dba0745
Created attachment 301681 [details] decompressed /proc/config.gz from my modified kernel 5.19.4 Note that this config is identical for both the 5.19.4 kernels with and without commit a8d302a0b77057568350fe0123e639d02dba0745
Hmm, then the commit needs yet more fix. Pascal, could you check whether one of two patches I posted below fixes your problem (while keeping the commit a8d302a0b77 and applying one of two on the top)? https://lore.kernel.org/r/87ilm3vbzq.wl-tiwai@suse.de https://lore.kernel.org/r/875yi3froa.wl-tiwai@suse.de
(In reply to Takashi Iwai from comment #35) > Hmm, then the commit needs yet more fix. > > Pascal, could you check whether one of two patches I posted below fixes your > problem (while keeping the commit a8d302a0b77 and applying one of two on the > top)? > https://lore.kernel.org/r/87ilm3vbzq.wl-tiwai@suse.de > https://lore.kernel.org/r/875yi3froa.wl-tiwai@suse.de Both of the patches appear to fix the issue when applied on top of commit a8d302a0b77, though I'm currently only able to test them on the Haswell machine since the Broadwell machine is at another physical location. So AFAIAC, feel free to select the patch that you like best. :) And of course, thanks a lot for the effort you're putting into this! :)
It's a good news. Could you try the third one? It's a simplified version of the second, and that's what I'm going to submit and merge. https://lore.kernel.org/r/874jxml7a4.wl-tiwai@suse.de
(In reply to Takashi Iwai from comment #37) > It's a good news. Could you try the third one? It's a simplified version > of the second, and that's what I'm going to submit and merge. > https://lore.kernel.org/r/874jxml7a4.wl-tiwai@suse.de With that patch, I get reproducibly a build error 2-3 minutes after starting the build process: 2022-09-05_10:30:40 GEN /build/linux-hardened/src/linux-5.19.4/tools/objtool/arch/x86/lib/inat-tables.c 2022-09-05_10:30:40 free(): double free detected in tcache 2 2022-09-05_10:30:40 make[4]: *** [arch/x86/Build:9: /build/linux-hardened/src/linux-5.19.4/tools/objtool/arch/x86/lib/inat-tables.c] Error 134 2022-09-05_10:30:40 make[4]: *** Deleting file '/build/linux-hardened/src/linux-5.19.4/tools/objtool/arch/x86/lib/inat-tables.c' 2022-09-05_10:30:40 make[3]: *** [/build/linux-hardened/src/linux-5.19.4/tools/build/Makefile.build:139: arch/x86] Error 2 2022-09-05_10:30:40 make[2]: *** [Makefile:54: /build/linux-hardened/src/linux-5.19.4/tools/objtool/objtool-in.o] Error 2 2022-09-05_10:30:40 make[1]: *** [Makefile:73: objtool] Error 2 2022-09-05_10:30:40 make: *** [Makefile:1348: tools/objtool] Error 2 2022-09-05_10:30:40 ==> ERROR: A failure occurred in build(). 2022-09-05_10:30:40 Aborting...
Hm, what about the patch below instead?
Created attachment 301749 [details] revised fix patch
(In reply to Takashi Iwai from comment #40) > Created attachment 301749 [details] > revised fix patch Nope :( 2022-09-05_14:15:00 GEN /build/linux-hardened/src/linux-5.19.4/tools/objtool/arch/x86/lib/inat-tables.c 2022-09-05_14:15:00 free(): double free detected in tcache 2 2022-09-05_14:15:00 make[4]: *** [arch/x86/Build:9: /build/linux-hardened/src/linux-5.19.4/tools/objtool/arch/x86/lib/inat-tables.c] Error 134 2022-09-05_14:15:00 make[4]: *** Deleting file '/build/linux-hardened/src/linux-5.19.4/tools/objtool/arch/x86/lib/inat-tables.c' 2022-09-05_14:15:00 make[3]: *** [/build/linux-hardened/src/linux-5.19.4/tools/build/Makefile.build:139: arch/x86] Error 2 2022-09-05_14:15:00 make[2]: *** [Makefile:54: /build/linux-hardened/src/linux-5.19.4/tools/objtool/objtool-in.o] Error 2 2022-09-05_14:15:00 make[1]: *** [Makefile:73: objtool] Error 2 2022-09-05_14:15:00 make: *** [Makefile:1348: tools/objtool] Error 2 2022-09-05_14:15:00 ==> ERROR: A failure occurred in build(). 2022-09-05_14:15:00 Aborting...
Oh wait, this might actually be caused by something else not related to your patch, I'll get back to you once I've found the cause of the build issues.
It appears as if my build issues were caused by a recent update to gawk 5.2.0-1: https://archlinux.org/packages/testing/x86_64/gawk/ https://github.com/archlinux/svntogit-packages/commit/581a814672fc56ef3756e272aebd875b2cc95a93 I've downgraded to gawk 5.1.1-1 and I'm currently waiting for the build process with your patch from comment #37 to finish (this will likely take somewhere around 1-2 hours).
Takashi Iwai: Both the patch from comment #37 and the patch from comment #40 work for me.