Bug 218780

Summary: Regression: kernel version <= 6.8.5 breaks ACP/ACP3x/ACP6x Audio Coprocessor driver on Huawei Matebooks (KLVL-WXXW, HVY-WXX9, KLVL-WXX9)
Product: Drivers Reporter: scarecat
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: RESOLVED CODE_FIX    
Severity: normal CC: antoineok, benjarobin+kernel, bkosinski01, broonie, kernel, tiwai, Vijendar.Mukunda
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.8.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg
inxi -Axxx
dmidecode
Patch to add debug traces
dmesg with applied patch with traces
lspci -vvv result
lspci -vvv (KLVL-WXXW)
Fix for ACP PCI driver failure
Failed dmesg for the patch (KLVL-WXXW)(null pointer dereference)
ACP PCI driver probe fails results in pm runtime underflow
signature.asc

Description scarecat 2024-04-26 13:11:08 UTC
Created attachment 306217 [details]
dmesg

Driver: snd_acp_pci
Likely source files where changes were introduced (I'm not sure of this though):
/sound/soc/sof/amd/acp.c
/sound/soc/amd/acp/acp-pci.c

dmesg reports:
[    7.970090] snd_acp_pci 0000:04:00.5: enabling device (0000 -> 0002)
[    7.970397] snd_acp_pci 0000:04:00.5: Runtime PM usage count underflow!
[    7.970401] snd_acp_pci: probe of 0000:04:00.5 failed with error -22
Comment 1 scarecat 2024-04-26 13:11:33 UTC
Created attachment 306218 [details]
inxi -Axxx
Comment 2 Benjamin Robin 2024-04-28 13:00:12 UTC
Reverting the commit 4af565de9f8c74b9f6035924ce0d40adec211246 fix the problem.

Here additional information that I was able to obtain by adding some trace to the kernel:

pci->revision = 1
flags = FLAG_AMD_LEGACY (8)
DMI_BOARD_VENDOR = "HUAWEI"
DMI_PRODUCT_NAME = "KLVL-WXX9"
DMI_PRODUCT_VERSION = "M1010"
chip->name = "acp_asoc_renoir"
chip->acp_rev = ACP3X_DEV (3)

readl(chip->base + ACP_PIN_CONFIG) = 0 (ACP_CONFIG_0)
pdm_addr = ACP_RENOIR_PDM_ADDR (2)
acpi_find_child_device() -> NULL
Comment 3 Benjamin Robin 2024-04-28 13:01:23 UTC
Created attachment 306227 [details]
dmidecode
Comment 4 Benjamin Robin 2024-04-28 13:03:23 UTC
Created attachment 306228 [details]
Patch to add debug traces
Comment 5 Benjamin Robin 2024-04-28 13:04:05 UTC
Created attachment 306229 [details]
dmesg with applied patch with traces
Comment 6 Takashi Iwai 2024-04-28 18:15:10 UTC
The unbalanced PM runtime sounds nasty, maybe it's triggered by an unexpected error path?
Comment 7 Benjamin Robin 2024-04-28 18:28:07 UTC
I forget to specify that:
 - The sound card is working for attached dmesg with kernel patched to add trace. Indeed,
   I ignored the error code returned by check_acp_pdm()
 - Unbalanced PM runtime error message occurs also with unpatched code. In this case
   the sound card does not work since the probe failed.
Comment 8 Vijendar Mukunda 2024-04-28 19:58:36 UTC
Recent changes added for ACP 7.0 platform. We are going to provide fix for PM underflow.

We want to confirm ACP version being used on Huawei Matebooks.

@Benjamin Robin: Could you please share lspci -vvv output?
Comment 9 Vijendar Mukunda 2024-04-28 20:03:29 UTC
@Benjamin Robin: By checking pci revision, it seems platform is Renoir(ACP 3.1).
It seems this is targeted for I2S endpoint support.

We will provide fix for it.
Comment 10 Benjamin Robin 2024-04-28 20:06:44 UTC
I kind of lie, I do not own the computer, I am trying to help an Arch Linux user [1]

For know I only have that part of lspci (I will ask for the information):

04:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 01)
        Subsystem: Huaqin Technology Co.Ltd Device [1e83:3e33]

[1] https://forums.archlinux.fr/viewtopic.php?p=181319
Comment 11 Vijendar Mukunda 2024-04-28 20:10:33 UTC
(In reply to Vijendar Mukunda from comment #9)
> @Benjamin Robin: By checking pci revision, it seems platform is Renoir(ACP
> 3.1).
> It seems this is targeted for I2S endpoint support.
> 
> We will provide fix for it.

Here root cause is problem with check_acp_pdm() API. It should only return whether PDM endpoint support exists.

ACP PCI driver shouldn't fail when PDM endpoint PIN config is not selected.
This check_acp_pdm() should be modified. It has to be verified for other endpoint configurations like I2S as well.

we will provide a fix tomorrow.
Comment 12 Vijendar Mukunda 2024-04-28 20:11:23 UTC
(In reply to Benjamin Robin from comment #10)
> I kind of lie, I do not own the computer, I am trying to help an Arch Linux
> user [1]
> 
> For know I only have that part of lspci (I will ask for the information):
> 
> 04:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD]
> ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 01)
>         Subsystem: Huaqin Technology Co.Ltd Device [1e83:3e33]
> 
> [1] https://forums.archlinux.fr/viewtopic.php?p=181319

Got it. We will provide fix for validation tomorrow.
Comment 13 antoineok 2024-04-28 20:58:21 UTC
Created attachment 306233 [details]
lspci -vvv result

Here is the result of the lspci -vvv command
Comment 14 scarecat 2024-04-28 20:59:26 UTC
Created attachment 306234 [details]
lspci -vvv (KLVL-WXXW)

Here's the output of lspci -vvv on KLVL-WXXW
Comment 15 antoineok 2024-04-28 21:07:14 UTC
(In reply to antoineok from comment #13)
> Created attachment 306233 [details]
> lspci -vvv result
> 
> Here is the result of the lspci -vvv command

Forgot to mention, it's on KLVL-WXX9 (the previous dmesg logs and dmidecode came from my computer)
Comment 16 Vijendar Mukunda 2024-04-28 21:50:59 UTC
(In reply to antoineok from comment #15)
> (In reply to antoineok from comment #13)
> > Created attachment 306233 [details]
> > lspci -vvv result
> > 
> > Here is the result of the lspci -vvv command
> 
> Forgot to mention, it's on KLVL-WXX9 (the previous dmesg logs and dmidecode
> came from my computer)

could you please double confirm ACP_PIN_CONFIG value?
Because ACP_PIN_CONFIG value CONFIG_0 points to different config.
Comment 17 antoineok 2024-04-28 21:58:19 UTC
(In reply to Vijendar Mukunda from comment #16)
> (In reply to antoineok from comment #15)
> > (In reply to antoineok from comment #13)
> > > Created attachment 306233 [details]
> > > lspci -vvv result
> > > 
> > > Here is the result of the lspci -vvv command
> > 
> > Forgot to mention, it's on KLVL-WXX9 (the previous dmesg logs and dmidecode
> > came from my computer)
> 
> could you please double confirm ACP_PIN_CONFIG value?
> Because ACP_PIN_CONFIG value CONFIG_0 points to different config.

sure, what's the command?
Comment 18 Vijendar Mukunda 2024-04-28 22:03:22 UTC
(In reply to antoineok from comment #17)
> (In reply to Vijendar Mukunda from comment #16)
> > (In reply to antoineok from comment #15)
> > > (In reply to antoineok from comment #13)
> > > > Created attachment 306233 [details]
> > > > lspci -vvv result
> > > > 
> > > > Here is the result of the lspci -vvv command
> > > 
> > > Forgot to mention, it's on KLVL-WXX9 (the previous dmesg logs and
> dmidecode
> > > came from my computer)
> > 
> > could you please double confirm ACP_PIN_CONFIG value?
> > Because ACP_PIN_CONFIG value CONFIG_0 points to different config.
> 
> sure, what's the command?

for ACP3x variants, ACP_PING_CONFIG register value is different i.e 0x1400
where as for other ACP variants, it will be 0x1440.

For quick check modify the register macro value as 0x1400 in 
https://elixir.bootlin.com/linux/v6.9-rc5/source/sound/soc/amd/acp/chip_offset_byte.h#L22
and add a debug print and confirm the value?
Comment 19 antoineok 2024-04-28 22:04:15 UTC
so i have to recompile a kernel?
Comment 20 Vijendar Mukunda 2024-04-28 22:05:34 UTC
(In reply to antoineok from comment #19)
> so i have to recompile a kernel?

yes.
Comment 21 Benjamin Robin 2024-04-28 22:18:18 UTC
@antoineok See the Arch forum post, I build a new kernel package with added trace.
You should have a line in dmesg starting with:
"check_acp_pdm: ACP_PIN_CONFIG at 0x"
Comment 22 Vijendar Mukunda 2024-04-28 22:28:21 UTC
(In reply to Benjamin Robin from comment #21)
> @antoineok See the Arch forum post, I build a new kernel package with added
> trace.
> You should have a line in dmesg starting with:
> "check_acp_pdm: ACP_PIN_CONFIG at 0x"

yes correct. this is for debug purpose only to know the ACP_PIN_CONFIG value.
I am going to provide a patch which will be generic for all platforms.

Existing function implementation is wrong i.e check_acp_pdm().
ACP platform device should be created only when I2S/PDM configuration is selected.
For other scenario, it has to skip platform device creation logic and it shouldn't return failure.

ACP PIN configuration varies from platform to platform.
I am going to replace existing function check_acp_pdm() with check_acp_config() and add platform specific config check functions.
Comment 23 antoineok 2024-04-28 22:54:56 UTC
[    6.222672] check_acp_pdm: ACP_PIN_CONFIG at 0x1400 = 4, acp_rev 3
[    6.222675] check_acp_pdm: acpi_find_child_device 0000000000000000, pdm_addr 2
[    6.222677] check_acp_pdm: ret -19
Comment 24 Vijendar Mukunda 2024-04-28 23:17:24 UTC
Created attachment 306235 [details]
Fix for ACP PCI driver failure

We don't have a Huawei Matebook to verify the fix on reported target platform.
Could you please try attached patch and let us know result?
This patch is going to be split in to further.
Mean while, we will try to validate on our other ACP variants.
Comment 25 antoineok 2024-04-28 23:20:35 UTC
I'm downloading the 6.8.8 kernel and patching it
Comment 26 scarecat 2024-04-29 08:19:15 UTC
Created attachment 306236 [details]
Failed dmesg for the patch (KLVL-WXXW)(null pointer dereference)

I tried compiling the 6.8.8 kernel on KLVL-WXXW, it ended up with a null pointer dereference and no audio device is being detected.

interesingly, both lspci and inxi now hang when i try to run them
Comment 27 Vijendar Mukunda 2024-04-29 08:33:01 UTC
(In reply to scarecat from comment #26)
> Created attachment 306236 [details]
> Failed dmesg for the patch (KLVL-WXXW)(null pointer dereference)
> 
> I tried compiling the 6.8.8 kernel on KLVL-WXXW, it ended up with a null
> pointer dereference and no audio device is being detected.
> 
> interesingly, both lspci and inxi now hang when i try to run them

will try on our platform(different ACP variant) and let you know the result.
Comment 28 scarecat 2024-04-29 09:24:10 UTC
I'll also try patching the mainline kernel to see if it changes anything
Comment 29 Vijendar Mukunda 2024-04-29 10:38:07 UTC
Created attachment 306239 [details]
ACP PCI driver probe fails results in pm runtime underflow

Problem found. Issue is with the below condition check.

+	check_acp_config(pci, chip);
+	if (!chip->is_pdm_dev && !chip->is_i2s_config)
+		goto skip_pdev_creation;

Earlier it was 'if (!chip->is_pdm_dev || !chip->is_i2s_config)'
Comment 30 Vijendar Mukunda 2024-04-29 10:38:48 UTC
(In reply to Vijendar Mukunda from comment #29)
> Created attachment 306239 [details]
> ACP PCI driver probe fails results in pm runtime underflow
> 
> Problem found. Issue is with the below condition check.
> 
> +     check_acp_config(pci, chip);
> +     if (!chip->is_pdm_dev && !chip->is_i2s_config)
> +             goto skip_pdev_creation;
> 
> Earlier it was 'if (!chip->is_pdm_dev || !chip->is_i2s_config)'

Please verify the attached patch.
Comment 31 Mangliev Timur 2024-04-29 10:39:32 UTC
I also have a Matebook that's affected by this regression and am willing to test patches.
Comment 32 Vijendar Mukunda 2024-04-29 10:41:12 UTC
Platform device creation should be skipped only when there is no PDM dev or I2S config is not selected.
Comment 33 scarecat 2024-04-29 11:07:21 UTC
(In reply to Vijendar Mukunda from comment #30)
> (In reply to Vijendar Mukunda from comment #29)
> > Created attachment 306239 [details]
> > ACP PCI driver probe fails results in pm runtime underflow
> > 
> > Problem found. Issue is with the below condition check.
> > 
> > +     check_acp_config(pci, chip);
> > +     if (!chip->is_pdm_dev && !chip->is_i2s_config)
> > +             goto skip_pdev_creation;
> > 
> > Earlier it was 'if (!chip->is_pdm_dev || !chip->is_i2s_config)'
> 
> Please verify the attached patch.

Will try that soon
Comment 34 scarecat 2024-04-29 12:12:21 UTC
(In reply to Vijendar Mukunda from comment #30)
> (In reply to Vijendar Mukunda from comment #29)
> > Created attachment 306239 [details]
> > ACP PCI driver probe fails results in pm runtime underflow
> > 
> > Problem found. Issue is with the below condition check.
> > 
> > +     check_acp_config(pci, chip);
> > +     if (!chip->is_pdm_dev && !chip->is_i2s_config)
> > +             goto skip_pdev_creation;
> > 
> > Earlier it was 'if (!chip->is_pdm_dev || !chip->is_i2s_config)'
> 
> Please verify the attached patch.

Works perfectly on KLVL-WXXW on kernel 6.8.8. Thank you
Comment 35 broonie 2024-04-29 14:22:46 UTC
Created attachment 306242 [details]
signature.asc

On Sun, Apr 28, 2024 at 12:57:23PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218780
> 
> Benjamin Robin (benjarobin+kernel@gmail.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |broonie@kernel.org

This has spammed me with more than 40 messages, please don't do this.
Comment 36 antoineok 2024-04-29 16:15:16 UTC
(In reply to Vijendar Mukunda from comment #30)
> (In reply to Vijendar Mukunda from comment #29)
> > Created attachment 306239 [details]
> > ACP PCI driver probe fails results in pm runtime underflow
> > 
> > Problem found. Issue is with the below condition check.
> > 
> > +     check_acp_config(pci, chip);
> > +     if (!chip->is_pdm_dev && !chip->is_i2s_config)
> > +             goto skip_pdev_creation;
> > 
> > Earlier it was 'if (!chip->is_pdm_dev || !chip->is_i2s_config)'
> 
> Please verify the attached patch.

works perfectly on KLVL-WXX9, thanks a lot
Comment 37 scarecat 2024-05-14 12:26:45 UTC
Can this be treated as resolved?
Comment 38 Vijendar Mukunda 2024-05-14 13:24:41 UTC
Yes, fixes has been upstream and merged in to stable kernel release(v6.8.9).
Comment 39 Bart 2024-05-15 17:33:19 UTC
Is this patch still waiting to be added to the main kernel or was it already done?
Product Name: HVY-WXX9
Dummy output is still present in audio settings.
Kernel version: 6.8.9-arch1-2
Comment 40 Vijendar Mukunda 2024-05-16 09:33:00 UTC
(In reply to Bart from comment #39)
> Is this patch still waiting to be added to the main kernel or was it already
> done?
> Product Name: HVY-WXX9
> Dummy output is still present in audio settings.
> Kernel version: 6.8.9-arch1-2

Fix has been merged in to Mark Brown asoc tree. Yet to land in stable kernel version. 

https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git/commit/sound/soc/amd/acp?id=09068d624c490c0e89f33f963c402f1859964467
Comment 41 antoineok 2024-09-13 08:52:40 UTC
it seems it landed in stable kernel.