Bug 218961

Summary: Lenovo Thinkpad X1 Carbon Gen 10 first S2idle fails, S0ix fails in all further suspends
Product: Drivers Reporter: Todd Brandt (todd.e.brandt)
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: RESOLVED CODE_FIX    
Severity: normal CC: pierre-louis.bossart
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.10.0-rc1 Subsystem:
Regression: Yes Bisected commit-id: d5263dbbd8af026159b16a08a94bedfe51b5f67b
Bug Depends on:    
Bug Blocks: 178231    
Attachments: otcpl-thinkpad-x1_freeze.html
otcpl-thinkpad-x1_freeze_6.10.0-rc1.html
issue.def
otcpl-lenovo-tix1-tgl_freeze.html
otcpl-hp-spectre-tgl_freeze.html
sof-fix.patch

Description Todd Brandt 2024-06-14 03:57:18 UTC
Created attachment 306459 [details]
otcpl-thinkpad-x1_freeze.html

We have a Lenovo Thinkpad X1 Carbon Gen 10 in our lab and ever since 6.10.0-rc1 it has failed its first S2idle suspend, and has stopped getting S0iX on subsequent successful S2idle suspends. The issue in the first suspend fail is here in the audio driver (dmesg section shown):

sof-audio-pci-intel-tgl 0000:00:1f.3: Code loader DMA did not complete
sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
sof-audio-pci-intel-tgl 0000:00:1f.3: Firmware download failed
sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_READY_OK (6)
sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
sof-audio-pci-intel-tgl 0000:00:1f.3: extended rom status:  0x5 0x0 0x4000 0x0 0x0 0x0 0x2560521 0x0
sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
sof-audio-pci-intel-tgl 0000:00:1f.3: Failed to start DSP
sof-audio-pci-intel-tgl 0000:00:1f.3: error: failed to boot DSP firmware after resume -110
sof-audio-pci-intel-tgl 0000:00:1f.3: error: hda_dsp_core_reset_enter: timeout on HDA_DSP_REG_ADSPCS read
sof-audio-pci-intel-tgl 0000:00:1f.3: error: dsp core reset failed: core_mask 1
sof-audio-pci-intel-tgl 0000:00:1f.3: failed to power down DSP during suspend
sof-audio-pci-intel-tgl 0000:00:1f.3: error: suspending dsp
sof-audio-pci-intel-tgl 0000:00:1f.3: error: failed to power down DSP during suspend -110
sof-audio-pci-intel-tgl 0000:00:1f.3: PM: pci_pm_suspend(): snd_sof_suspend [snd_sof] returns -110
sof-audio-pci-intel-tgl 0000:00:1f.3: PM: dpm_run_callback(): pci_pm_suspend returns -110
sof-audio-pci-intel-tgl 0000:00:1f.3: PM: failed to suspend async: error -110

This is the commit:

commit d5263dbbd8af026159b16a08a94bedfe51b5f67b
Author: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Date:   Thu Apr 4 13:54:47 2024 -0500

    ASoC: SOF: Intel: don't ignore IOC interrupts for non-audio transfers

Reverting the commit thusly fixes things in everything up to 6.10.0-rc3.

%> git diff 6cbf086143cf9674c7f029e1cf435c65a537066a d5263dbbd8af026159b16a08a94bedfe51b5f67b > ../revert.patch
%> cat ../revert.patch | patch -1 -R

I've attached the sleepgraph timeline of the fail. The dmesg log is accesible by clicking the "dmesg" button in the upper right hand corner. They "log" button shows all the system details.
Comment 1 Todd Brandt 2024-06-14 04:06:25 UTC
Created attachment 306460 [details]
otcpl-thinkpad-x1_freeze_6.10.0-rc1.html

sleepgraph timeline on 6.10.0-rc1 with dev mode data.
Comment 2 Pierre Bossart 2024-06-19 06:20:52 UTC
moved to https://github.com/thesofproject/linux/issues/5072

We don't track bugzilla...
Comment 3 Todd Brandt 2024-06-28 11:05:28 UTC
Created attachment 306507 [details]
issue.def
Comment 4 Todd Brandt 2024-06-28 13:13:18 UTC
This issue is also affecting the Lenovo ThinkPad X1 Titanium Gen 1. I've attached a failing timeline. The issue is the very same commit.
Comment 5 Todd Brandt 2024-06-28 13:15:02 UTC
Created attachment 306508 [details]
otcpl-lenovo-tix1-tgl_freeze.html
Comment 6 Todd Brandt 2024-06-28 17:32:56 UTC
This issue is also affecting the Hewlett Packard Spectre x360 Convertible 14-ea0xxx. So 3 machines out of 50 in our lab are affected in the same manner by this commit. I've attached a timeline of the spectre fail.
Comment 7 Todd Brandt 2024-06-28 17:33:39 UTC
Created attachment 306511 [details]
otcpl-hp-spectre-tgl_freeze.html
Comment 8 Todd Brandt 2024-07-03 21:08:46 UTC
It turns out the issue is that this particular commit exposed a bug in the old 2.0.0 intel-sof firmware. By updating to firmware v2.2.6 or higher this issue seems to be fixed. Here is a simple script to upgrade the firmware over your existing firmware-sof-signed package:

#!/bin/sh

cd /tmp
sudo mv /lib/firmware/intel/sof /tmp/
sudo mv /lib/firmware/intel/sof-tplg /tmp/
git clone https://github.com/thesofproject/sof-bin
cd sof-bin
sudo ./install.sh v2.2.x/v2.2-rc1
Comment 9 Todd Brandt 2024-07-09 23:51:40 UTC
Created attachment 306552 [details]
sof-fix.patch
Comment 10 Todd Brandt 2024-07-09 23:53:47 UTC
I've just attached a patch which applies to the kernel and fixes this issue for older versions of the firmware. It will likely not make it upstream until after 6.10 but it can be applied to stable versions. The primary link to it is here:

https://github.com/thesofproject/linux/pull/5089/commits/7a8379a0d960ea48ef4ec8e682f0ea46e27e8020
Comment 11 Todd Brandt 2024-07-31 04:04:20 UTC
The fix appears to be available in upstream 6.11.0-rc1. I'm marking this as closed. Thanks for the fix!