Bug 207893 - S0ix: Unable to achieve S0ix on Dell XPS 13 9365
Summary: S0ix: Unable to achieve S0ix on Dell XPS 13 9365
Status: NEW
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-25 23:22 UTC by Daniel Holz
Modified: 2023-01-06 01:58 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.13-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci -vvv (31.15 KB, text/plain)
2020-05-25 23:22 UTC, Daniel Holz
Details

Description Daniel Holz 2020-05-25 23:22:37 UTC
Created attachment 289281 [details]
lspci -vvv

The Dell XPS 13 9365 does not support S3 sleep so there is only S0ix available. I went through these guides from Intel

https://01.org/blogs/qwang59/2018/how-achieve-s0ix-states-linux
https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting

but unfortunately even though it suspends reliably and only consumes very little power during that it never enters S0ix state.

$sudo turbostat --show Pkg%pc2,Pkg%pc3,Pkg%pc6,Pkg%pc7,Pkg%pc8,Pkg%pc9,Pk%pc10,SYS%LPI rtcwake -m freeze -s 60

results in

Pkg%pc2	Pkg%pc3	Pkg%pc6	Pkg%pc7	Pkg%pc8	Pkg%pc9	Pk%pc10	SYS%LPI
0.90	0.31	0.17	0.14	2.45	0.00	94.15	0.00
0.90	0.31	0.17	0.14	2.45	0.00	94.15	0.00

$sudo turbostat --Summary --show GFX%rc6 sleep 10

shows 

GFX%rc6
91.90

/sys/kernel/debug/pmc_core/mphy_core_lanes_power_gating_status
shows that all MPHY lanes are power gated

Here is the output of
$sudo cat /sys/kernel/debug/pmc_core/pch_ip_power_gating_status | grep On

PCH IP: 0  - PMC                             	State: On
PCH IP: 1  - OPI-DMI                         	State: On
PCH IP: 2  - SPI / eSPI                      	State: On
PCH IP: 3  - XHCI                            	State: On
PCH IP: 17 - ISH                             	State: On
PCH IP: 22 - FUSE                            	State: On

With the guides I ended up disabling the SD card reader in bios which resulted in SBC getting power gated. 
SPI / eSPI should be power gated it seems according to the guide. But the laptop does not have any SPI devices at all which might be the reason why it thinks it is not power gated. 
According to the guide the ISH "is expected to be power gated for S0ix. The prerequisite is the latest ISH FW should be loaded." Unfortunately there doesn't seem to be any guide on entire internet on how to actually do that and also no downloadable firmware to be available. Blacklisting the module intel_ish_ipc didn't change the power gating state.
No idea what FUSE means and what state it should report. Intel's has it as "off" but no further information. 
The other parts seem to behave as expected I guess.

I followed this page from the arch wiki 
https://wiki.archlinux.org/index.php/Power_management#PCI_Runtime_Power_Management
on how to activate power management for PCI and USB. Powertop shows everything as good. Also I forced ASPM on and set the policy to powersupersave. PSR is active. 

DMC firmware is loaded

$ sudo cat /sys/kernel/debug/dri/0/i915_dmc_info
fw loaded: yes
path: i915/kbl_dmc_ver1_04.bin
version: 1.4
DC3 -> DC5 count: 28450
DC5 -> DC6 count: 27029
program base: 0x09004040
ssp base: 0x00002fc0
htp: 0x00b40068

$ dmesg | grep DMC
[    1.070930] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)

Some additional information:
$ dmesg | grep LPI
[    0.026996] ACPI: LPIT 0x000000006AFC7000 000094 (v01 INTEL  KBL-ULT  00000000 MSFT 0000005F)
[ 1386.229903] ACPI: \_PR_.PR00: LPI: Device not power manageable
[ 1386.229907] ACPI: \_PR_.PR01: LPI: Device not power manageable
[ 1386.229910] ACPI: \_PR_.PR02: LPI: Device not power manageable
[ 1386.229913] ACPI: \_PR_.PR03: LPI: Device not power manageable
[ 1386.229916] ACPI: \_SB_.PCI0.GFX0: LPI: Device not power manageable
[ 1386.229923] ACPI: \_SB_.PCI0.RP05.PXSX: LPI: Device not power manageable
[ 1386.229926] ACPI: \_SB_.PCI0.RP10.PXSX: LPI: Device not power manageable
Comment 1 Daniel Holz 2021-05-23 21:23:39 UTC
So, its almost been a year since reporting this issue and the situation has not changed. I'm currently using kernel 5.13-rc2 on Arch Linux and even though the device is suspending successfully most of the time it never reaches s0ix state.
Comment 2 wendy.wang 2021-05-25 06:02:09 UTC
Hi Daniel,
I guess your Dell XPS 13 9365 is 7th Generation, Kabylake, right?
Have you ever tried to disable ISH from BIOS setup to check any luck for the S0ix?
And what's the pmc_core sysfs shows on your machine:
ls /sys/kernel/debug/pmc_core/

Suggestion: please check the PCI device D3 status during suspend:
Below are the commands:
echo -n "file pci-driver.c +p" > /sys/kernel/debug/dynamic_debug/control
echo N > /sys/module/printk/parameters/console_suspend
echo 1 > /sys/power/pm_debug_message
turbostat -o tc.out rtcwake -m freeze -s 60
after resume back, check turbostat log: tc.log and dmesg log: dmesg | grep "PCI PM"
Comment 3 Daniel Holz 2021-05-25 16:35:17 UTC
Thanks for your help. My XPS is using a Kaby Lake CPU. Unfortunately it doesn't seem to offer a bios option to disable the ISH.

Here is the output of:
ls /sys/kernel/debug/pmc_core/

ltr_ignore  ltr_show  mphy_core_lanes_power_gating_status  package_cstate_show	pch_ip_power_gating_status  pll_status	slp_s0_residency_usec

And here is the output after suspending of:

dmesg | grep "PCI PM"

[  376.637032] nvme 0000:3a:00.0: PCI PM: Suspend power state: D0
[  376.637036] nvme 0000:3a:00.0: PCI PM: Skipped
[  376.638850] i801_smbus 0000:00:1f.4: PCI PM: Suspend power state: D0
[  376.638853] i801_smbus 0000:00:1f.4: PCI PM: Skipped
[  376.641822] pcieport 0000:00:1c.0: PCI PM: Suspend power state: D0
[  376.641823] pcieport 0000:00:1c.4: PCI PM: Suspend power state: D0
[  376.641824] pcieport 0000:00:1c.0: PCI PM: Skipped
[  376.641825] pcieport 0000:00:1c.4: PCI PM: Skipped
[  376.641883] intel_ish_ipc 0000:00:13.0: PCI PM: Suspend power state: D0
[  376.641886] intel_ish_ipc 0000:00:13.0: PCI PM: Skipped
[  376.650632] snd_hda_intel 0000:00:1f.3: PCI PM: Suspend power state: D3hot
[  376.650644] i915 0000:00:02.0: PCI PM: Suspend power state: D3hot
[  376.657236] proc_thermal 0000:00:04.0: PCI PM: Suspend power state: D3hot
[  376.657359] iwlwifi 0000:3b:00.0: PCI PM: Suspend power state: D3hot
[  376.657359] mei_me 0000:00:16.0: PCI PM: Suspend power state: D3hot
[  376.657362] intel_pch_thermal 0000:00:14.2: PCI PM: Suspend power state: D3hot
[  376.658043] intel-lpss 0000:00:15.1: PCI PM: Suspend power state: D3hot
[  376.658102] intel-lpss 0000:00:15.0: PCI PM: Suspend power state: D3hot
[  376.673886] pcieport 0000:00:1d.0: PCI PM: Suspend power state: D3hot
[  376.684023] xhci_hcd 0000:00:14.0: PCI PM: Suspend power state: D3hot

For reference here ist the output of lscpi -tvv

-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
           +-02.0  Intel Corporation HD Graphics 615
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
           +-13.0  Intel Corporation Sunrise Point-LP Integrated Sensor Hub
           +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
           +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem
           +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0
           +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1
           +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1
           +-1c.0-[01-39]--
           +-1c.4-[3a]----00.0  SK hynix PC401 NVMe Solid State Drive 256GB
           +-1d.0-[3b]----00.0  Intel Corporation Wireless 8265 / 8275
           +-1f.0  Intel Corporation Device 9d4b
           +-1f.2  Intel Corporation Sunrise Point-LP PMC
           +-1f.3  Intel Corporation Sunrise Point-LP HD Audio
           \-1f.4  Intel Corporation Sunrise Point-LP SMBus

So it seems the NVMe SSD, the SMBus, the ISH and some PCIe ports won't go into D3. 
As far as I know this SK Hynix SSD is supposed to stay in D0 during suspend because it does use its own sort of internal power management system and breaking the specification. Also its PCIe-port is staying in D0. Then there is another PCIe Port that's staying in D0 though nothing is connected to it. It also reports that it doesn't support ASPM. Maybe it is the Thunderbolt connector?
Then there is the SMBus - not sure how it's supposed to behave - and the ISH. A quick google search brought up that for S2idle it's supposed to go into D0i3. Is there a way to check if it's doing that or would that also be printed as D3 in dmesg?
Comment 4 wendy.wang 2021-05-26 02:34:00 UTC
smbus and ish in D0 are expected for the S0ix. They are OK.

[  376.638850] i801_smbus 0000:00:1f.4: PCI PM: Suspend power state: D0
[  376.638853] i801_smbus 0000:00:1f.4: PCI PM: Skipped
[  376.641883] intel_ish_ipc 0000:00:13.0: PCI PM: Suspend power state: D0
[  376.641886] intel_ish_ipc 0000:00:13.0: PCI PM: Skipped

So looks like your S0ix Blocker comes from NVMe. 
For 1c.0, I'm not sure if it's TBT's root port, if it is, and TBT is disabled and not in use, then unused PCIe root port will block s0ix as well.
You can have a try to disable the root port if TBT controller is disabled.
Comment 5 Daniel Holz 2021-05-26 11:18:28 UTC
Disabling Thunderbolt in the Bios did not make the weird port go away but with 

echo 1 > /sys/bus/pci/devices/0000:00:1c.0/remove

I could get rid of it while the system still seems to be working fine. Unfortunately I still get

intel_pmc_core INT33A1:00: CPU did not enter SLP_S0!!! (S0ix cnt=0)

when suspending the device. 
Maybe the SSD then? I'm not entirely sure how it is supposed to behave. There were some patches for it back in the day which would stop it from being disabled or being put into D3 so it can use it's own power saving system but they didn't get merged. Then there was this patch

nvme/pci: Use host managed power state for suspend
https://lore.kernel.org/lkml/20190514061141.GA7059@lst.de/T/

Which was a more general solution for this kind of SSDs.So it still might be supposed to stay in D0 to do its thing which should also keep its PCIe Port in D0.

Not sure where to go from here though.
Comment 6 wendy.wang 2021-05-26 13:31:59 UTC
Maybe you can apply the general solution patch you mentioned to see if NVMe and its PCIe root port can be D3. if they are always in D0, your machine will not enter S0ix, just go ahead talk to the NVMe engineer.
Comment 7 Daniel Holz 2021-05-26 14:01:22 UTC
Sorry, I kind of forgot to mention there that this patch was merged with Kernel 5.3. So I'm already using it. It's the reason why the SSD stays in D0. Before Kernel 5.3 I had around 5% battery drainage per hour during suspend making suspend pretty useless. With it I'm down to 1-2% per hour which is at least okay. 

I think you might be even referring to it in your S0ix troubleshooting guide:

https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting
 
"Beginning with v5.3, Linux kernel handles NVMe devices in a special way in the suspend to idle flow, which makes NVMe devices support S0ix more stable."

Not sure why the device still doesn't enter S0ix though. Is there a way to diagnose ASPM or APST power states? Or could it be something entirely different?

Also I could find the older patch set for SK Hynix SSD troubles with D3 that didn't get merged though for information.

https://lore.kernel.org/patchwork/patch/1007283/
Comment 8 wendy.wang 2021-05-27 02:18:57 UTC
Talked to our kernel developer, NVMe D0 should be expected for Kabylake to enter S0ix. Something else on the board blocks the S0ix.
Our engineer will try to reproduce the problem first, then update here.
Comment 9 Daniel Holz 2021-10-10 11:13:53 UTC
I found your selftest tool on github https://github.com/qwang59/S0ixSelftestTool. Here is the output of it:

sudo sh s0ix-selftest-tool.sh -s

---Check S2idle path S0ix Residency---:

---Check whether your system supports S0ix or not---:

Low Power S0 Idle is:1
Your system supports low power S0 idle capability.



---Check whether intel_pmc_core sysfs files exit---:

The pmc_core debug sysfs files are OK on your system.



---Judge PC10, S0ix residency available status---:
cat: /sys/kernel/debug/pmc_core/substate_residencies: Datei oder Verzeichnis nicht gefunden
grep: /sys/kernel/debug/pmc_core/substate_residencies: Datei oder Verzeichnis nicht gefunden
Test system does not support S0ix.y substate

Turbostat output: 
15.630675 sec
CPU%c1	CPU%c6	CPU%c7	GFX%rc6	Pkg%pc2	Pkg%pc3	Pkg%pc6	Pkg%pc7	Pkg%pc8	Pkg%pc9	Pk%pc10	SYS%LPI
1.97	0.73	95.53	26091.96	2.84	0.82	0.59	0.02	4.99	0.00	85.63	0.00
1.87	0.44	95.50	26098.00	2.84	0.82	0.59	0.02	4.99	0.00	85.63	0.00
2.51
1.82	1.02	95.56
1.65

CPU Core C7 residency after S2idle is: 95.53
GFX RC6 residency after S2idle is: 26091.96
CPU Package C-state 2 residency after S2idle is: 2.84
CPU Package C-state 3 residency after S2idle is: 0.82
CPU Package C-state 8 residency after S2idle is: 4.99
CPU Package C-state 9 residency after S2idle is: 0.00
CPU Package C-state 10 residency after S2idle is: 85.63
S0ix residency after S2idle is: 0.00
cat: /sys/kernel/debug/pmc_core/substate_residencies: Datei oder Verzeichnis nicht gefunden

Need to debug which IP blocked S0ix since PC10 is observed.

---Debug S0ix failure scenario--PCH IP power gating check---:

Your system south port controller power gating state is OK after 30 seconds runtime check.


But it seems it can't find the source of the problem.
Comment 10 Daniel Holz 2023-01-06 01:58:35 UTC
S0ix still doesn't work on this device. I'm running kernel 6.1.3 now and for some reason the output of s0ix-selftest-tool.sh changed.

sudo ./s0ix-selftest-tool.sh -s

---Check S2idle path S0ix Residency---:

The system OS Kernel version is:
Linux holzi-dell 6.1.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 04 Jan 2023 16:28:15 +0000 x86_64 GNU/Linux

---Check whether your system supports S0ix or not---:

Low Power S0 Idle is:1
Your system supports low power S0 idle capability.



---Check whether intel_pmc_core sysfs files exit---:

The pmc_core debug sysfs files are OK on your system.



---Judge PC10, S0ix residency available status---:
cat: /sys/kernel/debug/pmc_core/substate_residencies: Datei oder Verzeichnis nicht gefunden
grep: /sys/kernel/debug/pmc_core/substate_residencies: Datei oder Verzeichnis nicht gefunden
Test system does not support S0ix.y substate

Turbostat output: 
14.643011 sec
CPU%c1	CPU%c6	CPU%c7	GFX%rc6	Pkg%pc2	Pkg%pc3	Pkg%pc6	Pkg%pc7	Pkg%pc8	Pkg%pc9	Pk%pc10	SYS%LPI
0.91	0.30	97.67	1180.80	1.21	0.06	0.21	0.02	1.71	0.00	94.06	0.00
0.91	0.32	97.66	1181.09	1.21	0.06	0.21	0.02	1.71	0.00	94.06	0.00
0.74
1.02	0.28	97.69
0.96

CPU Core C7 residency after S2idle is: 97.67
GFX RC6 residency after S2idle is: 1180.80
CPU Package C-state 2 residency after S2idle is: 1.21
CPU Package C-state 3 residency after S2idle is: 0.06
CPU Package C-state 8 residency after S2idle is: 1.71
CPU Package C-state 9 residency after S2idle is: 0.00
CPU Package C-state 10 residency after S2idle is: 94.06
S0ix residency after S2idle is: 0.00
cat: /sys/kernel/debug/pmc_core/substate_residencies: Datei oder Verzeichnis nicht gefunden

Need to debug which IP blocked S0ix since PC10 is observed.

---Debug S0ix failure scenario--PCH IP power gating check---:

Your system south port controller did not meet S0ix requirement: SPB
PCH IP: 5  - SPB                             	State: On

---Debug S0ix failure scenario--Setting No ACPI DSM Callback---:

Setting no ACPI DSM callback is not helpful to the S0ix residency.

---Debug PCIeport D states and link PM states---

Checking PCI Devices D3 States:
[  207.243386] nvme 0000:3a:00.0: PCI PM: Suspend power state: D0
[  207.243397] nvme 0000:3a:00.0: PCI PM: Skipped
[  207.247171] i801_smbus 0000:00:1f.4: PCI PM: Suspend power state: D0
[  207.247176] i801_smbus 0000:00:1f.4: PCI PM: Skipped
[  207.251318] intel_ish_ipc 0000:00:13.0: PCI PM: Suspend power state: D0
[  207.251322] pcieport 0000:00:1c.4: PCI PM: Suspend power state: D0
[  207.251324] intel_ish_ipc 0000:00:13.0: PCI PM: Skipped
[  207.251334] pcieport 0000:00:1c.4: PCI PM: Skipped
[  207.251605] snd_hda_intel 0000:00:1f.3: PCI PM: Suspend power state: D3hot
[  207.251623] i915 0000:00:02.0: PCI PM: Suspend power state: D3hot
[  207.263408] proc_thermal 0000:00:04.0: PCI PM: Suspend power state: D3hot
[  207.263419] mei_me 0000:00:16.0: PCI PM: Suspend power state: D3hot
[  207.263689] iwlwifi 0000:3b:00.0: PCI PM: Suspend power state: D3hot
[  207.263715] intel_pch_thermal 0000:00:14.2: PCI PM: Suspend power state: D3hot
[  207.266059] intel-lpss 0000:00:15.0: PCI PM: Suspend power state: D3hot
[  207.266255] intel-lpss 0000:00:15.1: PCI PM: Suspend power state: D3hot
[  207.274491] pcieport 0000:00:1d.0: PCI PM: Suspend power state: D3hot
[  207.291388] xhci_hcd 0000:00:14.0: PCI PM: Suspend power state: D3hot


Checking PCI Devices tree diagram:
-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
           +-02.0  Intel Corporation HD Graphics 615
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
           +-13.0  Intel Corporation Sunrise Point-LP Integrated Sensor Hub
           +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
           +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem
           +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0
           +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1
           +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1
           +-1c.4-[3a]----00.0  SK hynix PC401 NVMe Solid State Drive 256GB
           +-1d.0-[3b]----00.0  Intel Corporation Wireless 8265 / 8275
           +-1f.0  Intel Corporation Device 9d4b
           +-1f.2  Intel Corporation Sunrise Point-LP PMC
           +-1f.3  Intel Corporation Sunrise Point-LP HD Audio
           \-1f.4  Intel Corporation Sunrise Point-LP SMBus

The pcieport 0000:00:1c.4 ASPM enable status:
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+

Pcieport is not in D3cold:          
0000:00:1c.4

Pcieport is not in D3cold:     
0000:00:1d.0

Available bridge device: 0000:00:1c.4 0000:00:1d.0

The PCIe bridge link power management state is:
0000:00:1c.4 Link is in L0

The link power management state of PCIe bridge: 0000:00:1c.4 is not expected. 
which is expected to be L1.1 or L1.2, or user would run this script again.


The L1SubCap of the failed 0000:00:1c.4 is:
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+

The L1SubCtl1 of the failed 0000:00:1c.4 is:
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+


Checking PCI Devices tree diagram:
-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
           +-02.0  Intel Corporation HD Graphics 615
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
           +-13.0  Intel Corporation Sunrise Point-LP Integrated Sensor Hub
           +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
           +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem
           +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0
           +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1
           +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1
           +-1c.4-[3a]----00.0  SK hynix PC401 NVMe Solid State Drive 256GB
           +-1d.0-[3b]----00.0  Intel Corporation Wireless 8265 / 8275
           +-1f.0  Intel Corporation Device 9d4b
           +-1f.2  Intel Corporation Sunrise Point-LP PMC
           +-1f.3  Intel Corporation Sunrise Point-LP HD Audio
           \-1f.4  Intel Corporation Sunrise Point-LP SMBus


The pcieroot port 0000:00:1c.4 ASPM setting is Enabled, its D state and Link PM are not expected,
please investigate or report a bug.

Note You need to log in before you can comment on or make changes to this bug.