Bug 217321 - Intel platforms can't sleep deeper than PC3 during long idle
Summary: Intel platforms can't sleep deeper than PC3 during long idle
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-11 08:32 UTC by KobaKo
Modified: 2023-08-29 15:36 UTC (History)
6 users (show)

See Also:
Kernel Version: 6.1.0-1007-oem (kobako@barbatos) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.1.0-2ubuntu1~22.04)
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci_vvv_202304061800 (13.97 KB, text/plain)
2023-04-11 08:32 UTC, KobaKo
Details
turbostat_202304111627 (4.99 KB, text/plain)
2023-04-11 08:32 UTC, KobaKo
Details
dmesg_202304111633 (79.48 KB, text/plain)
2023-04-11 08:35 UTC, KobaKo
Details
dmesg_202304121049 (79.78 KB, text/plain)
2023-04-12 02:53 UTC, KobaKo
Details
lspci_nvvv_202304121041 (42.75 KB, text/plain)
2023-04-12 02:54 UTC, KobaKo
Details
lspci_vvnn_202306090022.txt (44.89 KB, text/plain)
2023-06-08 16:23 UTC, KobaKo
Details
lspci_xxxx_202306100805 (121.43 KB, text/plain)
2023-06-10 00:07 UTC, KobaKo
Details

Description KobaKo 2023-04-11 08:32:04 UTC
[Symptom]
Intel cpu can't sleep deeper than pcˇ during long idle
~~~
Pkg%pc2	Pkg%pc3	Pkg%pc6	Pkg%pc7	Pkg%pc8	Pkg%pc9	Pk%pc10
15.08	75.02	0.00	0.00	0.00	0.00	0.00
15.09	75.02	0.00	0.00	0.00	0.00	0.00
^CPkg%pc2	Pkg%pc3	Pkg%pc6	Pkg%pc7	Pkg%pc8	Pkg%pc9	Pk%pc10
15.38	68.97	0.00	0.00	0.00	0.00	0.00
15.38	68.96	0.00	0.00	0.00	0.00	0.00
~~~
[How to Reproduce]
1. run turbostat to monitor
2. leave machine idle
3. turbostat show cpu only go into pc2~pc3.

[Misc]
The culprit are this 
a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume”

if revert a7152be79b62, the issue is gone
Comment 1 KobaKo 2023-04-11 08:32:33 UTC
Created attachment 304116 [details]
lspci_vvv_202304061800
Comment 2 KobaKo 2023-04-11 08:32:54 UTC
Created attachment 304117 [details]
turbostat_202304111627
Comment 3 KobaKo 2023-04-11 08:35:10 UTC
Created attachment 304118 [details]
dmesg_202304111633
Comment 4 Artem S. Tashkinov 2023-04-11 16:39:17 UTC
This patch must have been reverted, please check kernel 6.1.23 or 6.2.10.

https://www.spinics.net/lists/stable-commits/msg285503.html
Comment 5 KobaKo 2023-04-12 02:53:27 UTC
@Atem,
this patch revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"
and cause the CPU can't sleep deeper than pc3.

after revert the patch(you also mentioned), the issue is gone.
Comment 6 KobaKo 2023-04-12 02:53:55 UTC
Created attachment 304124 [details]
dmesg_202304121049
Comment 7 KobaKo 2023-04-12 02:54:16 UTC
Created attachment 304125 [details]
lspci_nvvv_202304121041
Comment 8 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-04-12 12:18:45 UTC
KobaKo, FWIW, Bjorn had some questions for you here:

https://lore.kernel.org/all/20230411204229.GA4168208@bhelgaas/
Comment 9 KobaKo 2023-05-04 08:19:04 UTC
@Bjorn,
1. just boot the machine and observe the PCx state, the issue could be observed.
don't need to trigger suspend/resume.
2. i will try to find which device is affecting.
3. sudo lspci uploaded. (lspci_nvvv_202304121041)
Comment 10 Bjorn Helgaas 2023-05-23 21:48:35 UTC
I'm removing the "regression" tag on this bug report.

This bug is definitely real.  We want to be able to sleep deeper than PC3 (I don't actually know what PC3 is, but I take your word for it :)).

4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") gave us that deep sleep capability temporarily, but 4ff116d0d5fd *also* broke suspend/resume completely on some machines.  So 4ff116d0d5fd is basically just a patch that needs a little more work before we merge it again.
Comment 11 Neo Wong 2023-06-05 15:02:44 UTC
@KobaKo,

Please help to replicate this issue in the latest RC kernel. See if issue remain.
Comment 12 KobaKo 2023-06-05 15:51:58 UTC
@Neo, is there any solutions landed on vanilla kernel?
Comment 13 Neo Wong 2023-06-05 15:53:13 UTC
No, but that's the request from engineering team for taking this issue.

thanks
Comment 14 KobaKo 2023-06-06 01:32:39 UTC
@Neo, i tried the latest generic and pci next before and
could try one more time.
Comment 15 KobaKo 2023-06-06 01:32:53 UTC
@Neo, i tried the latest generic and pci next before and
could try one more time.
Comment 16 Neo Wong 2023-06-06 01:36:21 UTC
@Koba, thanks, please make sure to upload the log once you saw the same issue.
Comment 17 David Box 2023-06-06 23:07:43 UTC
I believe the patch that was reverted should have only corrected the system state after the first suspend/resume since not saving state on suspend was the issue. Do you find that you can get deeper than PC3 after boot but not after resume from suspend? If you can get deeper than PC3 after boot (before suspend) can you send an lspci output of this?
Comment 18 KobaKo 2023-06-07 03:26:38 UTC
@David, i tried w/ patch
1. boot, can't get into deeper than pc3.
2. suspend/resume, cant get too.

i could revert the patch and get lspci output, is it ok?
Comment 19 David Box 2023-06-07 04:00:06 UTC
(In reply to KobaKo from comment #18)
> @David, i tried w/ patch
> 1. boot, can't get into deeper than pc3.
> 2. suspend/resume, cant get too.
> 
> i could revert the patch and get lspci output, is it ok?

Sure.
Comment 20 KobaKo 2023-06-08 16:23:32 UTC
Created attachment 304384 [details]
lspci_vvnn_202306090022.txt
Comment 21 KobaKo 2023-06-08 16:23:55 UTC
@David, please check lspci_vvnn_202306090022.txt
Comment 22 KobaKo 2023-06-08 16:29:08 UTC
also tried 6.4.RC5, VP.
Comment 23 KobaKo 2023-06-09 13:59:04 UTC
@David, it seem ltr1.2_threshold has been configured during booting.
nvme.ltr1.2_threshold is still zero through lspci.
~~~
[    0.000000] Linux version 6.1.0-1014-oem (kobako@barbatos) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.1.0-2ubuntu1~22.04) 12.1.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #14 SMP PREEMPT_DYNAMIC Fri Jun  9 08:03:27 UTC 2023
[    0.897741] pci 0000:02:00.0: aspm l1ss 120
[    0.897749] pci 0000:02:00.0: l1_2_threshold 118
[    2.165297] pci 10000:e1:00.0: aspm l1ss 112
[    2.165313] pci 10000:e1:00.0: l1_2_threshold 616
~~~
$ sudo lspci -s 10000:e1:00.0 -vvv
10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp Device 5015 (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Sandisk Corp Device 5015
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 0
	NUMA node: 0
	Region 0: Memory at 72000000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [b0] MSI-X: Enable+ Count=65 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00002000
	Capabilities: [c0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <8us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 16GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [1b8 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [300 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [900 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
			  PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [910 v1] Data Link Feature <?>
	Capabilities: [920 v1] Lane Margining at the Receiver <?>
	Capabilities: [9c0 v1] Physical Layer 16.0 GT/s <?>
	Kernel driver in use: nvme
	Kernel modules: nvme

~~~
Comment 24 KobaKo 2023-06-10 00:06:37 UTC
With Davids patch, only save&restore l1.2_threshold, VP.
~~~
u@u-Precision-3460:~$ sudo lspci -s 10000:e1:00.0 -vvv
10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp Device 5015 (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Sandisk Corp Device 5015
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 0
	NUMA node: 0
	Region 0: Memory at 72000000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [b0] MSI-X: Enable+ Count=65 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00002000
	Capabilities: [c0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <8us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 16GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [1b8 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [300 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [900 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
			  PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=616448ns
		L1SubCtl2: T_PwrOn=500us
	Capabilities: [910 v1] Data Link Feature <?>
	Capabilities: [920 v1] Lane Margining at the Receiver <?>
	Capabilities: [9c0 v1] Physical Layer 16.0 GT/s <?>
	Kernel driver in use: nvme
	Kernel modules: nvme

u@u-Precision-3460:~$ sudo rtcwake -m mem -s 10
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Jun  9 23:59:57 2023
u@u-Precision-3460:~$ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show
Package C2 : 181494097
Package C3 : 45220243
Package C6 : 0
Package C7 : 0
Package C8 : 0
Package C9 : 0
Package C10 : 0

~~~
Comment 25 KobaKo 2023-06-10 00:07:38 UTC
Created attachment 304392 [details]
lspci_xxxx_202306100805
Comment 26 Mika Westerberg 2023-06-26 10:42:31 UTC
Can you try the patch in:

https://bugzilla.kernel.org/show_bug.cgi?id=216782#c73

It should add back the save/restore functionality but also follow the spec more closely in the restore phase. It also skips restore on the ASUS system involved just in case.
Comment 27 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-08-29 15:21:54 UTC
(In reply to Mika Westerberg from comment #26)
> Can you try the patch in:

Mika, what happened to all of this? It looks like it fell through the cracks, but maybe I'm missing something.
Comment 28 Mika Westerberg 2023-08-29 15:36:39 UTC
Hi, it is still being worked on (but thanks for reminding). There is one system (see: https://bugzilla.kernel.org/show_bug.cgi?id=216877) where this fails and we are trying to figure out what might be causing it.

Note You need to log in before you can comment on or make changes to this bug.