Bug 209361 - PTMControl Enabled blocks Low Power Idle in 5% of s2idle attempts - Dell XPS-13 9300 Ice Lake
Summary: PTMControl Enabled blocks Low Power Idle in 5% of s2idle attempts - Dell XPS-...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2020-09-23 15:22 UTC by Len Brown
Modified: 2022-10-21 01:39 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.9-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Disable PTM during suspend (2.76 KB, patch)
2020-09-23 18:51 UTC, David Box
Details | Diff
PTM disable patch with ICL port ids (2.81 KB, patch)
2020-09-23 18:56 UTC, David Box
Details | Diff

Description Len Brown 2020-09-23 15:22:47 UTC
System: Dell XPS-13 9300, Intel(R) Core(TM) i7-1065G7

Up through Linux 5.9-rc5, this system will typically get very good residency
in Low-Power-Idle upon s2idle.  But on 5% of attempts, it gets ZERO
residency in low-power-idle.

I have confirmed that this does not change if the time sleeping
is extended, or the time between experiments is extended.

David Box suggested disabling PTM.
Doing so resulted -- for the first time --
the system entering LPI on s2idle 100% of s2idle attempts
(770 attempts over 8 hours)

~/bin/pci_write8 0 0x1d 0 0x158 0
~/bin/pci_write8 0 0x1d 7 0x158 0

is what I used to disable PTM on my system
(write a 0, where pci_read8 previously showed the value 3)

This change is visible in the output of lspci as for both bridges:

< PTMControl: Enabled:+ RootSelected:+
> PTMControl: Enabled:- RootSelected:-
Comment 1 David Box 2020-09-23 18:51:03 UTC
Created attachment 292597 [details]
Disable PTM during suspend

Please try the following patch that disables PTM during suspend and enables again on resume on the PCIe port.
Comment 2 David Box 2020-09-23 18:56:07 UTC
Created attachment 292599 [details]
PTM disable patch with ICL port ids

Same patch with ICL ids added.
Comment 3 Len Brown 2020-09-24 14:08:22 UTC
Tested-by: Len Brown <len.brown@intel.com>

tested this patch on 5.9.0-rc6 

out of 1066 attempts, the 9300 achieved LPI 1065 times.
The one failure remaining also did not achieve PC10, and so
it seems unrelated.
Comment 4 Bjorn Helgaas 2020-09-24 14:51:42 UTC
Based on the patch, I assume this is a hardware defect.  If it's a documented erratum, please cite it in the commit log when posting the patch.  I want to avoid quirks that will need to be updated for every new Intel chipset.

s/comsumption/consumption/ in the comment.

The comment mentions "S0ix", but this term is not defined or used in Linux.  I suspect it's related to [1] (but even that document doesn't contain the term "S0i").  It would be better if the comment could be phrased in terms of existing Linux PM concepts or if Linux PM could be updated to comprehend S0ix.

I don't understand the purpose of the "intel_ptm_restore" global in the patch.

The quirks could check the current state, and emit the message, do the config write, and update the pdev-> bits only when necessary.

[1] https://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
Comment 5 David Box 2020-09-24 16:56:20 UTC
(In reply to Bjorn Helgaas from comment #4)
> Based on the patch, I assume this is a hardware defect.  If it's a
> documented erratum, please cite it in the commit log when posting the patch.
> I want to avoid quirks that will need to be updated for every new Intel
> chipset.

There is no erratum as this does not affect Windows. We believe we are hitting a race condition somewhere in the suspend flow in Linux. The actual solution may be another kernel patch. But we have not got to a root cause yet.

> 
> s/comsumption/consumption/ in the comment.
> 
> The comment mentions "S0ix", but this term is not defined or used in Linux. 
> I suspect it's related to [1] (but even that document doesn't contain the
> term "S0i").  It would be better if the comment could be phrased in terms of
> existing Linux PM concepts or if Linux PM could be updated to comprehend
> S0ix.

Yes, this is the relevant document.  Will change the language.

> 
> I don't understand the purpose of the "intel_ptm_restore" global in the
> patch.
> 
> The quirks could check the current state, and emit the message, do the
> config write, and update the pdev-> bits only when necessary.

The purpose is to save the state so the ptm bits are only enabled on resume if they were disabled during suspend. I did look at other quirks and noticed that no one else did this type of check, but I thought it best not to assume that it was enabled before. If this is not a concern, it could just be enabled without the check. Thanks.
Comment 6 David Box 2020-11-19 00:25:20 UTC
Per discussion on lkml, instead of a quirk it would be better to do this as part of save/restore of the PTM control register (which was missing anyway). New patches [1] have been submitted.

[1] https://lore.kernel.org/linux-pci/20201119001822.31617-1-david.e.box@linux.intel.com/
Comment 7 David Box 2022-10-21 01:39:53 UTC
This bug can be marked as resolved. It is fixed by the following commit:

commit c01163dbd1b8aa016c163ff4bf3a2e90311504f1
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Fri Sep 9 15:25:05 2022 -0500

   PCI/PM: Always disable PTM for all devices during suspend

Note You need to log in before you can comment on or make changes to this bug.