System: Dell XPS-13 9300, Intel(R) Core(TM) i7-1065G7 Up through Linux 5.9-rc5, this system will typically get very good residency in Low-Power-Idle upon s2idle. But on 5% of attempts, it gets ZERO residency in low-power-idle. I have confirmed that this does not change if the time sleeping is extended, or the time between experiments is extended. David Box suggested disabling PTM. Doing so resulted -- for the first time -- the system entering LPI on s2idle 100% of s2idle attempts (770 attempts over 8 hours) ~/bin/pci_write8 0 0x1d 0 0x158 0 ~/bin/pci_write8 0 0x1d 7 0x158 0 is what I used to disable PTM on my system (write a 0, where pci_read8 previously showed the value 3) This change is visible in the output of lspci as for both bridges: < PTMControl: Enabled:+ RootSelected:+ > PTMControl: Enabled:- RootSelected:-
Created attachment 292597 [details] Disable PTM during suspend Please try the following patch that disables PTM during suspend and enables again on resume on the PCIe port.
Created attachment 292599 [details] PTM disable patch with ICL port ids Same patch with ICL ids added.
Tested-by: Len Brown <len.brown@intel.com> tested this patch on 5.9.0-rc6 out of 1066 attempts, the 9300 achieved LPI 1065 times. The one failure remaining also did not achieve PC10, and so it seems unrelated.
Based on the patch, I assume this is a hardware defect. If it's a documented erratum, please cite it in the commit log when posting the patch. I want to avoid quirks that will need to be updated for every new Intel chipset. s/comsumption/consumption/ in the comment. The comment mentions "S0ix", but this term is not defined or used in Linux. I suspect it's related to [1] (but even that document doesn't contain the term "S0i"). It would be better if the comment could be phrased in terms of existing Linux PM concepts or if Linux PM could be updated to comprehend S0ix. I don't understand the purpose of the "intel_ptm_restore" global in the patch. The quirks could check the current state, and emit the message, do the config write, and update the pdev-> bits only when necessary. [1] https://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
(In reply to Bjorn Helgaas from comment #4) > Based on the patch, I assume this is a hardware defect. If it's a > documented erratum, please cite it in the commit log when posting the patch. > I want to avoid quirks that will need to be updated for every new Intel > chipset. There is no erratum as this does not affect Windows. We believe we are hitting a race condition somewhere in the suspend flow in Linux. The actual solution may be another kernel patch. But we have not got to a root cause yet. > > s/comsumption/consumption/ in the comment. > > The comment mentions "S0ix", but this term is not defined or used in Linux. > I suspect it's related to [1] (but even that document doesn't contain the > term "S0i"). It would be better if the comment could be phrased in terms of > existing Linux PM concepts or if Linux PM could be updated to comprehend > S0ix. Yes, this is the relevant document. Will change the language. > > I don't understand the purpose of the "intel_ptm_restore" global in the > patch. > > The quirks could check the current state, and emit the message, do the > config write, and update the pdev-> bits only when necessary. The purpose is to save the state so the ptm bits are only enabled on resume if they were disabled during suspend. I did look at other quirks and noticed that no one else did this type of check, but I thought it best not to assume that it was enabled before. If this is not a concern, it could just be enabled without the check. Thanks.
Per discussion on lkml, instead of a quirk it would be better to do this as part of save/restore of the PTM control register (which was missing anyway). New patches [1] have been submitted. [1] https://lore.kernel.org/linux-pci/20201119001822.31617-1-david.e.box@linux.intel.com/
This bug can be marked as resolved. It is fixed by the following commit: commit c01163dbd1b8aa016c163ff4bf3a2e90311504f1 Author: Bjorn Helgaas <bhelgaas@google.com> Date: Fri Sep 9 15:25:05 2022 -0500 PCI/PM: Always disable PTM for all devices during suspend