Hi, so I used https://github.com/qwang59/S0ixSelftestTool to find out the fact that my NVME ssd (Non-Volatile memory controller: KIOXIA Corporation Device 0001) with VMD and raid mode can't reach s0ix (Pcieport is not in D3cold). Hence resulting in a huge battery drain on suspend, while running the mainline version of the 5.15 kernel. Then I came across https://bugzilla.kernel.org/show_bug.cgi?id=213717, tried the 3 patches proposed there and it fixed the issue, when I built my own kernel with these patches. I ran the self test tool again and I got, S0ix substates residency delta value: S0i2.0 13284122 Congratulations! Your system achieved the deepest S0ix substate! CPU Core C7 residency after S2idle is: 95.59 GFX RC6 residency after S2idle is: 617.37 CPU Package C-state 2 residency after S2idle is: 1.96 CPU Package C-state 3 residency after S2idle is: 1.55 CPU Package C-state 8 residency after S2idle is: 3.35 CPU Package C-state 9 residency after S2idle is: 0.00 CPU Package C-state 10 residency after S2idle is: 86.32 vs before CPU Core C7 residency after S2idle is: 95.59 GFX RC6 residency after S2idle is: 617.37 CPU Package C-state 2 residency after S2idle is: 1.96 CPU Package C-state 3 residency after S2idle is: 86.32 CPU Package C-state 8 residency after S2idle is: 0.00 CPU Package C-state 9 residency after S2idle is: 0.00 CPU Package C-state 10 residency after S2idle is: 0.00 the patches also had some debug lines, for which the outputs were [adhitya@XPS13 self]$ sudo dmesg | grep VMD_DEBUG [ 0.624100] pci 10000:e0:06.0: VMD_DEBUG: features = 46 [ 0.624101] pci 10000:e1:00.0: VMD_DEBUG: features = 46 [ 0.624103] pci 10000:e1:00.0: VMD_DEBUG: pos = 608 [ 0.624104] pci 10000:e1:00.0: VMD_DEBUG: Setting LTRs and policy... [ 0.624107] pci 10000:e1:00.0: VMD_DEBUG: Inside aspm_override_policy, link = 0000000033512dab [ 0.624109] pci 10000:e1:00.0: VMD_DEBUG: Found link, setting policy... [adhitya@XPS13 self]$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us 199014273 I opened this separate report as my device is different so I am not entirely sure if I should include my result in that report. TLDR: mainline kernel has broken suspend behavior, patches from https://bugzilla.kernel.org/show_bug.cgi?id=213717 fix it. I am also attaching the patch files, if it helps anyone else with my same device.
Created attachment 299631 [details] Subject: [PATCH 1/3] PCI/ASPM: Create ASPM override
Created attachment 299633 [details] Subject: [PATCH 2/3] PCI: vmd: Override ASPM on TGL/ADL VMD devices
Created attachment 299635 [details] Subject: [PATCH 3/3] Add pci warnings everywhere
I also had issues with this on my Dell XPS 15 9510 (2021) (so probably very similar to the one described in this bug). After applying these same patches, and according to the script S0ixSelftestTool, it looks like the computer can now reach S0ix substate. I'll report again if the computer seems to still lose battery too much.
This is still happening with 5.15.32 and 5.16.18 for me (XPS 15 9510 from 2021). I even booted on Windows to make sure I'd update all firmwares. I could unbitrot this patch and apply it on top of 5.16.18 and it still fixes the problem for me. How can we move forward ?
Created attachment 300726 [details] patch for 5.16.18 Here is the updated patch for 5.16.18, as a concatenation from the 3 others. Note that I don't know what I'm doing :-)
Hi, This patch has changed significantly since the version I posted, i recommend checking out the mailing list for the patch series under the title [PATCH V5 0/3] PCI: vmd: Enable PCIE ASPM and LTR Unfortunately it doesn't look like the module maintainer is very interested in working with Intel on merging the patches. (In reply to Julien Wajsberg from comment #5) > This is still happening with 5.15.32 and 5.16.18 for me (XPS 15 9510 from > 2021). I even booted on Windows to make sure I'd update all firmwares. > > I could unbitrot this patch and apply it on top of 5.16.18 and it still > fixes the problem for me. > > How can we move forward ?
Ah that's good to know, thanks. I'm a complete newbie in kernel development, is there a better way to import them than copy paste them from the mailing list archives? Especially do you know if some public git tree would contain them?
ah, I found the v6 version of the patch even, the 3 parts are in https://patchwork.kernel.org/project/linux-pci/cover/20220301041943.2935892-1-david.e.box@linux.intel.com/ v5 doesn't apply cleanly, let's see with v6.
First patch seems to be in-tree now. Second and third patch needs some edit to be applied but nothing crazy. Let's see how it goes...
I confirm this patchset also works for me! Great! Since 1 is intree now, let's hope 2 and 3 will go too.
The patches 2 and 3 all apply cleanly on 5.17.1 (not sure why I missed this stable version).
Great!, maybe send updated patch files, to this thread? It would be useful for someone else.
Forget about my previous messages: patch 1 isn't in-tree. I believe I had it applied locally and got confused. The full patchset applies cleanly on 5.17.1.
Here is how I applied: 1. Go to https://patchwork.kernel.org/project/linux-pci/cover/20220301041943.2935892-1-david.e.box@linux.intel.com/ 2. Click on "series" at the top right. 3. Run `git am download_file.patch` Then build the kernel as usual. For me this is: ``` # update config by running make menuconfig and immediatly exiting make menuconfig # build updated debian packages make bindeb-pkg -j$(nproc) ```
I took a shot at backporting vmd.c to 5.15.y, but PCI device defs hunk appeared too complex, so I'm bumping myself to 5.17.y to see if it helps bug 215367.
(In reply to Leho Kraav from comment #16) > I took a shot at backporting vmd.c to 5.15.y, but PCI device defs hunk > appeared too complex, so I'm bumping myself to 5.17.y to see if it helps bug > 215367. This bug is present in any devices that use the VMD driver, through the virtue of using the nvme drive by setting the drive in Intel VMD on all alder lake and tiger lake platforms, so yes it should help. To make this easier if you are on 5.15 use the patches I posted, if you are on 5.17 (which I reccomend bumping to) Then use the newer patches from David.
I have now figured out that my XPS 13 7390 2-in-1 is running NVMe in plain AHCI mode, not VMD. Adding this patchset unfortunatey does nothing for it. ``` The pcieport 0000:00:1d.0 0000:00:1d.7 ASPM enable status: Pcieport is not in D3cold: 0000:00:1d.0 0000:00:1d.7 The PCIe bridge link power management state is: 0000:00:1d.0 Link is in L0 The link power management state of PCIe bridge: 0000:00:1d.0 is not expected. which is expected to be L1.1 or L1.2, or user would run this script again. ``` I'll continue my investigation in bug 215367. Glad y'all got something moving forward for VMD though.
I see there's a new version of the patches, see https://patchwork.kernel.org/project/linux-pci/cover/20221025004411.2910026-1-david.e.box@linux.intel.com/ I'm applying on latest stable and currently building.
I forgot to comment, but I'm running on this v7 series since then with no issue.
My understanding is that David's work has been merged in v6.3.