Bug 215832
Description
Toke Høiland-Jørgensen
2022-04-12 10:30:08 UTC
Created attachment 300749 [details]
S0ixSelftestTool output
Created attachment 300830 [details]
PCIe link state script
Created attachment 300831 [details]
PCIe link state script
Updated to collect over 10 seconds.
This doesn't look like at ASPM issue to me. The nvme at least shows that ASPM is enabled. To be sure I've added the script used by the S0ix Self Test tool so that you can manually check it. It will sample the link states across the PCIe ports over a 10 second period. You can than grep the output by port to see if any are staying in L0. Please also run a turbostat collection for about a minute while your system is idle. You can do this with: sudo turbostat -n 12 -out ts.out This is to observe how your runtime package c-state residencies compare to those during suspend. Created attachment 300832 [details]
turbostat output
Created attachment 300833 [details]
PCIE link port status
You're right that there doesn't seem to be any PCIe ports that stay in L0; attached the output, as well as that from turbostat. Yeah, based on this I don't think the issue is with your nvme. Looking at the lspci output again, I see it could be WiFi. It's reporting 0s for its LTR values. Since this is a PCH connected device the small LTR value may appear in cat /sys/kernel/debug/pmc_core/ltr_show Please attach this output. There's a mechanism to ignore these LTRs for testing purposes. You can do this with: for i in {0..25}; do echo $i > /sys/kernel/debug/pmc_core/ltr_ignore; done You may see some failed writes. That's okay. The actual number of IPs that can be ignored depends on the platform. 25 is a large number to make sure all of them are captured. The rest will fail to write. After this you can run the turbostat command again to see if you can enter deeper package c-states. You can alternatively continuously cat the package_state file in the same pmc_core folder to see which states are updating. Created attachment 300834 [details]
turbostat output after ltr_ignore
Hmm, doesn't seem to help? IIRC the S0ixSelftestTool also does the ltr_ignore dance, doesn't it? Don't think that helped either.
I got two write errors writing to the ltr_ignore file, BTW, so I suppose it goes up to 23?
Created attachment 300835 [details]
contents of /sys/kernel/debug/pmc_core/ltr_show
Sorry for the delay. Stuck in Package C3 is difficult to debug without hardware tools. We'll have to search for the cause by trial and error. Here are some other things to try. 1. Check in the log that DMC firmware is loaded for the i915 driver. 2. Check in the log that firmware is loaded for wifi/bluetooth. 3. Report any errors in dmesg. 4. Run powertop --auto-tune to turn on runtime pm on all devices and check the package residency. 5. Unplug any attached USB devices and check the package residency. 6. Check the package residency with the display off, running turbostat in the background. 7. Physically remove the WiFi module and check the package residency. It's still suspect that the LTR is zero. Created attachment 300879 [details]
dmesg after cold boot (working PC10)
Created attachment 300880 [details]
dmesg after a reboot (stuck in pc3)
Okay, so I played around with enabling/disabling the WLAN module (in BIOS). At first I actually thought this helped, since after disabling it, the machine went into pc10 just fine. However, after re-enabling it, that was *still* the case. And then after rebooting, it got stuck in pc3 again. Playing around a bit more, it seems it was simply *the act of toggling* the WLAN module that "fixed" things. Or rather, whether the machine gets stuck in pc3 depends on how it was booted: If I cold boot it (i.e., shut it off completely, then start it back up) I get pc10 working, but if I then issue a "systemctl reboot", it' stuck in pc3 again. This is reproducible with or without the WLAN module enabled, and seems to be quite consistent. So I guess toggling the WLAN module causes a hardware reset that corresponds to a full power-off? I'm attaching the dmesg output after both a cold boot and a reboot; I couldn't really spot any meaningful difference, but maybe you can? Well, the act of changing any setting in BIOS causes a hardware reset, so it may not be related to WiFi at all. Did you do any suspends before rebooting? If you're not sure, can you confirm that after cold boot you get PC10 and then after reboot (without having suspended/resumed first) you only get PC3? Ah, totally missed your comment, sorry about that! Yeah, cold boot gets me PC10, then rebooting without suspending gets me only PC3. Just suspending (after a cold boot) also gets me stuck in PC3... Ping? Any updates on this? sorry about the delayed reply. we are setting up the system in the lab to debug more on this issue. Can you please confirm when it stops getting PC10 residency? check multiple times after reboot and confirm the PC10 counter is still incrementing. Also, what distribution are you using? Yeah, seems pretty consistent: cold boot, goes into PC10 every time I run turbostat (over a period of ~10-15 minutes after boot). Reboot, only PC3. I'm running Arch Linux; these latest tests were on the 5.19.13-arch1-1 kernel. Can you do a cold boot. Make sure you're getting PC10. Then write 1 to /sys/power/pm_debug_messages. Also enable some pci debug messages by doing the following as root: echo -n "file pci-driver.c +p" > /sys/kernel/debug/dynamic_debug/control Then do a 1 minute suspend. When you come back check that you are now only getting PC3. Send the dmesg log of the suspend/resume cycle. Also, as a separate test, please set nvme.noacpi=1 on the grub kernel command line. Before this change, /sys/module/nvme/parameters/noacpi is N. It will be Y after this change. There's a message in your log that indicates a quirk is being used for your nvme device during suspend. This will disable that quirk. Created attachment 303448 [details]
dmesg during suspend with debug enabled
Attaching the dmesg with debug enabled as requested.
I also tried the nvme.noacpi=1 kernel parameter. With this, I get PC10 even after a suspend; but after a (soft) reboot, I'm stuck in PC3 again...
Did you modify your bootloader to apply the kernel parameter every time? It otherwise looks like this is the issue. Please attach a copy of your acpi dump `sudo acpidump > acpidump.out`. The flag that the driver uses to apply this quirk is in one of the acpi tables. Created attachment 303451 [details]
acpidump output after cold boot
Yeah, the noacpi parameter is in the bootloader; I double-checked the module param even after a reboot.
So, to summarise, with nvme.noacpi:
- Cold boot: PC10
- Cold boot + suspend: PC10
- Reboot: PC3
- Reboot + suspend: PC3
Attaching the acpidump output for both a cold boot and after a reboot (there's a couple of bytes that are different, not sure if they're significant).
Created attachment 303452 [details]
acpidump output after reboot
Friendly ping? :) Sorry for the delay. Your tables show that your nvme is being forced to use D3 for suspend rather than ASPM which is the default for an s0ix system. The noacpi option ignores the BIOS request for D3 and this seemed to help for at least suspends after a cold boot. The presumption was that the D3 flow is not restoring the device properly on resume. But if this is true I don't know why it's not helping with reboots. That flow is different so it's likely the device is being put into D3, by I'd expect the restart to reset the device configuration. Anyway, to see if any of this is the case please capture lspci -vvv -xxxx output from your system under both PC10 and PC3 conditions. Created attachment 303673 [details]
lspci output in PS3-only state
Created attachment 303674 [details]
lspci output in PS10 state
Doesn't appear to be any difference on the NVME device itself, but there's a flag on the host bridge that's different between the two?
Sorry should have mentioned you need to run the command as root to get the extended config space detail. Created attachment 303675 [details]
lspci output in PS3-only state (as root)
Ah, doh, should have realised. Okay, trying again... :)
Created attachment 303676 [details]
lspci output in PS10 state (as root)
The diffs showed that the extended config space of both the PCIe root port and NVMe device changed but it's not clear what the cause is. The change didn't happen in the area I was suspecting. But, since it looks like not doing D3 for NVMe suspend allows your system to continue getting PC10 on resume, let's try something similar for reboot. Please try the attached patch. It will treat shutdown the same as suspend and use NVMe APST (PCIe ASPM) instead of D3. With this patch you don't need to use the noacpi flag as it will not do D3 even if it's set. This patch is intended solely to test this theory. I've tested it a few times with no observable issues on the drive but I can't say that it's safe. If you have important items on your drive you may want to change it first. But while on the topic, the drive could very well be the issue too. If you have another, preferably a different model or vendor, then you should try it without this patch. Created attachment 303741 [details]
[PATCH] Test using NVMe APST for shutdown instead of D3.
Okay, that sounds a little scary :) What's the risk here? Frying the drive, or overheating, or something? If I'm understanding you correctly your patch should just apply the same as the noacpi flag does, but for reboot as well? I've been running with the noacpi flag on since you suggested it back in December which does not appear to have broken anything, so in that case I guess it should be relatively safe? Or is there some additional risk in applying this to reboots? I do have an old Samsung NVME drive lying around from a previous laptop (the current one is a Seagate drive), but this is my daily driver laptop, so it's not quite trivial for me to find the time to replace the drive and do a fresh install to test it out; probably easier to try the patch, assuming it doesn't fry the drive :) Created attachment 303745 [details]
Test using NVMe APST for shutdown instead of D3
Sorry. It should be fine. On shutdown the NVMe is normally put into D3 (off) and the system powered down from that state. With this patch it is instead put into a very low power idle state and powered down from there. There are no accesses to the device in either instance. It's just a different condition under which the plug is getting pulled as it were. The caution is just that this is not the standard flow. But to your NVMe, it would be the equivalent of removing the power from your laptop while your system is suspended (like if the battery died). Hi. I have a similar (?) problem on ThinkPad T14s (3Gen Intel). Resume from S3 suspend leads to immediate kernel panic and the battery drain in S2idle was inconsistent (raniging from 10% to 50% overnight). It turned out to be a side effect of the laptop-mode-tools playing with /sys/module/pcie_aspm/parameters/policy. For some reason policies "default" and "powersupersave" correspond to low battery drain, while "performance" and "powersave" prevents PC8. I can provide S0ixSelftestTool -s logs for all cases if necessary. Sorry, if this is unrelated and I should rather have opened a separate ticket. (In reply to Lev Melnikovsky from comment #38) > Hi. I have a similar (?) problem on ThinkPad T14s (3Gen Intel). Resume from > S3 suspend leads to immediate kernel panic and the battery drain in S2idle > was inconsistent (raniging from 10% to 50% overnight). It turned out to be a > side effect of the laptop-mode-tools playing with > /sys/module/pcie_aspm/parameters/policy. For some reason policies "default" > and "powersupersave" correspond to low battery drain, while "performance" > and "powersave" prevents PC8. I can provide S0ixSelftestTool -s logs for all > cases if necessary. Sorry, if this is unrelated and I should rather have > opened a separate ticket. Please do create a new ticket. The symptoms you shared are not the same as this one. |