Bug 216877 - Regression in PCI powermanagement breaks resume after suspend
Summary: Regression in PCI powermanagement breaks resume after suspend
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL: https://lkml.org/lkml/2023/1/4/599
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-02 14:02 UTC by Thomas Witt
Modified: 2023-12-21 00:46 UTC (History)
7 users (show)

See Also:
Kernel Version: 6.1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
output of git bisect log (2.46 KB, text/plain)
2023-01-02 14:02 UTC, Thomas Witt
Details
lspci -v (6.76 KB, text/plain)
2023-01-02 14:03 UTC, Thomas Witt
Details
output of netconsole after the issue happens (12.10 KB, text/plain)
2023-01-02 14:03 UTC, Thomas Witt
Details
lspci -vv, as root user (36.48 KB, text/plain)
2023-01-02 17:31 UTC, Thomas Witt
Details
journal before suspend (78.38 KB, text/plain)
2023-01-02 17:32 UTC, Thomas Witt
Details
output of dmidecode (15.43 KB, text/plain)
2023-06-28 10:20 UTC, Thomas Witt
Details
[PATCH] PCI/ASPM: Add back L1 PM Substate save and restore with delay (6.27 KB, patch)
2023-07-05 20:45 UTC, David Box
Details | Diff
debug log for comment #27 (47.27 KB, application/gzip)
2023-11-05 13:31 UTC, Thomas Witt
Details
debug log for comment #29 (53.57 KB, application/gzip)
2023-11-06 22:42 UTC, Thomas Witt
Details
acpidump -b (68.84 KB, application/gzip)
2023-11-06 22:43 UTC, Thomas Witt
Details
Reconfigure ASPM after device enable (2.19 KB, patch)
2023-11-10 16:45 UTC, David Box
Details | Diff
Disable L1SS and L1 before restore (8.35 KB, patch)
2023-12-15 20:42 UTC, David Box
Details | Diff
Save and restore on POLICY_DEFAULT only (10.56 KB, patch)
2023-12-15 20:43 UTC, David Box
Details | Diff
Add back save and restore. Disable child before parent. (11.13 KB, patch)
2023-12-21 00:46 UTC, David Box
Details | Diff

Description Thomas Witt 2023-01-02 14:02:51 UTC
Created attachment 303512 [details]
output of git bisect log

After commit 5e85eba6f50dc288c22083a7e213152bcc4b8208 "PCI/ASPM: Refactor L1 PM Substates Control Register programming" my Laptop does not resume PCI devices back from suspend.

My Laptop is a Tuxedo Infinitybook S 14 v5, as far as I can tell they use a Clevo L140CU Mainboard.

The main symptom is:
iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible

after that, the level of interaction I still have with the laptop varies, but It cannot run dmesg and it cannot do a clean reboot. The issue occurs on every suspend/resume cycle.
Comment 1 Thomas Witt 2023-01-02 14:03:22 UTC
Created attachment 303513 [details]
lspci -v
Comment 2 Thomas Witt 2023-01-02 14:03:52 UTC
Created attachment 303514 [details]
output of netconsole after the issue happens
Comment 3 Thomas Witt 2023-01-02 17:31:22 UTC
Created attachment 303515 [details]
lspci -vv, as root user
Comment 4 Thomas Witt 2023-01-02 17:32:13 UTC
Created attachment 303516 [details]
journal before suspend

this is the output of "journalctl -k -b -1"
Comment 5 Artem S. Tashkinov 2023-01-08 09:46:19 UTC
Is this still broken in 6.1.4?
Comment 6 Thomas Witt 2023-01-08 10:17:52 UTC
(In reply to Artem S. Tashkinov from comment #5)
> Is this still broken in 6.1.4?

Yes, same symptoms.
Comment 7 Artem S. Tashkinov 2023-01-09 12:24:24 UTC
It's being discussed on LKML as well, thanks!

https://lkml.org/lkml/2023/1/4/599
Comment 8 Bjorn Helgaas 2023-02-10 23:18:31 UTC
This should be fixed by
https://git.kernel.org/linus/ff209ecc376a
which will appear in v6.2-rc8.

Please reopen if you can reproduce the problem on v6.2-rc8 or later.
Comment 9 Raghav Shankar 2023-03-04 07:47:38 UTC
This still affects my system on v6.2.2 (with zen patches). I have to restart NetworkManager.service to get any networking working again. Additionally, startup got a lot slower after updating to v6.2.2.
Comment 10 Raghav Shankar 2023-03-04 07:48:23 UTC
To clarify, the startup problem seems to be caused by network manager taking too long.
Comment 11 Bjorn Helgaas 2023-03-09 21:47:56 UTC
Raghav, thanks very much for your report.  I'm not sure the problem you're seeing is the same as what Thomas reported here.  Thomas reported that his system was completely unusable after suspend/resume, and the only thing he could really do was reboot (and even that didn't work reliably).

In your case, it sounds like the system is slow and something is wrong with NetworkManager.  Is this a regression?  If it used to work, and simply upgrading the kernel to v6.2.2 caused problems, then we should look for a kernel issue.

If it seems like a kernel issue, can you please open a new report with more details (distro details, dmesg log, "sudo lspci -vv" output)?  Most kernel subsystems don't pay attention to bugzilla, so email to linux-kernel@vger.kernel.org and whatever other list seems relevant is probably best.  Maybe netdev@vger.kernel.org if you think it's network-related, or linux-pci@vger.kernel.org if you think it's PCI-related.
Comment 12 Raghav Shankar 2023-03-10 14:55:42 UTC
Ah, thank you for your response. I actually managed to fix it by passing ibt=off to the kernel cmdline, as that feature was causing issues with systemd service bringup. Thank you again for taking the time to help me here.
Comment 13 Bjorn Helgaas 2023-03-10 15:35:40 UTC
Hmmm.  That's horrible.  "ibt=off" isn't documented at all, and even if it were, users should not be required to diagnose the slowdown and somehow figure out to use "ibt=off" to avoid it, so I would definitely consider "ibt=off" as an interim *workaround*, but not an actual *fix*.

I did find a couple bug reports that mention "ibt=off" as a workaround:

  https://bugs.archlinux.org/task/74886
  https://bugs.archlinux.org/task/74891

Both are related to the nvidia driver, and it looks like you should see "Missing ENDBR" in your dmesg log if you are seeing the same problem.  So, if you're seeing that problem, I guess using "ibt=off" is OK.

But if you're seeing something different, i.e., you're not using nvidia, please report it to linux-kernel@vger.kernel.org and Peter Zijlstra <peterz@infradead.org> (and cc: me).  We would want to see the complete dmesg log to help figure this out.
Comment 14 Raghav Shankar 2023-03-10 17:09:07 UTC
I came across this through some discussions around that, like this one: https://bbs.archlinux.org/viewtopic.php?id=276805. Before this, I had tried out a few other kernel versions, and I had this same issue in 5.19 as well, I believe. The one I've linked has to do with 5.18. I do use NVidia hardware, which is what could have caused this. I'm not sure if I saw the string you describe when I was looking at my dmesg.
Comment 15 Thomas Witt 2023-06-28 10:20:53 UTC
Created attachment 304494 [details]
output of dmidecode
Comment 16 David Box 2023-07-05 20:45:12 UTC
Created attachment 304553 [details]
[PATCH] PCI/ASPM: Add back L1 PM Substate save and restore with delay

Hi Thomas.

Can you please try the attached patch. It is Mika's patch that does the save and restore but adds a delay between steps to check if the failure is related to timing.
Comment 17 Werner Sembach [TUXEDO] 2023-08-24 14:33:02 UTC
Hi,
so I unsuccessfuly tried to reproduce the bug on a TUXEDO InfinityBook S 14 v5.
I'm using 6.5-rc7 with this revert reverted: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff209ecc376a
and
BIOS 1.07.09RTR1
EC 1.07.08TR5
Nothing bad happened for me after resume.
Obligatory question: Which BIOS and EC firmware are you on?
Comment 18 Mika Westerberg 2023-08-31 09:21:51 UTC
Hi Werner. This is the same system Thomas has, right (InfinityBook S 14 v5)? And the original problem does not reproduce? Do you use "mem_sleep=deep" in the command line or the default s2idle?
Comment 19 Werner Sembach [TUXEDO] 2023-09-04 13:26:45 UTC
Yes the same device.
I did use the default s2idle.
Comment 20 Thomas Witt 2023-09-04 13:41:53 UTC
(In reply to Werner Sembach [TUXEDO] from comment #19)
> Yes the same device.
> I did use the default s2idle.

The bug triggers for me when I set mem_sleep=deep. Without that setting, I get normal idle power consumption in suspend to ram state.
Comment 21 Mika Westerberg 2023-09-05 11:32:25 UTC
Thomas, did you check the firmware versions, are they same and if no, is there an update that could allow you to use the default s2idle instead?

Werner, is this system expected to use s2idle? I would think so and if that's the case entering s2idle should not consume much more power compared to S3. We saw some weird behavior with the Thomas' system that the CPUs did not enter low power states when he run https://github.com/intel/S0ixSelftestTool.
Comment 22 Thomas Witt 2023-09-05 13:30:49 UTC
Sure, I have the same Firmware versions: 1.07.09RTR1 and 1.07.08TR5
Comment 23 Werner Sembach [TUXEDO] 2023-09-08 17:42:38 UTC
We also could not reproduce the issue with deep-sleep
Comment 24 Mika Westerberg 2023-09-09 06:08:06 UTC
Hm, that's weird. The you both have the same system with same firmwares but different results? We can't really add that system to denylist because it defaults to s2idle and in order to save power and enter deeper C-states the L1SS needs to be saved/restore properly. So I'm kind of confused what we should do with this :(
Comment 25 Werner Sembach [TUXEDO] 2023-09-12 15:21:41 UTC
@Thomas Witt
Can you try a fresh TUXEDO OS install?
Which nvme are you using?
Comment 26 Thomas Witt 2023-10-05 15:57:33 UTC
(In reply to Werner Sembach [TUXEDO] from comment #25)
> @Thomas Witt
> Can you try a fresh TUXEDO OS install?
> Which nvme are you using?

I tried with TUXEDO OS in a live-system, but I cannot curently reimage the whole computer.

The NVME is a Samsung 970 EVO, I believe its original.
Comment 27 David Box 2023-10-24 20:27:30 UTC
Hi Thomas,

Following up from the conversation with Bjorn, we need to examine that state of you PCI devices before and after suspend in the deep and s2idle case. Steps are below. I've also added a command to enable ACPI debug to view the ACPI commands that are executed during suspend and resume. These commands require CONFIG_ACPI_DEBUG be set in your kernel. If that isn't set or the commands don't work, you can skip them. Also note the lspci commands have 3 v's and 4 x's. This enables a full decoding and dump of config space.

Reboot

1. sudo lspci -vvvxxxx > lspci_before_suspend.txt
2. echo 0x70 | sudo tee /sys/module/acpi/parameters/debug_layer
3. echo 0x8000C | sudo tee /sys/module/acpi/parameters/debug_level
5. Short suspend, mem_sleep=deep
6. dmesg > dmesg_log_s3 # May need to be root to do this.
7. sudo lspci -vvvxxxx > lspci_after_s3.txt

Reboot

1. echo 0x70 | sudo tee /sys/module/acpi/parameters/debug_layer
2. echo 0x8000C | sudo tee /sys/module/acpi/parameters/debug_level
3. do a short suspend, mem_sleep=s2idle
4. dmesg > dmesg_log_s2idle
5. sudo lspci -vvvxxxx > lspci_after_s2idle.txt

Attach files
lspci_before_suspend.txt
lspci_after_s3.txt
lspci_after_s2idle.txt
dmesg_log_s3
dmesg_log_s2idle
Comment 28 Thomas Witt 2023-11-05 13:31:38 UTC
Created attachment 305369 [details]
debug log for comment #27

I cannot get dmesg and lscpi after s3, the system does not provide enough valid IO options. I have a netconsole log, but I don't see a way to make lspci happen.
Comment 29 David Box 2023-11-06 22:11:52 UTC
Hi Thomas,

Thanks for the files.

For the netconsole output, it's unclear to me if this includes both suspend and resume output. Given the iwlwifi errors it looks like resume had started. How long did you suspend for?

Also, it looks like you sent the S3 files for a kernel with Mika's patch applied since it failed. Can you send them without the patch applied (the way you normally run the system) so we can see how it behaves differently in the working case?

1. echo 0x70 | sudo tee /sys/module/acpi/parameters/debug_layer
2. echo 0x8000C | sudo tee /sys/module/acpi/parameters/debug_level
3. Short suspend, mem_sleep=deep
4. dmesg > dmesg_log_s3
5. sudo lspci -vvvxxxx > lspci_after_s3.txt

Finally, can you attach a tarball of the output from "acpidump -b". Thanks.
Comment 30 Thomas Witt 2023-11-06 22:42:25 UTC
Created attachment 305374 [details]
debug log for comment #29

Sorry, I misunderstood. Here are all five files on my working kernel without Mika's patch.
Comment 31 Thomas Witt 2023-11-06 22:43:01 UTC
Created attachment 305375 [details]
acpidump -b
Comment 32 David Box 2023-11-08 01:05:24 UTC
Hi Thomas,

Thanks for the additional files. Before suspend lspci shows L1 substates are disabled, but after resume they are enabled which is strange. Your working kernel wouldn't be touching these registers without Mika's patch. So it could BIOS changing it or maybe a daemon that's controlling PM policy. The kernel log does show that your system allows the OS to control ASPM policy. Please run the following to check it.

      cat /sys/module/pcie_aspm/parameters/policy

Please check before (on first boot) and after suspend with your working kernel. If it's anything other than default, you likely have a daemon that's changing it. Either way, it indicates that feature can be set without hosing your PCI devices.

If your system starts with the capability disabled, you can try to enable it explicitly by writing 'powersupersave' to the policy file. Try this with Mika's kernel to see if it works. That is, set powersupersave before suspending and see if you can come back.

David
Comment 33 Thomas Witt 2023-11-08 17:41:39 UTC
(In reply to David Box from comment #32)
> Hi Thomas,
> 
> Thanks for the additional files. Before suspend lspci shows L1 substates are
> disabled, but after resume they are enabled which is strange. Your working
> kernel wouldn't be touching these registers without Mika's patch. So it
> could BIOS changing it or maybe a daemon that's controlling PM policy. The
> kernel log does show that your system allows the OS to control ASPM policy.
> Please run the following to check it.
> 
>       cat /sys/module/pcie_aspm/parameters/policy
> 
> Please check before (on first boot) and after suspend with your working
> kernel. If it's anything other than default, you likely have a daemon that's
> changing it. Either way, it indicates that feature can be set without hosing
> your PCI devices.
> 
> If your system starts with the capability disabled, you can try to enable it
> explicitly by writing 'powersupersave' to the policy file. Try this with
> Mika's kernel to see if it works. That is, set powersupersave before
> suspending and see if you can come back.
> 
> David

That looks like it triggers the issue. It is set to "powersave" right after boot and stays there after a suspend/resume cycle. I believe that setting comes from this .config snippet:

CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEFAULT is not set
CONFIG_PCIEASPM_POWERSAVE=y
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set

I don't remember why I configured this, the system cycles to suspend/resume flawlessly with Mika's patch when I set the policy to either "default", "performance" or "powersupersave" only "performance" leads to the NVME not coming back up.

Which is probably why Werner could not reproduce my issue, if he kept Tuxedo's default .config, he would have used "default".

Thomas
Comment 34 David Box 2023-11-09 00:18:50 UTC
Did you mean only "powersave" leads to the NVME not coming back?

I wasn't able to reproduce the failure you saw but on one of my systems I was able to see L1SS enabled after suspend with policy set to "powersave". This looks like a bug as this should happen only with "powersupersave".
Comment 35 David Box 2023-11-10 16:45:25 UTC
Created attachment 305395 [details]
Reconfigure ASPM after device enable
Comment 36 David Box 2023-11-10 16:54:55 UTC
Hi Thomas,

There's at least one issue I can see here that's not specific to your system. After S3 suspend it looks like BIOS is configuring L1 substates back to preboot settings. But the OS is unaware of this because it doesn't not reconfigure L1SS if the ASPM policy hasn't changed. I can see this on my system as well. Suspend with powersave, L1SS disabled, resume with powersave, L1SS enabled. This is a bug and I've attached a proposed patch to fix it. I'm hoping it can resolve your issue too. Please apply both Mika's patch and the Reconfigure ASPM patch and try again with powersave to see if your system can come back.
Comment 37 Thomas Witt 2023-11-12 10:20:23 UTC
(In reply to David Box from comment #36)
> Hi Thomas,
> 
> There's at least one issue I can see here that's not specific to your
> system. After S3 suspend it looks like BIOS is configuring L1 substates back
> to preboot settings. But the OS is unaware of this because it doesn't not
> reconfigure L1SS if the ASPM policy hasn't changed. I can see this on my
> system as well. Suspend with powersave, L1SS disabled, resume with
> powersave, L1SS enabled. This is a bug and I've attached a proposed patch to
> fix it. I'm hoping it can resolve your issue too. Please apply both Mika's
> patch and the Reconfigure ASPM patch and try again with powersave to see if
> your system can come back.

Yes, works. With Mika's patch (removed the whitelist) and your patch, my system comes back from suspend even with 

- /sys/power/mem_sleep [deep]
- /sys/module/pcie_aspm/parameters/policy [powersave]

set.
Comment 38 Thomas Witt 2023-11-12 10:56:40 UTC
(In reply to David Box from comment #36)
> Hi Thomas,
> 
> There's at least one issue I can see here that's not specific to your
> system. After S3 suspend it looks like BIOS is configuring L1 substates back
> to preboot settings. But the OS is unaware of this because it doesn't not
> reconfigure L1SS if the ASPM policy hasn't changed. I can see this on my
> system as well. Suspend with powersave, L1SS disabled, resume with
> powersave, L1SS enabled. This is a bug and I've attached a proposed patch to
> fix it. I'm hoping it can resolve your issue too. Please apply both Mika's
> patch and the Reconfigure ASPM patch and try again with powersave to see if
> your system can come back.

Please disregard comment #37, I had the wrong kernel booted.
The patch from comment #36 does not change the behaviour.
Comment 39 David Box 2023-11-13 17:30:15 UTC
Okay. Can you try only the patch from comment #35?
Comment 40 Thomas Witt 2023-12-08 14:24:39 UTC
(In reply to David Box from comment #39)
> Okay. Can you try only the patch from comment #35?

Yes, the system can suspend and resume with the patch and /sys/module/pcie_aspm/parameters/policy set to powersave
Comment 41 David Box 2023-12-14 16:47:40 UTC
Thanks. I will add your Tested-by to the patch.
Comment 42 David Box 2023-12-15 20:42:07 UTC
Created attachment 305612 [details]
Disable L1SS and L1 before restore
Comment 43 David Box 2023-12-15 20:43:15 UTC
Created attachment 305613 [details]
Save and restore on POLICY_DEFAULT only
Comment 44 David Box 2023-12-15 20:45:21 UTC
Hi Thomas,

Thanks a lot for your help. I have another request if you can find the time. It's possible that your system didn't work because L1SS was enabled by BIOS after suspend and the restore patch didn't disable it first. So I've got two patches to test that theory. They need to be applied and tested separately. If either works, then they are a preferred solution over the last patch you tested.
Comment 45 David Box 2023-12-21 00:46:54 UTC
Created attachment 305639 [details]
Add back save and restore. Disable child before parent.

Here's the proposed final fix I'm sending to the mailing list. Because it differs from patches that you guys have tested before (in order to fix the issue discovered by Thomas) would appreciate if you can test again. Thanks.

Note You need to log in before you can comment on or make changes to this bug.