Bug 216773
Summary: | PM: resume devices took 60.044 seconds || Dell Inspiron 14 5425 | ||
---|---|---|---|
Product: | Power Management | Reporter: | David Alvarez Lombardi (dqalombardi) |
Component: | Hibernation/Suspend | Assignee: | Mario Limonciello (AMD) (mario.limonciello) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | dqalombardi, elvis.angelaccio, hurricanepootis, jwrdegoede, mario.limonciello, mayurchoksione, victor.bonnelle |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | 6.0.10-300.fc37.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Output of `dmidecode` on Inspiron 14 5425
Output of `acpidump` on Inspiron 14 5425 Patch v1 Patch v2 acpidump+dmesg+dmidecode+lspci on HP Elitebook 645 G9 amd_s2idle.py report on HP Elitebook 645 G9 amd_s2idle.py report on HP Elitebook 645 G9 (with v2 patch) sleepstudy report on HP Elitebook 645 G9 Module parameter for nvme v1 DSDT extracted from acpidump on Windows 11 (HP Elitebook 645 G9) |
Description
David Alvarez Lombardi
2022-12-04 18:43:17 UTC
Created attachment 303355 [details]
Output of `dmidecode` on Inspiron 14 5425
Kernel: 6.0.10-300.fc37.x86_64
Created attachment 303356 [details]
Output of `acpidump` on Inspiron 14 5425
Kernel: 6.0.10-300.fc37.x86_64
Thanks, looks the same to me indeed. Please have a try with the attached patch. Created attachment 303363 [details]
Patch v1
How would I go about trying out the patch ? (Newbie here.) Would I have to install some patched version of the kernel ? Wouldn't that necessarily entail upgrading to kernel 6.1 since the patch is against mainline ? Couldn't that cause some incompatibilities in my system ? I'm nearing finals week in my masters program so I can't afford to brick my computer if I mess something up which, knowing myself, I will. When could I count on this fix making it downstream to a Fedora release ? As mentioned in above comments, I'm using 6.0.10-300.fc37.x86_64 . David, I have started a Fedora test-kernel build with Mario's patch from comment 4 added here: https://koji.fedoraproject.org/koji/taskinfo?taskID=95052279 This is still building at the moment, it will be done in a couple of hours. Here are some generic instructions for installing a kernel directly from koji (Fedora's buildsystem): https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt Installing the kernel from rpms is pretty safe, so there is no need to worry about "bricking" your computer. If for some reason the test-builds gives problems (which I don't expect) then you can always select the previous kernel at the grub bootmenu. See here for how to get the grub menu to show at boot if necessary: https://hansdegoede.dreamwidth.org/19180.html Regards, Hans Note the kernel build is done now. If you don't have time to test right away please at least download the rpms from: https://koji.fedoraproject.org/koji/taskinfo?taskID=95052279 koji will remove the rpms for test builds after about a week to replace the diskspace. I finally have time to try this patch out. I downloaded the following files back when the build finished. - kernel-core-6.0.11-300.bko216773.fc37.x86_64.rpm - kernel-modules-6.0.11-300.bko216773.fc37.x86_64.rpm Before I give it a try, I have a few questions. 1. How can I be sure that the kernel version I am using now will be available for me to revert to ? 2. Going forward, can I still update my system in the same way (dnf update)? 3. When this patch gets merged and comes to a fedora release, can I move back to the normal fedora kernel ? How will I do that ? Thank you. (In reply to David Alvarez Lombardi from comment #8) > I finally have time to try this patch out. I downloaded the following files > back when the build finished. > > - kernel-core-6.0.11-300.bko216773.fc37.x86_64.rpm > - kernel-modules-6.0.11-300.bko216773.fc37.x86_64.rpm > > Before I give it a try, I have a few questions. > > 1. How can I be sure that the kernel version I am using now will be > available for me to revert to ? Fedora always keeps the last 3 kernels installed. You can select one of the 2 other kernels in the grub menu. For how to get the grub-menu see: https://hansdegoede.dreamwidth.org/19180.html > 2. Going forward, can I still update my system in the same way (dnf update)? Yes. > 3. When this patch gets merged and comes to a fedora release, can I move > back to the normal fedora kernel ? How will I do that ? This kernel has a lower version then future kernels will have, dnf will automatically install newer kernels and once this is not one of the last 3 kernels it will remove this kernel (unless you are running this kernel, then it will remove the first older kernel). Did you ever get a chance to test this? I have the same laptop, and I tested the changes. It fixes the issue! Any clue on which release will include it? We've been waiting for testing results to submit it into the kernel, I'll submit something next week. Great! Thanks for your work Sent up this patch for review: https://lore.kernel.org/linux-acpi/20230213213537.6121-1-mario.limonciello@amd.com/T/#t Created attachment 303729 [details]
Patch v2
The list is growing too big, after 6.3-rc1 will submit this patch instead.
Great! This one works as well (In reply to Mario Limonciello (AMD) from comment #15) > Created attachment 303729 [details] > Patch v2 > > The list is growing too big, after 6.3-rc1 will submit this patch instead. Hi, would this patch also work on "Barcelo" CPUs? I have an HP laptop with Ryzen 7 PRO 5875U (which should be Barcelo and not Cezanne) and I think I'm also affected by this bug. Yes it should, but this has not been seen outside Dell. Can we please see an acpidump and kernel log? Created attachment 303760 [details]
acpidump+dmesg+dmidecode+lspci on HP Elitebook 645 G9
Sure, in this zip you can find:
- acpidump
- dmesg output *after* the 60 seconds freeze on resume
- dmidecode
- lspci -v
The laptop is an HP Elitebook 645 G9. Linux is installed on the main M2 slot which has a Samsung 970 Evo Plus 1TB. I have a somewhat "peculiar" setup as I have also added a Western Digital SN520 in the M2 2242 WWAN slot of the laptop and I had to add `nvme_core.default_ps_max_latency_us=5500` as kernel parameter on boot, otherwise the kernel would not see the SSD at all. I don't know if this could be related.
Another thing I noticed: this should be the Samsung SSD:
$ cat /sys/bus/pci/drivers/nvme/0000\:04\:00.0/firmware_node/path
\_SB_.PCI0.GP24.NVME
while this should be the WD SSD:
$ cat /sys/bus/pci/drivers/nvme/0000\:05\:00.0/firmware_node
cat: '/sys/bus/pci/drivers/nvme/0000:05:00.0/firmware_node': No such file or directory
> Western Digital SN520 in the M2 2242 WWAN slot of the laptop and I had to add > `nvme_core.default_ps_max_latency_us=5500` as kernel parameter on boot, > otherwise the kernel would not see the SSD at all. I don't know if this could > be related. I think that added SSD is definitely the reason for this. More below. > Another thing I noticed: this should be the Samsung SSD: > $ cat /sys/bus/pci/drivers/nvme/0000\:04\:00.0/firmware_node/path > \_SB_.PCI0.GP24.NVME > while this should be the WD SSD: > $ cat /sys/bus/pci/drivers/nvme/0000\:05\:00.0/firmware_node > cat: '/sys/bus/pci/drivers/nvme/0000:05:00.0/firmware_node': No such file or > directory If you look at the DSDT in your acpidump you can see there is only one node that describes and SSD (GP24.NVME). This node has the `StorageD3Enable` property set and so you can see in your kernel log the matching line: [ 0.676597] nvme 0000:04:00.0: platform quirk: setting simple suspend Meanwhile if you look at lines related to your other SSD: [ 0.676695] nvme nvme1: pci function 0000:05:00.0 [ 489.166438] nvme 0000:05:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xf0 returns -16 [ 489.166457] nvme 0000:05:00.0: PM: failed to resume async: error -16 That is the added SSD doesn't have anything in the BIOS to ascribe what properties to apply to it. The system vendor probably hadn't actually tested a second SSD being added to that slot and so they didn't add anything like this. The good news however is the new v2 patch that I had proposed should actually help your configuration. It will apply that property to all SSDs in the system whether they have that property assigned or not. I have a few asks from you: 1) Can you please confirm the AMD s2idle testing script https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py doesn't catch your case when you test it? I don't think it will and I'd like to extend it for your case if so. 2) Can you try the v2 patch and see if it helps? If it does, can you please get me a full dmesg for it working? 3) I'll follow up with changes to that debugging script if 1 didn't catch it and 2 does fix it. (In reply to Mario Limonciello (AMD) from comment #20) > The system vendor probably hadn't actually > tested a second SSD being added to that slot and so they didn't add anything > like this. That would be surprising, since HP itself advertises in the laptop specs the support for a second SSD in the WWAN slot. > > The good news however is the new v2 patch that I had proposed should > actually help your configuration. It will apply that property to all SSDs > in the system whether they have that property assigned or not. > Awesome! > I have a few asks from you: Sure, I'll report back in a few days. Thanks! > Sure, I'll report back in a few days. Thanks! OK. > 3) I'll follow up with changes to that debugging script if 1 didn't catch it > and 2 does fix it. I've modified the script already for what I think should cover your case, we'll see when you test it. > That would be surprising, since HP itself advertises in the laptop specs the > support for a second SSD in the WWAN slot. In this case, I'd like to also see a sleep study report performed from Windows that includes a suspend cycle of at least 15 minutes. I want to see if the system entered into a hardware sleep state or not. https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-sleepstudy Hi I have a Dell Inspiron 5425 and have the same issue, i.e. start after suspend tkae 60 odd seconds. I understand I need to pathe the kernel using the following rpms - kernel-core-6.0.11-300.bko216773.fc37.x86_64.rpm - kernel-modules-6.0.11-300.bko216773.fc37.x86_64.rpm I am not sure where I can locate these files. Your help is much appreiated. Created attachment 303786 [details]
amd_s2idle.py report on HP Elitebook 645 G9
@Mario: here's the report generated by `sudo python amd_s2idle.py` accepting the default arguments.
(In reply to Mario Limonciello (AMD) from comment #22) > In this case, I'd like to also see a sleep study report performed from > Windows that includes a suspend cycle of at least 15 minutes. I want to see > if the system entered into a hardware sleep state or not. > > https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/ > modern-standby-sleepstudy I can try. Can I keep the 2nd SSD formatted with Ext4, or do I need to format it with NTFS for this test? > I can try. Can I keep the 2nd SSD formatted with Ext4, or do I need to format > it with NTFS for this test? You can keep it as is. > @Mario: here's the report generated by `sudo python amd_s2idle.py` accepting > the default arguments. [tag] [reply] [−] PrivateComment 25Elvis Angelaccio Is this a patched kernel? I would guess so - Both nvme are configured for s2idle but you're still having some major problems with sleep residency and gpio 18 is active. (In reply to Mario Limonciello (AMD) from comment #26) > > I can try. Can I keep the 2nd SSD formatted with Ext4, or do I need to > format > > it with NTFS for this test? > > You can keep it as is. > > > @Mario: here's the report generated by `sudo python amd_s2idle.py` > accepting > > the default arguments. > [tag] [reply] [−] PrivateComment 25Elvis Angelaccio > > Is this a patched kernel? I would guess so - Both nvme are configured for > s2idle but you're still having some major problems with sleep residency and > gpio 18 is active. No I ran the script with unpatched 6.1.12 kernel from the archlinux package. I'm building right now the 6.1.12 kernel with your v2 patch applied. Created attachment 303787 [details]
amd_s2idle.py report on HP Elitebook 645 G9 (with v2 patch)
@Mario: the v2 patch seems to work also on my laptop :)
Resume from sleep is now instant, no more 60 seconds freeze.
Here's the report from amd_s2idle.py ran on patched 6.1.12 kernel.
> No I ran the script with unpatched 6.1.12 kernel from the archlinux package. > I'm building right now the 6.1.12 kernel with your v2 patch applied. OK the script seems to have detected the broken case wrong then. If you don't mind I'd like to get this fixed so future people can rely upon it too. I've pushed a change, can refresh to the new version and see see if it helps detect your broken case? > @Mario: the v2 patch seems to work also on my laptop :) > Resume from sleep is now instant, no more 60 seconds freeze. > Here's the report from amd_s2idle.py ran on patched 6.1.12 kernel. Looks great! So we have a confirmed root cause then for you too. I'll adjust the changelog to cover your case and will be sending this patch out shortly after 6.3-rc1. (In reply to Mario Limonciello (AMD) from comment #29) > > No I ran the script with unpatched 6.1.12 kernel from the archlinux > package. > > I'm building right now the 6.1.12 kernel with your v2 patch applied. > > OK the script seems to have detected the broken case wrong then. If you > don't mind I'd like to get this fixed so future people can rely upon it too. > > I've pushed a change, can refresh to the new version and see see if it helps > detect your broken case? Yep, the updated script now detects the broken case on the unpatched kernel! Excerpt of the output: NVME Sandisk Corp PC SN520 NVMe SSD is not configured for s2idle in BIOS Your system does not meet s2idle prerequisites! S0i3 failures reported on your system Sandisk Corp PC SN520 NVMe SSD missing ACPI attributes Created attachment 303788 [details]
sleepstudy report on HP Elitebook 645 G9
@Mario: here's the report generated by powercfg.exe /SleepStudy after a ~40 minutes sleep.
Note: Windows is installed on another NVME ssd (the OEM one), which I had to swap with the Samung 970 ssd (where I installed Linux) for doing this test.
Let me know if you need something else from Windows.
> Yep, the updated script now detects the broken case on the unpatched kernel! Great, thanks! > @Mario: here's the report generated by powercfg.exe /SleepStudy after a ~40 > minutes sleep. So on Windows somehow it knows to use D3 for both SSD's even though the BIOS is only set for one of them. AFAICT there is a separate OS configuration key. Perhaps this is set and arranging such a policy. https://learn.microsoft.com/en-us/windows/configuration/wcd/wcd-storaged3inmodernstandby > Let me know if you need something else from Windows. If you're able to figure out whether such a setting is configured it might be indicative of how we should handle this for Linux going forward. If Windows does really offer a knob for this, we may want to do the same in Linux as well. Otherwise we might be doing the same patch for Rembrandt, and then another for Mendocino, etc. Created attachment 303796 [details]
Module parameter for nvme v1
Attached is a patch that should introduce a module parameter for the NVME module. I'm still planning on adding the CPU ID after 6.3-rc1, but can you please see if this patch instead with nvme.simple_suspend=1 on kernel command line also helps?
We can upstream both of them and then offer more options in the future for users to (hopefully) match Windows behavior.
(In reply to Mario Limonciello (AMD) from comment #32) > > Yep, the updated script now detects the broken case on the unpatched > kernel! > > Great, thanks! > > > @Mario: here's the report generated by powercfg.exe /SleepStudy after a ~40 > > minutes sleep. > > So on Windows somehow it knows to use D3 for both SSD's even though the BIOS > is only set for one of them. AFAICT there is a separate OS configuration > key. > Perhaps this is set and arranging such a policy. > > https://learn.microsoft.com/en-us/windows/configuration/wcd/wcd- > storaged3inmodernstandby > > > Let me know if you need something else from Windows. > > If you're able to figure out whether such a setting is configured it might > be indicative of how we should handle this for Linux going forward. If > Windows does really offer a knob for this, we may want to do the same in > Linux as well. Otherwise we might be doing the same patch for Rembrandt, > and then another for Mendocino, etc. I've looked into this and found interesting things. The StorageD3InModernStandby registry key was NOT set on my system. According to the MS docs at [1], "if the registry key is not configured, then Storport will check the _DSD configuration to determine whether to enable D3. If the _DSD is not implemented, then Storport will check whether the platform is on the allowlist for D3 support." So, the next thing I did was to get an acpidump from Windows to compare the dsdt.dsl file with the one extracted from Linux. I was expecting to get the same file, but to my surprise the dsdt.dsl extracted on Windows is different! In particular, the StorageD3Enable property occurs twice, while in the Linux file only once... I'm just guessing here, I don't know the acpi "language" and I can't say this is the reason why Windows enables D3, but it looks strongly related... Is it even possible that the DSDT is different depending on the OS? Anyway, I'll attach the Windows dsdt.dsl file just for the record. [1]: https://learn.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Created attachment 303803 [details]
DSDT extracted from acpidump on Windows 11 (HP Elitebook 645 G9)
> So, the next thing I did was to get an acpidump from Windows to compare the
> dsdt.dsl file with the one extracted from Linux. I was expecting to get the
> same file, but to my surprise the dsdt.dsl extracted on Windows is different!
Off the cuff that seems like a surprising result but I think it's actually a red herring. The DSDT shows each of them is conditionally loaded depending upon "how new" the Windows version is.
As the registry key isn't set in Windows but the properties are applied to both disks I have to wonder if perhaps on Windows it's a "global setting". That is when one disk uses D3 the OS assumes all disks should.
Ok, I see. Anyway, I'll try your new patch with the simple_suspend argument. Do I need to apply both the previous v2 patch and the simple_suspend one? Or just the latter? Just this one. Hi, How do I apply the patch? I am running Ubuntu 22.04. Downloaded the patch (Patch V2). Then executed the following command: patch < ~/Downloads/0001-ACPI-x86-Add-Cezanne-to-the-list-for-forcing-Storage.patch > How do I apply the patch?
You would need to obtain the kernel sources, apply the patch to them and compile and install your own kernel binary.
I just applied patch v2 to linux 6.2.1. I am running Arch Linux on a Dell Inspiron 14 5425 with an AMD Ryzen 5 5625U. I was able to suspend and resume without the 60 second delay. I give this patch a beautiful thumbs up 👍. @hurricanepootis@protonmail.com Thanks! Would you please also test the NVME version of the patch with the module parameter nvme.simple_suspend=1 on your kernel command line and without the v2 patch applied? I can confirm that the nvme.simple_suspend patch also works. OK thanks! I'll send out both then. (In reply to Mario Limonciello (AMD) from comment #42) > @hurricanepootis@protonmail.com > > Thanks! Would you please also test the NVME version of the patch with the > module parameter nvme.simple_suspend=1 on your kernel command line and > without the v2 patch applied? I would, but it doesn't work for linux 6.2.1, and the building process I use (since I am not a kernel developer, just an end user), only allows for building stable point releases. Sorry. No worry, Elvis tested it so I sent it out. Hi, Is there a view on the timeline of when this fixed will me merged to a mainline kernel version? Ball-park timeline is good for me. The workaround is queued up for 6.3. so one of the RCs. Hi Mario, Can you please suggest if there is a certain timeframe the fix will be merged into one of the mainline versions (6.3). I have windows working fine but prefer Linux... Hence if its gain to take a few days/week I can live with the suspend issue... but if there is not release date in sight for version 6.3.. I may have to, for now, go back to Windows. Why not build it yourself and use it until 6.3 is released? The Cezanne workaround for this BIOS bug is part of 6.3-rc1. https://github.com/torvalds/linux/commit/e2a56364485e7789e7b8f342637c7f3a219f7ede The NVME module parameter is still under discussion, but I'll close this issue as we have at least have a solution for these Dell Cezanne systems and the HP 2 disk issue. Fixed verified with Linux Kernel 6.3-rc3. Suspend working as expected. No need to go back to Windows now. Will also take a Timeshift backup so that if any future updates/patches break anything I can go back to the working state using TimeShift Restore. My Laptop version: Del Inspiron 14 5425, with AMD Ryzen 7 5825 |