Bug 216773 - PM: resume devices took 60.044 seconds || Dell Inspiron 14 5425
Summary: PM: resume devices took 60.044 seconds || Dell Inspiron 14 5425
Status: RESOLVED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: AMD Linux
: P1 normal
Assignee: Mario Limonciello (AMD)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-12-04 18:43 UTC by David Alvarez Lombardi
Modified: 2023-03-19 22:56 UTC (History)
7 users (show)

See Also:
Kernel Version: 6.0.10-300.fc37.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Output of `dmidecode` on Inspiron 14 5425 (28.01 KB, text/plain)
2022-12-04 18:44 UTC, David Alvarez Lombardi
Details
Output of `acpidump` on Inspiron 14 5425 (974.64 KB, text/plain)
2022-12-04 18:46 UTC, David Alvarez Lombardi
Details
Patch v1 (1.65 KB, patch)
2022-12-05 21:13 UTC, Mario Limonciello (AMD)
Details | Diff
Patch v2 (3.00 KB, patch)
2023-02-14 21:56 UTC, Mario Limonciello (AMD)
Details | Diff
acpidump+dmesg+dmidecode+lspci on HP Elitebook 645 G9 (312.99 KB, application/zip)
2023-02-19 11:12 UTC, Elvis Angelaccio
Details
amd_s2idle.py report on HP Elitebook 645 G9 (280.93 KB, text/plain)
2023-02-25 16:27 UTC, Elvis Angelaccio
Details
amd_s2idle.py report on HP Elitebook 645 G9 (with v2 patch) (87.45 KB, text/plain)
2023-02-25 17:50 UTC, Elvis Angelaccio
Details
sleepstudy report on HP Elitebook 645 G9 (506.63 KB, text/html)
2023-02-26 13:45 UTC, Elvis Angelaccio
Details
Module parameter for nvme v1 (3.16 KB, application/mbox)
2023-02-27 16:43 UTC, Mario Limonciello (AMD)
Details
DSDT extracted from acpidump on Windows 11 (HP Elitebook 645 G9) (618.20 KB, text/x-csrc)
2023-02-27 22:55 UTC, Elvis Angelaccio
Details

Description David Alvarez Lombardi 2022-12-04 18:43:17 UTC
My device (Inspiron 14 5425, Ryzen 7 5825U, Radeon Graphics) consistently takes over a minute to wake from suspension.

I believe this is the same bug as was fixed in the following issue, but I am not 100% sure. There are just many similarities in the hardware in use and the logs shared.

https://bugzilla.kernel.org/show_bug.cgi?id=216440

I would send output from `inxi` or another command to give some context, but I don't want to overload with logs and command output, so for now I will just include a crucial extract from `dmesg` along with the output of `dmidecode` and `acpidump` as was requested on the above-mentioned ticket. Please let me know what other info I can include to be of more help.

Relevant `dmesg` output extract:

[Sun Dec  4 17:27:14 2022] nvme nvme0: I/O 477 (I/O Cmd) QID 5 timeout, aborting
[Sun Dec  4 17:27:14 2022] nvme nvme0: I/O 478 (I/O Cmd) QID 5 timeout, aborting
[Sun Dec  4 17:27:14 2022] nvme nvme0: I/O 479 (I/O Cmd) QID 5 timeout, aborting
[Sun Dec  4 17:27:14 2022] nvme nvme0: I/O 480 (I/O Cmd) QID 5 timeout, aborting
[Sun Dec  4 17:27:35 2022] nvme nvme0: I/O 27 QID 0 timeout, reset controller
[Sun Dec  4 17:27:35 2022] nvme nvme0: Abort status: 0x371
[Sun Dec  4 17:27:35 2022] nvme nvme0: Abort status: 0x371
[Sun Dec  4 17:27:35 2022] nvme nvme0: Abort status: 0x371
[Sun Dec  4 17:27:35 2022] nvme nvme0: Abort status: 0x371
[Sun Dec  4 17:27:35 2022] nvme 0000:02:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16
[Sun Dec  4 17:27:35 2022] nvme 0000:02:00.0: PM: failed to resume async: error -16
[Sun Dec  4 17:27:35 2022] PM: resume devices took 60.044 seconds
[Sun Dec  4 17:27:35 2022] ------------[ cut here ]------------
[Sun Dec  4 17:27:35 2022] Component: resume devices, time: 60044
[Sun Dec  4 17:27:35 2022] WARNING: CPU: 12 PID: 86400 at kernel/power/suspend_test.c:53 suspend_test_finish+0x70/0x80
[Sun Dec  4 17:27:35 2022] Modules linked in: tls uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc vfat fat snd_soc_dmic snd_acp3x_rn snd_acp3x_pdm_dma mt7921e intel_rapl_msr snd_sof_amd_renoir intel_rapl_common snd_ctl_led snd_sof_amd_acp mt7921_common snd_sof_pci snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_sof mt76_connac_lib snd_hda_intel snd_sof_utils snd_intel_dspcfg mt76 dell_laptop snd_intel_sdw_acpi dell_smm_hwmon snd_soc_core uvcvideo btusb snd_hda_codec btrtl videobuf2_vmalloc videobuf2_memops snd_compress mac80211 edac_mce_amd btbcm videobuf2_v4l2 snd_hda_core ac97_bus snd_pcm_dmaengine videobuf2_common btintel snd_rpl_pci_acp6x snd_hwdep kvm_amd snd_pci_acp6x libarc4 snd_seq btmtk kvm videodev
[Sun Dec  4 17:27:35 2022]  snd_seq_device dell_wmi irqbypass bluetooth snd_pci_acp5x snd_pcm cfg80211 dell_smbios rapl mc joydev snd_rn_pci_acp3x snd_timer dcdbas snd_acp_config ledtrig_audio snd sparse_keymap dell_wmi_descriptor pcspkr wmi_bmof snd_soc_acpi dell_rbtn soundcore snd_pci_acp3x i2c_piix4 k10temp rfkill acpi_tad amd_pmc zram amdgpu drm_ttm_helper ttm iommu_v2 nvme gpu_sched drm_buddy nvme_core drm_display_helper ucsi_acpi hid_multitouch crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel ccp typec_ucsi serio_raw cec sp5100_tco typec nvme_common wmi video i2c_hid_acpi i2c_hid ip6_tables ip_tables fuse
[Sun Dec  4 17:27:35 2022] CPU: 12 PID: 86400 Comm: systemd-sleep Not tainted 6.0.10-300.fc37.x86_64 #1
[Sun Dec  4 17:27:35 2022] Hardware name: Dell Inc. Inspiron 14 5425/0J9C2M, BIOS 1.5.0 09/13/2022
[Sun Dec  4 17:27:35 2022] RIP: 0010:suspend_test_finish+0x70/0x80
[Sun Dec  4 17:27:35 2022] Code: 03 00 00 29 c1 e8 0e 4f bf 00 81 fb 10 27 00 00 77 07 5b 5d c3 cc cc cc cc 89 da 48 89 ee 48 c7 c7 eb 16 75 98 e8 18 ed be 00 <0f> 0b 5b 5d c3 cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 0f
[Sun Dec  4 17:27:35 2022] RSP: 0018:ffff9ec554d0fd68 EFLAGS: 00010286
[Sun Dec  4 17:27:35 2022] RAX: 0000000000000026 RBX: 000000000000ea8c RCX: 0000000000000000
[Sun Dec  4 17:27:35 2022] RDX: 0000000000000001 RSI: ffffffff987b08f2 RDI: 00000000ffffffff
[Sun Dec  4 17:27:35 2022] RBP: ffffffff9875160b R08: ffffffff990662e0 R09: 0000000000000004
[Sun Dec  4 17:27:35 2022] R10: ffffffffffffffff R11: ffffffff99c04bfe R12: ffffffff987515c6
[Sun Dec  4 17:27:35 2022] R13: ffff89324131d450 R14: 0000000000000004 R15: ffff893246551b20
[Sun Dec  4 17:27:35 2022] FS:  00007fb523029b40(0000) GS:ffff89353e900000(0000) knlGS:0000000000000000
[Sun Dec  4 17:27:35 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Dec  4 17:27:35 2022] CR2: 00007f77f95d47e0 CR3: 000000023abae000 CR4: 0000000000750ee0
[Sun Dec  4 17:27:35 2022] PKRU: 55555554
[Sun Dec  4 17:27:35 2022] Call Trace:
[Sun Dec  4 17:27:35 2022]  <TASK>
[Sun Dec  4 17:27:35 2022]  suspend_devices_and_enter+0x1bd/0x870
[Sun Dec  4 17:27:35 2022]  pm_suspend.cold+0x2d2/0x35e
[Sun Dec  4 17:27:35 2022]  state_store+0x68/0xd0
[Sun Dec  4 17:27:35 2022]  kernfs_fop_write_iter+0x11e/0x1f0
[Sun Dec  4 17:27:35 2022]  vfs_write+0x222/0x3e0
[Sun Dec  4 17:27:35 2022]  ksys_write+0x5b/0xd0
[Sun Dec  4 17:27:35 2022]  do_syscall_64+0x5b/0x80
[Sun Dec  4 17:27:35 2022]  ? exc_page_fault+0x70/0x170
[Sun Dec  4 17:27:35 2022]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[Sun Dec  4 17:27:35 2022] RIP: 0033:0x7fb52331e0c4
[Sun Dec  4 17:27:35 2022] Code: 15 71 7d 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d 3d 05 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
[Sun Dec  4 17:27:35 2022] RSP: 002b:00007ffdd6eaa7c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[Sun Dec  4 17:27:35 2022] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fb52331e0c4
[Sun Dec  4 17:27:35 2022] RDX: 0000000000000004 RSI: 00007ffdd6eaa8b0 RDI: 0000000000000004
[Sun Dec  4 17:27:35 2022] RBP: 00007ffdd6eaa8b0 R08: 000055de8f7261c0 R09: 0000000000000000
[Sun Dec  4 17:27:35 2022] R10: 000055de8ee78158 R11: 0000000000000202 R12: 0000000000000004
[Sun Dec  4 17:27:35 2022] R13: 000055de8f7223b0 R14: 0000000000000004 R15: 00007fb5233f2a00
[Sun Dec  4 17:27:35 2022]  </TASK>
[Sun Dec  4 17:27:35 2022] ---[ end trace 0000000000000000 ]---
[Sun Dec  4 17:27:35 2022] OOM killer enabled.
[Sun Dec  4 17:27:35 2022] Restarting tasks ... done.

Also, for anyone interested in some context, I have been troubleshooting in the following ask.fedoraproject post up until I found the above-mentioned bug report.

https://ask.fedoraproject.org/t/fedora-36-slow-wake-time-dell-inspiron-ryzen-7-radeon-graphics/28903

Thank you.
Comment 1 David Alvarez Lombardi 2022-12-04 18:44:44 UTC
Created attachment 303355 [details]
Output of `dmidecode` on Inspiron 14 5425

Kernel: 6.0.10-300.fc37.x86_64
Comment 2 David Alvarez Lombardi 2022-12-04 18:46:37 UTC
Created attachment 303356 [details]
Output of `acpidump` on Inspiron 14 5425

Kernel: 6.0.10-300.fc37.x86_64
Comment 3 Mario Limonciello (AMD) 2022-12-05 21:12:28 UTC
Thanks, looks the same to me indeed.  Please have a try with the attached patch.
Comment 4 Mario Limonciello (AMD) 2022-12-05 21:13:02 UTC
Created attachment 303363 [details]
Patch v1
Comment 5 David Alvarez Lombardi 2022-12-07 10:24:13 UTC
How would I go about trying out the patch ? (Newbie here.) Would I have to install some patched version of the kernel ? Wouldn't that necessarily entail upgrading to kernel 6.1 since the patch is against mainline ? Couldn't that cause some incompatibilities in my system ? I'm nearing finals week in my masters program so I can't afford to brick my computer if I mess something up which, knowing myself, I will.

When could I count on this fix making it downstream to a Fedora release ? As mentioned in above comments, I'm using 6.0.10-300.fc37.x86_64 .
Comment 6 Hans de Goede 2022-12-07 14:44:35 UTC
David,

I have started a Fedora test-kernel build with Mario's patch from comment 4 added here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=95052279

This is still building at the moment, it will be done in a couple of hours.

Here are some generic instructions for installing a kernel directly from koji (Fedora's buildsystem):

https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

Installing the kernel from rpms is pretty safe, so there is no need to worry about "bricking" your computer.

If for some reason the test-builds gives problems (which I don't expect) then you can always select the previous kernel at the grub bootmenu. See here for how to get the grub menu to show at boot if necessary:

https://hansdegoede.dreamwidth.org/19180.html

Regards,

Hans
Comment 7 Hans de Goede 2022-12-09 08:11:42 UTC
Note the kernel build is done now. If you don't have time to test right away please at least download the rpms from:
https://koji.fedoraproject.org/koji/taskinfo?taskID=95052279

koji will remove the rpms for test builds after about a week to replace the diskspace.
Comment 8 David Alvarez Lombardi 2023-01-04 20:06:17 UTC
I finally have time to try this patch out. I downloaded the following files back when the build finished.

- kernel-core-6.0.11-300.bko216773.fc37.x86_64.rpm
- kernel-modules-6.0.11-300.bko216773.fc37.x86_64.rpm

Before I give it a try, I have a few questions.

1. How can I be sure that the kernel version I am using now will be available for me to revert to ?
2. Going forward, can I still update my system in the same way (dnf update)?
3. When this patch gets merged and comes to a fedora release, can I move back to the normal fedora kernel ? How will I do that ?

Thank you.
Comment 9 Hans de Goede 2023-01-09 09:32:33 UTC
(In reply to David Alvarez Lombardi from comment #8)
> I finally have time to try this patch out. I downloaded the following files
> back when the build finished.
> 
> - kernel-core-6.0.11-300.bko216773.fc37.x86_64.rpm
> - kernel-modules-6.0.11-300.bko216773.fc37.x86_64.rpm
> 
> Before I give it a try, I have a few questions.
> 
> 1. How can I be sure that the kernel version I am using now will be
> available for me to revert to ?

Fedora always keeps the last 3 kernels installed. You can select one of the 2 other kernels in the grub menu. For how to get the grub-menu see:

https://hansdegoede.dreamwidth.org/19180.html

> 2. Going forward, can I still update my system in the same way (dnf update)?

Yes.

> 3. When this patch gets merged and comes to a fedora release, can I move
> back to the normal fedora kernel ? How will I do that ?

This kernel has a lower version then future kernels will have, dnf will automatically install newer kernels and once this is not one of the last 3 kernels it will remove this kernel (unless you are running this kernel, then it will remove the first older kernel).
Comment 10 Mario Limonciello (AMD) 2023-01-26 04:12:47 UTC
Did you ever get a chance to test this?
Comment 11 Victor Bonnelle 2023-02-11 20:02:20 UTC
I have the same laptop, and I tested the changes. It fixes the issue!

Any clue on which release will include it?
Comment 12 Mario Limonciello (AMD) 2023-02-11 20:03:44 UTC
We've been waiting for testing results to submit it into the kernel, I'll submit something next week.
Comment 13 Victor Bonnelle 2023-02-11 20:04:45 UTC
Great! Thanks for your work
Comment 14 Mario Limonciello (AMD) 2023-02-13 21:37:29 UTC
Sent up this patch for review:
https://lore.kernel.org/linux-acpi/20230213213537.6121-1-mario.limonciello@amd.com/T/#t
Comment 15 Mario Limonciello (AMD) 2023-02-14 21:56:46 UTC
Created attachment 303729 [details]
Patch v2

The list is growing too big, after 6.3-rc1 will submit this patch instead.
Comment 16 Victor Bonnelle 2023-02-15 09:41:32 UTC
Great! This one works as well
Comment 17 Elvis Angelaccio 2023-02-18 10:55:12 UTC
(In reply to Mario Limonciello (AMD) from comment #15)
> Created attachment 303729 [details]
> Patch v2
> 
> The list is growing too big, after 6.3-rc1 will submit this patch instead.

Hi, would this patch also work on "Barcelo" CPUs?

I have an HP laptop with Ryzen 7 PRO 5875U (which should be Barcelo and not Cezanne) and I think I'm also affected by this bug.
Comment 18 Mario Limonciello (AMD) 2023-02-18 13:24:36 UTC
Yes it should, but this has not been seen outside Dell. Can we please see an acpidump and kernel log?
Comment 19 Elvis Angelaccio 2023-02-19 11:12:22 UTC
Created attachment 303760 [details]
acpidump+dmesg+dmidecode+lspci on HP Elitebook 645 G9

Sure, in this zip you can find:
- acpidump
- dmesg output *after* the 60 seconds freeze on resume
- dmidecode
- lspci -v

The laptop is an HP Elitebook 645 G9. Linux is installed on the main M2 slot which has a Samsung 970 Evo Plus 1TB. I have a somewhat "peculiar" setup as I have also added a Western Digital SN520 in the M2 2242 WWAN slot of the laptop and I had to add `nvme_core.default_ps_max_latency_us=5500` as kernel parameter on boot, otherwise the kernel would not see the SSD at all. I don't know if this could be related.

Another thing I noticed: this should be the Samsung SSD:

$ cat /sys/bus/pci/drivers/nvme/0000\:04\:00.0/firmware_node/path 
\_SB_.PCI0.GP24.NVME

while this should be the WD SSD:

$ cat /sys/bus/pci/drivers/nvme/0000\:05\:00.0/firmware_node
cat: '/sys/bus/pci/drivers/nvme/0000:05:00.0/firmware_node': No such file or directory
Comment 20 Mario Limonciello (AMD) 2023-02-19 15:44:42 UTC
> Western Digital SN520 in the M2 2242 WWAN slot of the laptop and I had to add
> `nvme_core.default_ps_max_latency_us=5500` as kernel parameter on boot,
> otherwise the kernel would not see the SSD at all. I don't know if this could
> be related.

I think that added SSD is definitely the reason for this.  More below.

> Another thing I noticed: this should be the Samsung SSD:
> $ cat /sys/bus/pci/drivers/nvme/0000\:04\:00.0/firmware_node/path 
> \_SB_.PCI0.GP24.NVME

> while this should be the WD SSD:

> $ cat /sys/bus/pci/drivers/nvme/0000\:05\:00.0/firmware_node
> cat: '/sys/bus/pci/drivers/nvme/0000:05:00.0/firmware_node': No such file or 
> directory

If you look at the DSDT in your acpidump you can see there is only one node that describes and SSD (GP24.NVME).  This node has the `StorageD3Enable` property set and so you can see in your kernel log the matching line:

[    0.676597] nvme 0000:04:00.0: platform quirk: setting simple suspend

Meanwhile if you look at lines related to your other SSD:

[    0.676695] nvme nvme1: pci function 0000:05:00.0
[  489.166438] nvme 0000:05:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xf0 returns -16
[  489.166457] nvme 0000:05:00.0: PM: failed to resume async: error -16

That is the added SSD doesn't have anything in the BIOS to ascribe what properties to apply to it.  The system vendor probably hadn't actually tested a second SSD being added to that slot and so they didn't add anything like this.

The good news however is the new v2 patch that I had proposed should actually help your configuration.  It will apply that property to all SSDs in the system whether they have that property assigned or not.

I have a few asks from you:

1) Can you please confirm the AMD s2idle testing script https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py doesn't catch your case when you test it?
I don't think it will and I'd like to extend it for your case if so.

2) Can you try the v2 patch and see if it helps?  If it does, can you please get me a full dmesg for it working?

3) I'll follow up with changes to that debugging script if 1 didn't catch it and 2 does fix it.
Comment 21 Elvis Angelaccio 2023-02-20 09:20:23 UTC
(In reply to Mario Limonciello (AMD) from comment #20)
> The system vendor probably hadn't actually
> tested a second SSD being added to that slot and so they didn't add anything
> like this.

That would be surprising, since HP itself advertises in the laptop specs the support for a second SSD in the WWAN slot.

> 
> The good news however is the new v2 patch that I had proposed should
> actually help your configuration.  It will apply that property to all SSDs
> in the system whether they have that property assigned or not.
> 

Awesome! 

> I have a few asks from you:

Sure, I'll report back in a few days. Thanks!
Comment 22 Mario Limonciello (AMD) 2023-02-20 15:11:15 UTC
> Sure, I'll report back in a few days. Thanks!

OK.

> 3) I'll follow up with changes to that debugging script if 1 didn't catch it
> and 2 does fix it.

I've modified the script already for what I think should cover your case, we'll see when you test it.

> That would be surprising, since HP itself advertises in the laptop specs the
> support for a second SSD in the WWAN slot.

In this case, I'd like to also see a sleep study report performed from Windows that includes a suspend cycle of at least 15 minutes.  I want to see if the system entered into a hardware sleep state or not.

https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-sleepstudy
Comment 23 mayurchoksione 2023-02-23 12:12:38 UTC
Hi
I have a Dell Inspiron 5425 and have the same issue, i.e. start after suspend tkae 60 odd seconds.

I understand I need to pathe the kernel using the following rpms

- kernel-core-6.0.11-300.bko216773.fc37.x86_64.rpm
- kernel-modules-6.0.11-300.bko216773.fc37.x86_64.rpm

I am not sure where I can locate these files. 
Your help is much appreiated.
Comment 24 Elvis Angelaccio 2023-02-25 16:27:10 UTC
Created attachment 303786 [details]
amd_s2idle.py report on HP Elitebook 645 G9

@Mario: here's the report generated by `sudo python amd_s2idle.py` accepting the default arguments.
Comment 25 Elvis Angelaccio 2023-02-25 16:29:41 UTC
(In reply to Mario Limonciello (AMD) from comment #22)
> In this case, I'd like to also see a sleep study report performed from
> Windows that includes a suspend cycle of at least 15 minutes.  I want to see
> if the system entered into a hardware sleep state or not.
> 
> https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/
> modern-standby-sleepstudy

I can try. Can I keep the 2nd SSD formatted with Ext4, or do I need to format it with NTFS for this test?
Comment 26 Mario Limonciello (AMD) 2023-02-25 16:41:44 UTC
> I can try. Can I keep the 2nd SSD formatted with Ext4, or do I need to format
> it with NTFS for this test?

You can keep it as is.

> @Mario: here's the report generated by `sudo python amd_s2idle.py` accepting
> the default arguments.
[tag] [reply] [−] PrivateComment 25Elvis Angelaccio

Is this a patched kernel? I would guess so - Both nvme are configured for s2idle but you're still having some major problems with sleep residency and gpio 18 is active.
Comment 27 Elvis Angelaccio 2023-02-25 17:16:47 UTC
(In reply to Mario Limonciello (AMD) from comment #26)
> > I can try. Can I keep the 2nd SSD formatted with Ext4, or do I need to
> format
> > it with NTFS for this test?
> 
> You can keep it as is.
> 
> > @Mario: here's the report generated by `sudo python amd_s2idle.py`
> accepting
> > the default arguments.
> [tag] [reply] [−] PrivateComment 25Elvis Angelaccio
> 
> Is this a patched kernel? I would guess so - Both nvme are configured for
> s2idle but you're still having some major problems with sleep residency and
> gpio 18 is active.

No I ran the script with unpatched 6.1.12 kernel from the archlinux package. I'm building right now the 6.1.12 kernel with your v2 patch applied.
Comment 28 Elvis Angelaccio 2023-02-25 17:50:11 UTC
Created attachment 303787 [details]
amd_s2idle.py report on HP Elitebook 645 G9 (with v2 patch)

@Mario: the v2 patch seems to work also on my laptop :)
Resume from sleep is now instant, no more 60 seconds freeze.

Here's the report from amd_s2idle.py ran on patched 6.1.12 kernel.
Comment 29 Mario Limonciello (AMD) 2023-02-25 20:23:21 UTC
> No I ran the script with unpatched 6.1.12 kernel from the archlinux package.
> I'm building right now the 6.1.12 kernel with your v2 patch applied.

OK the script seems to have detected the broken case wrong then.  If you don't mind I'd like to get this fixed so future people can rely upon it too.

I've pushed a change, can refresh to the new version and see see if it helps detect your broken case?

> @Mario: the v2 patch seems to work also on my laptop :)
> Resume from sleep is now instant, no more 60 seconds freeze.
> Here's the report from amd_s2idle.py ran on patched 6.1.12 kernel.

Looks great! So we have a confirmed root cause then for you too.  I'll adjust the changelog to cover your case and will be sending this patch out shortly after 6.3-rc1.
Comment 30 Elvis Angelaccio 2023-02-26 10:33:41 UTC
(In reply to Mario Limonciello (AMD) from comment #29)
> > No I ran the script with unpatched 6.1.12 kernel from the archlinux
> package.
> > I'm building right now the 6.1.12 kernel with your v2 patch applied.
> 
> OK the script seems to have detected the broken case wrong then.  If you
> don't mind I'd like to get this fixed so future people can rely upon it too.
> 
> I've pushed a change, can refresh to the new version and see see if it helps
> detect your broken case?

Yep, the updated script now detects the broken case on the unpatched kernel!

Excerpt of the output:

NVME Sandisk Corp PC SN520 NVMe SSD is not configured for s2idle in BIOS
Your system does not meet s2idle prerequisites!
S0i3 failures reported on your system
Sandisk Corp PC SN520 NVMe SSD missing ACPI attributes
Comment 31 Elvis Angelaccio 2023-02-26 13:45:57 UTC
Created attachment 303788 [details]
sleepstudy report on HP Elitebook 645 G9

@Mario: here's the report generated by powercfg.exe /SleepStudy after a ~40 minutes sleep.

Note: Windows is installed on another NVME ssd (the OEM one), which I had to swap with the Samung 970 ssd (where I installed Linux) for doing this test.

Let me know if you need something else from Windows.
Comment 32 Mario Limonciello (AMD) 2023-02-26 14:22:15 UTC
> Yep, the updated script now detects the broken case on the unpatched kernel!

Great, thanks!

> @Mario: here's the report generated by powercfg.exe /SleepStudy after a ~40
> minutes sleep.

So on Windows somehow it knows to use D3 for both SSD's even though the BIOS is only set for one of them.  AFAICT there is a separate OS configuration key.
Perhaps this is set and arranging such a policy.

https://learn.microsoft.com/en-us/windows/configuration/wcd/wcd-storaged3inmodernstandby

> Let me know if you need something else from Windows.

If you're able to figure out whether such a setting is configured it might be indicative of how we should handle this for Linux going forward.  If Windows does really offer a knob for this, we may want to do the same in Linux as well.  Otherwise we might be doing the same patch for Rembrandt, and then another for Mendocino, etc.
Comment 33 Mario Limonciello (AMD) 2023-02-27 16:43:37 UTC
Created attachment 303796 [details]
Module parameter for nvme v1

Attached is a patch that should introduce a module parameter for the NVME module.  I'm still planning on adding the CPU ID after 6.3-rc1, but can you please see if this patch instead with nvme.simple_suspend=1 on kernel command line also helps?

We can upstream both of them and then offer more options in the future for users to (hopefully) match Windows behavior.
Comment 34 Elvis Angelaccio 2023-02-27 22:51:43 UTC
(In reply to Mario Limonciello (AMD) from comment #32)
> > Yep, the updated script now detects the broken case on the unpatched
> kernel!
> 
> Great, thanks!
> 
> > @Mario: here's the report generated by powercfg.exe /SleepStudy after a ~40
> > minutes sleep.
> 
> So on Windows somehow it knows to use D3 for both SSD's even though the BIOS
> is only set for one of them.  AFAICT there is a separate OS configuration
> key.
> Perhaps this is set and arranging such a policy.
> 
> https://learn.microsoft.com/en-us/windows/configuration/wcd/wcd-
> storaged3inmodernstandby
> 
> > Let me know if you need something else from Windows.
> 
> If you're able to figure out whether such a setting is configured it might
> be indicative of how we should handle this for Linux going forward.  If
> Windows does really offer a knob for this, we may want to do the same in
> Linux as well.  Otherwise we might be doing the same patch for Rembrandt,
> and then another for Mendocino, etc.

I've looked into this and found interesting things. The StorageD3InModernStandby registry key was NOT set on my system. According to the MS docs at [1], "if the registry key is not configured, then Storport will check the _DSD configuration to determine whether to enable D3. If the _DSD is not implemented, then Storport will check whether the platform is on the allowlist for D3 support."

So, the next thing I did was to get an acpidump from Windows to compare the dsdt.dsl file with the one extracted from Linux. I was expecting to get the same file, but to my surprise the dsdt.dsl extracted on Windows is different! In particular, the StorageD3Enable property occurs twice, while in the Linux file only once...
I'm just guessing here, I don't know the acpi "language" and I can't say this is the reason why Windows enables D3, but it looks strongly related...

Is it even possible that the DSDT is different depending on the OS?

Anyway, I'll attach the Windows dsdt.dsl file just for the record.

[1]: https://learn.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro
Comment 35 Elvis Angelaccio 2023-02-27 22:55:05 UTC
Created attachment 303803 [details]
DSDT extracted from acpidump on Windows 11 (HP Elitebook 645 G9)
Comment 36 Mario Limonciello (AMD) 2023-02-27 22:58:56 UTC
> So, the next thing I did was to get an acpidump from Windows to compare the
> dsdt.dsl file with the one extracted from Linux. I was expecting to get the
> same file, but to my surprise the dsdt.dsl extracted on Windows is different! 

Off the cuff that seems like a surprising result but I think it's actually a red herring.  The DSDT shows each of them is conditionally loaded depending upon "how new" the Windows version is.

As the registry key isn't set in Windows but the properties are applied to both disks I have to wonder if perhaps on Windows it's a "global setting".  That is when one disk uses D3 the OS assumes all disks should.
Comment 37 Elvis Angelaccio 2023-02-27 23:11:46 UTC
Ok, I see.

Anyway, I'll try your new patch with the simple_suspend argument. Do I need to apply both the previous v2 patch and the simple_suspend one? Or just the latter?
Comment 38 Mario Limonciello (AMD) 2023-02-27 23:12:53 UTC
Just this one.
Comment 39 mayurchoksione 2023-02-28 12:08:07 UTC
Hi,
How do I apply the patch?
I am running Ubuntu 22.04.

Downloaded the patch (Patch V2). Then executed the following command:


patch <  ~/Downloads/0001-ACPI-x86-Add-Cezanne-to-the-list-for-forcing-Storage.patch
Comment 40 Mario Limonciello (AMD) 2023-02-28 16:33:05 UTC
> How do I apply the patch?

You would need to obtain the kernel sources, apply the patch to them and compile and install your own kernel binary.
Comment 41 hurricanepootis 2023-02-28 19:47:14 UTC
I just applied patch v2 to linux 6.2.1. I am running Arch Linux on a Dell Inspiron 14 5425 with an AMD Ryzen 5 5625U. I was able to suspend and resume without the 60 second delay. I give this patch a beautiful thumbs up 👍.
Comment 42 Mario Limonciello (AMD) 2023-02-28 19:54:11 UTC
@hurricanepootis@protonmail.com 

Thanks!  Would you please also test the NVME version of the patch with the module parameter nvme.simple_suspend=1 on your kernel command line and without the v2 patch applied?
Comment 43 Elvis Angelaccio 2023-02-28 22:03:24 UTC
I can confirm that the nvme.simple_suspend patch also works.
Comment 44 Mario Limonciello (AMD) 2023-02-28 22:04:06 UTC
OK thanks!  I'll send out both then.
Comment 45 hurricanepootis 2023-02-28 22:20:23 UTC
(In reply to Mario Limonciello (AMD) from comment #42)
> @hurricanepootis@protonmail.com 
> 
> Thanks!  Would you please also test the NVME version of the patch with the
> module parameter nvme.simple_suspend=1 on your kernel command line and
> without the v2 patch applied?

I would, but it doesn't work for linux 6.2.1, and the building process I use (since I am not a kernel developer, just an end user), only allows for building stable point releases. Sorry.
Comment 46 Mario Limonciello (AMD) 2023-02-28 22:21:30 UTC
No worry, Elvis tested it so I sent it out.
Comment 47 mayurchoksione 2023-03-04 10:50:25 UTC
Hi,
Is there a view on the timeline of when this fixed will me merged to a mainline kernel version? Ball-park timeline is good for me.
Comment 48 Mario Limonciello (AMD) 2023-03-04 13:06:28 UTC
The workaround is queued up for 6.3. so one of the RCs.
Comment 49 mayurchoksione 2023-03-04 19:28:24 UTC
Hi Mario,
Can you please suggest if there is a certain timeframe the fix will be merged into one of the mainline versions (6.3). I have windows working fine but prefer Linux... Hence if its gain to take a few days/week I can live with the suspend issue... but if there is not release date in sight for version 6.3.. I may have to, for now, go back to Windows.
Comment 50 Victor Bonnelle 2023-03-04 20:36:13 UTC
Why not build it yourself and use it until 6.3 is released?
Comment 51 Mario Limonciello (AMD) 2023-03-06 02:00:18 UTC
The Cezanne workaround for this BIOS bug is part of 6.3-rc1.
https://github.com/torvalds/linux/commit/e2a56364485e7789e7b8f342637c7f3a219f7ede

The NVME module parameter is still under discussion, but I'll close this issue as we have at least have a solution for these Dell Cezanne systems and the HP 2 disk issue.
Comment 52 mayurchoksione 2023-03-19 22:56:44 UTC
Fixed verified with Linux Kernel 6.3-rc3.
Suspend working as expected. No need to go back to Windows now. Will also take a Timeshift backup so that if any future updates/patches break anything I can go back to the working state using TimeShift Restore.

My Laptop version:
Del Inspiron 14 5425, with AMD Ryzen 7 5825

Note You need to log in before you can comment on or make changes to this bug.