Created attachment 301728 [details] dmesg with resume issue The issue occured on a brand new Dell Inspiron 14 7425 2-in-1 laptop equipped with Ryzen 7 5825U processor, an updated AMD Cezanne series named Brados. After a short time, resuming took about 60 seconds to execute. Here is an extract: Aug 31 21:24:54 kernel: nvme nvme0: I/O 914 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 915 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 916 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 917 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 918 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 919 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 920 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 921 QID 12 timeout, aborting Aug 31 21:24:54 kernel: nvme nvme0: I/O 6 QID 0 timeout, reset controller Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16 Aug 31 21:24:54 kernel: nvme 0000:02:00.0: PM: failed to resume async: error -16 Aug 31 21:24:54 kernel: PM: resume devices took 60.047 seconds Aug 31 21:24:54 kernel: ------------[ cut here ]------------ Aug 31 21:24:54 kernel: Component: resume devices, time: 60047 Aug 31 21:24:54 kernel: WARNING: CPU: 6 PID: 4458 at kernel/power/suspend_test.c:53 suspend_test_finish+0x6d/0x80 Aug 31 21:24:54 kernel: Modules linked in: tls uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct> Aug 31 21:24:54 kernel: dell_smbios snd_pci_acp5x snd_timer dcdbas snd snd_rn_pci_acp3x pcspkr mc ecdh_generic sparse_keymap dell_wmi_descriptor wmi_bmof joydev k10temp i2c_piix4 snd_pci_ac> Aug 31 21:24:54 kernel: CPU: 6 PID: 4458 Comm: systemd-sleep Not tainted 5.17.5-300.fc36.x86_64 #1 Aug 31 21:24:54 kernel: Hardware name: Dell Inc. Inspiron 14 7425 2-in-1/063KWG, BIOS 1.4.0 06/27/2022 Aug 31 21:24:54 kernel: RIP: 0010:suspend_test_finish+0x6d/0x80 Aug 31 21:24:54 kernel: Code: 69 c2 e8 03 00 00 29 c1 e8 ef 5e b5 00 81 fb 10 27 00 00 77 04 5b 5d c3 cc 89 da 48 89 ee 48 c7 c7 b0 e8 60 8f e8 4c 01 b5 00 <0f> 0b 5b 5d c3 cc cc cc cc cc cc> Aug 31 21:24:54 kernel: RSP: 0018:ffffbae58038bd80 EFLAGS: 00010282 Aug 31 21:24:54 kernel: RAX: 0000000000000026 RBX: 000000000000ea8f RCX: 0000000000000000 Aug 31 21:24:54 kernel: RDX: 0000000000000001 RSI: ffffffff8f665ad5 RDI: 00000000ffffffff Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: nvme nvme0: Abort status: 0x371 Aug 31 21:24:54 kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16 Aug 31 21:24:54 kernel: nvme 0000:02:00.0: PM: failed to resume async: error -16 Aug 31 21:24:54 kernel: PM: resume devices took 60.047 seconds Aug 31 21:24:54 kernel: ------------[ cut here ]------------ Aug 31 21:24:54 kernel: Component: resume devices, time: 60047 Aug 31 21:24:54 kernel: WARNING: CPU: 6 PID: 4458 at kernel/power/suspend_test.c:53 suspend_test_finish+0x6d/0x80 Aug 31 21:24:54 kernel: Modules linked in: tls uinput snd_seq_dummy snd_hrtimer nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct> Aug 31 21:24:54 kernel: dell_smbios snd_pci_acp5x snd_timer dcdbas snd snd_rn_pci_acp3x pcspkr mc ecdh_generic sparse_keymap dell_wmi_descriptor wmi_bmof joydev k10temp i2c_piix4 snd_pci_ac> Aug 31 21:24:54 kernel: CPU: 6 PID: 4458 Comm: systemd-sleep Not tainted 5.17.5-300.fc36.x86_64 #1 Aug 31 21:24:54 kernel: Hardware name: Dell Inc. Inspiron 14 7425 2-in-1/063KWG, BIOS 1.4.0 06/27/2022 Aug 31 21:24:54 kernel: RIP: 0010:suspend_test_finish+0x6d/0x80 Aug 31 21:24:54 kernel: Code: 69 c2 e8 03 00 00 29 c1 e8 ef 5e b5 00 81 fb 10 27 00 00 77 04 5b 5d c3 cc 89 da 48 89 ee 48 c7 c7 b0 e8 60 8f e8 4c 01 b5 00 <0f> 0b 5b 5d c3 cc cc cc cc cc cc> Aug 31 21:24:54 kernel: RSP: 0018:ffffbae58038bd80 EFLAGS: 00010282 Aug 31 21:24:54 kernel: RAX: 0000000000000026 RBX: 000000000000ea8f RCX: 0000000000000000 Aug 31 21:24:54 kernel: RDX: 0000000000000001 RSI: ffffffff8f665ad5 RDI: 00000000ffffffff Aug 31 21:24:54 kernel: RBP: ffffffff8f60e7f9 R08: ffffffff8fe65580 R09: 0000000000000004 Aug 31 21:24:54 kernel: R10: ffffffffffffffff R11: ffffffff90a8d21e R12: ffffffff8f60e7ae Aug 31 21:24:54 kernel: R13: ffff9f8888477fd0 R14: 0000000000000004 R15: ffff9f88486deda0 Aug 31 21:24:54 kernel: FS: 00007f652635bb40(0000) GS:ffff9f8f2e780000(0000) knlGS:0000000000000000 Aug 31 21:24:54 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 31 21:24:54 kernel: CR2: 00007fe3932fdd06 CR3: 000000016e5c0000 CR4: 0000000000750ee0 Aug 31 21:24:54 kernel: PKRU: 55555554 Aug 31 21:24:54 kernel: Call Trace: Aug 31 21:24:54 kernel: <TASK> Aug 31 21:24:54 kernel: suspend_devices_and_enter+0x18f/0x830 Aug 31 21:24:54 kernel: pm_suspend.cold+0x2fa/0x343 Aug 31 21:24:54 kernel: state_store+0x68/0xc0 Aug 31 21:24:54 kernel: kernfs_fop_write_iter+0x11b/0x1f0 Aug 31 21:24:54 kernel: new_sync_write+0x102/0x180 Aug 31 21:24:54 kernel: vfs_write+0x209/0x2a0 Aug 31 21:24:54 kernel: ksys_write+0x53/0xd0 Aug 31 21:24:54 kernel: do_syscall_64+0x3a/0x80 Aug 31 21:24:54 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Aug 31 21:24:54 kernel: RIP: 0033:0x7f6526f628f7 Aug 31 21:24:54 kernel: Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83> Aug 31 21:24:54 kernel: RSP: 002b:00007ffd6c577718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 Aug 31 21:24:54 kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6526f628f7 Aug 31 21:24:54 kernel: RDX: 0000000000000004 RSI: 00007ffd6c577800 RDI: 0000000000000004 Aug 31 21:24:54 kernel: RBP: 00007ffd6c577800 R08: 0000557c7d6d71f0 R09: 0000000000000000 Aug 31 21:24:54 kernel: R10: 0000557c7d578158 R11: 0000000000000246 R12: 0000000000000004 Aug 31 21:24:54 kernel: R13: 0000557c7d6d33e0 R14: 0000000000000004 R15: 00007f65270559e0 Aug 31 21:24:54 kernel: </TASK> Aug 31 21:24:54 kernel: ---[ end trace 0000000000000000 ]--- Aug 31 21:24:54 kernel: OOM killer enabled.
Is this a regression? Did kernel 5.18.x work? Would be great if you could perform regression testing using git bisect.
Possibly related: bug 216438
Please have a try with iommu=pt on kernel command line.
Created attachment 301732 [details] dmesg with resume issue using iommu=pt Requeste dmesg with iommu-pt parameter
(In reply to Artem S. Tashkinov from comment #1) > Is this a regression? Did kernel 5.18.x work? > > Would be great if you could perform regression testing using git bisect. No regression as the issue also happened on kernel 5.17.x from a fresh installation. git bisect returned nothing
(In reply to Artem S. Tashkinov from comment #1) > Is this a regression? Did kernel 5.18.x work? > > Would be great if you could perform regression testing using git bisect. No regression as the issue also happened on kernel 5.17.x from a fresh installation meaning all kernel version is impacted. git bisect returned nothing
Created attachment 301733 [details] updated dmesg with iommu=pt on kernel 5.19.6
Please share an acpidump. I suspect a bios issue and would need to check something to confirm.
Created attachment 301735 [details] acpidump from kernel 5.19.6 on Dell Inspiron 14 7425 2-in-1 As requested, here is the acipdump from kernel 5.19.6
The kernel isn't detecting StorageD3Enable for the NVME storage in your system. From your kernel you're not getting this message: > nvme 0000:BB:DD.F: platform quirk: setting simple suspend Did you change your hardware at all from how it was shipped? Can you please confirm that # cat /sys/bus/pci/drivers/nvme/0000:02:00.0/firmware_node/path returns \_SB_.PCI0.GPP1.NVME
Yes, I replaced the original 512GB NVME from Dell with a 1TB Samsung 970 Evo Plus NVME. # cat /sys/bus/pci/drivers/nvme/0000\:02\:00.0/firmware_node/path returns \_SB_.PCI0.GPP1.DEV0
It is odd that result is \_SB_.PCI0.GPP1.DEV0 as fdisk -l reads Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: Samsung SSD 970 EVO Plus 1TB Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt
> Yes, I replaced the original 512GB NVME from Dell with a 1TB Samsung 970 Evo > Plus NVME. Did the original NVME have this problem too? I would think so. Do things work properly in Windows? > \_SB_.PCI0.GPP1.DEV0 It seems to me that the NVME is not getting assigned to the proper ACPI companion from the firmware. It should be set to GPP1.NVME. Maybe it's due to the ambiguity between both nodes .DEV0 and .NVME both having _ADR of 0. Hans, any ideas?
(In reply to Mario Limonciello (AMD) from comment #13) > > Yes, I replaced the original 512GB NVME from Dell with a 1TB Samsung 970 > Evo > > Plus NVME. > > Did the original NVME have this problem too? I would think so. Yes it does. > > Do things work properly in Windows? > Yes, suspend/resume runs smoothly on Windows even after swapping NVME drives. The issue appears specific to Linux kernel.
To me it seems it's a spec violation and best effort from the kernel side to handle the case of two devices sharing an _ADR. Neither has _STA, and neither has a _HID, so the existing best effort from the kernel is helpless for your case. Perhaps we can try to go off the fact that the NVME device has device properties? Can you see if below helps to link the right ACPI companion? diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c index 204fe94c7e45..8346ad564904 100644 --- a/drivers/acpi/glue.c +++ b/drivers/acpi/glue.c @@ -96,8 +96,12 @@ static int find_child_checks(struct acpi_device *adev, bool check_children) return -ENODEV; status = acpi_evaluate_integer(adev->handle, "_STA", NULL, &sta); - if (status == AE_NOT_FOUND) + if (status == AE_NOT_FOUND) { + /* Make this more appealing if it has properties declared */ + if (acpi_dev_has_props(adev)) + return FIND_CHILD_MAX_SCORE; return FIND_CHILD_MIN_SCORE; + } if (ACPI_FAILURE(status) || !(sta & ACPI_STA_DEVICE_ENABLED)) return -ENODEV;
Sure, I applied the patch and will return the result after the build is complete.
After applying patch from #15 # cat /sys/bus/pci/drivers/nvme/0000\:02\:00.0/firmware_node/path \_SB_.PCI0.GPP1.DEV0 Same result as tested on 6.0.0-0.rc4.20220909git506357871c18.34
Created attachment 301800 [details] Patch to add a quirk That's unfortunate that didn't help. Let's try the quirk approach instead for now. Give this patch a try, and if it fails can you please share both an updated dmesg log and dmidecode output (so I can get the strings right).
Sadly, the patch has no effect: # cat /sys/bus/pci/drivers/nvme/0000\:02\:00.0/firmware_node/path\_SB_.PCI0.GPP1.DEV0 I include both dmesg and dmidecode shortly
Created attachment 301803 [details] dell inspiron 7425dmesg via journalctl Here is included the dmesg report running on kernel 6.0.0.rc5
Created attachment 301804 [details] dell inspiron 7425 dmidecode dmidecide from Dell Inspiron running kernel 6.0.0.rc5
Sep 13 10:14:12 kernel: nvme 0000:02:00.0: platform quirk: setting simple suspend It shouldn't change device node, only set quirk. It looks like it's setting policy properly from your log. Are you still hitting functional but with it?
Created attachment 301810 [details] dmesg showing successful speeding resume I confirm the resume comes quicker with the patch as highlighted in the attached dmesg: [ 52.991877] PM: resume devices took 1.030 seconds [...] [ 109.103704] PM: resume devices took 1.035 seconds
(In reply to Mario Limonciello (AMD) from comment #22) > Sep 13 10:14:12 kernel: nvme 0000:02:00.0: platform quirk: setting simple > suspend > > It shouldn't change device node, only set quirk. It looks like it's setting > policy properly from your log. Are you still hitting functional but with it? Yes, the quirk functions as intended and resume is definitely quicker instead of 60 seconds long. See comment #23 for the full dmesg log.
OK, the solution is queued up for kernel 6.1. https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=018d6711c26e4bd26e20a819fcc7f8ab902608f3
Thank you Mario for the quick response. For Hans, will it possible to copy the fix as patch for the Fedora kernel? Thanks in advance.
(In reply to Luya Tshimbalanga from comment #26) > Thank you Mario for the quick response. For Hans, will it possible to copy > the fix as patch for the Fedora kernel? Thanks in advance. I'm sorry but my workload lately has been too high for me to be able to cherry-pick individual fixes into the Fedora kernels, so you will have to wait for this to get upstream through the stable series. This patch should find its way into the 5.19.x / 6.0.x stable series as soon as 6.1-rc1 is out.
(In reply to Hans de Goede from comment #27) > (In reply to Luya Tshimbalanga from comment #26) > > Thank you Mario for the quick response. For Hans, will it possible to copy > > the fix as patch for the Fedora kernel? Thanks in advance. > > I'm sorry but my workload lately has been too high for me to be able to > cherry-pick individual fixes into the Fedora kernels, so you will have to > wait for this to get upstream through the stable series. > > This patch should find its way into the 5.19.x / 6.0.x stable series as soon > as 6.1-rc1 is out. Understood.
(In reply to Mario Limonciello (AMD) from comment #25) > OK, the solution is queued up for kernel 6.1. > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/ > ?h=bleeding-edge&id=018d6711c26e4bd26e20a819fcc7f8ab902608f3 Every issue Luya has described is also happening on my Inspiron 16 5625. Will the solution also work for me? I'm wondering because the product name that is hardcoded is different.
Can you please share your dmidecode and acpidump? We can certainly try to add your system to the workaround list.
Created attachment 302987 [details] Inspiron 16 5625 dmidecode
Created attachment 302988 [details] acpidump from kernel 5.19.14 on Inspiron 16 5625
Created attachment 302989 [details] dmidecode-inspiron-16-5625.txt > bugzilla-daemon@kernel.org hat am 13.10.2022 12:38 IST geschrieben: > > > https://bugzilla.kernel.org/show_bug.cgi?id=216440 > > --- Comment #30 from Mario Limonciello (AMD) (mario.limonciello@amd.com) --- > Can you please share your dmidecode and acpidump? We can certainly try to > add > your system to the workaround list. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are on the CC list for the bug.
Created attachment 302990 [details] acpidump-inspiron-16-5625.txt
Created attachment 302994 [details] Patch for Inspiron 16 526 OK, yeah it does look the same. Have a try with this patch.
Worked like a charm, thank you. If you don't mind a newbie ask some questions: Can I somehow keep this patch until it is integrated into the fedora kernel? Is this advisable? And out of curiosity, does this change need to be hardcoded or is this just a temporary solution?
> Can I somehow keep this patch until it is integrated into the fedora kernel? > Is this advisable? You can talk to Fedora kernel guys about trying to carry it in the Fedora kernel sooner than release, but otherwise yes you need to keep compiling and using your own kernels until landed. > And out of curiosity, does this change need to be hardcoded or is this just a > temporary solution? The BIOS offers two ACPI nodes with the same _ADR and ACPI spec doesn't define what to do here. So you end up with kernel taking a guess and picking wrong one. If Dell issues a BIOS update that fixes this you would no longer need this hardcoded change.
Since this fix seems to be a hardcoded list of exceptions, I wanted to mention that my device (Dell Inspiron 14 5425) is also affected. I would happily share more device details to ensure my machine will be covered by this patch. I had been troubleshooting in the following ask.fedoraproject post until I found this bug report. Anyone interested can find many logs and device details there. https://ask.fedoraproject.org/t/fedora-36-slow-wake-time-dell-inspiron-ryzen-7-radeon-graphics/28903
Created attachment 303353 [details] Output of `dmidecode` on Inspiron 14 5425 Kernel: 6.0.10-300.fc37.x86_64
Created attachment 303354 [details] Output of `acpidump` on Inspiron 14 5425 Kernel: 6.0.10-300.fc37.x86_64
This bug report is closed. Although it is *likely* the same type of solution; please open your own bug, place those attachments there and we can track it there. You can CC me to that bug.
Sounds good. Sorry, I missed that the issue was closed. Here is the bug report I opened: https://bugzilla.kernel.org/show_bug.cgi?id=216773