Created attachment 301712 [details] dmesg errors after resuming from suspend After resuming from suspend, the system is in an unusable state. Several hardware components fail, in particular the SSD cannot write any more data and the wireless card is also unusable. Please see attached file "dmesg-after-suspend.txt" for all the errors popping up after resuming from suspend. I could only collect the attached dmesg errors using netconsole with an USB Ethernet adapter, which also goes to show that not all hardware failed to resume successfully from suspend. The system couldn't write the errors to the local log files because the SSD failed. I tried this from a console with systemctl suspend, no GUI running. I can still type in the console that was open before entering suspend, but, e.g., systemctl poweroff only returns an "Input/Output error". Note that I am running the vanilla Archlinux kernel with a simple patch applied (which is already part of 5.20) to get the keyboard working [0]. The reported behavior happens with or without this patch, so I assume it is unrelated. I also tried several kernel versions, which all show the same behavior (5.18.16, 5.19.4, and 5.19.6) and also the most recent Ubuntu 22.04.1 LTS Live system booted from a USB Stick hangs after suspend, so I assume it is not a regression. If I have miscategorized this bug, please change it. [0] https://bugzilla.kernel.org/show_bug.cgi?id=216118
Created attachment 301713 [details] dmesg output before entering suspend
Created attachment 301714 [details] lspci -vv
Created attachment 301715 [details] lsmod
Created attachment 301716 [details] lshw
Created attachment 301717 [details] dmesg output before entering suspend (only warnings+)
Possibly related: bug 216440
Please have a try with 'iommu=pt' on your kernel command line.
Thanks for the quick reply. Unfortunately, I will only be able to try it on Sept 13 the earliest. Will give an update then.
'iommu=pt' does indeed solve the main problems, thanks! Now, the system comes back quickly with a working SSD. One new issue now came up: the built-in keyboard stops working after resuming from suspend. This wasn't the case without `iommu=pt'. The relevant dmesg output is: atkbd serio0: Failed to enable keyboard on isa0060/serio0 The wifi chip also still gets reset with dmesg showing mt7921e 0000:02:00.0: Message 00020007 (seq 13) timeout mt7921e 0000:02:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -110 mt7921e 0000:02:00.0: PM: failed to resume async: error -110 mt7921e 0000:02:00.0: chip reset But it still seems functional, NetworkManager successfully reestablishes the wifi connection. I will attach an updated full dmesg with `iommu=pt` enabled. Note that after resuming from suspend, I attached the laptop to a USB-C monitor for power and an external keyboard, resulting in errors like ACPI Error: No handler for Region [ECSI] (00000000e7945f01) [EmbeddedControl] (20220331/evregion-130) ACPI Error: Region EmbeddedControl (ID=3) has no handler (20220331/exfldio-261) ACPI Error: Aborting method \_SB.UBTC.ECRD due to previous error (AE_NOT_EXIST) (20220331/psparse-529) ACPI Error: Aborting method \_SB.UBTC.NTFY due to previous error (AE_NOT_EXIST) (20220331/psparse-529) ACPI Error: Aborting method \_SB.PCI0.LPC0.EC0._Q4F due to previous error (AE_NOT_EXIST) (20220331/psparse-529) But that was also the case before and needs separate investigation.
Created attachment 301814 [details] dmesg with iommu=pt (inlcuding entering and leaving suspend)
Created attachment 301815 [details] dmesg with iommu=pt (inlcuding entering and leaving suspend, only warnings+)
I suspect you might be hitting two separate bugs. The necessity of iommu=pt to avoid the SSD issue is a pure BIOS bug that Lenovo should fix. Something similar happened in earlier generations too. It's worked around in the earlier generations by adding platforms to this list: https://github.com/torvalds/linux/commit/455cd867b85b53fd3602345f9b8a8facc551adc9 To confirm there are two bugs can you please have a try with: 1) 6.0-rc5 2) This series applied: https://lore.kernel.org/linux-acpi/20220912172401.22301-1-mario.limonciello@amd.com/T/#t 3) The following on the kernel command line: iommu=pt acpi.prefer_microsoft_guid=1
To 1) I can confirm that the keyboard works after resuming with 6.0-rc5. Also, the last line "mt7921e 0000:02:00.0: chip reset" is gone from the mt7921e errors. The first three are still there though. To 3) 'acpi.prefer_microsoft_guid=1' doesn't fix the keyboard after resuming with kernel 5.19.6 (keyboard fix for ZEN devices applied, as mentioned before). To 2) do you still need this info? If yes, applied to which kernel version? 5.19? 6.0-rc5? Are these mt7921e errors another bug that should be reported? And what's the procedure to get Lenovo to fix this BIOS bug? Are you reporting these bugs to Lenovo? I observe a strange behavior on both, 5.19 and 6.0, when pressing any key on an external keyboard to resume the laptop from suspend. The laptop's keyboard backlights light up for half a second, then go dark again and the laptop stays suspended. If I now press any key on the laptop, it resumes from suspend.
I actually had meant to test the combination of those 3 things, but now that you mention your results below let's break it down a little better. > To 1) I can confirm that the keyboard works after resuming with 6.0-rc5. > Also, the last line "mt7921e 0000:02:00.0: chip reset" is gone from the > mt7921e errors. The first three are still there though. OK so if the keyboard works after resuming in 6.0-rc5 with nothing beyond iommu=pt on kernel command line, it's probably fixed via https://github.com/torvalds/linux/commit/9946e39fe8d0a5da9eb947d8e40a7ef204ba016e > To 3) 'acpi.prefer_microsoft_guid=1' doesn't fix the keyboard after resuming > with kernel 5.19.6 (keyboard fix for ZEN devices applied, as mentioned > before). > To 2) do you still need this info? If yes, applied to which kernel version? > 5.19? 6.0-rc5? Using 6.0-rc5 I'd like to see if applying that patch series and adding "acpi.prefer_microsoft_guid=1" improves anything. Given the keyboard is working in 6.0-rc5 now I suspect it won't change anything for you though and thus there is no reason to add your system to the quirk list in that series. > Are these mt7921e errors another bug that should be reported? Yes; separate bug for that driver's maintainer to address. > And what's the procedure to get Lenovo to fix this BIOS bug? Are you > reporting these bugs to Lenovo? The BIOS bug necessitating "iommu=pt" is pretty well understood by Lenovo. It hit a TON of models, the fix just needs to be ported to yours. You can report it to Lenovo's forums and see if they can do that for you. > I observe a strange behavior on both, 5.19 and 6.0, when pressing any key on > an external keyboard to resume the laptop from suspend. The laptop's keyboard > backlights light up for half a second, then go dark again and the laptop > stays suspended. If I now press any key on the laptop, it resumes from > suspend. Can you please contrast if this behavior still happens with that series I mentioned and kernel command line change? If it does, I think we'll need a debugging log with the following to characterize it: pm_debug_messages acpi.dyndbg='file drivers/acpi/x86/s2idle.c +p' pinctrl_amd.dyndbg='+p'
BTW, *make sure* that last debugging step I mentioned is run in 6.0-rc kernels. The reason is this commit addresses a related bug to wakeup from USB keyboard. https://github.com/torvalds/linux/commit/b8c824a869f220c6b46df724f85794349bafbf23
Apologies if I've missed the obvious - but which platform is this on? Thanks Mark
LENOVO 82TL (Lenovo Yoga Slim 7 ProX 14ARH7)
Thanks (and argh....) It's not in the Linux program so no promises - but I'll see if I can flag this to them as it is something that should be fixed. At least they can reference the fix from the Thinkpads... Tracked with internal ticket LO-2023 Mark
> OK so if the keyboard works after resuming in 6.0-rc5 with nothing beyond > iommu=pt on kernel command line, it's probably fixed via > https://github.com/torvalds/linux/commit/ > 9946e39fe8d0a5da9eb947d8e40a7ef204ba016e I have to revert my previous observation. Booting 6.0-rc5 another time, the keyboard issue was back, no keyboard on resume. Note also that I run a 5.19.6 with this patch applied, where it also doesn't work. This patch just fixes the keyboard working at all on initial boot. (This time with 6.0-rc5, the backlight didn't light up shortly when I pressed a key on an external keyboard. Seems like either the keyboard works randomly, but then has this backlight flicker, or it doesn't work at all.) > Using 6.0-rc5 I'd like to see if applying that patch series and adding > "acpi.prefer_microsoft_guid=1" improves anything. Given the keyboard is > working in 6.0-rc5 now I suspect it won't change anything for you though and > thus there is no reason to add your system to the quirk list in that series. This time I can confirm that 6.0-rc5, with this patch series applied and 'acpi.prefer_microsoft_guid=1' solves all issues! - keyboard works on resume - even all mt7921e dmesg errors on resume are gone, so it's apparently related - no short light-up of the laptop keyboard on an external keyboard press during suspend So looks like adding my model to the quirks list makes sense after all. Note that the only difference between Travis laptop from #216473 and mine is that he has a touch screen. > > I observe a strange behavior on both, 5.19 and 6.0, when pressing any key > on > > an external keyboard to resume the laptop from suspend. The laptop's > keyboard > > backlights light up for half a second, then go dark again and the laptop > > stays suspended. If I now press any key on the laptop, it resumes from > > suspend. > > Can you please contrast if this behavior still happens with that series I > mentioned and kernel command line change? > > If it does, I think we'll need a debugging log with the following to > characterize it: > > pm_debug_messages acpi.dyndbg='file drivers/acpi/x86/s2idle.c +p' > pinctrl_amd.dyndbg='+p' Now that everything works with the patchset, do you still need this info?
Just to confirm, running 6.0-rc5 with the patchset, but without 'acpi.prefer_microsoft_guid=1', still has all the issues (keyboard randomly works or doesn't on resume, mt7921e throwing errors).
Can you please try to modify the last patch from this line: DMI_MATCH(DMI_PRODUCT_NAME, "82V2"), to DMI_MATCH(DMI_PRODUCT_NAME, "82"), Then remove acpi.prefer_microsoft_guid=1 from kernel command line. That should hopefully make the patch series apply automatically to your system too. If that works I'll revise it for a v3 to loop your system in too.
Can confirm that this works, thanks!
Rolled that change into v3.
The kernel solution has been queued up for 6.1 (https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=888ca9c7955e3969df84f5a1bda2143be9fa365a)