Bug 218024 - broken suspend to idle on Lenovo V15 G4 AMN (and related laptops)
Summary: broken suspend to idle on Lenovo V15 G4 AMN (and related laptops)
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Platform_x86 (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: drivers_platform_x86@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-18 16:17 UTC by David Lazar
Modified: 2023-11-02 18:23 UTC (History)
3 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg-6.6.0-rc6-v2-suspend.txt (109.34 KB, text/plain)
2023-10-18 16:17 UTC, David Lazar
Details
dmesg-6.6.0-rc6-v2-suspend-2.txt (126.60 KB, text/plain)
2023-10-18 16:17 UTC, David Lazar
Details
add-s2idle-quirk.patch (518 bytes, patch)
2023-10-18 17:43 UTC, David Lazar
Details | Diff
dmesg-6.6.0-rc6-quirks.txt (88.65 KB, text/plain)
2023-10-18 17:44 UTC, David Lazar
Details

Description David Lazar 2023-10-18 16:17:05 UTC
Created attachment 305259 [details]
dmesg-6.6.0-rc6-v2-suspend.txt

When suspending and resuming from RAM on the Lenovo V15 G4 AMN, multiple NVME IOMMU page faults occur, showing up in dmesg as repeated errors:

nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xb6674000 flags=0x0000]

The system is unstable afterwards: of three attempts, one resulted in an unusable file system (and read or write attempts resulted in IO errors), one appeared fine, but still logged the above errors (see attachment dmesg-6.6.0-rc6-v2-suspend.txt), and one could not restart the wifi card (dmesg-6.6.0-rc6-v2-suspend-2.txt).

This was discovered while investigating bug 218003 (malfunctioning keyboard caused by uninitialized PIC), but it's believed to be a separate issue.
Comment 1 David Lazar 2023-10-18 16:17:43 UTC
Created attachment 305260 [details]
dmesg-6.6.0-rc6-v2-suspend-2.txt
Comment 2 Mario Limonciello (AMD) 2023-10-18 16:27:56 UTC
Can you experiment with adding the system into this quirk list?

https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd/pmc/pmc-quirks.c#L25

It's not exactly the same bug, but it is similarish.
Comment 3 David Lazar 2023-10-18 17:43:26 UTC
Created attachment 305262 [details]
add-s2idle-quirk.patch

This is the patch I've used to add it to the quirks list.  Suspend to RAM seems to work fine now, and the page faults no longer appear in the dmesg output (attached, below).
Comment 4 David Lazar 2023-10-18 17:44:03 UTC
Created attachment 305263 [details]
dmesg-6.6.0-rc6-quirks.txt
Comment 5 Mario Limonciello (AMD) 2023-10-18 17:46:22 UTC
Well that's great!  I think we need something from Lenovo to confirm what values represent all those Mendocino systems.

We don't want that patch to apply to "ALL Lenovo laptops" from this year, only the Mendocino ones that are affected by this issue.

@Mark,

Can you please confirm the strings for the Mendocino ones?
Comment 6 Mark Pearson 2023-10-18 19:22:03 UTC
I'll try...though as a note this is a BU I don't deal with though so might take me a bit to track down.

As a note - looking at https://psref.lenovo.com/Product/Lenovo/Lenovo_V15_G4_AMN

If you go to the models tab and click on Machine type there are 82YU and 83CQ model numbers for this platform - so at least for this one probably good to cover both.

Wish there was a way of filtering on CPU across all the portfolio's.

Mark
Comment 7 Mario Limonciello (AMD) 2023-10-18 19:25:05 UTC
Thanks Mark!  The other way to attack this is of course to fix the BIOS.

This appears to be the same issue you fixed on all those other systems, but I suspected this one wasn't in the list getting the fix.

If that way is preferable, we can close this issue from Linux side and you can go that way too.
Comment 8 Mark Pearson 2023-10-18 19:30:47 UTC
Well either way we have to identify the systems :) 

For my reference: Internal ticket is LO-2698
I have noted that preferred fix is to fix BIOS - but as these platform(s) aren't in the Linux program I can't promise anything.
Comment 9 Mark Pearson 2023-10-19 19:28:15 UTC
Not guaranteed conclusive - but there are no Mendocino used on Think platforms, AIO, SMB or desktop.
The only ones we found referenced are Ideapad1 and V15

Checking psref.lenovo.com only matches I could find for 7x20 (which I think is what Mendocino is) are:

V15 AMN (82YU and 83CQ model)
Ideapad1 14 AMN7 (82VF model)
Ideapad1 15 AMN7 (82VG & 82X5 model)

Chances of me getting FW fixes for these platforms is very very low - so recommend going with the kernel patch.

Mark
Comment 10 Mario Limonciello (AMD) 2023-10-19 19:31:25 UTC
Thanks Mark!

David, do you mind squeezing all those into your patch and sending it out to the mailing lists?
Comment 11 David Lazar 2023-10-19 20:03:50 UTC
(In reply to Mark Pearson from comment #9)
> Checking psref.lenovo.com only matches I could find for 7x20 (which I think
> is what Mendocino is) are:
> 
> V15 AMN (82YU and 83CQ model)
> Ideapad1 14 AMN7 (82VF model)
> Ideapad1 15 AMN7 (82VG & 82X5 model)

Isn't V14 G4 AMN (82YT, 83GE) also affected?
IdeaPad Slim 3 14AMN8 (82XN)?
IdeaPad Slim 3 14AMN8 (82XQ)?

I'm starting to suspect that "AMN" stands for "AMD MendociNo".

(In reply to Mario Limonciello (AMD) from comment #10)
> David, do you mind squeezing all those into your patch and sending it out to
> the mailing lists?

Happy to do it, but it's my first kernel patch since the early 2000s, so I'll have to read up on the process first. :-)  Any pointers greatly appreciated.
Comment 12 Mario Limonciello (AMD) 2023-10-19 20:07:14 UTC
> Happy to do it, but it's my first kernel patch since the early 2000s, so I'll
> have to read up on the process first. :-)  Any pointers greatly appreciated.

Thanks!

https://www.kernel.org/doc/html/latest/process/submitting-patches.html
Comment 13 Mark Pearson 2023-10-20 02:00:37 UTC
> I'm starting to suspect that "AMN" stands for "AMD MendociNo".

Oh man - can't believe I didn't notice that! I think you're right. I have no experience with Ideapad naming (I've always thought ThinkBook was strange...I think it follows similar)

I checked on PSREF and it gave the platforms you found. I think you're good to go.
Comment 14 Mario Limonciello (AMD) 2023-10-28 00:13:25 UTC
Merged as 
https://github.com/torvalds/linux/commit/3bde7ec13c971445faade32172cb0b4370b841d9
Comment 15 Hans de Goede 2023-11-02 16:32:44 UTC
Note the fixes for this have landed in 6.5.10 now, which should be available through your distro for Arch and Fedora users soon.
Comment 16 David Lazar 2023-11-02 18:23:29 UTC
The fixes also landed in 6.1.61, for distros that follow that older branch.

And, as a note of caution, make sure the module containing this code is loaded, otherwise you'll still see those page faults.  Ask me how I know...

For the 6.5+ kernels, the module is called 'amd-pmc', while in 6.1 this code still lived in 'thinpad_acpi'.

Note You need to log in before you can comment on or make changes to this bug.