Bug 217950

Summary: [Regression] S3 Sleep Mode failures since Linux 6.x on Dell Inspiron 15 5593
Product: ACPI Reporter: Arnas (arnasz616)
Component: Power-Sleep-WakeAssignee: acpi_power-sleep-wake
Status: NEW ---    
Severity: normal CC: bagasdotme, regressions
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Arnas 2023-09-25 15:34:37 UTC
I'm having some weird issues with sleep mode on any 6.x Linux kernel version - it's a toss-up when I close the lid as to whether it will sleep properly or not - when it fails, the screen will lock, but it will not actually enter S3 sleep - it just blanks the screen, but the laptop stays on (and fan does too).

Opening the lid after a failed sleep attempt turns on the screen instantaneously, and it doesn't even need to reconnect to WiFi - this doesn't happen when actually resuming from sleep, it takes a couple seconds for the screen to come on, and it then needs to reconnect the network.

Following the failed attempt to enter sleep mode (closing the lid), the following entries appear in the system log -

arkiron kernel: ACPI Error: Thread 3233415168 cannot release Mutex [ECMX] acquired by thread 3268191936 (20221020/exmutex-378)
arkiron kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20221020/psparse-529)
arkiron kernel: Non-boot CPUs are not disabled

Now, the "Non-boot CPUs are not disabled" line stands out the most to me here, because successful sleep attempts won't have this line in the log.

After the failed attempt above to sleep, I now close the lid again, and it seemingly goes to sleep successfully. After checking the log following this, I find two new error lines in the log -

arkiron kernel: ACPI Error: Thread 3233415168 cannot release Mutex [ECMX] acquired by thread 3268191936 (20221020/exmutex-378)
arkiron kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20221020/psparse-529)

Note that this time the CPU line is missing, as expected for a successful sleep attempt.

This happens on both latest stable Linux kernel 6.5 as well as the latest Linux LTS 6.1 kernel. The last kernel that this didn't happen on was Linux LTS 5.15 (any version), which is what I was running up until the Linux LTS 6.1 upgrade. At that point I tried switching back to mainline (6.5) to see if it would fix sleep issues, but it didn't help. Downgrading to Linux LTS 5.15 did fix the sleep issues and the laptop seems to sleep reliably now. Running LTS 5.15.131-1 without issue as I am making this report.

I'm on a Dell Inspiron 15 5593 using BIOS ver 1.27.0 (latest as of now), running Arch Linux x86_64.
Comment 1 Bagas Sanjaya 2023-09-26 00:12:54 UTC
(In reply to Arnas from comment #0)
> I'm having some weird issues with sleep mode on any 6.x Linux kernel version
> - it's a toss-up when I close the lid as to whether it will sleep properly
> or not - when it fails, the screen will lock, but it will not actually enter
> S3 sleep - it just blanks the screen, but the laptop stays on (and fan does
> too).
> 
> Opening the lid after a failed sleep attempt turns on the screen
> instantaneously, and it doesn't even need to reconnect to WiFi - this
> doesn't happen when actually resuming from sleep, it takes a couple seconds
> for the screen to come on, and it then needs to reconnect the network.
> 
> Following the failed attempt to enter sleep mode (closing the lid), the
> following entries appear in the system log -
> 
> arkiron kernel: ACPI Error: Thread 3233415168 cannot release Mutex [ECMX]
> acquired by thread 3268191936 (20221020/exmutex-378)
> arkiron kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to
> previous error (AE_AML_NOT_OWNER) (20221020/psparse-529)
> arkiron kernel: Non-boot CPUs are not disabled
> 
> Now, the "Non-boot CPUs are not disabled" line stands out the most to me
> here, because successful sleep attempts won't have this line in the log.
> 
> After the failed attempt above to sleep, I now close the lid again, and it
> seemingly goes to sleep successfully. After checking the log following this,
> I find two new error lines in the log -
> 
> arkiron kernel: ACPI Error: Thread 3233415168 cannot release Mutex [ECMX]
> acquired by thread 3268191936 (20221020/exmutex-378)
> arkiron kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to
> previous error (AE_AML_NOT_OWNER) (20221020/psparse-529)
> 
> Note that this time the CPU line is missing, as expected for a successful
> sleep attempt.
> 
> This happens on both latest stable Linux kernel 6.5 as well as the latest
> Linux LTS 6.1 kernel. The last kernel that this didn't happen on was Linux
> LTS 5.15 (any version), which is what I was running up until the Linux LTS
> 6.1 upgrade. At that point I tried switching back to mainline (6.5) to see
> if it would fix sleep issues, but it didn't help. Downgrading to Linux LTS
> 5.15 did fix the sleep issues and the laptop seems to sleep reliably now.
> Running LTS 5.15.131-1 without issue as I am making this report.
> 

Please do bisection (see Documentation/admin-guide/bug-bisect.rst in the
kernel sources).

> I'm on a Dell Inspiron 15 5593 using BIOS ver 1.27.0 (latest as of now),
> running Arch Linux x86_64.

Since you have to compile your own kernel during bisection, please see
ArchWiki guide [1].

[1]: https://wiki.archlinux.org/title/Kernel/Traditional_compilation
Comment 2 Arnas 2023-09-26 00:37:51 UTC
(In reply to Bagas Sanjaya from comment #1)

> Please do bisection (see Documentation/admin-guide/bug-bisect.rst in the
> kernel sources).
> 
> > I'm on a Dell Inspiron 15 5593 using BIOS ver 1.27.0 (latest as of now),
> > running Arch Linux x86_64.
> 
> Since you have to compile your own kernel during bisection, please see
> ArchWiki guide [1].
> 
> [1]: https://wiki.archlinux.org/title/Kernel/Traditional_compilation

Sure, I can do this, but it may have to be sometime this week as I'll need to set aside time to read about this and compile a kernel. Last time took me about 3-4 hours when I was compiling my 5.15-LTS kernel.

I'd like to also make one more note - The Mutex error line and aborting method line both show up in 5.15 as well - however, the `Non-boot CPUs are not disabled` line never does, and sleep works every time.
Comment 3 Bagas Sanjaya 2023-10-10 09:05:51 UTC
(In reply to Arnas from comment #2)
> (In reply to Bagas Sanjaya from comment #1)
> 
> > Please do bisection (see Documentation/admin-guide/bug-bisect.rst in the
> > kernel sources).
> > 
> > > I'm on a Dell Inspiron 15 5593 using BIOS ver 1.27.0 (latest as of now),
> > > running Arch Linux x86_64.
> > 
> > Since you have to compile your own kernel during bisection, please see
> > ArchWiki guide [1].
> > 
> > [1]: https://wiki.archlinux.org/title/Kernel/Traditional_compilation
> 
> Sure, I can do this, but it may have to be sometime this week as I'll need
> to set aside time to read about this and compile a kernel. Last time took me
> about 3-4 hours when I was compiling my 5.15-LTS kernel.
> 
> I'd like to also make one more note - The Mutex error line and aborting
> method line both show up in 5.15 as well - however, the `Non-boot CPUs are
> not disabled` line never does, and sleep works every time.

Arnas, have you done the bisection?
Comment 4 Arnas 2023-10-10 13:09:26 UTC
(In reply to Bagas Sanjaya from comment #3)
> Arnas, have you done the bisection?

Ahh, my apologies. I've been very busy with work, so I didn't get around to it. I'll try to do it asap.
Comment 5 Arnas 2023-11-22 04:56:14 UTC
(In reply to Bagas Sanjaya from comment #3)

Just wanted to throw up a quick status update. I can try doing this once my current college semester is over, as I'll have a winter break to mess around with this then. Should be sometime early December.
Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-22 16:56:53 UTC
(In reply to Arnas from comment #5)
> 
> I can try doing this once my current college semester is over, [...]

No big deal, that's how it is sometimes. Good luck with the semester. And reminder: before trying a bisection, check if latest mainline works any better for you.
Comment 7 Arnas 2024-05-14 13:25:24 UTC
Alright, so quick status update, I switched to Fedora Linux 40 since I needed it for a few things, and got rid of my Arch install.

Current kernel is 6.8.9-300.fc40.x86_64, and annoyingly enough, this issue still occurs on the laptop, even though this is a different distro and a fresh install.

So, I do think this is a upstream kernel issue. Anyway, it's summer break - is there anything you want me to do? I don't know if I can really roll back anymore, since I don't have my Arch install with the old 5.x kernel.

All I can confirm rn is that latest kernel indeed has the issue, and it seems to occur on any and all Linux distros.
Comment 8 Arnas 2024-05-14 13:28:09 UTC
Same exact issues as well in the log file after closing lid:

(Attempt 1, successful sleep):
May 14 08:17:30 fediron kernel: ACPI Error: Thread 3438522176 cannot release Mutex [ECMX] acquired by thread 3239915328 (20230628/exmutex-378)
May 14 08:17:30 fediron kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20230628/psparse-5>

(Attempt 2, unsuccessful sleep):
May 14 08:18:24 fediron kernel: ACPI Error: Thread 3439165440 cannot release Mutex [ECMX] acquired by thread 3411128128 (20230628/exmutex-378)
May 14 08:18:24 fediron kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20230628/psparse-5>
May 14 08:18:25 fediron kernel: Non-boot CPUs are not disabled
Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-05-14 13:41:08 UTC
(In reply to Arnas from comment #7)
> So, I do think this is a upstream kernel issue. Anyway, it's summer break -
> is there anything you want me to do?

Try at least 6.9 now; you can get it for Fedora from these repos: https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories

> I don't know if I can really roll back
> anymore, since I don't have my Arch install with the old 5.x kernel.

Without a bisection (https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html ) I guess no developer will look into this. Maybe try to compile a 5.15 kernel yourself using a .config file from the time when Fedora used such a version (might be possible to extract from old RPMs found in the build system: https://koji.fedoraproject.org/koji/packageinfo?packageID=8 ), with a bit of luck it will boot.
Comment 10 Arnas 2024-05-14 14:02:06 UTC
Well, I swapped my NVMe, went from Arch+Win10 to Fedora+Win11. Since I was swapping SSDs, I also updated my Windows and swapped to Fedora as main.

I might be able to pull out my old NVMe with Arch for testing, it should still boot fine.

(Hopefully I can get it to boot from external SSD? I don't want to take apart my laptop again)

I can try 6.9 on Fedora, will do so and report back. (Although I have a feeling it will make no difference)
Comment 11 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-05-14 14:08:23 UTC
(In reply to Arnas from comment #10)

> I might be able to pull out my old NVMe with Arch for testing, it should
> still boot fine.

Might complicate things, *if* the NVMe device is part of the problem.
 
> (Although I have a feeling it will make no difference)

Yeah, it's unlikely, but it's worth a shot.