We run weekly 48+ hour stress testing on various systems with the latest upstream kernel. Starting in 5.0.0-rc4 we started to notice recurring ACPI errors on several Dell platforms during S3 suspend/resume. Here's some snippets of dmesg log where the issue occurs on 3 dells: Dell XPS 9380: [95784.411872] PM: Saving platform NVS memory [95784.411893] Disabling non-boot CPUs ... [95784.413309] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266) [95784.413332] [95784.413335] Local0: 00000000608d0738 <Obj> Integer 0000000000000000 [95784.413349] Initialized Arguments for Method [ECR1]: (1 arguments defined for method invocation) [95784.413351] Arg0: 000000005ec5fa8f <Obj> Integer 000000000000000B [95784.413361] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR1, AE_BAD_PARAMETER (20181213/psparse-531) [95784.413375] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR2, AE_BAD_PARAMETER (20181213/psparse-531) [95784.413387] ACPI Error: Method parse/execution failed \ECRW, AE_BAD_PARAMETER (20181213/psparse-531) [95784.413397] ACPI Error: Method parse/execution failed \ECGD, AE_BAD_PARAMETER (20181213/psparse-531) [95784.413407] ACPI Error: Method parse/execution failed \NEVT, AE_BAD_PARAMETER (20181213/psparse-531) [95784.413418] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV._Q66, AE_BAD_PARAMETER (20181213/psparse-531) Dell Precision 3520: [95893.925428] PM: Saving platform NVS memory [95893.925564] Disabling non-boot CPUs ... [95893.927607] smpboot: CPU 1 is now offline [95893.927622] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266) [95893.927653] [95893.927656] Local0: 00000000686401e3 <Obj> Integer 0000000000000000 [95893.927675] Initialized Arguments for Method [ECR1]: (1 arguments defined for method invocation) [95893.927677] Arg0: 00000000b17d0acd <Obj> Integer 000000000000000B [95893.927692] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR1, AE_BAD_PARAMETER (20181213/psparse-531) [95893.927712] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR2, AE_BAD_PARAMETER (20181213/psparse-531) [95893.927728] ACPI Error: Method parse/execution failed \ECRW, AE_BAD_PARAMETER (20181213/psparse-531) [95893.927744] ACPI Error: Method parse/execution failed \ECGD, AE_BAD_PARAMETER (20181213/psparse-531) [95893.927759] ACPI Error: Method parse/execution failed \NEVT, AE_BAD_PARAMETER (20181213/psparse-531) [95893.927775] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV._Q66, AE_BAD_PARAMETER (20181213/psparse-531) Dell XPS 9370: [ 33.196759] PM: Saving platform NVS memory [ 33.196815] Disabling non-boot CPUs ... [ 33.197105] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266) [ 33.197112] No Local Variables are initialized for Method [ECW1] [ 33.197114] Initialized Arguments for Method [ECW1]: (2 arguments defined for method invocation) [ 33.198145] Arg0: 0000000047ba53b8 <Obj> Integer 000000000000008A [ 33.198151] Arg1: 00000000100f24ce <Obj> Integer 0000000000000002 [ 33.198156] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1 [ 33.198159] smpboot: CPU 1 is now offline [ 33.198160] , AE_BAD_PARAMETER (20181213/psparse-531) [ 33.198163] ACPI Error: Method parse/execution failed \ECWB, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.198167] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.198176] ACPI: \_SB_.UBTC: failed to evaluate _DSM (0x1001) [ 33.198178] ucsi_acpi USBC000:00: ucsi_acpi_dsm: failed to evaluate _DSM 1 [ 33.198180] ucsi_acpi USBC000:00: ucsi_ack: failed [ 33.198181] ucsi_acpi USBC000:00: con2: failed to get status [ 33.198523] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266) [ 33.198533] No Local Variables are initialized for Method [ECW1] [ 33.198534] Initialized Arguments for Method [ECW1]: (2 arguments defined for method invocation) [ 33.198535] Arg0: 00000000100f24ce <Obj> Integer 00000000000000A0 [ 33.198539] Arg1: 0000000047ba53b8 <Obj> Integer 0000000000000000 [ 33.198542] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.198548] ACPI Error: Method parse/execution failed \ECWB, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.198552] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.198564] ACPI: \_SB_.UBTC: failed to evaluate _DSM (0x1001) [ 33.198566] ucsi_acpi USBC000:00: ucsi_acpi_dsm: failed to evaluate _DSM 1 [ 33.200027] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266) [ 33.200049] smpboot: CPU 2 is now offline [ 33.200052] No Local Variables are initialized for Method [ECW1] [ 33.200053] Initialized Arguments for Method [ECW1]: (2 arguments defined for method invocation) [ 33.200054] Arg0: 0000000047ba53b8 <Obj> Integer 00000000000000A0 [ 33.200057] Arg1: 00000000100f24ce <Obj> Integer 0000000000000000 [ 33.200060] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.200064] ACPI Error: Method parse/execution failed \ECWB, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.200068] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_BAD_PARAMETER (20181213/psparse-531) [ 33.200075] ACPI: \_SB_.UBTC: failed to evaluate _DSM (0x1001) I am attaching three sleepgraph timelines, one from each of the three systems.
Created attachment 281149 [details] dell-XPS-13-9380_mem.html
Created attachment 281151 [details] otcpl-dell-9370-kblr_mem.html
Created attachment 281153 [details] otcpl-dell-p3520_mem.html
The occurrence rate of this issue is as follows: XPS 9380 : 12.7% Precision 3520 : 7.5% XPS 9370 : 3.5%
Created attachment 281223 [details] dell-precision-3520-acpidump.out
Created attachment 281225 [details] dell-xps-9370-acpidump.out
Created attachment 281227 [details] dell-xps-9380-acpidump.out
P.S. the dmesg logs are embedded in the html timelines attached. Open them in a browser and click the "dmesg" button in the upper right hand corner.
Created attachment 281241 [details] otcpl-dell-9370-kblr_bootdmesg.log
Created attachment 281243 [details] otcpl-dell-p3520_bootdmesg.log
Created attachment 281245 [details] otcpl-dell-xps-13-9380_bootdmesg.log
For this particular error, please try to go into BIOS setup and turn off C-states and repeat this test run. See if it happens again.
Linux, via the intel_idle driver, ignores ACPI BIOS C-states, and instead invokes them directly with known MWAIT hints. So if we "turn off C-states" in BIOS SETUP, we would not expect that to have any impact on how Linux invokes C-states. What is it that we are trying to do? If the answer is to "not invoke C-states", we have full control over that from within Linux -- just need to know which states to disable. (indeed, we can disable all the way down to C1 and poll in idle instead)
I tried disabling cstate switching from the kernel command line: processor.max_cstate=0 intel_idle.max_cstate=0 No change.
The problem is that we are handling EC events/read/write after EC stopped. So this is a regression? say, we don't have the problem with 5.0-rc3 and earlier?
Created attachment 282371 [details] issue.def
Created attachment 282401 [details] issue.def
We are seeing this on over 10% of suspends on the Dell 9380, and we are seeing also as much on the Dell p3520 and Dell 9370: 5.3.0-rc3+ otcpl-dell-9380-cfl mem-x2157 245 11.36% html 5.3.0-rc3+ otcpl-dell-p3520 mem-x2381 178 7.48% html 5.3.0-rc3+ otcpl-dell-9370-kblr mem-x611 43 7.04% html
i have a similar issue this is what i found in the log file Apr 14 15:51:57 sqar-7642 NetworkManager[1111]: <info> [1586904716.9747] dhcp4 (wlp3s0): state changed bound -> bound Apr 14 18:32:21 sqar-7642 kernel: [894764.840250] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20170831/evregion-300) Apr 14 18:32:21 sqar-7642 kernel: [894764.840273] No Local Variables are initialized for Method [ECW1] Apr 14 18:32:21 sqar-7642 kernel: [894764.840274] Initialized Arguments for Method [ECW1]: (2 arguments defined for method invocation) Apr 14 18:32:21 sqar-7642 kernel: [894764.840275] Arg0: 000000000db5b4fd <Obj> Integer 0000000000000033 Apr 14 18:32:21 sqar-7642 kernel: [894764.840279] Arg1: 0000000033909f7b <Obj> Integer 0000000000000003 Apr 14 18:32:21 sqar-7642 kernel: [894764.840284] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_TIME (20170831/psparse-550) Apr 14 18:32:21 sqar-7642 kernel: [894764.840291] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.KDRT, AE_TIME (20170831/psparse-550) Apr 14 18:32:21 sqar-7642 kernel: [894764.840295] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.NGFF._TMP, AE_TIME (20170831/psparse-550) Apr 14 18:32:23 sqar-7642 kernel: [894767.412237] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20170831/evregion-300) Apr 14 18:32:23 sqar-7642 kernel: [894767.412254] No Local Variables are initialized for Method [ECW1] Apr 14 18:32:23 sqar-7642 kernel: [894767.412255] Initialized Arguments for Method [ECW1]: (2 arguments defined for method invocation) Apr 14 18:32:23 sqar-7642 kernel: [894767.412256] Arg0: 00000000e0cd19f7 <Obj> Integer 0000000000000033 Apr 14 18:32:23 sqar-7642 kernel: [894767.412260] Arg1: 000000002cbf10dc <Obj> Integer 0000000000000001 Apr 14 18:32:23 sqar-7642 kernel: [894767.412265] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_TIME (20170831/psparse-550) Apr 14 18:32:23 sqar-7642 kernel: [894767.412271] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.KDRT, AE_TIME (20170831/psparse-550) Apr 14 18:32:23 sqar-7642 kernel: [894767.412274] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.TSKN._TMP, AE_TIME (20170831/psparse-550) \00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] microcode: microcode updated early to revision 0xca, date = 2019-09-26 Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Linux version 4.15.0-96-generic (buildd@lgw01-amd64-007) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)) #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 (Ubuntu 4.15.0-96.97~16.04.1-generic 4.15.18) Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-96-generic root=UUID=33c159b3-cdef-4ad4-9aef-09c65915144c ro quiet splash vt.handoff=7 Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] KERNEL supported cpus: Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Intel GenuineIntel Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] AMD AuthenticAMD Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Centaur CentaurHauls Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
This bug happens about 6% of the time on my Dell 9365, and has done so for all kernels tested: 5.7.0-rc3+ lenb-Dell-XPS-13-9365 mem-x2450 156 6.37% html 5.7.0-rc2+ lenb-Dell-XPS-13-9365 mem-x2042 140 6.86% html 5.6.0+ lenb-Dell-XPS-13-9365 mem-x1015 65 6.40% html 5.6.0+ lenb-Dell-XPS-13-9365 mem-x2040 117 5.74% html 5.5.0+ lenb-Dell-XPS-13-9365 mem-x995 63 6.33% html 5.4.0+ lenb-Dell-XPS-13-9365 mem-x10 1 10.00% html 5.4.0+ lenb-Dell-XPS-13-9365 mem-x212 14 6.60% html 5.4.0+ lenb-Dell-XPS-13-9365 mem-x987 58 5.88% html 5.3.0-45-generic lenb-Dell-XPS-13-9365 mem-x977 54 5.53% html
Dell 9365 results above were not with very latest Dell BIOS: BIOS : 2.9.0 BIOSDATE : 07/24/2019 Updating today to BIOS 2.10.0, and will re-test. Dell's release notes do not suggest this problem will go away, but always good to test latest BIOS for a BIOS-related issue.
Confirmed with a 3-hour stress test, that the Dell 9365 with latest BIOS (2.10.0) still has these issues on about 7% of ACPI S3's: 5.7.0-rc4+ lenb-Dell-XPS-13-9365 mem-x582 43 7.39%
Seeing something similar on a XPS 13 9310 2-in-1 (Bios 2.6.0, Linux 5.16.7) during suspend: [67202.879700] PM: suspend entry (s2idle) [67202.902166] Filesystems sync: 0.022 seconds [67203.024292] Freezing user space processes ... (elapsed 0.002 seconds) done. [67203.027156] OOM killer disabled. [67203.027158] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [67203.028401] printk: Suspending console(s) (use no_console_suspend to debug) [67203.355481] ACPI: EC: interrupt blocked [71671.346603] ACPI: EC: interrupt unblocked [71672.212171] ACPI Error: Thread 983940608 cannot release Mutex [ECMX] acquired by thread 3944857088 (20210730/exmutex-378) [71672.212185] No Local Variables are initialized for Method [_Q66] [71672.212187] No Arguments are initialized for method [_Q66] [71672.212189] ACPI Error: Aborting method \_SB.PC00.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20210730/psparse-529) Should I file a separate bug for this one?
Edit: typo, meant Linux 5.15.7
Same issue on Dell Precision 7560, Ubuntu 22.04.2, Kernel 5.19.0-10.generic .
dell xps 7390 2 in 1, arch, latest BIOS Linux arazu 6.2.12-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 20 Apr 2023 16:11:27 +0000 x86_64 GNU/Linux ``` [ +0.008768] ACPI Error: Thread 20096704 cannot release Mutex [ECMX] acquired by thread 18972672 (20221020/exmutex-37> [ +0.000009] ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20221020/> [ +0.490374] PM: suspend entry (deep) [ +0.009559] Filesystems sync: 0.009 seconds ```