Bug 202583

Summary: Dell ACPI error in \_SB.PCI0.LPCB.ECDV.ECR1
Product: ACPI Reporter: Todd Brandt (todd.e.brandt)
Component: BIOSAssignee: Zhang Rui (rui.zhang)
Status: NEEDINFO ---    
Severity: normal CC: antonio.petricca, Esokrarkose, jabame55, kernel, lenb, Perry.Yuan, rui.zhang, serg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.0.0-rc5 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 178231    
Attachments: dell-XPS-13-9380_mem.html
otcpl-dell-9370-kblr_mem.html
otcpl-dell-p3520_mem.html
dell-precision-3520-acpidump.out
dell-xps-9370-acpidump.out
dell-xps-9380-acpidump.out
otcpl-dell-9370-kblr_bootdmesg.log
otcpl-dell-p3520_bootdmesg.log
otcpl-dell-xps-13-9380_bootdmesg.log
issue.def
issue.def

Description Todd Brandt 2019-02-14 23:51:20 UTC
We run weekly 48+ hour stress testing on various systems with the latest upstream kernel. Starting in 5.0.0-rc4 we started to notice recurring ACPI errors on several Dell platforms during S3 suspend/resume.

Here's some snippets of dmesg log where the issue occurs on 3 dells:

Dell XPS 9380:
[95784.411872] PM: Saving platform NVS memory
[95784.411893] Disabling non-boot CPUs ...
[95784.413309] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266)
[95784.413332] 
[95784.413335]   Local0: 00000000608d0738 <Obj>           Integer 0000000000000000
[95784.413349] Initialized Arguments for Method [ECR1]:  (1 arguments defined for method invocation)
[95784.413351]   Arg0:   000000005ec5fa8f <Obj>           Integer 000000000000000B
[95784.413361] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR1, AE_BAD_PARAMETER (20181213/psparse-531)
[95784.413375] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR2, AE_BAD_PARAMETER (20181213/psparse-531)
[95784.413387] ACPI Error: Method parse/execution failed \ECRW, AE_BAD_PARAMETER (20181213/psparse-531)
[95784.413397] ACPI Error: Method parse/execution failed \ECGD, AE_BAD_PARAMETER (20181213/psparse-531)
[95784.413407] ACPI Error: Method parse/execution failed \NEVT, AE_BAD_PARAMETER (20181213/psparse-531)
[95784.413418] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV._Q66, AE_BAD_PARAMETER (20181213/psparse-531)

Dell Precision 3520:
[95893.925428] PM: Saving platform NVS memory
[95893.925564] Disabling non-boot CPUs ...
[95893.927607] smpboot: CPU 1 is now offline
[95893.927622] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266)
[95893.927653] 
[95893.927656]   Local0: 00000000686401e3 <Obj>           Integer 0000000000000000
[95893.927675] Initialized Arguments for Method [ECR1]:  (1 arguments defined for method invocation)
[95893.927677]   Arg0:   00000000b17d0acd <Obj>           Integer 000000000000000B
[95893.927692] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR1, AE_BAD_PARAMETER (20181213/psparse-531)
[95893.927712] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECR2, AE_BAD_PARAMETER (20181213/psparse-531)
[95893.927728] ACPI Error: Method parse/execution failed \ECRW, AE_BAD_PARAMETER (20181213/psparse-531)
[95893.927744] ACPI Error: Method parse/execution failed \ECGD, AE_BAD_PARAMETER (20181213/psparse-531)
[95893.927759] ACPI Error: Method parse/execution failed \NEVT, AE_BAD_PARAMETER (20181213/psparse-531)
[95893.927775] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV._Q66, AE_BAD_PARAMETER (20181213/psparse-531)


Dell XPS 9370:
[   33.196759] PM: Saving platform NVS memory
[   33.196815] Disabling non-boot CPUs ...
[   33.197105] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266)
[   33.197112] No Local Variables are initialized for Method [ECW1]
[   33.197114] Initialized Arguments for Method [ECW1]:  (2 arguments defined for method invocation)
[   33.198145]   Arg0:   0000000047ba53b8 <Obj>           Integer 000000000000008A
[   33.198151]   Arg1:   00000000100f24ce <Obj>           Integer 0000000000000002
[   33.198156] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1
[   33.198159] smpboot: CPU 1 is now offline
[   33.198160] , AE_BAD_PARAMETER (20181213/psparse-531)
[   33.198163] ACPI Error: Method parse/execution failed \ECWB, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.198167] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.198176] ACPI: \_SB_.UBTC: failed to evaluate _DSM (0x1001)
[   33.198178] ucsi_acpi USBC000:00: ucsi_acpi_dsm: failed to evaluate _DSM 1
[   33.198180] ucsi_acpi USBC000:00: ucsi_ack: failed
[   33.198181] ucsi_acpi USBC000:00: con2: failed to get status
[   33.198523] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266)
[   33.198533] No Local Variables are initialized for Method [ECW1]
[   33.198534] Initialized Arguments for Method [ECW1]:  (2 arguments defined for method invocation)
[   33.198535]   Arg0:   00000000100f24ce <Obj>           Integer 00000000000000A0
[   33.198539]   Arg1:   0000000047ba53b8 <Obj>           Integer 0000000000000000
[   33.198542] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.198548] ACPI Error: Method parse/execution failed \ECWB, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.198552] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.198564] ACPI: \_SB_.UBTC: failed to evaluate _DSM (0x1001)
[   33.198566] ucsi_acpi USBC000:00: ucsi_acpi_dsm: failed to evaluate _DSM 1
[   33.200027] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20181213/evregion-266)
[   33.200049] smpboot: CPU 2 is now offline
[   33.200052] No Local Variables are initialized for Method [ECW1]
[   33.200053] Initialized Arguments for Method [ECW1]:  (2 arguments defined for method invocation)
[   33.200054]   Arg0:   0000000047ba53b8 <Obj>           Integer 00000000000000A0
[   33.200057]   Arg1:   00000000100f24ce <Obj>           Integer 0000000000000000
[   33.200060] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.200064] ACPI Error: Method parse/execution failed \ECWB, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.200068] ACPI Error: Method parse/execution failed \_SB.UBTC._DSM, AE_BAD_PARAMETER (20181213/psparse-531)
[   33.200075] ACPI: \_SB_.UBTC: failed to evaluate _DSM (0x1001)

I am attaching three sleepgraph timelines, one from each of the three systems.
Comment 1 Todd Brandt 2019-02-14 23:53:48 UTC
Created attachment 281149 [details]
dell-XPS-13-9380_mem.html
Comment 2 Todd Brandt 2019-02-14 23:54:05 UTC
Created attachment 281151 [details]
otcpl-dell-9370-kblr_mem.html
Comment 3 Todd Brandt 2019-02-14 23:54:30 UTC
Created attachment 281153 [details]
otcpl-dell-p3520_mem.html
Comment 4 Todd Brandt 2019-02-15 00:00:12 UTC
The occurrence rate of this issue is as follows:

XPS 9380       : 12.7%
Precision 3520 : 7.5%
XPS 9370       : 3.5%
Comment 5 Todd Brandt 2019-02-19 22:44:07 UTC
Created attachment 281223 [details]
dell-precision-3520-acpidump.out
Comment 6 Todd Brandt 2019-02-19 22:44:32 UTC
Created attachment 281225 [details]
dell-xps-9370-acpidump.out
Comment 7 Todd Brandt 2019-02-19 22:44:51 UTC
Created attachment 281227 [details]
dell-xps-9380-acpidump.out
Comment 8 Todd Brandt 2019-02-20 19:46:28 UTC
P.S. the dmesg logs are embedded in the html timelines attached. Open them in a browser and click the "dmesg" button in the upper right hand corner.
Comment 9 Todd Brandt 2019-02-20 20:05:09 UTC
Created attachment 281241 [details]
otcpl-dell-9370-kblr_bootdmesg.log
Comment 10 Todd Brandt 2019-02-20 20:05:33 UTC
Created attachment 281243 [details]
otcpl-dell-p3520_bootdmesg.log
Comment 11 Todd Brandt 2019-02-20 20:05:56 UTC
Created attachment 281245 [details]
otcpl-dell-xps-13-9380_bootdmesg.log
Comment 12 Mario Limonciello 2019-03-07 17:03:38 UTC
For this particular error, please try to go into BIOS setup and turn off C-states and repeat this test run.  See if it happens again.
Comment 13 Len Brown 2019-03-28 20:31:09 UTC
Linux, via the intel_idle driver, ignores ACPI BIOS C-states,
and instead invokes them directly with known MWAIT hints.

So if we "turn off C-states" in BIOS SETUP, we would not expect
that to have any impact on how Linux invokes C-states.

What is it that we are trying to do?  If the answer is to
"not invoke C-states", we have full control over that from
within Linux -- just need to know which states to disable.
(indeed, we can disable all the way down to C1 and poll in idle instead)
Comment 14 Todd Brandt 2019-04-01 18:44:13 UTC
I tried disabling cstate switching from the kernel command line:

processor.max_cstate=0 intel_idle.max_cstate=0

No change.
Comment 15 Zhang Rui 2019-04-02 03:57:18 UTC
The problem is that we are handling EC events/read/write after EC stopped.
So this is a regression? say, we don't have the problem with 5.0-rc3 and earlier?
Comment 16 Todd Brandt 2019-04-17 19:20:51 UTC
Created attachment 282371 [details]
issue.def
Comment 17 Todd Brandt 2019-04-19 07:25:42 UTC
Created attachment 282401 [details]
issue.def
Comment 18 Len Brown 2019-08-15 14:47:03 UTC
We are seeing this on over 10% of suspends on the Dell 9380,
and we are seeing also as much on the Dell p3520 and Dell 9370:

5.3.0-rc3+	otcpl-dell-9380-cfl	mem-x2157	245	11.36%	html
5.3.0-rc3+	otcpl-dell-p3520	mem-x2381	178	7.48%	html
5.3.0-rc3+	otcpl-dell-9370-kblr	mem-x611	43	7.04%	html
Comment 19 Rup 2020-04-15 20:30:23 UTC
i have a similar issue this is what i found in the log file

Apr 14 15:51:57 sqar-7642 NetworkManager[1111]: <info>  [1586904716.9747] dhcp4 (wlp3s0): state changed bound -> bound
Apr 14 18:32:21 sqar-7642 kernel: [894764.840250] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20170831/evregion-300)
Apr 14 18:32:21 sqar-7642 kernel: [894764.840273] No Local Variables are initialized for Method [ECW1]
Apr 14 18:32:21 sqar-7642 kernel: [894764.840274] Initialized Arguments for Method [ECW1]:  (2 arguments defined for method invocation)
Apr 14 18:32:21 sqar-7642 kernel: [894764.840275]   Arg0:   000000000db5b4fd <Obj>           Integer 0000000000000033
Apr 14 18:32:21 sqar-7642 kernel: [894764.840279]   Arg1:   0000000033909f7b <Obj>           Integer 0000000000000003
Apr 14 18:32:21 sqar-7642 kernel: [894764.840284] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_TIME (20170831/psparse-550)
Apr 14 18:32:21 sqar-7642 kernel: [894764.840291] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.KDRT, AE_TIME (20170831/psparse-550)
Apr 14 18:32:21 sqar-7642 kernel: [894764.840295] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.NGFF._TMP, AE_TIME (20170831/psparse-550)
Apr 14 18:32:23 sqar-7642 kernel: [894767.412237] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20170831/evregion-300)
Apr 14 18:32:23 sqar-7642 kernel: [894767.412254] No Local Variables are initialized for Method [ECW1]
Apr 14 18:32:23 sqar-7642 kernel: [894767.412255] Initialized Arguments for Method [ECW1]:  (2 arguments defined for method invocation)
Apr 14 18:32:23 sqar-7642 kernel: [894767.412256]   Arg0:   00000000e0cd19f7 <Obj>           Integer 0000000000000033
Apr 14 18:32:23 sqar-7642 kernel: [894767.412260]   Arg1:   000000002cbf10dc <Obj>           Integer 0000000000000001
Apr 14 18:32:23 sqar-7642 kernel: [894767.412265] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_TIME (20170831/psparse-550)
Apr 14 18:32:23 sqar-7642 kernel: [894767.412271] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.KDRT, AE_TIME (20170831/psparse-550)
Apr 14 18:32:23 sqar-7642 kernel: [894767.412274] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.TSKN._TMP, AE_TIME (20170831/psparse-550)
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00Apr 15 07:38:37 sqar-7642 kernel: [    0.000000] microcode: microcode updated early to revision 0xca, date = 2019-09-26
Apr 15 07:38:37 sqar-7642 kernel: [    0.000000] Linux version 4.15.0-96-generic (buildd@lgw01-amd64-007) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)) #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 (Ubuntu 4.15.0-96.97~16.04.1-generic 4.15.18)
Apr 15 07:38:37 sqar-7642 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-96-generic root=UUID=33c159b3-cdef-4ad4-9aef-09c65915144c ro quiet splash vt.handoff=7
Apr 15 07:38:37 sqar-7642 kernel: [    0.000000] KERNEL supported cpus:
Apr 15 07:38:37 sqar-7642 kernel: [    0.000000]   Intel GenuineIntel
Apr 15 07:38:37 sqar-7642 kernel: [    0.000000]   AMD AuthenticAMD
Apr 15 07:38:37 sqar-7642 kernel: [    0.000000]   Centaur CentaurHauls
Apr 15 07:38:37 sqar-7642 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Comment 20 Len Brown 2020-05-06 15:06:57 UTC
This bug happens about 6% of the time on my Dell 9365, and has done so for all kernels tested:

5.7.0-rc3+	lenb-Dell-XPS-13-9365	mem-x2450	156	6.37%	html
5.7.0-rc2+	lenb-Dell-XPS-13-9365	mem-x2042	140	6.86%	html
5.6.0+	lenb-Dell-XPS-13-9365	mem-x1015	65	6.40%	html
5.6.0+	lenb-Dell-XPS-13-9365	mem-x2040	117	5.74%	html
5.5.0+	lenb-Dell-XPS-13-9365	mem-x995	63	6.33%	html
5.4.0+	lenb-Dell-XPS-13-9365	mem-x10	1	10.00%	html
5.4.0+	lenb-Dell-XPS-13-9365	mem-x212	14	6.60%	html
5.4.0+	lenb-Dell-XPS-13-9365	mem-x987	58	5.88%	html
5.3.0-45-generic	lenb-Dell-XPS-13-9365	mem-x977	54	5.53%	html
Comment 21 Len Brown 2020-05-06 15:44:33 UTC
Dell 9365 results above were not with very latest Dell BIOS:
    BIOS     : 2.9.0
    BIOSDATE : 07/24/2019

Updating today to BIOS 2.10.0, and will re-test.

Dell's release notes do not suggest this problem will go away,
but always good to test latest BIOS for a BIOS-related issue.
Comment 22 Len Brown 2020-05-07 05:02:56 UTC
Confirmed with a 3-hour stress test, that the Dell 9365 with latest BIOS (2.10.0) still has these issues on about 7% of ACPI S3's:

5.7.0-rc4+	lenb-Dell-XPS-13-9365	mem-x582	43	7.39%
Comment 23 Esokrarkose 2021-12-12 20:20:35 UTC
Seeing something similar on a XPS 13 9310 2-in-1 (Bios 2.6.0, Linux 5.16.7) during suspend:

[67202.879700] PM: suspend entry (s2idle)
[67202.902166] Filesystems sync: 0.022 seconds
[67203.024292] Freezing user space processes ... (elapsed 0.002 seconds) done.
[67203.027156] OOM killer disabled.
[67203.027158] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[67203.028401] printk: Suspending console(s) (use no_console_suspend to debug)
[67203.355481] ACPI: EC: interrupt blocked
[71671.346603] ACPI: EC: interrupt unblocked
[71672.212171] ACPI Error: Thread 983940608 cannot release Mutex [ECMX] acquired by thread 3944857088 (20210730/exmutex-378)

[71672.212185] No Local Variables are initialized for Method [_Q66]

[71672.212187] No Arguments are initialized for method [_Q66]

[71672.212189] ACPI Error: Aborting method \_SB.PC00.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20210730/psparse-529)


Should I file a separate bug for this one?
Comment 24 Esokrarkose 2021-12-12 20:23:09 UTC
Edit: typo, meant Linux 5.15.7
Comment 25 Antonio Petricca 2023-04-19 21:02:28 UTC
Same issue on Dell Precision 7560, Ubuntu 22.04.2, Kernel 5.19.0-10.generic .
Comment 26 Serg Podtynnyi 2023-04-23 12:10:50 UTC
dell xps 7390 2 in 1, arch, latest BIOS
Linux arazu 6.2.12-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 20 Apr 2023 16:11:27 +0000 x86_64 GNU/Linux

```
[  +0.008768] ACPI Error: Thread 20096704 cannot release Mutex [ECMX] acquired by thread 18972672 (20221020/exmutex-37>
[  +0.000009] ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20221020/>
[  +0.490374] PM: suspend entry (deep)
[  +0.009559] Filesystems sync: 0.009 seconds
```