Bug 202583
Summary: | Dell ACPI error in \_SB.PCI0.LPCB.ECDV.ECR1 | ||
---|---|---|---|
Product: | ACPI | Reporter: | Todd Brandt (todd.e.brandt) |
Component: | BIOS | Assignee: | Zhang Rui (rui.zhang) |
Status: | NEEDINFO --- | ||
Severity: | normal | CC: | antonio.petricca, Esokrarkose, jabame55, kernel, lenb, Perry.Yuan, rui.zhang, serg |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.0.0-rc5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 178231 | ||
Attachments: |
dell-XPS-13-9380_mem.html
otcpl-dell-9370-kblr_mem.html otcpl-dell-p3520_mem.html dell-precision-3520-acpidump.out dell-xps-9370-acpidump.out dell-xps-9380-acpidump.out otcpl-dell-9370-kblr_bootdmesg.log otcpl-dell-p3520_bootdmesg.log otcpl-dell-xps-13-9380_bootdmesg.log issue.def issue.def |
Description
Todd Brandt
2019-02-14 23:51:20 UTC
Created attachment 281149 [details]
dell-XPS-13-9380_mem.html
Created attachment 281151 [details]
otcpl-dell-9370-kblr_mem.html
Created attachment 281153 [details]
otcpl-dell-p3520_mem.html
The occurrence rate of this issue is as follows: XPS 9380 : 12.7% Precision 3520 : 7.5% XPS 9370 : 3.5% Created attachment 281223 [details]
dell-precision-3520-acpidump.out
Created attachment 281225 [details]
dell-xps-9370-acpidump.out
Created attachment 281227 [details]
dell-xps-9380-acpidump.out
P.S. the dmesg logs are embedded in the html timelines attached. Open them in a browser and click the "dmesg" button in the upper right hand corner. Created attachment 281241 [details]
otcpl-dell-9370-kblr_bootdmesg.log
Created attachment 281243 [details]
otcpl-dell-p3520_bootdmesg.log
Created attachment 281245 [details]
otcpl-dell-xps-13-9380_bootdmesg.log
For this particular error, please try to go into BIOS setup and turn off C-states and repeat this test run. See if it happens again. Linux, via the intel_idle driver, ignores ACPI BIOS C-states, and instead invokes them directly with known MWAIT hints. So if we "turn off C-states" in BIOS SETUP, we would not expect that to have any impact on how Linux invokes C-states. What is it that we are trying to do? If the answer is to "not invoke C-states", we have full control over that from within Linux -- just need to know which states to disable. (indeed, we can disable all the way down to C1 and poll in idle instead) I tried disabling cstate switching from the kernel command line: processor.max_cstate=0 intel_idle.max_cstate=0 No change. The problem is that we are handling EC events/read/write after EC stopped. So this is a regression? say, we don't have the problem with 5.0-rc3 and earlier? Created attachment 282371 [details]
issue.def
Created attachment 282401 [details]
issue.def
We are seeing this on over 10% of suspends on the Dell 9380, and we are seeing also as much on the Dell p3520 and Dell 9370: 5.3.0-rc3+ otcpl-dell-9380-cfl mem-x2157 245 11.36% html 5.3.0-rc3+ otcpl-dell-p3520 mem-x2381 178 7.48% html 5.3.0-rc3+ otcpl-dell-9370-kblr mem-x611 43 7.04% html i have a similar issue this is what i found in the log file Apr 14 15:51:57 sqar-7642 NetworkManager[1111]: <info> [1586904716.9747] dhcp4 (wlp3s0): state changed bound -> bound Apr 14 18:32:21 sqar-7642 kernel: [894764.840250] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20170831/evregion-300) Apr 14 18:32:21 sqar-7642 kernel: [894764.840273] No Local Variables are initialized for Method [ECW1] Apr 14 18:32:21 sqar-7642 kernel: [894764.840274] Initialized Arguments for Method [ECW1]: (2 arguments defined for method invocation) Apr 14 18:32:21 sqar-7642 kernel: [894764.840275] Arg0: 000000000db5b4fd <Obj> Integer 0000000000000033 Apr 14 18:32:21 sqar-7642 kernel: [894764.840279] Arg1: 0000000033909f7b <Obj> Integer 0000000000000003 Apr 14 18:32:21 sqar-7642 kernel: [894764.840284] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_TIME (20170831/psparse-550) Apr 14 18:32:21 sqar-7642 kernel: [894764.840291] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.KDRT, AE_TIME (20170831/psparse-550) Apr 14 18:32:21 sqar-7642 kernel: [894764.840295] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.NGFF._TMP, AE_TIME (20170831/psparse-550) Apr 14 18:32:23 sqar-7642 kernel: [894767.412237] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] (20170831/evregion-300) Apr 14 18:32:23 sqar-7642 kernel: [894767.412254] No Local Variables are initialized for Method [ECW1] Apr 14 18:32:23 sqar-7642 kernel: [894767.412255] Initialized Arguments for Method [ECW1]: (2 arguments defined for method invocation) Apr 14 18:32:23 sqar-7642 kernel: [894767.412256] Arg0: 00000000e0cd19f7 <Obj> Integer 0000000000000033 Apr 14 18:32:23 sqar-7642 kernel: [894767.412260] Arg1: 000000002cbf10dc <Obj> Integer 0000000000000001 Apr 14 18:32:23 sqar-7642 kernel: [894767.412265] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.ECW1, AE_TIME (20170831/psparse-550) Apr 14 18:32:23 sqar-7642 kernel: [894767.412271] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.KDRT, AE_TIME (20170831/psparse-550) Apr 14 18:32:23 sqar-7642 kernel: [894767.412274] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.ECDV.TSKN._TMP, AE_TIME (20170831/psparse-550) \00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] microcode: microcode updated early to revision 0xca, date = 2019-09-26 Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Linux version 4.15.0-96-generic (buildd@lgw01-amd64-007) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)) #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 (Ubuntu 4.15.0-96.97~16.04.1-generic 4.15.18) Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-96-generic root=UUID=33c159b3-cdef-4ad4-9aef-09c65915144c ro quiet splash vt.handoff=7 Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] KERNEL supported cpus: Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Intel GenuineIntel Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] AMD AuthenticAMD Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] Centaur CentaurHauls Apr 15 07:38:37 sqar-7642 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' This bug happens about 6% of the time on my Dell 9365, and has done so for all kernels tested: 5.7.0-rc3+ lenb-Dell-XPS-13-9365 mem-x2450 156 6.37% html 5.7.0-rc2+ lenb-Dell-XPS-13-9365 mem-x2042 140 6.86% html 5.6.0+ lenb-Dell-XPS-13-9365 mem-x1015 65 6.40% html 5.6.0+ lenb-Dell-XPS-13-9365 mem-x2040 117 5.74% html 5.5.0+ lenb-Dell-XPS-13-9365 mem-x995 63 6.33% html 5.4.0+ lenb-Dell-XPS-13-9365 mem-x10 1 10.00% html 5.4.0+ lenb-Dell-XPS-13-9365 mem-x212 14 6.60% html 5.4.0+ lenb-Dell-XPS-13-9365 mem-x987 58 5.88% html 5.3.0-45-generic lenb-Dell-XPS-13-9365 mem-x977 54 5.53% html Dell 9365 results above were not with very latest Dell BIOS: BIOS : 2.9.0 BIOSDATE : 07/24/2019 Updating today to BIOS 2.10.0, and will re-test. Dell's release notes do not suggest this problem will go away, but always good to test latest BIOS for a BIOS-related issue. Confirmed with a 3-hour stress test, that the Dell 9365 with latest BIOS (2.10.0) still has these issues on about 7% of ACPI S3's: 5.7.0-rc4+ lenb-Dell-XPS-13-9365 mem-x582 43 7.39% Seeing something similar on a XPS 13 9310 2-in-1 (Bios 2.6.0, Linux 5.16.7) during suspend: [67202.879700] PM: suspend entry (s2idle) [67202.902166] Filesystems sync: 0.022 seconds [67203.024292] Freezing user space processes ... (elapsed 0.002 seconds) done. [67203.027156] OOM killer disabled. [67203.027158] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [67203.028401] printk: Suspending console(s) (use no_console_suspend to debug) [67203.355481] ACPI: EC: interrupt blocked [71671.346603] ACPI: EC: interrupt unblocked [71672.212171] ACPI Error: Thread 983940608 cannot release Mutex [ECMX] acquired by thread 3944857088 (20210730/exmutex-378) [71672.212185] No Local Variables are initialized for Method [_Q66] [71672.212187] No Arguments are initialized for method [_Q66] [71672.212189] ACPI Error: Aborting method \_SB.PC00.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20210730/psparse-529) Should I file a separate bug for this one? Edit: typo, meant Linux 5.15.7 Same issue on Dell Precision 7560, Ubuntu 22.04.2, Kernel 5.19.0-10.generic . dell xps 7390 2 in 1, arch, latest BIOS Linux arazu 6.2.12-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 20 Apr 2023 16:11:27 +0000 x86_64 GNU/Linux ``` [ +0.008768] ACPI Error: Thread 20096704 cannot release Mutex [ECMX] acquired by thread 18972672 (20221020/exmutex-37> [ +0.000009] ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20221020/> [ +0.490374] PM: suspend entry (deep) [ +0.009559] Filesystems sync: 0.009 seconds ``` |