Bug 200011

Summary: BISECTED 5a8361f7ecce: /proc/acpi/wakeup LID0 doesn't work - MacBookAir6,2
Product: ACPI Reporter: Paulo Nascimento (paulo.ulusu)
Component: Power-LidAssignee: Zhang Rui (rui.zhang)
Status: RESOLVED DUPLICATE    
Severity: normal CC: a3at.mail, adam, adrien-xx-kernel-bz, andrey+kernel, archlinux, lenb, muellerjona, paulo.ulusu, rui.zhang, vlasov, yu.c.chen
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.17.0-1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg | grep ACPI
acpidump.out
dmesg output kernel 4.14 LID0
dmesg output kernel 4.17 power_button
dmesg_output_kernel_4.19rc7_macbookair62.txt
Dmesg output
dmesg without my patch
dmesg with my patch
Add operation region addresses to the global address list
dmesg with es-add-address patch
Stop clearing GPE's when
Bob's _REG patch
dmesg with both patches
acpidump
Load table at the same time regardless of module-level code option
custom dsdt to return EACP for AC status
dmesg with patch comment 52
dmesg 4.20_rc1
acpi.debug_level=0x420 acpi.debug_layer=0x05 on good kernel
acpi.debug_level=0x420 acpi.debug_layer=0x05 on bad kernel
debug patch to force re-install EC address space handler

Description Paulo Nascimento 2018-06-09 17:53:57 UTC
After updating to kernel 4.17.0-1 LID0 doesn’t wake up laptop after suspend. In order to wake up the laptop I have to type any key or use the mouse.
After going back to kernel 4.14.48-1 the laptop wakes up when opening the lid.
Comment 1 Chen Yu 2018-06-11 02:32:25 UTC
Could you provide cat /proc/acpi/wakeup and check if LID0 is enabled or not please?
Comment 2 Zhang Rui 2018-06-12 03:30:19 UTC
and please attach the acpidump output
Comment 3 Paulo Nascimento 2018-06-17 21:38:39 UTC
[pmmn@pn-macbook ~]$ inxi -F
System:
  Host: pn-macbook Kernel: 4.17.0-2-MANJARO x86_64 bits: 64 
  Desktop: KDE Plasma 5.12.5 
  Distro: Manjaro Linux 17.1.10 Hakoila 
Machine:
  Type: Laptop System: Apple product: MacBookPro9,2 v: 1.0 
  serial: <root required> 
  Mobo: Apple model: Mac-6F01561E16C75D06 v: MacBookPro9,2 
  serial: <root required> UEFI: Apple 
  v: MBP91.88Z.00D3.B0C.1509111653 date: 09/11/2015 
CPU:
  Topology: Dual Core model: Intel Core i5-3210M bits: 64 
  type: MT MCP L2 cache: 3072 KiB 
  Speed: 2789 MHz min/max: 1200/3100 MHz Core speeds (MHz): 
  1: 2789 2: 1645 3: 2409 4: 2431 
Graphics:
  Card-1: Intel 3rd Gen Core processor Graphics driver: i915 
  v: kernel 
  Display: x11 server: X.Org 1.19.6 driver: intel 
  unloaded: fbdev,modesetting,vesa resolution: 1280x800~60Hz 
  OpenGL: renderer: Mesa DRI Intel Ivybridge Mobile 
  v: 4.2 Mesa 18.1.1 
Audio:
  Card-1: Intel 7 Series/C216 Family High Definition Audio 
  driver: snd_hda_intel 
  Sound Server: ALSA v: k4.17.0-2-MANJARO 
Network:
  Card-1: Broadcom Limited NetXtreme BCM57765 Gigabit Ethernet 
  PCIe 
  driver: tg3 
  IF: enp1s0f0 state: down mac: 68:5b:35:bc:93:7b 
  Card-2: Broadcom Limited BCM4331 802.11a/b/g/n 
  driver: bcma-pci-bridge 
  IF-ID-1: wlp2s0b1 state: up mac: 60:03:08:ed:ae:c0 
Drives:
  HDD Total Size: 698.65 GiB used: 94.16 GiB (13.5%) 
  ID-1: /dev/sda vendor: ^APPLE model: APPLE HDD HTS545050A7E362 
  size: 465.76 GiB 
  ID-2: /dev/sdb vendor: Seagate model: ST9250315AS 
  size: 232.89 GiB 
Partition:
  ID-1: / size: 26.17 GiB used: 8.52 GiB (32.5%) fs: ext4 
  dev: /dev/sdb6 
  ID-2: /home size: 151.01 GiB used: 85.63 GiB (56.7%) fs: ext4 
  dev: /dev/sdb5 
  ID-3: swap-1 size: 4.42 GiB used: 0 KiB (0.0%) fs: swap 
  dev: /dev/sdb4 
Sensors:
  System Temperatures: cpu: 52.0 C mobo: N/A 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 290 Uptime: 9m Memory: 7.70 GiB 
  used: 997.6 MiB (12.6%) Shell: bash inxi: 3.0.10
Comment 4 Paulo Nascimento 2018-06-17 21:39:15 UTC
[pmmn@pn-macbook ~]$ cat /proc/acpi/wakeup
Device  S-state   Status   Sysfs node
P0P2      S3    *disabled  pci:0000:00:01.0
PEG1      S3    *disabled
EC        S4    *disabled  platform:PNP0C09:00
GMUX      S3    *disabled  pnp:00:03
HDEF      S3    *disabled  pci:0000:00:1b.0
RP01      S3    *disabled  pci:0000:00:1c.0
GIGE      S3    *disabled  pci:0000:01:00.0
SDXC      S3    *disabled  pci:0000:01:00.1
RP02      S3    *disabled  pci:0000:00:1c.1
ARPT      S3    *disabled  pci:0000:02:00.0
RP03      S3    *disabled  pci:0000:00:1c.2
EHC1      S3    *enabled   pci:0000:00:1d.0
EHC2      S3    *enabled   pci:0000:00:1a.0
XHC1      S3    *enabled   pci:0000:00:14.0
ADP1      S4    *disabled  platform:ACPI0003:00
LID0      S4    *enabled   platform:PNP0C0D:00
Comment 5 Paulo Nascimento 2018-06-17 21:40:29 UTC
[pmmn@pn-macbook ~]$ dmesg | grep ACPI

Attached as file ACPI.info
Comment 6 Paulo Nascimento 2018-06-17 21:42:26 UTC
Created attachment 276627 [details]
dmesg | grep ACPI
Comment 7 Paulo Nascimento 2018-06-17 21:47:28 UTC
I didn't find acpidump in AUR
Comment 8 Zhang Rui 2018-06-21 00:50:29 UTC
you need to install acpidump tool and run "acpidump > acpidump.out" with root privilege.
Comment 9 Paulo Nascimento 2018-06-22 17:52:13 UTC
acpidump > acpidump.out
Comment 10 Paulo Nascimento 2018-06-22 17:52:50 UTC
Created attachment 276747 [details]
acpidump.out
Comment 11 Zhang Rui 2018-06-25 08:39:24 UTC
please attach the dmesg output after resume in both 4.14 kernel that woken up by LID and 4.17 kernel that woken up by power button.
Comment 12 Paulo Nascimento 2018-06-26 18:13:53 UTC
Created attachment 276883 [details]
dmesg output kernel 4.14 LID0
Comment 13 Paulo Nascimento 2018-06-26 18:14:42 UTC
Created attachment 276885 [details]
dmesg output kernel 4.17 power_button
Comment 14 Zhang Rui 2018-06-28 02:44:53 UTC
can you please do bisect to find out which commit introduces the problem?
Comment 15 Andrey Melentyev 2018-07-08 19:27:14 UTC
(In reply to Zhang Rui from comment #14)
> can you please do bisect to find out which commit introduces the problem?

Not the bug reporter but I've got the same issue with a MacBookAir6,2. Resume on lid open stopped working after upgrading from 4.16.13 to 4.17.2.

Bisect says: first bad commit: [5a8361f7ecceaed64b4064000d16cb703462be49] ACPICA: Integrate package handling with module-level code
Comment 16 Jona Müller 2018-07-08 21:08:59 UTC
Also not the bug reporter, but I can confirm that the issue is also introduced in 5a8361f7ecceaed64b4064000d16cb703462be49 on a MacBookPro12,1.
Comment 17 Erik Kaneda 2018-07-26 20:47:43 UTC
Could you test the latest 4.18 rc and post the dmesg if it still does not work?

Thanks
Comment 18 Paulo Nascimento 2018-07-28 21:03:02 UTC
Sorry. But with this heat my Macbook died. Thank you very much for your efforts. You have to ask another Macbook user.

Regards, ulusu
Comment 19 Erik Kaneda 2018-07-30 15:57:24 UTC
(In reply to Paulo Nascimento from comment #18)
> Sorry. But with this heat my Macbook died. Thank you very much for your
> efforts. You have to ask another Macbook user.

Sorry that this happened. Let's leave this one open for the other reporters.

Jona and Andrey, can you try the latest rc? We've been submitting fixes for the module-level code change.
Comment 20 Jona Müller 2018-07-30 19:02:37 UTC
Sorry, I was quite busy, I will build the latest RC today evening and report back. 

(In reply to Erik Schmauss from comment #19)
> (In reply to Paulo Nascimento from comment #18)
> > Sorry. But with this heat my Macbook died. Thank you very much for your
> > efforts. You have to ask another Macbook user.
> 
> Sorry that this happened. Let's leave this one open for the other reporters.
> 
> Jona and Andrey, can you try the latest rc? We've been submitting fixes for
> the module-level code change.
Comment 21 Andrey Melentyev 2018-07-30 20:56:27 UTC
Created attachment 277621 [details]
dmesg_output_kernel_4.19rc7_macbookair62.txt

$ dmesg > dmesg_output_kernel_4.19rc7_macbookair62.txt
Comment 22 Andrey Melentyev 2018-07-30 20:57:56 UTC
Sorry about the delay, I've tried 4.19-rc7 and for me the issue still persists (Macbook Air 6,2).

Please find dmesg output attached.
Comment 23 Jona Müller 2018-07-30 21:16:10 UTC
I have built d72e90f33aa4709ebecc5005562f52335e106a60 (tag v4.18-rc6) a few minutes ago and the issue still persists on my MacbookPro12,1. My ACPI wakeup configuration is 

> muellerjona@MacbookPro:~$ cat /proc/acpi/wakeup 
> Device        S-state   Status   Sysfs node
> PEG0    S3    *disabled
> EC      S4    *disabled  platform:PNP0C09:00
> HDEF    S3    *disabled  pci:0000:00:1b.0
> RP01    S3    *disabled  pci:0000:00:1c.0
> RP02    S3    *disabled  pci:0000:00:1c.1
> RP03    S3    *disabled  pci:0000:00:1c.2
> ARPT    S4    *enabled   pci:0000:03:00.0
> RP05    S3    *disabled  pci:0000:00:1c.4
> RP06    S3    *disabled  pci:0000:00:1c.5
> SPIT    S3    *disabled
> XHC1    S3    *disabled  pci:0000:00:14.0
> ADP1    S4    *disabled  platform:ACPI0003:00
> LID0    S4    *enabled   platform:PNP0C0D:00

Find my dmesg output attached.
Comment 24 Jona Müller 2018-07-30 21:17:20 UTC
Created attachment 277623 [details]
Dmesg output

$ sudo dmesg > dmesg-4.18.0-rc6.txt
Comment 25 Erik Kaneda 2018-07-31 16:48:49 UTC
Hi,

I'm not a LID expert but the dmesg outputs that both of you sent does not have an error like the following in comment 12:

[  172.092019] ACPI: EC: event unblocked
...
[  172.099293] ACPI Error: [\_SB_.PCI0.P0P2.GFX0] Namespace lookup failure, AE_NOT_FOUND (20170728/psargs-364)
[  172.099311] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.EC._Q80, AE_NOT_FOUND (20170728/psparse-550)

This error indicates that the EC tried to query a non-existent "device" called GFX0. The only difference that I see in the execution of AML is the above. I know that we thinkg that pointing to the module-level code change but I am curious about this missing error message. Rui, do you know why newer versions of the kernel do not have the above error message? Was there something significant that changed in GPE processing?
Comment 26 Erik Kaneda 2018-07-31 17:06:21 UTC
Rui, another thought: we need to record dmesg output when the lid opens/system wakes up. This will allow us to understand what's going on in the AML. Do you know of any debugging options to dump dmesg contents as the lid is opening and the system is waking?
Comment 28 Zhang Rui 2018-08-30 02:15:05 UTC
(In reply to Erik Schmauss from comment #25)
> Hi,
> 
> I'm not a LID expert but the dmesg outputs that both of you sent does not
> have an error like the following in comment 12:
> 
> [  172.092019] ACPI: EC: event unblocked
> ...
> [  172.099293] ACPI Error: [\_SB_.PCI0.P0P2.GFX0] Namespace lookup failure,
> AE_NOT_FOUND (20170728/psargs-364)
> [  172.099311] ACPI Error: Method parse/execution failed
> \_SB.PCI0.LPCB.EC._Q80, AE_NOT_FOUND (20170728/psparse-550)
> 
> This error indicates that the EC tried to query a non-existent "device"
> called GFX0. The only difference that I see in the execution of AML is the
> above. I know that we thinkg that pointing to the module-level code change
> but I am curious about this missing error message. Rui, do you know why
> newer versions of the kernel do not have the above error message? Was there
> something significant that changed in GPE processing?

first of all, GFX0 is not defined, thus we should get this error whenever _Q80 is evaluated. so, I guess this is because _Q80 is not invoked in new kernel at all.

here is the output from 4.14 kernel.
[  172.092019] ACPI: EC: event unblocked
...
[  172.098258] pciehp 0000:05:03.0:pcie204: Timeout on hotplug command 0x1038 (issued 171423 msec ago)
[  172.099293] ACPI Error: [\_SB_.PCI0.P0P2.GFX0] Namespace lookup failure, AE_NOT_FOUND (20170728/psargs-364)
[  172.099311] ACPI Error: Method parse/execution failed \_SB.PCI0.LPCB.EC._Q80, AE_NOT_FOUND (20170728/psparse-550)
[  172.287432] OOM killer enabled.
[  172.287433] Restarting tasks ... done.

and this is the output from 4.17 kernel
[  126.041792] pcieport 0000:05:06.0: quirk_apple_wait_for_thunderbolt+0x0/0xe0 took 38188 usecs
[  126.042372] ACPI: EC: event unblocked
[  126.048653] sd 0:0:0:0: [sda] Starting disk
[  126.048689] sd 1:0:0:0: [sdb] Starting disk
[  126.237365] OOM killer enabled.
[  126.237367] Restarting tasks ... done.

We can see that EC event is unblocked earlier in 4.14 kernel, which results in _Q80 evaluated.
So I think this error message is caused by the EC code changes, but I'm not sure if it is related with the lid problem or not.
Comment 29 Zhang Rui 2018-08-30 02:19:25 UTC
(In reply to Erik Schmauss from comment #26)
> Rui, another thought: we need to record dmesg output when the lid
> opens/system wakes up. This will allow us to understand what's going on in
> the AML. Do you know of any debugging options to dump dmesg contents as the
> lid is opening and the system is waking?

no, here is the process of S3 wakeup
1. user opens lid
2. an SCI/GPE fired, and wakes up the system
3. system starts to run code from the waking vector (FACS)
4. all the GPEs are cleared
5. system is resumed.

thus, for the GPE that wakes up system from S3, its AML/method handler is not invoked at all.
Comment 30 Zhang Rui 2018-08-30 02:21:02 UTC
for this particular issue, I'd suggest to introduce code to dump the GPE register status just before suspend and right after resume. This would help me to know if a specific GPE is enabled as wakeup GPE or not, and which GPE is fired during resume.
I think Yu has a patch for this before, let me find it out.
Comment 31 Andrey Melentyev 2018-09-08 18:47:22 UTC
I have one of the affected by the bug laptops and the bug is in "NEEDINFO". I'm up for installing a patch with debugging output, just point me to it.
Comment 32 Jean-Marc Lenoir 2018-09-30 13:13:41 UTC
I have the same bug on my MacBook Pro 9,2. I also have the battery bug (https://bugzilla.kernel.org/show_bug.cgi?id=200141), but only when I put the kernel parameter acpi_osi=!Darwin.

I am able to solve these two bugs by applying the following patch (I just reverted the part of the faulty commit that causes the bug). But it's probably not a long term solution, because if I understand I have enabled an obsolete configuration which will be removed in the future.

```
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -161,7 +161,7 @@
  * NOTE, this is essentially obsolete and will be removed soon
  * (01/2018).
  */
-ACPI_INIT_GLOBAL(u8, acpi_gbl_group_module_level_code, FALSE);
+ACPI_INIT_GLOBAL(u8, acpi_gbl_group_module_level_code, TRUE);
 
 /*
  * Optionally support module level code by parsing an entire table as
@@ -169,7 +169,7 @@
  * NOTE, this is essentially obsolete and will be removed soon
  * (01/2018).
  */
-ACPI_INIT_GLOBAL(u8, acpi_gbl_execute_tables_as_methods, TRUE);
+ACPI_INIT_GLOBAL(u8, acpi_gbl_execute_tables_as_methods, FALSE);
 
 /*
  * Optionally use 32-bit FADT addresses if and when there is a conflict
```
Comment 33 Erik Kaneda 2018-10-02 19:34:41 UTC
(In reply to Jean-Marc Lenoir from comment #32)
> I have the same bug on my MacBook Pro 9,2. I also have the battery bug
> (https://bugzilla.kernel.org/show_bug.cgi?id=200141), but only when I put
> the kernel parameter acpi_osi=!Darwin.
> 
> I am able to solve these two bugs by applying the following patch (I just
> reverted the part of the faulty commit that causes the bug). But it's
> probably not a long term solution, because if I understand I have enabled an
> obsolete configuration which will be removed in the future.

Thanks for this information and yes, we are going to get rid of this sometime next year. We should get to the bottom of this before we do. Could you post your dmesg with and without the changes below?

> 
> ```
> --- a/include/acpi/acpixf.h
> +++ b/include/acpi/acpixf.h
> @@ -161,7 +161,7 @@
>   * NOTE, this is essentially obsolete and will be removed soon
>   * (01/2018).
>   */
> -ACPI_INIT_GLOBAL(u8, acpi_gbl_group_module_level_code, FALSE);
> +ACPI_INIT_GLOBAL(u8, acpi_gbl_group_module_level_code, TRUE);
>  
>  /*
>   * Optionally support module level code by parsing an entire table as
> @@ -169,7 +169,7 @@
>   * NOTE, this is essentially obsolete and will be removed soon
>   * (01/2018).
>   */
> -ACPI_INIT_GLOBAL(u8, acpi_gbl_execute_tables_as_methods, TRUE);
> +ACPI_INIT_GLOBAL(u8, acpi_gbl_execute_tables_as_methods, FALSE);
>  
>  /*
>   * Optionally use 32-bit FADT addresses if and when there is a conflict
> ```
Comment 34 Jean-Marc Lenoir 2018-10-03 19:03:27 UTC
Created attachment 278909 [details]
dmesg without my patch

I don't know if it is useful for you, but I've suspended by closing lid and resumed (manually) my laptop just before saving dmesg.
Comment 35 Jean-Marc Lenoir 2018-10-03 19:06:55 UTC
Created attachment 278911 [details]
dmesg with my patch

I've also suspended (by closing lid) and resumed (by opening lid) my laptop before saving dmesg.
Comment 36 Erik Kaneda 2018-10-04 17:39:29 UTC
Thanks for doing this. I think I found the problem.

The region initialization isn't adding the address range properly. This causes the warning below to NOT appear which might be causing your issue. I'll work on a fix and post a patch (hopefully) in a few hours.

ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000400-0x000000000000047F (\PMIO) (20180531/utaddress-213)
ACPI: If an ACPI driver is available for this device, you should use it
Comment 37 Erik Kaneda 2018-10-04 20:42:08 UTC
Created attachment 278923 [details]
Add operation region addresses to the global address list

Please give this a try
Comment 38 Jean-Marc Lenoir 2018-10-04 22:01:25 UTC
Created attachment 278925 [details]
dmesg with es-add-address patch

Now I can see the ACPI warning, but unfortunately it is not sufficient to solve the bug.

I'm posting my dmesg with your patch.
Comment 39 Erik Kaneda 2018-10-05 19:10:26 UTC
(In reply to Jean-Marc Lenoir from comment #38)
> Created attachment 278925 [details]
> dmesg with es-add-address patch
> 
> Now I can see the ACPI warning, but unfortunately it is not sufficient to
> solve the bug.

Thanks for testing, I think we'll include something like this in upstream anyway.

> 
> I'm posting my dmesg with your patch.
Comment 40 Erik Kaneda 2018-10-05 19:13:39 UTC
Created attachment 278935 [details]
Stop clearing GPE's when

Does your kernel version have this patch? If so, could to try this patch in addition to the one I posted?
Comment 41 Jean-Marc Lenoir 2018-10-05 19:27:43 UTC
Yes, I already have this patch in my kernel.
Comment 42 Erik Kaneda 2018-10-05 21:49:40 UTC
Ok. What happens when you remove the acpi_osi=!Darwin from the boot parameters with the patch from comment 37?
Comment 43 Jean-Marc Lenoir 2018-10-06 06:28:31 UTC
Exactly the same thing than without the patch: the LID0 bug is still present, but the battery bug (reported as absent) is fixed.

For information, I've just tested on Linux 4.19-rc6 and there is exactly the same result than my current kernel.
Comment 44 Erik Kaneda 2018-10-08 15:41:07 UTC
(In reply to Jean-Marc Lenoir from comment #43)
> Exactly the same thing than without the patch: the LID0 bug is still
> present, but the battery bug (reported as absent) is fixed.

How about the message regarding Ac Adapter (ADP1)? Is ADP1 on-line?

> 
> For information, I've just tested on Linux 4.19-rc6 and there is exactly the
> same result than my current kernel.
Comment 45 Erik Kaneda 2018-10-08 16:24:17 UTC
Created attachment 278959 [details]
Bob's _REG patch

Could you try this patch on top of the patch from comment 37 without any OSI boot parameters?
Comment 46 Erik Kaneda 2018-10-08 17:39:29 UTC
@Jean-Marc, can you post your acpidump? I think you might have different firmware than the original poster...
Comment 47 Jean-Marc Lenoir 2018-10-08 19:22:28 UTC
> How about the message regarding Ac Adapter (ADP1)? Is ADP1 on-line?

I assume ADP1 is on-line if /sys/class/power_supply/ADP1/online contains 1. It's interesting, because when I boot with the adapter connected, this "file" contains 0. If I unplug and re-plug the adapter , it becomes 1.

If I use the kernel with my patch to revert the faulty commit, /sys/class/power_supply/ADP1/online contains 1 without the need to unplug and re-plug the adapter.
Comment 48 Jean-Marc Lenoir 2018-10-08 19:28:44 UTC
Created attachment 278961 [details]
dmesg with both patches

With both patches, it doesn't work better.

I'm posting my dmesg with these patches and without OSI boot parameters.
Comment 49 Jean-Marc Lenoir 2018-10-08 19:41:28 UTC
Created attachment 278963 [details]
acpidump

My acpidump, taken from the kernel with both patches and without OSI boot parameters (I don't know if it matters).
Comment 50 Erik Kaneda 2018-10-08 20:33:25 UTC
(In reply to Jean-Marc Lenoir from comment #47)
> > How about the message regarding Ac Adapter (ADP1)? Is ADP1 on-line?
> 
> I assume ADP1 is on-line if /sys/class/power_supply/ADP1/online contains 1.
> It's interesting, because when I boot with the adapter connected, this
> "file" contains 0. If I unplug and re-plug the adapter , it becomes 1.
> 
> If I use the kernel with my patch to revert the faulty commit,
> /sys/class/power_supply/ADP1/online contains 1 without the need to unplug
> and re-plug the adapter.

Ok. So this might be the next step. Your dmesg from comment 48 contains the line 

[    7.763316] ACPI: AC Adapter [ADP1] (off-line)
[    7.779700] apple_gmux: gmux device not present

but your dmesg with your patch contains

[   10.219707] ACPI: AC Adapter [ADP1] (on-line)
[   10.252324] apple_gmux: gmux device not present

So the next step would be to get these two messages to display the same. I'll take a look at this.

Thanks for your acpidump. It matches the other acpidump files posted to this bugzilla.
Comment 51 Erik Kaneda 2018-10-08 22:28:17 UTC
I'll get a macbook pro tomorrow so I will (hopefully) be able to debug from my end. Stay tuned...
Comment 52 Erik Kaneda 2018-10-09 22:35:58 UTC
Created attachment 278975 [details]
Load table at the same time regardless of module-level code option

Try this one. I found that the table load occurs at a different place in the code depending on the acpi_gbl_execute_tables_as_methods variable. This could be affecting the EC (which is tied to AC adapter and LID0)...
Comment 53 Zhang Rui 2018-10-10 02:54:03 UTC
            Method (_PSR, 0, NotSerialized)  // _PSR: Power Source
            {
                Return (PWRS) /* \PWRS */
            }

so it is PWRS returns different values, causing AC Adapter to be online vs. offline. PWRS is set in three places
1. EC _REG method
2. EC _Q21 method
3. _WAK method
and PWRS should always equal \_SB_.PCI0.LPCB.EC__.EACP according to the ASL.

So there are two possibilities that PWRS could return different value,
the first one is that OS behaves differently to evaluate _REG/_Q21/_WAK during boot, the second one is that EACP has different values.

I'd suggest we verify the second one first, I will make a custom dsdt, which makes AC _PSR always return \_SB_.PCI0.LPCB.EC__.EACP directly.


BTW, I also don't see why module level code matters here. @Erik, is there anyway that I can filter out the non-module-level-code from the dsdt.dsl, or else we don't know what module level code does on this platform.
Comment 54 Zhang Rui 2018-10-10 02:57:38 UTC
Created attachment 278977 [details]
custom dsdt to return EACP for AC status

please follow step 5,6,7 in the below page to override the dsdt and check if the problem still exists.
https://01.org/linux-acpi/documentation/overriding-dsdt
Comment 55 Erik Kaneda 2018-10-10 16:25:12 UTC
(In reply to Zhang Rui from comment #53)
>             Method (_PSR, 0, NotSerialized)  // _PSR: Power Source
>             {
>                 Return (PWRS) /* \PWRS */
>             }
> 
> so it is PWRS returns different values, causing AC Adapter to be online vs.
> offline. PWRS is set in three places
> 1. EC _REG method
> 2. EC _Q21 method
> 3. _WAK method
> and PWRS should always equal \_SB_.PCI0.LPCB.EC__.EACP according to the ASL.
> 
> So there are two possibilities that PWRS could return different value,
> the first one is that OS behaves differently to evaluate _REG/_Q21/_WAK
> during boot, the second one is that EACP has different values.
> 
> I'd suggest we verify the second one first, I will make a custom dsdt, which
> makes AC _PSR always return \_SB_.PCI0.LPCB.EC__.EACP directly.
> 
> 

Hi Rui,

> BTW, I also don't see why module level code matters here.

The reporter says he can fix the LID0 issue by reverting to the old module-level code or to flip the flags stated in comment 32. One thing that is different between old module-level code support and new module-level code is the load time of the ECDT so comment 52 alters the ECDT load order to test if there are differences.

> @Erik, is there
> anyway that I can filter out the non-module-level-code from the dsdt.dsl, or
> else we don't know what module level code does on this platform.

We don't currently have a way but I think I can write something. This would be valuable to have in the future anyway. I've glanced at the reporter's acpi tables and I didn't see any module-level code...
Comment 56 Jean-Marc Lenoir 2018-10-11 23:26:55 UTC
Great! The patch comment 52 fixes the two bugs (LID0 and battery status) for me, with and without the acpi_osi=!Darwin parameter.
Comment 57 Erik Kaneda 2018-10-12 00:00:13 UTC
(In reply to Jean-Marc Lenoir from comment #56)
> Great! The patch comment 52 fixes the two bugs (LID0 and battery status) for
> me, with and without the acpi_osi=!Darwin parameter.

Just for reference, could you post the dmesg with this patch?
Comment 58 Jean-Marc Lenoir 2018-10-12 05:49:32 UTC
Created attachment 278999 [details]
dmesg with patch comment 52
Comment 59 Erik Kaneda 2018-10-12 17:56:51 UTC
As I suspected, comment 52 patch is the wrong order of loading the ACPI tables and ECDT. It goes directly against the spec but somehow it works... It's likely that this apple platform may not be spec compliant. Unfortunately, this patch cannot go upstream because it goes against the spec (ECDT should be loaded before other ACPI tables).. However, I gave you credit on testing the patch in comment 37 and that is going upstream.

From what you say, comment 52 fixes your issue so I am tempted to close this bug. However, Rui's suggestion in comment 54 (in addition to comment 37 and removing the osi flag from the boot parameters) might be a better approach. I'll let you decide if you want to take this approach. If you're fine with the patch in comment 52, let us know and I'll close the bug.

Thanks for your effort in testing. I think it was a productive bug report.
Comment 60 Jean-Marc Lenoir 2018-10-14 19:46:16 UTC
Ok. I read some informations about Rui's approach this week-end and I think it's a more elegant solution. In addition, with my distribution I am able to change the DSDT without recompiling the kernel (using these instructions: https://wiki.archlinux.org/index.php/DSDT#Using_a_CPIO_archive). So I would like to use this approach.

I tried to compile the kernel including the DSDT.hex file but it failed with the error:
```
drivers/acpi/tables.c: In function 'acpi_os_table_override':
drivers/acpi/tables.c:727:44: error: 'AmlCode' undeclared (first use in this function)
   *new_table = (struct acpi_table_header *)AmlCode;
                                            ^~~~~~~
```
Do you have any idea what is wrong? Or, @Zhang Rui, can you send me the corresponding dsdt.dsl file? I will include it with my distribution method, without recompiling the kernel.
Comment 61 Erik Kaneda 2018-10-15 16:33:36 UTC
(In reply to Jean-Marc Lenoir from comment #60)
> Ok. I read some informations about Rui's approach this week-end and I think
> it's a more elegant solution. In addition, with my distribution I am able to
> change the DSDT without recompiling the kernel (using these instructions:
> https://wiki.archlinux.org/index.php/DSDT#Using_a_CPIO_archive). So I would
> like to use this approach.
> 
> I tried to compile the kernel including the DSDT.hex file but it failed with
> the error:
> ```
> drivers/acpi/tables.c: In function 'acpi_os_table_override':
> drivers/acpi/tables.c:727:44: error: 'AmlCode' undeclared (first use in this
> function)
>    *new_table = (struct acpi_table_header *)AmlCode;
>                                             ^~~~~~~
> ```

I think yuo need to look in the DSDT.hex and change the name of the array to "AmlCode".
> Do you have any idea what is wrong? Or, @Zhang Rui, can you send me the
> corresponding dsdt.dsl file? I will include it with my distribution method,
> without recompiling the kernel.
Comment 62 Jean-Marc Lenoir 2018-10-15 21:10:26 UTC
Thanks. Now, /sys/class/power_supply/ADP1/online equals 1 just after reboot, without the need to unplug/re-plug the adapter. But it doesn't fix my battery and LID bugs.
Comment 63 Erik Kaneda 2018-10-17 23:44:33 UTC
(In reply to Jean-Marc Lenoir from comment #62)
> Thanks. Now, /sys/class/power_supply/ADP1/online equals 1 just after reboot,
> without the need to unplug/re-plug the adapter. But it doesn't fix my
> battery and LID bugs.

Ok, I think this means that you might want to go with the patch then...
Comment 64 Erik Kaneda 2018-11-05 16:20:49 UTC
After more investigation, we found errors that are similar to this laptop lid issue so this may not be a mac-specific issue and it is possible that it's an issue with the EC driver.

Hi Jean-Marc, could you try 4.20-rc1?

@Rui, I think this may be an issue with the EC similar to https://bugzilla.kernel.org/show_bug.cgi?id=201351 but I'm not sure if it's the same issue. I've assigned the bug for you to look at. We can't do anything about this bug from ACPICA perspective so the fix may have to come from the EC driver...
Comment 65 Jean-Marc Lenoir 2018-11-05 20:50:01 UTC
Created attachment 279329 [details]
dmesg 4.20_rc1

Linux 4.20-rc1 doesn't fix my issues (lid and battery) but now I can see the warning line in comment 36.

I join my new dmesg.
Comment 66 Zhang Rui 2018-11-27 04:44:10 UTC
So, first of all, please make sure your kernel is built with CONFIG_ACPI_DEBUG=Y, and then boot with kernel parameter "acpi.debug_level=0x420 acpi.debug_layer=0x05", and attach the full dmesg output after boot, for both good and bad kernels.
Comment 67 Zhang Rui 2018-11-27 04:51:02 UTC
(In reply to Zhang Rui from comment #53)
>             Method (_PSR, 0, NotSerialized)  // _PSR: Power Source
>             {
>                 Return (PWRS) /* \PWRS */
>             }
> 
> so it is PWRS returns different values, causing AC Adapter to be online vs.
> offline. PWRS is set in three places
> 1. EC _REG method
> 2. EC _Q21 method
> 3. _WAK method
> and PWRS should always equal \_SB_.PCI0.LPCB.EC__.EACP according to the ASL.
> 
> So there are two possibilities that PWRS could return different value,
> the first one is that OS behaves differently to evaluate _REG/_Q21/_WAK
> during boot, the second one is that EACP has different values.
> 
> I'd suggest we verify the second one first, I will make a custom dsdt, which
> makes AC _PSR always return \_SB_.PCI0.LPCB.EC__.EACP directly.
> 
> 
(In reply to Jean-Marc Lenoir from comment #62)
> Thanks. Now, /sys/class/power_supply/ADP1/online equals 1 just after reboot,
> without the need to unplug/re-plug the adapter. But it doesn't fix my
> battery and LID bugs.

So it means that EACP always returns the correct value.
So we are either missing _REG evaluation, or missing the EC _Q21 event.
Comments #66 will show when the EC _REG will be evaluated, and please do attach the test result.
Comment 68 Jean-Marc Lenoir 2018-11-27 20:59:39 UTC
Created attachment 279681 [details]
acpi.debug_level=0x420 acpi.debug_layer=0x05 on good kernel
Comment 69 Jean-Marc Lenoir 2018-11-27 21:00:30 UTC
Created attachment 279683 [details]
acpi.debug_level=0x420 acpi.debug_layer=0x05 on bad kernel
Comment 70 Zhang Rui 2018-11-28 06:50:21 UTC
for good kernel
[    0.546501] ACPI: EC: EC started
[    0.546501] ACPI: EC: interrupt blocked
[    0.546906]     Running _REG methods for SpaceId EmbeddedControl
[    0.547078] Executing  Method       \_SB.PCI0.LPCB.EC._REG
[    0.547115] Initializing Region       \GNVS
[    0.547131] utaddress-0072 ut_add_address_range  : 
               Added [GNVS] address range: 0x000000008AD40C10-0x000000008AD40D96
[    0.547167] Initializing Region       \_SB.PCI0.LPCB.EC.ECOR
[    0.547528] Initializing Region       \_SB.PCI0.IGPU.IGDM
[    0.547549] utaddress-0072 ut_add_address_range  : 
               Added [IGDM] address range: 0x000000008AD15190-0x000000008AD1718F
[    0.547973]     Executed 1 _REG methods for SpaceId EmbeddedControl
[    0.547976] ACPI: \: Used as first EC
[    0.547977] ACPI: \: GPE=0x17, EC_CMD/EC_SC=0x66, EC_DATA=0x62
[    0.547978] ACPI: \: Used as boot ECDT EC to handle transactions
and for bad kernel
[    0.540038] ACPI: EC: EC started
[    0.540038] ACPI: EC: interrupt blocked
[    0.540038]     Running _REG methods for SpaceId EmbeddedControl
[    0.540038]     Executed 0 _REG methods for SpaceId EmbeddedControl
[    0.540038] ACPI: \: Used as first EC
[    0.540038] ACPI: \: GPE=0x17, EC_CMD/EC_SC=0x66, EC_DATA=0x62
[    0.540038] ACPI: \: Used as boot ECDT EC to handle transactions

So we can see that, in the early stage, previously we can invoke EC _REG because the ACPI tables have been loaded, but now we can not.
Problem is that we should re-install the EC address space handler, thus invokes EC _REG,  in acpi_ec_add(), but I didn't find such information in the bad kernel.

Let me look into the code and generate a debug patch later to see what is going wrong in acpi_ec_add().
Comment 71 Zhang Rui 2018-11-29 01:42:49 UTC
Created attachment 279729 [details]
debug patch to force re-install EC address space handler

I think I have found the problem but I'm not sure how to fix it properly yet.
Please try this debug patch on top of newer kernel (the patch is made based on 4.20-rc2) and confirm the problem is gone.
Comment 72 Jean-Marc Lenoir 2018-11-29 21:58:36 UTC
Great! I can confirm this patch works on both 4.20-rc4 and 4.19.5 kernels.
Comment 73 Zhang Rui 2018-12-19 15:02:02 UTC
please check if the patch at https://bugzilla.kernel.org/show_bug.cgi?id=199981#c34 helps or not
Comment 74 Jean-Marc Lenoir 2018-12-19 19:01:51 UTC
This patch fix the problem on Linux 4.20-rc7.
Comment 75 Zhang Rui 2018-12-27 15:01:06 UTC
*** Bug 200141 has been marked as a duplicate of this bug. ***
Comment 76 Zhang Rui 2018-12-27 15:27:02 UTC

*** This bug has been marked as a duplicate of bug 199981 ***
Comment 77 Zhang Rui 2019-01-10 09:45:02 UTC
here is the latest patch,
https://patchwork.kernel.org/patch/10753143/
please apply it on top of 5.0-rc1 and check if it works or not.