Bug 89211 (Surface_Pro_Lid) - Laptop Lid status (/proc/acpi/button/lid/LID0/state) on MS Surface Pro 1 remains on "closed" after first use.
Summary: Laptop Lid status (/proc/acpi/button/lid/LID0/state) on MS Surface Pro 1 rema...
Status: CLOSED CODE_FIX
Alias: Surface_Pro_Lid
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
: 106151 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-12-03 16:35 UTC by GiH
Modified: 2016-08-24 09:31 UTC (History)
10 users (show)

See Also:
Kernel Version: 3.16.0-4-amd64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (71.29 KB, application/octet-stream)
2014-12-03 16:35 UTC, GiH
Details
acpidump.txt (214.62 KB, text/plain)
2014-12-18 10:48 UTC, GiH
Details
dmesg (75.23 KB, text/plain)
2014-12-18 10:49 UTC, GiH
Details
Behaviour of "gpe1E" file (40.98 KB, text/plain)
2015-02-16 09:37 UTC, GiH
Details
the script to log the behaviour (237 bytes, text/plain)
2015-02-16 09:38 UTC, GiH
Details
Required file (71.61 KB, text/plain)
2015-04-01 08:42 UTC, GiH
Details
dmesg double suspending (61.20 KB, application/octet-stream)
2015-09-22 09:27 UTC, GiH
Details
dmesg still suspending (61.81 KB, application/octet-stream)
2015-11-03 12:41 UTC, GiH
Details
Please apply this debug patch on top of ec debugging (484 bytes, application/octet-stream)
2015-11-04 07:51 UTC, Chen Yu
Details
dmesg enhanced debug (317.89 KB, application/octet-stream)
2015-11-04 09:38 UTC, GiH
Details
dmesg enhanced debug (268.26 KB, application/octet-stream)
2015-11-05 09:01 UTC, GiH
Details
dmesg enhanced debug on 4.3.0 wich acpi.trace_xxx params (221.42 KB, application/octet-stream)
2015-11-06 10:41 UTC, GiH
Details
acpid.log (9.77 KB, text/x-log)
2015-11-09 14:53 UTC, GiH
Details
dmesg parallel to acpid.log (84.35 KB, application/octet-stream)
2015-11-09 14:56 UTC, GiH
Details
systemd.log of 6 suspending test from attacments 192511 and 192521 (297.59 KB, text/x-log)
2015-11-10 09:10 UTC, GiH
Details
/etc/acpi/handler.sh (1.79 KB, application/x-shellscript)
2015-11-10 15:45 UTC, GiH
Details
dmesg for comment 78 (78.79 KB, application/octet-stream)
2015-11-10 19:14 UTC, GiH
Details
acpi.log for comment78 (6.95 KB, text/x-log)
2015-11-10 19:14 UTC, GiH
Details
Please apply this patch and without any other patches applied (1.09 KB, application/octet-stream)
2015-11-13 16:22 UTC, Chen Yu
Details
adjust_lid_status_according_to_suspend_resume_debug.diff (1.54 KB, application/octet-stream)
2015-11-14 01:47 UTC, Chen Yu
Details
dmesg after applying attachment 192911 (258.13 KB, application/octet-stream)
2015-11-14 13:26 UTC, GiH
Details
2015-11-15 lid patch to fix lid state and send correct lid state to input layer/netlink (2.71 KB, application/octet-stream)
2015-11-15 07:36 UTC, Chen Yu
Details
dmesg after applying attachment 193001 (483.48 KB, application/octet-stream)
2015-11-15 20:42 UTC, GiH
Details
export_dmi.diff (1.29 KB, application/octet-stream)
2015-11-18 03:20 UTC, Chen Yu
Details
quirk to provide the cached lid state for broken bios (5.42 KB, application/octet-stream)
2016-02-01 12:27 UTC, Chen Yu
Details
dmi quirk for broken lid state (5.42 KB, application/octet-stream)
2016-02-02 08:17 UTC, Chen Yu
Details
please apply this one (6.45 KB, application/octet-stream)
2016-02-02 10:48 UTC, Chen Yu
Details

Description GiH 2014-12-03 16:35:22 UTC
Created attachment 159571 [details]
dmesg

Hi,
I use a MS Surface Pro with Debian Jessie on it. ;-)
At start a status of /proc/acpi/button/lid/LID0/state is "open", but after first use it changes to "closed" and remains on this status until reboot.

I have tested it with Debian Wheezy, Debian Jessie, Fedora 20, Fedora 21 and Ubuntu 14.10. All of them have the same behaviour. /proc/acpi/button/lid/LID0/state remains on "closed".

The most annoying thing is, that the notebook suspends after ca. 20 seconds after wake up. Just reboot helps.

My suggestion is: A bug in button mod. It recognize a closing of lid but not the opening.
Comment 1 Aaron Lu 2014-12-18 08:22:12 UTC
acpidump please:
# acpidump > acpidump.txt
Thanks
Comment 2 GiH 2014-12-18 10:48:06 UTC
Created attachment 161181 [details]
acpidump.txt
Comment 3 GiH 2014-12-18 10:49:13 UTC
Created attachment 161191 [details]
dmesg

new dmesg (just for case)
Comment 4 GiH 2014-12-18 10:53:58 UTC
Hi,
I have broke my old Debian system...

Now i have started Fedora 21 Live and created acpidump.txt and new dmesg (just for case) for you.

Fedora 21 Live have a different kernel: 3.17.4-301.fc21.x86_64
The behaviour is the same.

Thanks
Comment 5 Aaron Lu 2015-02-15 05:57:17 UTC
Does the below file content change when you close/open the LID?
/sys/firmware/acpi/interrupts/gpe1E

The content is a number and should be at least incremented by 1 with every open or close of the LID.
Comment 6 GiH 2015-02-16 09:37:02 UTC
Created attachment 167041 [details]
Behaviour of "gpe1E" file

hard to say... It changes also without my doing.
I have wrote a simple bash script to log the behaviour. Here is this log file. I have closed and opened the lid every 10 seconds.
Comment 7 GiH 2015-02-16 09:38:45 UTC
Created attachment 167051 [details]
the script to log the behaviour

here is my script to log the behaviour of gpe1E and lid state files.
Comment 8 Chen Yu 2015-03-25 12:23:35 UTC
Seems that there is a dsdt bug on surface pro.

GiH,
could you please test the following steps:

1. # cd /sys/kernel/debug/dynamic_debug

2. 
# cat control | grep acpi_ec_run
drivers/acpi/ec.c:628 [acpi]acpi_ec_run =p "##### Query(0x%02x) stopped #####\012"
drivers/acpi/ec.c:623 [acpi]acpi_ec_run =p "##### Query(0x%02x) started #####\012"

3. file name and line number should be the same as step 2

# echo -n 'file drivers/acpi/ec.c line 628 +p' > control
# echo -n 'file drivers/acpi/ec.c line 623 +p' > control


4. test lid close and open.

5. dmesg > ec_quecy.log, please attach this file
Comment 9 Aaron Lu 2015-03-31 03:21:38 UTC
Ping
Comment 10 GiH 2015-03-31 18:33:01 UTC
Sorry guys, give me a day or two. I will definetly try it. Have no time now...
Comment 11 GiH 2015-04-01 08:42:00 UTC
Created attachment 172901 [details]
Required file

Hi,
I have done what you want in Fedora 21 (live pendrive):

# uname -a
Linux localhost 3.17.4-301.fc21.x86_64 #1 SMP Thu Nov 27 19:09:10 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# cd /sys/kernel/debug/dynamic_debug

# cat control | grep acpi_ec_run
drivers/acpi/ec.c:596 [acpi]acpi_ec_run =_ "stop query execution\012"
drivers/acpi/ec.c:591 [acpi]acpi_ec_run =_ "start query execution\012"

# echo -n 'file drivers/acpi/ec.c line 596 +p' > control
# echo -n 'file drivers/acpi/ec.c line 591 +p' > control

# cat control | grep acpi_ec_run
drivers/acpi/ec.c:596 [acpi]acpi_ec_run =p "stop query execution\012"
drivers/acpi/ec.c:591 [acpi]acpi_ec_run =p "start query execution\012"

# cd /home/liveuser

closed and opened the lid.

# dmesg > ec_quecy.log

Best reggards
Comment 12 Chen Yu 2015-04-01 09:01:38 UTC
(In reply to GiH from comment #11)
> Created attachment 172901 [details]
> Required file
> 
> Hi,
> I have done what you want in Fedora 21 (live pendrive):
> 
> # uname -a
> Linux localhost 3.17.4-301.fc21.x86_64 #1 SMP Thu Nov 27 19:09:10 UTC 2014
> x86_64 x86_64 x86_64 GNU/Linux
> 
> # cd /sys/kernel/debug/dynamic_debug
> 
> # cat control | grep acpi_ec_run
> drivers/acpi/ec.c:596 [acpi]acpi_ec_run =_ "stop query execution\012"
> drivers/acpi/ec.c:591 [acpi]acpi_ec_run =_ "start query execution\012"
> 
> # echo -n 'file drivers/acpi/ec.c line 596 +p' > control
> # echo -n 'file drivers/acpi/ec.c line 591 +p' > control
> 
> # cat control | grep acpi_ec_run
> drivers/acpi/ec.c:596 [acpi]acpi_ec_run =p "stop query execution\012"
> drivers/acpi/ec.c:591 [acpi]acpi_ec_run =p "start query execution\012"
> 
> # cd /home/liveuser
> 
> closed and opened the lid.
> 
> # dmesg > ec_quecy.log
> 
> Best reggards

Thanks.

According to log there is only one ec query method invoked before s3, so I think this issue is confirmed to be a DSDT _LID method bug.

However, I think kernel should handle this situation? A workaround might be needed..
Comment 13 GiH 2015-04-01 09:32:09 UTC
I am looking at you with hope. ;)
Comment 14 GiH 2015-06-05 11:15:51 UTC
Ping
Comment 15 Chen Yu 2015-07-29 05:18:31 UTC
Hi,
will look at this after https://bugzilla.kernel.org/show_bug.cgi?id=84651 solved.
Yu
Comment 16 Aaron Lu 2015-08-20 08:42:45 UTC
(In reply to Chen Yu from comment #15)
> Hi,
> will look at this after https://bugzilla.kernel.org/show_bug.cgi?id=84651
> solved.

It doesn't seem the two bugs are related?
BTW, do you see a possible workaround for this firmware bug?
Comment 17 Chen Yu 2015-08-20 08:50:19 UTC
(In reply to Aaron Lu from comment #16)
> (In reply to Chen Yu from comment #15)
> > Hi,
> > will look at this after https://bugzilla.kernel.org/show_bug.cgi?id=84651
> > solved.
> 
> It doesn't seem the two bugs are related?
> BTW, do you see a possible workaround for this firmware bug?

No they are not related actually, I was just lacking of bandwidth at that time :)
I think for the workaround, a compensatory invoking of EC query should be carried out, according to LV's suggestion.
Comment 18 Aaron Lu 2015-08-21 05:58:11 UTC
GiH,
Please do not enable suspend on LID close, and then close/open the LID to see if the LID status changes.
Comment 19 GiH 2015-08-25 14:44:38 UTC
Hi,
I have tested it in Fedora 21 (Kernel: 3.17.4-301.fc21.x86_64).
I have bloccked the suspending with:

# systemd-inhibit --what=handle-lid-switch --who="Lid-Close-Test" --why="Test" --mode=block watch -n 1 cat /proc/acpi/button/lid/LID0/state

and then closed the lid. The status changed from "open" to "closed" and doesn't changed back. The same behavior.
Comment 20 Aaron Lu 2015-08-26 03:08:35 UTC
(In reply to Chen Yu from comment #17)
> (In reply to Aaron Lu from comment #16)
> > (In reply to Chen Yu from comment #15)
> > > Hi,
> > > will look at this after https://bugzilla.kernel.org/show_bug.cgi?id=84651
> > > solved.
> > 
> > It doesn't seem the two bugs are related?
> > BTW, do you see a possible workaround for this firmware bug?
> 
> No they are not related actually, I was just lacking of bandwidth at that
> time :)
> I think for the workaround, a compensatory invoking of EC query should be
> carried out, according to LV's suggestion.

It doesn't seem we can somehow know when to carry out the EC query, and which query to carry out. This looks more like an EC bug in that EC firmware doesn't emit events when the LID change its state from close to open.
Comment 21 GiH 2015-08-26 09:18:25 UTC
I can slightly remember that Windows acts similar. It also not reacts on open lid, i need press some button on keyboard or a power button. But it's not going back to sleep in the next 10 seconds. But Linux does. And this is annoying. Can you somehow reset a status of lid on some other action (keyboard key, power key, sensor panel pressed) is performed? Probably with combination of light sensor...
Comment 22 Zvi "CtrlZvi" Effron 2015-08-26 21:44:13 UTC
I'm running F22 with a slightly modified kernel (4.1.5 + https://bugzilla.kernel.org/attachment.cgi?id=171281 + http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/94430). I see the status going back to open after the lid opens back up. So something is working better there.

But I do still experience the system going back to sleep after 10 or so seconds. After waking it up a second time, it stays awake. This happens whether it was put to sleep by the lid or not, though. So maybe it's not related?
Comment 23 Aaron Lu 2015-08-27 02:43:26 UTC
(In reply to GiH from comment #21)
> I can slightly remember that Windows acts similar. It also not reacts on
> open lid, i need press some button on keyboard or a power button. But it's
> not going back to sleep in the next 10 seconds. But Linux does. And this is

I guess that is due to some user space(perhaps systemd?) finds that the LID is closed so it initiates the suspend. But yes, the LID status is wrong in the first place.

> annoying. Can you somehow reset a status of lid on some other action
> (keyboard key, power key, sensor panel pressed) is performed? Probably with
> combination of light sensor...

Maybe we can, but it would be better for us to know why Windows doesn't have this issue(I mean why LID status changed back to open under Windows) before we add these quirks into kernel code.
Comment 24 Aaron Lu 2015-08-27 02:47:25 UTC
(In reply to Zvi "CtrlZvi" Effron from comment #22)
> I'm running F22 with a slightly modified kernel (4.1.5 +
> https://bugzilla.kernel.org/attachment.cgi?id=171281 +
> http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/
> 94430). I see the status going back to open after the lid opens back up. So
> something is working better there.

Good to know this, thanks.

GiH, maybe you should try a v4.1.x kernel?

> 
> But I do still experience the system going back to sleep after 10 or so
> seconds. After waking it up a second time, it stays awake. This happens
> whether it was put to sleep by the lid or not, though. So maybe it's not
> related?

Interesting, we need to find out why it went to sleep again after it was resumed the 1st time. I wonder if there is any log recording why the computer goes to sleep?
Comment 25 GiH 2015-08-27 10:17:48 UTC
Already done. ArchLinux kernel 4.1.6-ARCH. The same thing.
Comment 26 Zvi "CtrlZvi" Effron 2015-08-27 17:21:52 UTC
GiH,

I was just rereading this and realized I don't think you and I are talking about the same device. I ran my test on a Surface Pro 3. And it looks like you're talking about a Surface Pro 1?
Comment 27 GiH 2015-08-27 18:28:30 UTC
That was exactly what I wanted to ask you but in opposite way... Yes, I have a Surface Pro 1.
Comment 28 Lv Zheng 2015-08-28 01:00:58 UTC
Hi, GiH

Have you tried latest upstream linus tree kernels?
Where there syould be some fixes related to the EC event.

Thanks and best regards
-Lv
Comment 29 GiH 2015-08-28 05:50:11 UTC
Hi, I have compiled and tried the 4.2.0-rc7 yesterday. The same thing. I will try the 4.2.0-rc8 today. Or do you mean something else?
Comment 30 Lv Zheng 2015-08-28 07:02:45 UTC
No need to try 4.2.0-rc8. :)

Thanks
-Lv
Comment 31 GiH 2015-09-22 09:25:01 UTC
(In reply to Zvi "CtrlZvi" Effron from comment #22)
> But I do still experience the system going back to sleep after 10 or so
> seconds. After waking it up a second time, it stays awake. This happens
> whether it was put to sleep by the lid or not, though. So maybe it's not
> related?

I can confirm this behavior under Gnome on my Surface Pro 1 ArchLinux 4.2.0 Kernel Patched from here https://bugzilla.kernel.org/show_bug.cgi?id=69661

I have tested this behavior under Cinnamon. Cinnamon wake up correctly - just one time. But there is desktop not locked so i dont need to type my password after waking up the system.

Probably this is not related to the topic, but i wanted to confirm the behavior.
Comment 32 GiH 2015-09-22 09:27:04 UTC
Created attachment 188071 [details]
dmesg double suspending

I putted the empty lines before first and second suspendings. Just for case...
Comment 33 Chen Yu 2015-11-02 08:07:10 UTC
According to the log:
[    1.377076] ACPI : EC: GPE = 0x1e, I/O: command/status = 0x66, data = 0x62
EC is using GPE 0x1e,
can you please help to confirm if this value has changed before/after suspend?
(EC sci should increase by 1 when LID is opened everytime) :
provide: 
# grep . /sys/firmware/acpi/interrupts/gpe*

before/after lid is opened.
Thank you!
Comment 34 Chen Yu 2015-11-02 08:11:41 UTC
BTW, please use the latest kernel.
Comment 35 GiH 2015-11-02 09:30:59 UTC
Hi,
i'm using a patched Kernel 4.2.3 now. Due a Bug with wifi on Surface. See https://bugzilla.kernel.org/show_bug.cgi?id=69661.

Test 1:
I have started:
# watch -n 0,1 grep . /sys/firmware/acpi/interrupts/gpe1E
On close the lid the counter increases by 3.
On open nothing happens.


Test 2:
I have started:
# watch -n 0,1 grep . /sys/firmware/acpi/interrupts/gpe_all
The same behaviour.

I will try it again with new 4.3 Kernel after updating a system.
Comment 36 Chen Yu 2015-11-02 09:39:22 UTC
So no EC sci is not triggered after lid opened..
How about waiting for 20 seconds after you open the lid? I saw
"the notebook suspends after ca. 20 seconds after wake up"
you can disable the reboot by replacing /sbin/reboot with a simple print script.
Comment 37 GiH 2015-11-02 10:00:12 UTC
Kernel is the same 4.2.3
Test3:

# watch -n 0,1 grep . /sys/firmware/acpi/interrupts/gpe1E

Time(min:sec)    gpe1E_counter    my_action
49:30            1081             Closed the lid
49:33            1084             -
49:50            1112             -
50:30            1112             Opened the lid
51:30            1112             -

By the way, the system is not suspends because I disabled the reaktion on "lid close" in /etc/systemd/logind.conf

|   HandleLidSwitch=ignore

and also in Power Settings of Gnome.
Comment 38 Chen Yu 2015-11-02 10:35:13 UTC
So it appears to me that the latest kernel still encounters this problem:
if lid is controlled by an EC, and lid will lost his EC notification if 
EC is used as a wake up S3 resource, because the wake up process is not related 
to general irq system at all, thus no SCI handler will be invoked, no EC query will be invoked.

To fix this, either we should ask Windows to update their _LID implementation to return a actual status of lid, or linux adds a quirk to leverage EC to execute a compensatory EC query, once resumed from S3.

Lv, what do you think?

yu
Comment 39 GiH 2015-11-02 12:24:31 UTC
I think the MS will do nothing. They concentrating on push new Surface on the market. They have already 4th one.

I have already mentioned in 21 post, that Windows also not reacting on opening lid. It still suspended. Its reacts first on press of keyboard, power button or sensor Windows logo just below display.

This behavior is ok for me. We just need to reset a lid status after wake up. So I am definitely for a quirk.

With best regards.
GiH
Comment 40 Chen Yu 2015-11-02 13:50:03 UTC
Hi, Gih, I got some questions about #Comment 21:
On windows, do you mean, if you close the lid, the system will be suspended, and if you  open the lid, the system will not be woken up, unless you press keyboard or power button?
And for linux, if the lid is closed, the system will be suspended to S3, and if you open the lid again, the system will be woken up, but with lid status 'closed'(only with system running you can check lid status..), and after a while(20 second), the system falls asleep again because of incorrect lid status?

yu
Comment 41 GiH 2015-11-02 14:09:50 UTC
(In reply to Chen Yu from comment #40)
> On windows, do you mean, if you close the lid, the system will be suspended,
> and if you  open the lid, the system will not be woken up, unless you press
> keyboard or power button?
Yes.

> And for linux, if the lid is closed, the system will be suspended to S3, and
> if you open the lid again, the system will be woken up, 
No. I also need to press some button to wake up the system. I believe strongly this is some hardware solution.

> but with lid status
> 'closed'(only with system running you can check lid status..), and after a
> while(20 second), the system falls asleep again because of incorrect lid
> status?
Yes.
Comment 42 Aaron Lu 2015-11-03 07:38:52 UTC
(In reply to GiH from comment #37)
> Kernel is the same 4.2.3
> Test3:
> 
> # watch -n 0,1 grep . /sys/firmware/acpi/interrupts/gpe1E
> 
> Time(min:sec)    gpe1E_counter    my_action
> 49:30            1081             Closed the lid
> 49:33            1084             -
> 49:50            1112             -
> 50:30            1112             Opened the lid
> 51:30            1112             -

Looks like _Q37, which is used to update the LID status to "open", never comes...
Comment 43 Chen Yu 2015-11-03 09:28:47 UTC
(In reply to Aaron Lu from comment #42)
> (In reply to GiH from comment #37)
> > Kernel is the same 4.2.3
> > Test3:
> > 
> > # watch -n 0,1 grep . /sys/firmware/acpi/interrupts/gpe1E
> > 
> > Time(min:sec)    gpe1E_counter    my_action
> > 49:30            1081             Closed the lid
> > 49:33            1084             -
> > 49:50            1112             -
> > 50:30            1112             Opened the lid
> > 51:30            1112             -
> 
> Looks like _Q37, which is used to update the LID status to "open", never
> comes...

Yes,seems that openning the LID will not trigger any interrupt when suspended.
Comment 44 Chen Yu 2015-11-03 09:29:44 UTC
Lv suggests reading the LID status by a cached value, named button->cached_status, which is maintained by:

1. acpi_button_suspend and acpi_button_resume
   cached_status is set to 'closed' when acpi_button_suspend , and set to 
   'open' when acpi_button_resume.

2. acpi_button_notify
   When code comes to acpi_button_notify, it means that, the EC query has been  
   evaluated, at this time the _LID result is correct, and we need to update the cached_status to _LID. 

please try https://bugzilla.kernel.org/attachment.cgi?id=191911 if it works.
thanks
Yu
Comment 45 GiH 2015-11-03 12:41:26 UTC
Created attachment 191921 [details]
dmesg still suspending

I have tried. Added your patch. Switched the systemd and gnome to behave as normal.

Now the lid status changes back to "open". Hurray!
But system still suspending every 10-20 seconds after resume. Argh!
Comment 46 Aaron Lu 2015-11-04 00:59:22 UTC
(In reply to Chen Yu from comment #43)
> (In reply to Aaron Lu from comment #42)
> > (In reply to GiH from comment #37)
> > > Kernel is the same 4.2.3
> > > Test3:
> > > 
> > > # watch -n 0,1 grep . /sys/firmware/acpi/interrupts/gpe1E
> > > 
> > > Time(min:sec)    gpe1E_counter    my_action
> > > 49:30            1081             Closed the lid
> > > 49:33            1084             -
> > > 49:50            1112             -
> > > 50:30            1112             Opened the lid
> > > 51:30            1112             -
> > 
> > Looks like _Q37, which is used to update the LID status to "open", never
> > comes...
> 
> Yes,seems that openning the LID will not trigger any interrupt when
> suspended.

Noth that GiH has disabled suspend when LID closed with:
HandleLidSwitch=ignore
So there is no suspend involved here.
Comment 47 Lv Zheng 2015-11-04 06:11:17 UTC
(In reply to Chen Yu from comment #38)
> So it appears to me that the latest kernel still encounters this problem:
> if lid is controlled by an EC, and lid will lost his EC notification if 
> EC is used as a wake up S3 resource, because the wake up process is not
> related 
> to general irq system at all, thus no SCI handler will be invoked, no EC
> query will be invoked.

I'm not sure if the test is correct.
Will gpe1E counter be increased after acpi_enable_all_wakeup_gpes() invoked?

> 
> To fix this, either we should ask Windows to update their _LID
> implementation to return a actual status of lid, or linux adds a quirk to
> leverage EC to execute a compensatory EC query, once resumed from S3.

I don't think so.
Linux simple won't know which kind of compensatory EC query should be executed...
Is the system woken by "power button pressed" event? Or by a "LID open" event?

Thanks and best regards
-Lv
Comment 48 Lv Zheng 2015-11-04 06:50:08 UTC
(In reply to GiH from comment #45)
> Created attachment 191921 [details]
> dmesg still suspending
> 
> I have tried. Added your patch. Switched the systemd and gnome to behave as
> normal.
> 
> Now the lid status changes back to "open". Hurray!
> But system still suspending every 10-20 seconds after resume. Argh!

So what do you mean?
Do you mean this is not a problem related to the default LID state after resuming. The bug is talking about some "freezed" EC event.
So what's the behavior the firmware expects on this? Without the detailed log, we cannot infer.

Could you apply attachment 191911 [details] and enable EC debug to capature the "suspending again" dmesg output for us to learn.

In order to enable the EC debugging, you can uncomment the following line from drivers/acpi/ec.c
   /* #define DEBUG */

Thanks in advance.
Comment 49 Lv Zheng 2015-11-04 06:56:03 UTC
You should be able to capture the dmesg output of the "1st suspending again" process after the 2nd resuming.
Hope you can also leave some hints in the log for us to learn the timings of your user space actions (if any).

Thanks and best regards.
Comment 50 Chen Yu 2015-11-04 07:13:07 UTC
(In reply to Aaron Lu from comment #46)
> (In reply to Chen Yu from comment #43)
> > (In reply to Aaron Lu from comment #42)
> > > (In reply to GiH from comment #37)
> > > > Kernel is the same 4.2.3
> > > > Test3:
> > > > 
> > > > # watch -n 0,1 grep . /sys/firmware/acpi/interrupts/gpe1E
> > > > 
> > > > Time(min:sec)    gpe1E_counter    my_action
> > > > 49:30            1081             Closed the lid
> > > > 49:33            1084             -
> > > > 49:50            1112             -
> > > > 50:30            1112             Opened the lid
> > > > 51:30            1112             -
> > > 
> > > Looks like _Q37, which is used to update the LID status to "open", never
> > > comes...
> > 
> > Yes,seems that openning the LID will not trigger any interrupt when
> > suspended.
> 
> Noth that GiH has disabled suspend when LID closed with:
> HandleLidSwitch=ignore
> So there is no suspend involved here.

Ah, right. So even if the system is normally running, the action to open a lid will be trigger any interrupt.
Comment 51 GiH 2015-11-04 07:37:40 UTC
(In reply to Lv Zheng from comment #48)
> (In reply to GiH from comment #45)
> > Created attachment 191921 [details]
> > dmesg still suspending
> > 
> > I have tried. Added your patch. Switched the systemd and gnome to behave as
> > normal.
> > 
> > Now the lid status changes back to "open". Hurray!
> > But system still suspending every 10-20 seconds after resume. Argh!
> 
> So what do you mean?
> Do you mean this is not a problem related to the default LID state after
> resuming. The bug is talking about some "freezed" EC event.
> So what's the behavior the firmware expects on this? Without the detailed
> log, we cannot infer.

No.

> 
> Could you apply attachment 191911 [details] and enable EC debug to capature
> the "suspending again" dmesg output for us to learn.

My Comment 45 is already after applying attachment 191911 [details].
I have applied the patch to button.c file, recompiled and reinstalled the modules. Enabled a suspending by commenting "HandleLidSwitch" line in /etc/systemd/logind.conf and enabled suspending in Gnome. In other words returned to default settings. Restarted the system and tried to close the lid.

Posted dmesg in attachment 191921 [details] (see commment 45)

Also comments in the last dmesg are added. Search for:

<<<<<<<<<<<<<<<<<<<< 1st SUSPEND >>>>>>>>>>>>>>>>>>>>

and

<<<<<<<<<<<<<<<<<<<< 2nd SUSPEND >>>>>>>>>>>>>>>>>>>>

> 
> In order to enable the EC debugging, you can uncomment the following line
> from drivers/acpi/ec.c
>    /* #define DEBUG */

I can try this today if you still need it.
Comment 52 Chen Yu 2015-11-04 07:43:56 UTC
Hi, Gih,
thanks for your work,

If I understand correctly, current status can be summarised as:
With default setting(suspend when lid is closed)

1. The closing of lid will trigger a sci and a EC query37 will change the LID status to closed, then system falls to alseep.

2. The openning of lid will not trigger any sci and LID status remains closed, after woken up by other device.

3. Lv's patch add hooks in LID's suspend/resume path, which will update the lid_button's status on time, and sysfs will read the new lid_button's status rather than _LID directly.

4.after 10 seonds, the system falls asleep again, but since #Comment 37 shows that, after lid opened, there is no GPE sci triggered for 1 minute, it is unlikely an extra lid open event triggers the suspending.

 In order to find out why it suspends again, please test with 
/* #define DEBUG */
 and plus please try this debug patch attached. 

Yu
Comment 53 Chen Yu 2015-11-04 07:51:50 UTC
Created attachment 192051 [details]
Please apply this debug patch on top of ec debugging
Comment 54 Chen Yu 2015-11-04 07:52:56 UTC
BTW, also with https://bugzilla.kernel.org/attachment.cgi?id=191921 applied
Comment 55 GiH 2015-11-04 08:00:57 UTC
(In reply to Chen Yu from comment #52)
> Hi, Gih,
> thanks for your work,
> 
> If I understand correctly, current status can be summarised as:
> With default setting(suspend when lid is closed)
> 
> 1. The closing of lid will trigger a sci and a EC query37 will change the
> LID status to closed, then system falls to alseep.
> 
> 2. The openning of lid will not trigger any sci and LID status remains
> closed, after woken up by other device.
> 
> 3. Lv's patch add hooks in LID's suspend/resume path, which will update the
> lid_button's status on time, and sysfs will read the new lid_button's status
> rather than _LID directly.

After applying the patch I have tested it first with disabled suspending. The behavior was a old one. The lid status remains "closed".

After enabling the suspending the status starts to change back to "open" but system still catched in the suspending loop (suspends every 10-20 secs).


> 
> 4.after 10 seonds, the system falls asleep again, but since #Comment 37
> shows that, after lid opened, there is no GPE sci triggered for 1 minute, it
> is unlikely an extra lid open event triggers the suspending.

I think the Lv has right in comment 47 about correctnes of the test. I'm using my system permanently with disabled suspending. Otherwise its just impossible to work. With enabled suspending the system is in continuous loop of suspending after each closing the lid. The test in comment 37 was with disabled suspending. But how you can see the counter still reacts on closing the lid.
> 
>  In order to find out why it suspends again, please test with 
> /* #define DEBUG */
>  and plus please try this debug patch attached. 
> 

In additional...

in attachment 191921 [details] the first comment:

<<<<<<<<<<<<<<<<<<<< 1st SUSPEND >>>>>>>>>>>>>>>>>>>>

is on closing the lid. (Expected behavior)

the second:


<<<<<<<<<<<<<<<<<<<< 2nd SUSPEND >>>>>>>>>>>>>>>>>>>>

after 10 seconds. (Wrong behavior) I did not closed the lid at this point - it steys open. Also status in /proc/acpi/button/lid/LID0/state is now "open".
Comment 56 GiH 2015-11-04 08:01:28 UTC
(In reply to Chen Yu from comment #53)
> Created attachment 192051 [details]
> Please apply this debug patch on top of ec debugging

I will do. Give some time...
Comment 57 GiH 2015-11-04 08:06:29 UTC
(In reply to Chen Yu from comment #54)
> BTW, also with https://bugzilla.kernel.org/attachment.cgi?id=191921 applied

of couse...
Comment 58 GiH 2015-11-04 08:09:21 UTC
(In reply to GiH from comment #57)
> (In reply to Chen Yu from comment #54)
> > BTW, also with https://bugzilla.kernel.org/attachment.cgi?id=191921 applied
> 
> of couse...

Ahem... You mean attachment 191911 [details], i suppose...
Comment 59 Chen Yu 2015-11-04 08:22:49 UTC
(In reply to GiH from comment #58)
> (In reply to GiH from comment #57)
> > (In reply to Chen Yu from comment #54)
> > > BTW, also with https://bugzilla.kernel.org/attachment.cgi?id=191921
> applied
> > 
> > of couse...
> 
> Ahem... You mean attachment 191911 [details], i suppose...

ah, yes.
Comment 60 Lv Zheng 2015-11-04 08:45:33 UTC
(In reply to GiH from comment #51)
> > 
> > In order to enable the EC debugging, you can uncomment the following line
> > from drivers/acpi/ec.c
> >    /* #define DEBUG */
> 
> I can try this today if you still need it.

We need to know the cause of the 2nd suspending after applying attachment 191911 [details].
Yu's patch (attachment 192051 [details]) can help to capture the userspace task that has triggered the 2nd suspending. And EC debugging log can help me to figure out if this is triggered by an EC event IRQ.

So we meant you to do such a test:
1. Apply attachment 191911 [details]
2. Apply attachment 192051 [details]
3. Enable EC debugging
4. Rebuild and boot the kernel
5. Do the suspend test again like what you've done in comment 45
6. Upload the dmesg output of the 2nd suspending here, also please help to mark the 1st suspending/2nd suspending like what you've done in attachment 191921 [details]

Hope this test request is clear enough.

Thanks
-Lv
Comment 61 Lv Zheng 2015-11-04 08:51:42 UTC
If you can use latest kernel, let me tune the request to capture more information.

So we meant you to do such a test:
1. Apply attachment 191911 [details]
2. Apply attachment 192051 [details]
3. Enable EC debugging by uncommenting #define DEBUG from drivers/acpi/ec.c
4. Enable CONFIG_ACPI_DEBUG using make menuconfig
5. Rebuild and boot the kernel with acpi.trace_state=method acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID
6. Do the suspend test again like what you've done in comment 45
7. Upload the dmesg output of the 2nd suspending here, also please help to mark the 1st suspending/2nd suspending like what you've done in attachment 191921 [details]

Thanks in advance
-Lv
Comment 62 GiH 2015-11-04 09:09:34 UTC
(In reply to Lv Zheng from comment #60)
> (In reply to GiH from comment #51)

> 5. Rebuild and boot the kernel with acpi.trace_state=method
> acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID

This point is not clear for me. Should I start a compilation like that:
# make -j3  acpi.trace_state=method acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID

or do you mean something else?
Comment 63 GiH 2015-11-04 09:38:10 UTC
Created attachment 192061 [details]
dmesg enhanced debug

(In reply to Lv Zheng from comment #61)
> If you can use latest kernel, let me tune the request to capture more
> information.
> 
> So we meant you to do such a test:
> 1. Apply attachment 191911 [details]
check

> 2. Apply attachment 192051 [details]
check

> 3. Enable EC debugging by uncommenting #define DEBUG from drivers/acpi/ec.c
check

> 4. Enable CONFIG_ACPI_DEBUG using make menuconfig
CONFIG_ACPI_DEBUG=y in .config
check

> 5. Rebuild and boot the kernel with acpi.trace_state=method
> acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID
# make -j3 acpi.trace_state=method acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID
check

> 6. Do the suspend test again like what you've done in comment 45
check

> 7. Upload the dmesg output of the 2nd suspending here, also please help to
> mark the 1st suspending/2nd suspending like what you've done in attachment
> 191921 [details]
check
Comment 64 GiH 2015-11-04 15:30:37 UTC
(In reply to Lv Zheng from comment #61)
> If you can use latest kernel, let me tune the request to capture more
> information.
Since start of October I was using 4.2.3 Kernel.
Also the last test (comment 63) was done with 4.2.3.

Now I have compiled a 4.2.5 Kernel. I need to test the stability of wifi module (Bug: https://bugzilla.kernel.org/show_bug.cgi?id=69661) under this kernel.

Some how I have troubles to move to 4.3. :(
Comment 65 Lv Zheng 2015-11-05 00:37:25 UTC
Hi,

The step 5 check is wrong, let me split:

5. Rebuild and boot the kernel with acpi.trace_state=method acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID

Should be:
5.1 Rebuild the kernel
5.2 Boot the kernel with the following kernel boot parameters appended:
acpi.trace_state=method acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID

Please correct it.

Thanks and best regards
-Lv
Comment 66 Lv Zheng 2015-11-05 00:39:39 UTC
(In reply to GiH from comment #64)
> (In reply to Lv Zheng from comment #61)
> > If you can use latest kernel, let me tune the request to capture more
> > information.
> Since start of October I was using 4.2.3 Kernel.
> Also the last test (comment 63) was done with 4.2.3.
> 
> Now I have compiled a 4.2.5 Kernel. I need to test the stability of wifi
> module (Bug: https://bugzilla.kernel.org/show_bug.cgi?id=69661) under this
> kernel.
> 
> Some how I have troubles to move to 4.3. :(

If you cannot use 4.3+ kernels, tracing AML execution is not possible. Thus acpi.trace_xxx will be invalid boot parameters.

Thanks and best regards
-Lv
Comment 67 Lv Zheng 2015-11-05 00:40:21 UTC
You can try 4.2.x and ignore the step 5.2 for now.

Thanks
-Lv
Comment 68 GiH 2015-11-05 09:01:55 UTC
Created attachment 192171 [details]
dmesg enhanced debug

(In reply to Lv Zheng from comment #61)
> If you can use latest kernel, let me tune the request to capture more
> information.
> 
> So we meant you to do such a test:
> 1. Apply attachment 191911 [details]
check

> 2. Apply attachment 192051 [details]
check

> 3. Enable EC debugging by uncommenting #define DEBUG from drivers/acpi/ec.c
check

> 4. Enable CONFIG_ACPI_DEBUG using make menuconfig
check

> 5. Rebuild and boot the kernel with acpi.trace_state=method
> acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID
partially check.

I'm not able to compile 4.3 now. It just won't. Will try latter.
Compiled 4.2.5 with:
# make -j3
# make modules_install
and so on...
In accordance with comment 67 booted without acpi.trace_xxx params.

> 6. Do the suspend test again like what you've done in comment 45
check

> 7. Upload the dmesg output of the 2nd suspending here, also please help to
> mark the 1st suspending/2nd suspending like what you've done in attachment
> 191921
check (see last attachment)
Comment 69 Lv Zheng 2015-11-06 02:17:29 UTC
I can see 2 facts here:

1. The 2nd suspending was triggered by systemd-sleep.
	Line 2958: [  297.765897] Current(2075,systemd-sleep),on cpu:2,is trying to suspend the system
	Line 3806: [  329.418016] Current(3780,systemd-sleep),on cpu:3,is trying to suspend the system

2. There was no EC events triggered after 1st resuming.
	Line 291: [    0.174742] ACPI : EC: +++++ Starting EC +++++
	Line 1015: [    2.827349] ACPI : EC: ##### Query(0x41) scheduled #####
	Line 1793: [    2.930983] ACPI : EC: ##### Query(0x09) scheduled #####
	Line 1923: [    3.057022] ACPI : EC: ##### Query(0x09) scheduled #####
	Line 2933: [  296.291923] ACPI : EC: ##### Query(0x36) scheduled #####
	Line 2989: [  298.302433] ACPI : EC: +++++ Stopping EC +++++
	Line 2997: [  298.309502] ACPI : EC: +++++ Starting EC +++++
	Line 3845: [  329.969192] ACPI : EC: +++++ Starting EC +++++
	Line 3837: [  329.959669] ACPI : EC: +++++ Stopping EC +++++
All events arrived before 1st suspending. So it wouldn't be LID status change event that had triggered the 2nd suspending.

Since we don't have \_SB.PCI0.LPCB.LID0._LID event log, it's hard to figure out if systemd has accessed stale /proc/acpi/button/lid/LID0/state.

Next step would be to debug the systemd-sleep.

Thanks and best regards
-Lv
Comment 70 GiH 2015-11-06 10:41:30 UTC
Created attachment 192241 [details]
dmesg enhanced debug on 4.3.0 wich acpi.trace_xxx params

Hi guys,
I have managed to compile 4.3.0. Dont ask me how...

I have done all steps from comment 61 on 4.3.0 kernel.
Here is a new dmesg

Lv, how can i debug systemd-sleep?
Comment 71 Lv Zheng 2015-11-09 02:28:37 UTC
Hi,

I think Yu will ask you to try acpid and add debugging information in acpid to catch the cause.

Thanks
-Lv
Comment 72 GiH 2015-11-09 14:53:57 UTC
Created attachment 192511 [details]
acpid.log

(In reply to Lv Zheng from comment #71)
> Hi,
> 
> I think Yu will ask you to try acpid and add debugging information in acpid
> to catch the cause.

Ok...
1. Installed acpid but not enabled a service.
2. Rebooted in to debug patched 4.3.0 kernel with boot options acpi.trace_state=method acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID (comment 61)

3. Started acpid manualy with:

# acpid -d -l >acpid.log 2>&1

4. Performed the test. 

5. have made a notes in log as usual
<<<<<<<<<<<<<<<<<<<< 1st SUSPEND >>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<< 2nd SUSPEND >>>>>>>>>>>>>>>>>>>>
etc.

I have closed the lid just once at
<<<<<<<<<<<<<<<<<<<< 1st SUSPEND >>>>>>>>>>>>>>>>>>>>

All others suspends 
<<<<<<<<<<<<<<<<<<<< 2nd SUSPEND >>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<< 3rd SUSPEND >>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<< 4th SUSPEND >>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<< 5th SUSPEND >>>>>>>>>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<< 6th SUSPEND >>>>>>>>>>>>>>>>>>>>
are automatically and are unwanted.

Have just logged a little more...
Comment 73 GiH 2015-11-09 14:56:43 UTC
Created attachment 192521 [details]
dmesg parallel to acpid.log

Parallel to acpid.log I have also logged the dmesg. Here is it. As usual with comments.
Comment 74 Lv Zheng 2015-11-10 01:39:36 UTC
From the acpid log, I cannot see any kernel ACPI subsystem events triggered the 2nd suspend.
Except the 1st suspend triggering event (input button/lid LID close):
	Line 28: acpid: completed input layer event "button/lid LID close"
	Line 34: acpid: completed netlink event "processor LNXCPU:00 00000081 00000000"
	Line 40: acpid: completed netlink event "processor LNXCPU:01 00000081 00000000"
	Line 46: acpid: completed netlink event "processor LNXCPU:02 00000081 00000000"
	Line 52: acpid: completed netlink event "processor LNXCPU:03 00000081 00000000"
	Line 61: acpid: completed netlink event "processor LNXCPU:00 00000081 00000000"
	Line 67: acpid: completed netlink event "processor LNXCPU:01 00000081 00000000"
	Line 73: acpid: completed netlink event "processor LNXCPU:02 00000081 00000000"
	Line 79: acpid: completed netlink event "processor LNXCPU:03 00000081 00000000"
	Line 88: acpid: completed netlink event "processor LNXCPU:00 00000081 00000000"
	Line 94: acpid: completed netlink event "processor LNXCPU:01 00000081 00000000"
	Line 100: acpid: completed netlink event "processor LNXCPU:02 00000081 00000000"
	Line 106: acpid: completed netlink event "processor LNXCPU:03 00000081 00000000"
	Line 115: acpid: completed netlink event "processor LNXCPU:00 00000081 00000000"
	Line 121: acpid: completed netlink event "processor LNXCPU:01 00000081 00000000"
	Line 127: acpid: completed netlink event "processor LNXCPU:02 00000081 00000000"
	Line 133: acpid: completed netlink event "processor LNXCPU:03 00000081 00000000"
	Line 142: acpid: completed netlink event "processor LNXCPU:00 00000081 00000000"
	Line 148: acpid: completed netlink event "processor LNXCPU:01 00000081 00000000"
	Line 154: acpid: completed netlink event "processor LNXCPU:02 00000081 00000000"
	Line 160: acpid: completed netlink event "processor LNXCPU:03 00000081 00000000"
	Line 169: acpid: completed netlink event "processor LNXCPU:00 00000081 00000000"
	Line 175: acpid: completed netlink event "processor LNXCPU:01 00000081 00000000"
	Line 181: acpid: completed netlink event "processor LNXCPU:02 00000081 00000000"
	Line 187: acpid: completed netlink event "processor LNXCPU:03 00000081 00000000"
All other events are processor related netlink events.

From the acpi tracer log, I can only see 2 _LID evaluations, one is evaluated after boot when the boot "LID open" event is received, the other is evaluated before the 1st suspend when the "LID close" event is received. It doesn't seem to be the cause that the userspace evaluated _LID and obtained old LID status.
	Line 652: [    2.743730]   exdebug-0435 ex_trace_point        : Method Begin [0xffffc90000017f96:\_SB.PCI0.LPCB.LID0._LID] execution.
	Line 653: [    2.746114]   exdebug-0435 ex_trace_point        : Method End [0xffffc90000017f96:\_SB.PCI0.LPCB.LID0._LID] execution.
	Line 783: [  162.147024]   exdebug-0435 ex_trace_point        : Method Begin [0xffffc90000017f96:\_SB.PCI0.LPCB.LID0._LID] execution.
	Line 784: [  162.147070]   exdebug-0435 ex_trace_point        : Method End [0xffffc90000017f96:\_SB.PCI0.LPCB.LID0._LID] execution.

So I have no idea here. Maybe you can provide systemd logs.

Thanks and best regards
-Lv
Comment 75 GiH 2015-11-10 09:10:22 UTC
Created attachment 192631 [details]
systemd.log of 6 suspending test from attacments 192511 and 192521

I uploaded a journal of yesterday

# journalctl -a --until=today --since=yesterday >systemd.log

Added a comments of the suspending times. Its probably not precisly, because I had to reconstruct it some how...

I hope this is what you asking.
Comment 76 Chen Yu 2015-11-10 15:11:36 UTC
Could you please attach your /etc/acpi/*.sh?  thank!
Yu
Comment 77 GiH 2015-11-10 15:45:36 UTC
Created attachment 192651 [details]
/etc/acpi/handler.sh

Of course...
There are just:
/etc/acpi/handler.sh
/etc/acpi/events/anything

The content of /etc/acpi/events/anything:
 | # Pass all events to our one handler script
 | event=.*
 | action=/etc/acpi/handler.sh %e

The symbol "|" is of course not there.
Comment 78 GiH 2015-11-10 19:12:56 UTC
Guys,
I have made another test...
1. I have forced a suspending with "systemctl suspend"
2. Resumed and wait a minute. Everything is ok.
3. Forced second time.
4. In suspended state closed and opened a lid.
5. Resumed. Status of lid still "closed". PC suspended by own after 20 sec.
6. Resumed. Status of lid is "open" now. PC suspended by own after 20 sec.

Any ideas?
Comment 79 GiH 2015-11-10 19:14:24 UTC
Created attachment 192671 [details]
dmesg for comment 78
Comment 80 GiH 2015-11-10 19:14:53 UTC
Created attachment 192681 [details]
acpi.log for comment78
Comment 81 Chen Yu 2015-11-11 12:07:51 UTC
it seems that in #Comment 72 acpid has not replaced systemd, since systemd is quite hard to debug:( I wonder if we can use a much simpler implementation of event handling framework, say, acpid to debug. BTW, which systemd version are you using?
Comment 82 GiH 2015-11-11 12:31:32 UTC
Hi,
# systemctl--version

give me this output:

systemd 226
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN
Comment 83 Chen Yu 2015-11-12 15:22:01 UTC
Can you please help to test:

1) boot linux with 'text init=/bin/bash' appended in commandline(you might need to set SATA/XHCI_HCD/EHCI_HCD built-in kernel rather than module) 
2) after booting up into a simple shell, start the acpid by: acpid -d -l
3) test if it will suspend after you close the lid, and wake up the system by other devices than lid, then wait if it will suspend for the second time?

why I do like this is to check if it is a bug from systemd or a bug from linux core, and acpid is much simpler for debugging...

thanks
Yu
Comment 84 Aaron Lu 2015-11-13 02:14:42 UTC
One question: will acpid trigger system suspend when it received a LID close event?
Comment 85 Chen Yu 2015-11-13 09:47:43 UTC
Ah right, default acpid settings seemed not be interested in lid closing event,we should add lid's action in /etc/acpi/events..
but before that, I'm curious if the following step would cause the suspending for the second time:

systemd is using a timeout mechanism, if systemd has only received one lid close event, but does not receive any lid open notification(for example, netlink, input event, or any other events), although it has resumed, it will forcely suspend the system for the second time. Event with lv's patch applied, 
the open notification by acpi_lid_send_state has not been invoked, which might cause expected behavior..
I'll refine lv's patch and send another version later.

Meanwhile, Gih, could you please help to test acpid to see if it is still reproducable? thanks..

 1. vim /etc/acpi/events
event=button/lid
action=/etc/acpi/actions/sleep.sh %e

2. /etc/acpi/actions/sleep.sh 
#!/bin/sh
sync
sleep 5 && echo -n "mem" > /sys/power/state
Comment 86 Chen Yu 2015-11-13 16:22:49 UTC
Created attachment 192901 [details]
Please apply this patch and without any other patches applied

This patch is to adjust lid status according to suspend/resume condition, please test it without any other patch applied.
Comment 87 GiH 2015-11-13 16:46:28 UTC
(In reply to Chen Yu from comment #85)
> 
> Meanwhile, Gih, could you please help to test acpid to see if it is still
> reproducable? thanks..

You mean the description from comment 83 ?
I will try it during weekend.



(In reply to Chen Yu from comment #86)
> Created attachment 192901 [details]
> Please apply this patch and without any other patches applied
> 
> This patch is to adjust lid status according to suspend/resume condition,
> please test it without any other patch applied.

The same: On weekend.

Thanks guys.
Comment 88 Chen Yu 2015-11-14 01:44:59 UTC
(In reply to GiH from comment #87)
> (In reply to Chen Yu from comment #85)
> > 
> > Meanwhile, Gih, could you please help to test acpid to see if it is still
> > reproducable? thanks..
> 
> You mean the description from comment 83 ?
> I will try it during weekend.
> 
> 
> 
> (In reply to Chen Yu from comment #86)
> > Created attachment 192901 [details]
> > Please apply this patch and without any other patches applied
> > 
> > This patch is to adjust lid status according to suspend/resume condition,
> > please test it without any other patch applied.
> 
> The same: On weekend.
> 
> Thanks guys.

Thanks, I've updated the patch(adjust_lid_status_according_to_suspend_resume_debug.diff), plz first apply this patch, and do a normal lid close to suspend the system,then wake up the system by other device, check the status of lid, then wait for a moment to see if it will be suspend for the second time. 
If it will still suspend for the second time, please test #Comment 83 with the same kernel with adjust_lid_status_according_to_suspend_resume_debug.diff applied.
Comment 89 Chen Yu 2015-11-14 01:47:24 UTC
Created attachment 192911 [details]
adjust_lid_status_according_to_suspend_resume_debug.diff

Please use this patch instead of #Comment 86
Comment 90 GiH 2015-11-14 13:26:30 UTC
Created attachment 192931 [details]
dmesg after applying attachment 192911 [details]

I have done like you said in the comment 88. The system is not more reaction on lid close. The status of lid in /proc/acpi/button/lid/LID0/state is still "open" now. I have tried it with enabled and disabled acpid.

Then I have applied all debug options:

1. Enable EC debugging by uncommenting #define DEBUG from drivers/acpi/ec.c
2. Enable CONFIG_ACPI_DEBUG using make menuconfig and rebuid the kernel
3. Boot the kernel with acpi.trace_state=method acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID

and created a new dmesg.

The comments in new dmesg are:
<<<<<<<<<<<<<<<<<<<< close lid >>>>>>>>>>>>>>>>>>>>          - no reaction
<<<<<<<<<<<<<<<<<<<< 1st manual SUSPEND >>>>>>>>>>>>>>>>>>>> - forced suspend by "systemd suspend" and resume
<<<<<<<<<<<<<<<<<<<< 2nd manual SUSPEND >>>>>>>>>>>>>>>>>>>> - forced suspend by "systemd suspend", closed the lid, open the lid, resume
Comment 91 Chen Yu 2015-11-15 07:34:57 UTC
(In reply to GiH from comment #90)
> Created attachment 192931 [details]
> dmesg after applying attachment 192911 [details]
> 
> I have done like you said in the comment 88. The system is not more reaction
> on lid close. The status of lid in /proc/acpi/button/lid/LID0/state is still
> "open" now. I have tried it with enabled and disabled acpid.
> 
> Then I have applied all debug options:
> 
> 1. Enable EC debugging by uncommenting #define DEBUG from drivers/acpi/ec.c
> 2. Enable CONFIG_ACPI_DEBUG using make menuconfig and rebuid the kernel
> 3. Boot the kernel with acpi.trace_state=method
> acpi.trace_method_name=_SB.PCI0.LPCB.LID0._LID
> 
> and created a new dmesg.
> 
> The comments in new dmesg are:
> <<<<<<<<<<<<<<<<<<<< close lid >>>>>>>>>>>>>>>>>>>>          - no reaction
> <<<<<<<<<<<<<<<<<<<< 1st manual SUSPEND >>>>>>>>>>>>>>>>>>>> - forced
> suspend by "systemd suspend" and resume
> <<<<<<<<<<<<<<<<<<<< 2nd manual SUSPEND >>>>>>>>>>>>>>>>>>>> - forced
> suspend by "systemd suspend", closed the lid, open the lid, resume

Thanks, there is something wrong with my previous patch, please apply this one and provide dmesg.
2015_11_15_adjust_lid_status_according_to_suspend_resume_debug.diff
(plz don't apply other patches, and test with systemd first)
Comment 92 Chen Yu 2015-11-15 07:36:22 UTC
Created attachment 193001 [details]
2015-11-15 lid patch to fix lid state and send correct lid state to input layer/netlink

2015-11-15 lid patch to fix lid state and send correct lid state to input layer/netlink
Comment 93 GiH 2015-11-15 20:42:53 UTC
Created attachment 193131 [details]
dmesg after applying attachment 193001 [details]

Yu, it works!!! After resuming the system is not in suspending loop (doesnt suspend every 20 sec). I still some issue with monitor went to dark twice immediately after resume. But this is not a suspending, just monitor going off and it could be a GNOME bug.

I have created a new dmesg. Just for case...

I will test it the days and ask GNOME people. I will give a feedback.

With best regards
GiH
Comment 94 Chen Yu 2015-11-18 03:17:14 UTC
Hi, Gih,
please help apply this debug patch to dump your system dmi info, thanks.
Yu
Comment 95 Chen Yu 2015-11-18 03:20:26 UTC
Created attachment 194751 [details]
export_dmi.diff
Comment 96 Chen Yu 2015-11-18 03:20:39 UTC
$ cat /sys/firmware/dmi/dmi_slot

[0]:(null)

[1]:Intel Corporation

[2]:BRAS.X64.B069.R00.1504230617

[3]:04/23/2015

[4]:Intel Corporation

[5]:CHERRYVIEW C0 PLATFORM

[6]:0.1

[7]:112233445566

[8]:9CFE245E-D0C8-BD45-A79F-54EA5FBD3D97

[9]:Intel Corp.

[10]:Braswell CRB

[11]:2

[12]:1

[13]:Base Board Asset Tag

[14]:Intel Corporation

[15]:9

[16]:0.1

[17]:serial#

[18]:Asset Tag
Comment 97 GiH 2015-11-18 14:57:06 UTC
Hi Yu,

this is my output:

$ cat /sys/firmware/dmi/dmi_slot

slot [0]:(null)
slot [1]:American Megatrends Inc.
slot [2]:1.07.0050
slot [3]:11/28/2014
slot [4]:Microsoft Corporation
slot [5]:Surface with Windows 8 Pro
slot [6]:1
slot [7]:019188131653
slot [8]:A94195F4-E55D-DF58-8359-C97BA087A104
slot [9]:Microsoft Corporation
slot [10]:Surface with Windows 8 Pro
slot [11]:1
slot [12]:0
slot [13]:0
slot [14]:Microsoft Corporation
slot [15]:17
slot [16]:1
slot [17]:0
slot [18]:0
Comment 98 GiH 2015-12-29 22:04:08 UTC
Hi,
until now is everything fine. Lid works as expected.

Is it possible to add this patch to official kernel source code?
Comment 99 Chen Yu 2016-01-04 15:09:12 UTC
Hi, I think a quirk solution would be acceptable at present. If there is no objection I'll send the patch for review soon.
Comment 100 Chen Yu 2016-01-04 15:11:10 UTC
Root cause to be unreliable value of bios's _LID, re-classify to EFI/BIOS issue for now.
Comment 101 GiH 2016-01-06 08:33:35 UTC
Hi,
let me know if you need ыщьу information or my help in this.
Comment 102 GiH 2016-01-06 08:34:11 UTC
Hi,
let me know if you need some information or my help in this.
Comment 103 Chen Yu 2016-02-01 12:27:46 UTC
Created attachment 202621 [details]
quirk to provide the cached lid state for broken bios

Gih can you help test if this patch work for you? I'm planning to sent it out for review. 
(by appending   'acpi_button=cache_lid' in cmdline.
thanks.
Comment 104 Bastien Nocera 2016-02-01 20:57:27 UTC
(In reply to Chen Yu from comment #103)
> Created attachment 202621 [details]
> quirk to provide the cached lid state for broken bios
> 
> Gih can you help test if this patch work for you? I'm planning to sent it
> out for review. 
> (by appending   'acpi_button=cache_lid' in cmdline.
> thanks.

I would use "opened" and "closed" in the commit message[1] instead of 0/1.

Once people have tested this, would it be possible to enable it through a DMI match instead, so it works out of the box?

[1]: Or at least opened ("1"), closed ("0").
Comment 105 Chen Yu 2016-02-02 06:53:23 UTC
Hi Bastien,
(In reply to Bastien Nocera from comment #104)
> (In reply to Chen Yu from comment #103)
> > Created attachment 202621 [details]
> > quirk to provide the cached lid state for broken bios
> > 
> > Gih can you help test if this patch work for you? I'm planning to sent it
> > out for review. 
> > (by appending   'acpi_button=cache_lid' in cmdline.
> > thanks.
> 
> I would use "opened" and "closed" in the commit message[1] instead of 0/1.
do you mean the changelog? I use zero because I mentioned a variable in the exaple DSDT code.
> 
> Once people have tested this, would it be possible to enable it through a
> DMI match instead, so it works out of the box?
OK, then I need three dmi info.
> 
> [1]: Or at least opened ("1"), closed ("0").
Comment 106 Chen Yu 2016-02-02 08:17:16 UTC
Created attachment 202741 [details]
dmi quirk for broken lid state

plz help check if this one is ok, thanks.
Comment 107 GiH 2016-02-02 10:44:10 UTC
After patching with attachment 202741 [details] the file button.c is not compiling.
Here are the errors:


drivers/acpi/button.c:490:13: error: expected declaration specifiers or ‘...’ before string constant
 early_param("acpi_button", acpi_button_setup);
             ^
drivers/acpi/button.c:490:28: error: expected declaration specifiers or ‘...’ before ‘acpi_button_setup’
 early_param("acpi_button", acpi_button_setup);
                            ^
drivers/acpi/button.c:480:19: warning: ‘acpi_button_setup’ defined but not used [-Wunused-function]
 static int __init acpi_button_setup(char *str)
                   ^
scripts/Makefile.build:264: recipe for target 'drivers/acpi/button.o' failed
make[2]: *** [drivers/acpi/button.o] Error 1
scripts/Makefile.build:403: recipe for target 'drivers/acpi' failed
make[1]: *** [drivers/acpi] Error 2
Makefile:941: recipe for target 'drivers' failed
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....
Comment 108 Chen Yu 2016-02-02 10:48:06 UTC
Created attachment 202751 [details]
please apply this one

Sorry I attached the wrong version, plz use this one.
Comment 109 GiH 2016-02-02 11:42:58 UTC
Now it works.
I have used a 4.3.5 Kernel.
Compiled and booted with "acpi_button=cache_lid".
Closed LID. PC going to suspend.
Opened LID, reactivated with sensor button. PC is waked up.
Wait ca 20.Secs.
Status of LID is also back to "open"

Works fine.

I want switch to 4.4.1 tomorow.

I will post here if I have some troubles with this or 4.4.1 Kernels.

Thanks a lot.
Comment 110 Lubomir Kucera 2016-02-21 11:42:18 UTC
Thank you, Chen Yu, for your work on this one. Where can I find repository with your changes ? Are you going to push the patch to linux-next anytime soon ?
Comment 111 Chen Yu 2016-03-07 15:13:46 UTC
According to feedback/discussion in the community, above patch has been obsoleted since it is too 'hack' and hard to maintain,
actually there is a problem in previous acpi button driver that has been
introduced long ago, we need to revert that commit and leverage the systemd
developer for a root cause, I've filed a thread on systemd at:
https://github.com/systemd/systemd/issues/2807
Comment 112 GiH 2016-03-07 21:24:45 UTC
I just want to mention, that I have reported this bug running Debian Wheezy without systemd.
Comment 113 Lv Zheng 2016-06-20 02:34:23 UTC
I just forgot to give you feedback on the bugzilla.
Please download the 3 patches:
https://patchwork.kernel.org/patch/9147055/
https://patchwork.kernel.org/patch/9147063/
https://patchwork.kernel.org/patch/9147067/

Apply them and boot the built kernel with following boot parameter:
lid_init_state.lid_init_state=open.

Thanks in advance.
Comment 114 Lv Zheng 2016-06-20 02:38:50 UTC
Sorry, the boot parameter should be:
button.lid_init_state=open.

Thanks
-Lv
Comment 115 GiH 2016-06-21 08:52:22 UTC
Hi,
I have tested with kernel 4.6.2. PC sleeps just on closing lid. After wakeup stays awaken. But status of lid in /proc/acpi/button/lid/LID0/state remains "closed" after first closing of lid.
Comment 116 Lv Zheng 2016-06-22 03:40:52 UTC
(In reply to GiH from comment #115)
> Hi,
> I have tested with kernel 4.6.2. PC sleeps just on closing lid. After wakeup
> stays awaken.

This means the bug has been resolved.

> But status of lid in /proc/acpi/button/lid/LID0/state remains
> "closed" after first closing of lid.

This is not related.
We are going to document that BIOS returned _LID is not reliable.
Maybe we can even move this file to the debugfs.
I think there is no user space tools relying on the correctness of this file.

So let me close the bug.

Thanks
-Lv
Comment 117 Lv Zheng 2016-06-22 04:31:17 UTC
*** Bug 106151 has been marked as a duplicate of this bug. ***
Comment 118 GiH 2016-06-22 10:52:41 UTC
(In reply to Lv Zheng from comment #116)
> (In reply to GiH from comment #115)
> > Hi,
> > I have tested with kernel 4.6.2. PC sleeps just on closing lid. After
> wakeup
> > stays awaken.
> 
> This means the bug has been resolved.
> 
> > But status of lid in /proc/acpi/button/lid/LID0/state remains
> > "closed" after first closing of lid.
> 
> This is not related.
Probably...
But it was updated in the previous patch (attachment 202751 [details]).

> We are going to document that BIOS returned _LID is not reliable.
> Maybe we can even move this file to the debugfs.
> I think there is no user space tools relying on the correctness of this file.
If it is so, how can I determine, let's say in my script, the real state of lid?

> 
> So let me close the bug.
If everybody is fine with this state you can close the bug.

> 
> Thanks
> -Lv
Comment 119 GiH 2016-06-22 11:22:02 UTC
Are you going to push the patch to linux-next?

Thanks for your work.
Comment 120 Lv Zheng 2016-06-23 03:15:57 UTC
(In reply to GiH from comment #118)
> (In reply to Lv Zheng from comment #116)
> > (In reply to GiH from comment #115)
> > > Hi,
> > > I have tested with kernel 4.6.2. PC sleeps just on closing lid. After
> wakeup
> > > stays awaken.
> > 
> > This means the bug has been resolved.
> > 
> > > But status of lid in /proc/acpi/button/lid/LID0/state remains
> > > "closed" after first closing of lid.
> > 
> > This is not related.
> Probably...
> But it was updated in the previous patch (attachment 202751 [details]).

The previous patch could cause regressions on other platforms.

> 
> > We are going to document that BIOS returned _LID is not reliable.
> > Maybe we can even move this file to the debugfs.
> > I think there is no user space tools relying on the correctness of this
> file.
> If it is so, how can I determine, let's say in my script, the real state of
> lid?
> 
> > 
> > So let me close the bug.
> If everybody is fine with this state you can close the bug.

We've been looking at the tables, and concluded that there are many such BIOS tables providing OSPMs with the wrong initial lid state and Windows doesn't care that too much.

Thanks
-Lv
Comment 121 Lv Zheng 2016-06-23 03:16:38 UTC
I(In reply to GiH from comment #119)
> Are you going to push the patch to linux-next?
> 
> Thanks for your work.

It's in Linux-pm.git/Linux-next, and will appear soon in the upstream.
Let's close it.

Thanks
-Lv
Comment 122 Zhang Rui 2016-06-23 06:56:57 UTC
no, you can only mark it as "RESOLVED" when there is a patch targeting upstream.
The bug can be closed only if the fix patch has been shipped in upstream kernel.
Comment 123 GiH 2016-06-23 07:45:18 UTC
(In reply to Lv Zheng from comment #120)

> We've been looking at the tables, and concluded that there are many such
> BIOS tables providing OSPMs with the wrong initial lid state and Windows
> doesn't care that too much.

That is really sad. It was really comfortable way to determine the lid state in applications/scripts/what ever. 

But I'm happy about that you guys solved the main problem.
Comment 124 Lv Zheng 2016-06-24 00:43:23 UTC
(In reply to GiH from comment #123)
> (In reply to Lv Zheng from comment #120)
> 
> > We've been looking at the tables, and concluded that there are many such
> > BIOS tables providing OSPMs with the wrong initial lid state and Windows
> > doesn't care that too much.
> 
> That is really sad. It was really comfortable way to determine the lid state
> in applications/scripts/what ever. 

However I think no one needs such a script.
Because like there is no need to implement an tray icon on Windows to indicate the lid state (the state of lid can be easily confirmed by human eyes), there should be no user needing such a script.

Another point is ALL AML TABLES can ensure that if the LID is switched once, the value is correct.
But if a user switches the lid once, then he surely should know the current state of the lid.

Thanks and best regards
-Lv
Comment 125 GiH 2016-06-27 08:43:36 UTC
(In reply to Lv Zheng from comment #124)
> 
> However I think no one needs such a script.
> Because like there is no need to implement an tray icon on Windows to
> indicate the lid state (the state of lid can be easily confirmed by human
> eyes), there should be no user needing such a script.
> 
> Another point is ALL AML TABLES can ensure that if the LID is switched once,
> the value is correct.
> But if a user switches the lid once, then he surely should know the current
> state of the lid.
> 
> Thanks and best regards
> -Lv

Why not? Start a scheduled backup just if lid is closed for example... It's not about need, it's about possibility. I'm happy that my Surface acts correct now and that I don't need to compile the kernel after every update in the future.

Best regards
GiH
Comment 126 Lv Zheng 2016-06-28 08:29:47 UTC
(In reply to GiH from comment #125)
> (In reply to Lv Zheng from comment #124)
> > 
> > However I think no one needs such a script.
> > Because like there is no need to implement an tray icon on Windows to
> > indicate the lid state (the state of lid can be easily confirmed by human
> > eyes), there should be no user needing such a script.
> > 
> > Another point is ALL AML TABLES can ensure that if the LID is switched
> once,
> > the value is correct.
> > But if a user switches the lid once, then he surely should know the current
> > state of the lid.
> > 
> > Thanks and best regards
> > -Lv
> 
> Why not? Start a scheduled backup just if lid is closed for example... It's
> not about need, it's about possibility. I'm happy that my Surface acts
> correct now and that I don't need to compile the kernel after every update
> in the future.

The lid close event is guarenteed by all AML tables, because Windows relies on this to issue a suspend. You can see this in Windows power options "Choose what closing the lid does". There is no option "Choose what opening the lid does" and there is no icon indicating lid state on Windows.

So the scheduled backup is possible, you just need to listen to the SW_LID event (we are about to introduce new PM_LID event) as what acpi_listen currently does.

The /proc/acpi/button/lid/LID0/state shouldn't be touched by the userspace.

Thanks and best regards
-Lv
Comment 127 Lv Zheng 2016-08-15 08:59:42 UTC
Patch upstreamed:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3540c32a9ae4cb23ab70f7798f45affc02762fa7
However it require special boot parameter to enable this quirk mechanism.

Closing...

Thanks
Comment 128 GiH 2016-08-24 09:31:58 UTC
Good news! What you think, in which kernel it will be integrated?

Note You need to log in before you can comment on or make changes to this bug.