Bug 187271 - Lid does not "open" after suspend to RAM - Razer Blade Stealth 2016
Summary: Lid does not "open" after suspend to RAM - Razer Blade Stealth 2016
Status: CLOSED DOCUMENTED
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Lid (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
: 189171 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-11-08 04:20 UTC by kobrient
Modified: 2017-12-24 17:55 UTC (History)
9 users (show)

See Also:
Kernel Version: 4.8.6-1-ARCH
Tree: Mainline
Regression: No


Attachments
acpidump of New Razer Blade Stealth 4K (905.17 KB, text/plain)
2016-11-18 21:58 UTC, Vic Fryzel
Details
attachment-14171-0.html (1.44 KB, text/html)
2016-11-21 01:47 UTC, Vic Fryzel
Details
attachment-14559-0.html (1.31 KB, text/html)
2016-11-21 02:02 UTC, Vic Fryzel
Details
[PATCH] Some in theory not correct code (3.61 KB, patch)
2017-04-26 01:27 UTC, Lv Zheng
Details | Diff

Description kobrient 2016-11-08 04:20:32 UTC
My system is set up to suspend to ram on a lid close event. The lid state transitions from "open" to "closed" correctly the first time (and the system suspends), but after resuming from suspend by opening the lid, the lid state does not change back to "open". Needless to say, closing the lid once again does not induce a suspend.

Opening the lid now, while the system is NOT sleeping, causes ACPI to notice and register the "open" event. Closing the lid once more successfully suspends the machine again. The behavior is reproducible every other attempt. Everything seems to indicate that ACPI drops or discards the lid open event from the EC when coming up from sleep.

If I run `acpi_listen` with all lid events disabled (no suspend action), I see both the "open" and "close" events successfully recorded multiple times. It is only when I successfully suspend to ram then try to wake that my lid "open" event gets dropped.

The problem is very similar to this issue documented on LKML in 2014:
https://lkml.org/lkml/2014/3/25/196

I am reading the lid state from /proc/acpi/button/lid/LID0/state and using acpid to set up and test lid open/close triggers.

If lends any credence to the story, the Arch linux wiki seems to agree with me:
https://wiki.archlinux.org/index.php/Razer#Suspend_Loop
**Note that their "workaround" here does not actually solve the problem.
Comment 1 Vic Fryzel 2016-11-18 21:58:02 UTC
Created attachment 245081 [details]
acpidump of New Razer Blade Stealth 4K

I've attached the acpidump output for a New Razer Blade Stealth 4K, which exhibits this exact same behavior (never resetting lid state to open).
Comment 2 Vic Fryzel 2016-11-18 21:58:57 UTC
As I noted in #110951, this is still present in 4.8.8 and 4.9.0-rc5.
Comment 3 Lv Zheng 2016-11-21 01:38:39 UTC
This is no longer a kernel issue now.
See Documentation/acpi/acpi-lid.txt.

For LID events.
Input layer requires a "switch event".
While ACPI BIOS is not designed to provide paired lid open/close events.
So ACPI lid driver now is only responsible for delivering lid close event to the userspace.
And the wrong systemd workaround shouldn't be used now.

You should ask systemd developers to handle the special ACPI lid case.
And boot the kernel with acpi.lid_init_state=ignore.
There is no possible way in kernel fixing this with userspace wrong behaviors.

I'll close this issue.
And feel free to re-open it when necessary.
Comment 4 Lv Zheng 2016-11-21 01:41:16 UTC
For this issue, kernel doesn't break userspace.
It's BIOS breaking userspace.
Which means userspace is not proper to handle the BIOS expected behavior.
And kernel acpi lid driver managed to make it possible for a userspace fix.
So IMO, you've filed the bug in a wrong place.
Comment 5 Vic Fryzel 2016-11-21 01:47:51 UTC
Created attachment 245221 [details]
attachment-14171-0.html

If I sent a kernel patch to accommodate this, would it be accepted assuming
it passed technical review? Trying to determine if you're refusing it on
philosophical grounds or technical ones.

On Sun, Nov 20, 2016, 5:41 PM <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=187271
>
> --- Comment #4 from Lv Zheng <lv.zheng@intel.com> ---
> For this issue, kernel doesn't break userspace.
> It's BIOS breaking userspace.
> Which means userspace is not proper to handle the BIOS expected behavior.
> And kernel acpi lid driver managed to make it possible for a userspace fix.
> So IMO, you've filed the bug in a wrong place.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 6 Lv Zheng 2016-11-21 01:51:03 UTC
The description in the following link is entirely wrong:
https://wiki.archlinux.org/index.php/Razer#Suspend_Loop

Let me clarify again:
>  There is a bug in the linux kernel which causes the ACPI "lid open" event to
>  be dropped and not modify the stored state of the lid switch.

This is not a kernel bug. It's ACPI bios nature.

And in acpi.lid_init_state=ignore mode, ACPI lid driver managed to send such an "open" event to userspace in order to make the "close" event delivered to the userspace (because lid events are switch events in input layer).

> Since systemd monitors the lid switch state, it will eventually check and see
> that it is "closed" and resuspend the laptop.

Systemd should stop doing this.
This is entirely wrong.
It looks like an unhealthy obsession behavior now...
It is only used for the old days, where lid open event could be lost.
While in the current driver, there are 2 ways to deliver the lost open event to the userspace.

So the old workaround triggers the issue.
If systemd is not able to see the "open" event, it suspends the laptop.
During the period, the systemd hasn't removed this wrong workaround, you can try acpi.lid_init_state=open.

> The correct fix is to not drop the ACPI event in the kernel, but for now a
> workaround is possible.

ACPI lid driver never drops open event now.
But it is not possible for ACPI lid driver to invent an open event in case the BIOS doesn't send any lid event.
So the correct fix should be in userspace:
Do not expect there is always an open event after resuming.

Thanks
Lv
Comment 7 Lv Zheng 2016-11-21 01:52:34 UTC
(In reply to Vic Fryzel from comment #2)
> As I noted in #110951, this is still present in 4.8.8 and 4.9.0-rc5.

Did you try acpi.lid_init_state=open boot parameter?(

The correct fix is Did you try acpi.lid_init_state=ignore.
However that requires userspace fix.

Thanks
Comment 8 Lv Zheng 2016-11-21 01:53:14 UTC
Did you try acpi.lid_init_state=open boot parameter?

The correct fix is "acpi.lid_init_state=ignore".
However that requires userspace fix.
Comment 9 Lv Zheng 2016-11-21 01:54:01 UTC
> If I sent a kernel patch to accommodate this, would it be accepted assuming
it passed technical review? Trying to determine if you're refusing it on
philosophical grounds or technical ones.

Please send such a kernel patch.
Comment 10 Vic Fryzel 2016-11-21 02:02:04 UTC
Created attachment 245231 [details]
attachment-14559-0.html

Sorry our conversation is happening a but out of order. I will try your
other suggestions first. Thanks for the advice.

On Sun, Nov 20, 2016, 5:54 PM <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=187271
>
> --- Comment #9 from Lv Zheng <lv.zheng@intel.com> ---
> > If I sent a kernel patch to accommodate this, would it be accepted
> assuming
> it passed technical review? Trying to determine if you're refusing it on
> philosophical grounds or technical ones.
>
> Please send such a kernel patch.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>
Comment 11 Lv Zheng 2016-11-21 02:04:46 UTC
Let me say something more:

acpi.lid_init_state=open

This workaround may fix your issue when the systemd is not changed.
But we've been considering the coverage of this behavior, and found that it cannot handle some other cases.
So it is not the final fix, feel free to use it if you need it.

But the only correct fix is:
1. Use acpi.lid_init_state=ignore, and
2. Remove systemd wrong obsession behaviors (at least making it an option).
So please don't report the same issue to the kernel community.
It's agreed to be a userspace issue now.

Thanks
Lv
Comment 12 Lv Zheng 2016-11-21 02:06:37 UTC
> Sorry our conversation is happening a but out of order.
> I will try your other suggestions first.
> Thanks for the advice.

No problem, we seem to have logged onto bugzilla in the same time. :)
Just feel free to contribute and reopen this report if you can find a way fixing it in the kernel.

Thanks and best regards
Lv
Comment 13 Lv Zheng 2016-11-21 02:32:30 UTC
I should tell the story in this way:

In the old days, there is a kernel bug, acpi lid driver is not able to handle "switch event" requirement of the lid events.
So if there is no "open" lid event sent from the BIOS, all follow-up "close" lid events won't be delivered to the userspace, causing laptop incapable of re-suspending after resuming.
I guess that's why the userspace added such an obsession behavior as a workaround.

But in acpi.lid_init_state=ignore mode, the kernel bug has been fixed.
While acpi lid driver is still not able to work correctly if the old userspace workaround is there.
So please contact the userspace developer to remove the old workaround, or at least make it an option (enabled for old versioned buggy kernel acpi lid drivers).

During the period when the userspace hasn't made a change, we've provided a new workaround acpi.lid_init_state=open for the users who need it.
But this workaround cannot handle all cases, especially it cannot handle cases that the laptop is actually resumed with lid closed.
So it is not the final fix and that's why it is not enabled by default in the kernel.

Thanks and best regards
Lv
Comment 14 kobrient 2016-11-21 21:34:42 UTC
(In reply to Vic Fryzel from comment #10)
> Created attachment 245231 [details]
> attachment-14559-0.html
> 
> Sorry our conversation is happening a but out of order. I will try your
> other suggestions first. Thanks for the advice.

Using boot parameter "acpi.lid_init_state=open" as a workaround does not seem to be resolving the issue for me.

@Vic, did it fix the issue for you? Have you tried?

Thanks,
K
Comment 15 Lv Zheng 2016-11-22 00:02:33 UTC
(In reply to kobrient from comment #14)
> (In reply to Vic Fryzel from comment #10)
> > Created attachment 245231 [details]
> > attachment-14559-0.html
> > 
> > Sorry our conversation is happening a but out of order. I will try your
> > other suggestions first. Thanks for the advice.
> 
> Using boot parameter "acpi.lid_init_state=open" as a workaround does not
> seem to be resolving the issue for me.

Where did you test this?
In GUI or in CUI?

Thanks
Lv
Comment 16 Lv Zheng 2016-11-22 00:07:10 UTC
> Opening the lid now, while the system is NOT sleeping, causes ACPI to notice
> and register the "open" event.

IMO, in S3, system is woken by a certain wakeup GPE.
And BIOS takes the control ship.
After doing necessary stuffs, BIOS returns the control ship to OSPM via a waking vector.
So there is simply no such "open" event.
I think this scenario is natural in ACPI BIOS.

Thanks
Lv
Comment 17 kobrient 2016-11-22 01:03:04 UTC
(In reply to Lv Zheng from comment #15)
> 
> Where did you test this?
> In GUI or in CUI?
> 
> Thanks
> Lv

I tested in both CUI and GUI(Cinnamon).
Adding "acpi.lid_init_state=open" to the parameters at boot did not seem to work.

But strangely, using parameter "button.lid_init_state=open" DID work.

I think this will be my workaround until systemd changes.
@Vic, please still respond with any results you got.
Comment 18 Lv Zheng 2016-11-23 02:59:14 UTC
> But strangely, using parameter "button.lid_init_state=open" DID work.

...
It's my fault, the acpi button driver name is button, should be button.lid_init_state=open.
I'm sorry for that.

Hi guys, please just raise this issue to systemd community.
It's no longer a kernel acpi community's business now.

Since button.lid_init_state=open still breaks some other cases and we do not want to be reported of new regressions because of making such a non-final workaround , we didn't make it the default behavior.
So users need to enable it manually before systemd fixes accordingly.

Sorry for the inconvenience.

Let me close it.

Thanks
Lv
Comment 19 Lv Zheng 2016-12-16 03:58:57 UTC
*** Bug 189171 has been marked as a duplicate of this bug. ***
Comment 20 Ryan Burns 2017-01-10 05:18:58 UTC
Hi Lv, 

I know you've marked this CLOSED with the `button.lid_init_state=ignore` fix, but I'm also on a late 2016 razer blade stealth with the 4.8.0-2 kernel and `button.lid_init_state=ignore` in my boot params and I still don't see any acpi events emitted when the lid is opened. My test is running `acpid -d -l` and then closing the lid and then reopening it. It appears only one event is  generated for the lid close and no event is generated for lid open:

acpid: received input layer event "button/lid LID close"
acpid: rule from /etc/acpi/events/lid matched
acpid: executing action "/etc/acpi/lid.sh"
BEGIN HANDLER MESSAGES
END HANDLER MESSAGES
acpid: action exited with status 0
acpid: 1 total rule matched
acpid: completed input layer event "button/lid LID close"

[SNIP]


So just to confirm, should ACPI be sending lid open events? I'm not entirely clear on how I would fix this in userspace without it.

Thanks,
Ryan
Comment 21 Lv Zheng 2017-01-10 06:37:26 UTC
There could be 2 causes that you cannot see a lid open event:

1. BIOS developers didn't prepare any lid open event - we have checked several such platforms, there are indeed no lid open events in it.

2. Post resuming SCI is not delivered to the drivers.

However we did see some platforms, where lid open event relied on the EC queries, and Linux did implement EC IRQ polling after resuming, still no lid open events could be seen for such platforms.

So we think the behavior is natural - we shouldn't expect all BIOS can send us a lid open event after resuming. So why do we need to enforce it?

As such we expect userspace to adopt this. When it is adopted, you can use button.lid_init_state=ignore. So I actually didn't mean you should try this quirk.

When the userspace hasn't adopted this, please use button.lid_init_state=open. Have you tried this quirk which I actually meant to fix the issue?

So I just close such bugs upon seeing them.
There is nothing to do here (yes we may be able to improve issue 1, but we still need to handle case 2).

Thanks
Lv
Comment 22 Ryan Burns 2017-01-10 06:56:34 UTC
Hi Lv, 

Thanks for the added explanation! I disabled systemd's lid handling and implemented my own event handler with acpid and sleep/resume now work as expected!

It is still odd to me that the kernel exposes an interface that implies a lid state (/proc/acpi/button/lid/LID0/state [open/closed]). I know its hard to get rid of APIs once they exist, but I imagine others will continue to be confused as long as that interface exists and does not actually report lid state. Maybe a new one should be created that better maps to modern hardware? and the current one eventually phased out.


Thanks again for the help.
Ryan
Comment 23 Lv Zheng 2017-01-10 07:20:46 UTC
I don't know any one using /proc/acpi/button/lid/LID0/state.
It's simply for debugging purpose.
We'll move it to debugfs and no need to add a new one.

systemd actually thinks SW_LID key event is a switch event, and if it sees a LID close event, when the platform is resumed, it __requires__ kernel to send LID open event, otherwise, after having not seen such a event for a while, systemd suspends the system again.
The logic looks strange to us, especially there is really platform siliently resumed with no open events.

The kernel only need to ensure that:
If the BIOS does generate a lid event and it is delivered to the lid driver, the switch event nature wouldn't prevent it to be delivered by the lid driver to the userspace.

That means, the kernel lid driver only ensures to reliably deliver BIOS triggered close events to the userspace. For open events, well, lid driver is not able to ensure this if the underlying BIOS is bugyy or the underlying SCI or IRQ handling is buggy.

Thanks
Lv
Comment 24 Lv Zheng 2017-01-10 07:21:53 UTC
All my test platforms (if s2idle is the only working suspend mean) are using button.lid_init_state=open. :)

Thanks
Lv
Comment 25 Ryan Burns 2017-01-10 07:24:56 UTC
Actually I think I spoke too soon. ACPI events still seem incorrect. Here is what I'm seeing:

1. Close the lid: lid state goes into `closed` and system sleeps (ACPI event received)
2. Open the lid: lid state still closed and system resumes (no lid events)
3. Close the lid again: nothing happens, lid state still closed (no lid events)
4. Open the lid again: lid state goes into `open` and system sleeps (ACPI event received)
5. Press a key to wake up the computer

This happens with and without `button.lid_init_state=ignore`. That doesn't seem like expected behavior, so now I'm thinking that your second idea might be whats actually happening SCI doesn't get delivered. Is there a bug for this or am missing something else?


Thanks,
Ryan
Comment 26 Lv Zheng 2017-01-10 07:51:06 UTC
I have done the same tests as you on my machine.
And I just did it again.

With button.lid_init_state=ignore:
1. close lid, I can see a close event captured by acpi_listen
2. open lid, system resumes, and no events
3. close lid, I can see 2 events captured by acpi_listen - 1st is an open event, 2nd is a close event.

The behavior is brought to us by the code here:
	if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
	    button->last_state != !!state)
		do_update = true;
...
		if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
			do_update = true;
			/*
			 * Do generate complement switch event for "close"
			 * as "close" is reliable and wrong "open" won't
			 * trigger unexpected behaviors.
			 * Do not generate complement switch event for
			 * "open" as "open" is not reliable and wrong
			 * "close" will trigger unexpected behaviors.
			 */
			if (!state) {
				input_report_switch(button->input,
						    SW_LID, state);
				input_sync(button->input);
			}
		}

I don't know what your test is.
Please re-do the test using acpi_listen.

Thanks
Lv
Comment 27 Lv Zheng 2017-01-10 07:52:52 UTC
see, in the above test case, the open event is very late.
it appears just before the 2nd close.
So the systemd timeout mechanism is not suitable for this mode.

Thanks
Lv
Comment 28 Lv Zheng 2017-01-10 08:07:34 UTC
Looking into your test result, it actually sounds like a button.lid_init_state=method result.
And button.lid_init_state=method is currently the default behavior.

However this mode is buggy, and we couldn't fix it.
We've been considering all cases, and it seems it's not a good idea to fix this mode. Fixing this mode messes up other modes.

So you should use button.lid_init_state=open with gapped userspace and button.lid_init_state=ignore with non-gapped userspace.

Thanks
Lv
Comment 29 Lv Zheng 2017-01-11 02:46:01 UTC
Code related to this machine (decoded from attachment 245081 [details]).

A. There are 2 EC devices and 2 LID devices in the DSDT:
1. _SB.LID0: _STA returns 0, we can ignore it.
    Scope (_SB)
    {
        Device (LID0)
        {
            Name (_HID, EisaId ("PNP0C0D") /* Lid Device */)  // _HID: Hardware ID
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                Return (Zero)
            }
        }
    }

2. _SB.PCI0.LPCB: _STA returns 0, we can ignore it.
    Scope (_SB.PCI0.LPCB)
    {
        Device (H_EC)
        {
            Name (_HID, EisaId ("PNP0C09") /* Embedded Controller Device */)  // _HID: Hardware ID
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                ^^^GFX0.CLID = 0x03 <- this line is suspecious
                Return (Zero)
            }
            Name (LIDS, Zero)
        }
    }

3. _SB.PCI0.LPCB.EC0: this is the right EC working in the system
    Scope (_SB.PCI0.LPCB)
    {
        Device (EC0)
        {
            Name (_HID, EisaId ("PNP0C09") /* Embedded Controller Device */)  // _HID: Hardware ID
        }
    }

4. _SB.PCI0.LPCB.EC0.LID0: this is the right LID0 working in the system
    Scope (_SB.PCI0.LPCB)
    {
        Device (EC0)
        {
            Method (_Q14, 0, NotSerialized)  // _Qxx: EC Query
            {
                LIDS = Zero
                Notify (LID0, 0x80) // Status Change
            }
            Method (_Q15, 0, NotSerialized)  // _Qxx: EC Query
            {
                LIDS = One
                Notify (LID0, 0x80) // Status Change
            }
            Device (LID0)
            {
                Name (_HID, EisaId ("PNP0C0D") /* Lid Device */)  // _HID: Hardware ID
                Method (_STA, 0, NotSerialized)  // _STA: Status
                {
                    Return (0x0F)
                }

                Method (_LID, 0, NotSerialized)  // _LID: Lid Status
                {
                    Return (LIDS) /* External reference */
                }
            }
        }
    }
We can see that LIDS is a cached value, so if _LID is invoked before _Q14/_Q15 is invoked, wrong status could be returned.

B. The _LID functionality is related to the variable - LIDS, however there are two LIDS in the decompiled files
1. _SB.PCI0.LPCB.H_EC.LIDS: no one is accessing this variable
    Scope (_SB.PCI0.LPCB)
    {
        Device (H_EC)
        {
            Name (LIDS, Zero)
        }
    }
2. \_LIDS: seems to be the right LIDS accessing by the LID device
    OperationRegion (SANV, SystemMemory, 0x7AE7D358, 0x01FA)
    Field (SANV, AnyAcc, Lock, Preserve)
    {
        ...
        LIDS,   8, 
        ...
    }

C. Confirming LID events
This LID device is driven by EC, I haven't seen a possibility that such event could be lost.
According to comment 1, it seems no event is lost according to the acpi_listen result.
But we can do a double check, please confirm this by booting the kernel with "dyndbg="file ec.c +p" boot parameter, trying suspend/resuming and uploading the dmesg here.

D. Graphics device seems to have something to do with the LID:
	Line 2016:             CLID,   32, 
	Line 2265:                         Local0 = CLID /* \_SB_.PCI0.GFX0.CLID */
	Line 2265:                         Local0 = CLID /* \_SB_.PCI0.GFX0.CLID */
	Line 2268:                             CLID &= 0x0F
	Line 2269:                             GLID (CLID)
	Line 2451:                 CLID = 0x03
	Line 2455:                 CLID = Arg0
	Line 2460:                 CLID |= 0x80000000
	Line 2642:                                 Local0 = CLID /* \_SB_.PCI0.GFX0.CLID */
	Line 2642:                                 Local0 = CLID /* \_SB_.PCI0.GFX0.CLID */
	Line 2645:                                     CLID &= 0x0F
	Line 2646:                                     GLID (CLID)
GFX0 may have something to do with the LID functionality.
Comment 30 Lv Zheng 2017-01-11 02:58:14 UTC
To: Ryan

> Is there a bug for this or am missing something else?

Yes, there are.

In s2ram mode, system is woken up via waking vector, however, Linux doesn't handle the GPEs at this time, instead it clears all GPEs and re-triggers GPEs via an SCI_EN writes done in drivers/acpi/sleep.c (some chips may be able to re-trigger interrupts by doing so, some chips may not).
While the better way might be:
1. Do not clear GPEs from a non-GPE handling driver
2. Do not write SCI_EN
3. Just invoke SCI handler once right after being resumed

In s2freeze mode, situation is same.
After being woken up by an interrupt, Linux wakes up the system but doesn't handle the interrupt.
Working this around is more complicated.
We don't know how many drivers are ready to handle a task context IRQ handler. It seems not only x86, all architectures will be affected by solving this issue.
So I guess no one is going to make such a system-wide change, changes might be made per-irq-chip driver level which is more controllable.

So there are always interrupts dropped after being resumed.
But they are not related to this issue, as the reason below.

For EC, we don't worry about event lost.
EC driver implements polling after being resumed, whatever GPE status is cleared or not, GPE is enabled or not, as long as there is an event raised in the EC device, EC driver can always notice it via reading EC status register. So there is no event lost chance for case like this LID device.
Comment 31 Lv Zheng 2017-01-11 03:01:25 UTC
To Kobrient:

> If I run `acpi_listen` with all lid events disabled (no suspend action), I
> see both the "open" and "close" events successfully recorded multiple times.
> It is only when I successfully suspend to ram then try to wake that my lid
> "open" event gets dropped.

Maybe you need to upload the acpidump of your platform here.
Comment 32 Lv Zheng 2017-01-11 03:06:38 UTC
To Kobrient:

Your case seems not like Vic's case.

Vic's machine returns a cached value via _LID, and there is always notifications generated - during runtime or after being resumed.

While your machine seems to return hardware status from _LID and there are notifications generated during runtime, but no notification is generated after being resumed.
Comment 33 Lv Zheng 2017-01-11 03:17:07 UTC
The historical lid issue is complicated, there are:

A. Variable platform problematic behaviors:

1. cached lid status
2. no lid notification for open
3. no lid notification after being resumed

B. Aariable driver behaviors:

1. button.lid_init_state=ignore
2. button.lid_init_state=open
3. button.lid_init_state=method

C. Variable userspace behaviors:

1. suspend system again if no open arrived after being resumed
2. no such open event enforcement

You can check the list above and enumerate the combinations, only B.1+C.2 can handle all A cases.

For now (C.1), we can use B.2 to handle most of the A cases.
Comment 34 Ryan Burns 2017-01-11 07:29:07 UTC
Hi Lv, 

Thanks again for the continued help! After reading your comments one more time, I continued to debug the problem a bit more. I think I have the exact same issue as Kobrient, so I tried the following based on your suggestions and I think its working!

1. disable lid events for systemd:
Add `HandleLidSwitch=ignore` to /etc/systemd/logind.conf

2. add a lid close event handler to acpid (/etc/acpi/events/lid):
event=button[ /]lid LID close
action=/etc/acpi/lid.sh

3. create /etc/acpi/lid.sh which after prepping for suspend calls:
echo -n "mem" > /sys/power/state

4. add `button.lid_init_state=open` to the kernel params and reboot.


At this point, my laptop successfully sleeps on every lid close and successfully wakes up after opening the lid. And it no longer automatically sleeps again once the lid is open. I've tested this with several close/open cycles and it seems to work consistently.

So I think I'm good to go. I really appreciate the help you gave me in getting this working! 

Thanks again,
Ryan
Comment 35 Lv Zheng 2017-01-12 02:20:12 UTC
button.lid_init_state=open

This mode has a problem.
If the system is woken up by other means with lid closed.
Linux kernel wrongly returns an "open" status.
Comment 36 Joachim Frieben 2017-04-20 20:21:58 UTC
A recent and related kernel commit implementing "ACPI / button: Change default behavior to lid_init_state=open" has some undesirable side effects as exhibited at https://bugzilla.redhat.com/show_bug.cgi?id=1430259#c3.
Since kernel 4.10.11, this change keeps the internal display active even when the lid is always closed for certain docked ThinkPad systems.
Comment 37 Lv Zheng 2017-04-24 01:11:49 UTC
The only correct behavior is button.lid_init_state=ignore.
However we can only do this when Linux userspace is ready.
So please work this issue with systemd community, not here.

For now, the side effects of using button.lid_init_state=open are trivial.
You'd better just to ignore those side effects.
Note that /proc/acpi/button/LID0/state is not meant to be correct.
You can see whether whether it is closed or opened via your eyes.
Sometimes it returns BIOS cached value, so it is not meant to be correct.
It's there for test purpose.

Thanks
Lv
Comment 38 Julian Wiedmann 2017-04-24 18:30:01 UTC
Hi Lv,

as already noted by Joachim, your change in 77e9a4aa9de1 causes a regression for existing setups. So please send a revert, and/or advise what debug info is needed for a fix that doesn't break existing user space. Thank you.
Comment 39 Lv Zheng 2017-04-25 06:23:27 UTC
>> send a revert

That's not possible IMO.
You probably do not know the entire story. :)
Reverting this surely breaks the others.
What we can only do is to see if button.lid_init_state=method is still required.
And if so, just restore that quirks to who wants it.

Thanks and best regards
Lv
Comment 40 Lv Zheng 2017-04-25 06:57:59 UTC
Using button.lid_init_state=open, then we can mute bug reports that suspend/resume cycles can be seen. Those platform contains _LID control method returning cached value.
Using button.lid_init_state=method, then we can mute bug reports like this. These platforms normally doesn't notify userspace after resuming.

So from Linux userspace's point of view, both the platforms are buggy.

However in ACPI spec, it just says:
To implement a control method lid device, AML code should issue a Notify(lid_device, 0x80) for the device whenever the lid status has changed. The _LID control method for the lid device must be implemented to report the current state of the lid as either opened or closed.

It doesn't say there must be a post-resume notification, it doesn't say _LID mustn't return cached value.

So probably the Linux userspace expectation is wrong. They probably should be compliant to button.lid_init_state=ignore, and implement the same use case using different meanings.

The most likely action we'll take is to restore button.lid_init_state=method so that who wants to use it can specify a boot parameter, but we won't change the default bahavior. It's just a choice between 2 conflict use cases. Fixing A triggers B, fixing B triggers A. The only way to fix them is "do not expect ACPI control method lid device can act as what Linux userspace expects".

Hope this can be persuasive. :)
You can also refer to Bug 195455, we'll do more experiments there.

Thanks
Comment 41 Lv Zheng 2017-04-25 07:00:15 UTC
Please also refer to Documentation/acpi/acpi-lid.txt.
Comment 42 Julian Wiedmann 2017-04-25 07:13:36 UTC
(In reply to Lv Zheng from comment #40)
> So probably the Linux userspace expectation is wrong. They probably should
> be compliant to button.lid_init_state=ignore, and implement the same use
> case using different meanings.
> 
That's not how we do things - you can't break user space that worked flawlessly for years.

> The most likely action we'll take is to restore button.lid_init_state=method
> so that who wants to use it can specify a boot parameter, but we won't
> change the default bahavior. It's just a choice between 2 conflict use
> cases. Fixing A triggers B, fixing B triggers A.
?! Yes of course we need to change the default behavior back to what it was before, unless there's a reliable way to detect & fix all platforms that require =method. You can't break old setups just to enable other platforms.
Comment 43 Lv Zheng 2017-04-25 07:19:23 UTC
> That's not how we do things - you can't break user space that worked
> flawlessly for years.

Then can you take care of doing this reversion and take care of handling different regression reports for what you've reverted?
Comment 44 Lv Zheng 2017-04-25 07:22:59 UTC
I already said the root cause fix should happen in the userspace, or in the usage models themselves.
The problem is you couldn't be persuaded by this root cause. So probably you should persuade all the other end users instead of me.
Frankly, I already persuaded myself to accept the root cause and surely after knowing the __FACT__, I'm not be able to persuade the other end users.
So you probably the one having stronger reasons than me to do so.
Comment 45 Lv Zheng 2017-04-25 08:02:52 UTC
>> user space that worked flawlessly for years.

This is not the truth.
If you are talking about the regression.
The regression is actually introduced by this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=device-properties&id=23de5d9

You can refer Bug 326814. It happens in 2007. The solution is not reliable, it's proven to be broken now.

What if we just revert back to the old control method lid device driver behavior and let the old issue to be root-cause-fixed in the userspace?
Comment 46 Lv Zheng 2017-04-25 08:15:25 UTC
Bug 326814 should be a novell.com bug:
https://bugzilla.novell.com/show_bug.cgi?id=326814
Comment 47 Joachim Frieben 2017-04-25 09:30:18 UTC
This issue goes well beyond suspend/resume cases: I am using a ThinkPad T400 which is always docked with a closed lid using an external monitor.
After starting up the system, the internal display is enabled and set up as the primary display. In the case of affected Fedora 25 and Fedora 16, the login panel is shown on the internal display leaving the user with an empty screen showing nothing but the background. Relevant bug reports are

https://bugzilla.redhat.com/show_bug.cgi?id=1430259
https://bugzilla.redhat.com/show_bug.cgi?id=1444229
https://bugzilla.redhat.com/show_bug.cgi?id=1444727.
Comment 48 Julian Wiedmann 2017-04-25 17:25:21 UTC
(In reply to Lv Zheng from comment #43)
> > That's not how we do things - you can't break user space that worked
> > flawlessly for years.
> 
> Then can you take care of doing this reversion [...]

Will do.
Comment 49 Lv Zheng 2017-04-26 01:27:16 UTC
Created attachment 256043 [details]
[PATCH] Some in theory not correct code

Probably you should apply this patch and try again with some boot parameters, for example, nomodeset and see what happens.

In theory, graphics driver is able to make the decision whether a monitor should be lit on without knowing the lid status.
So this sounds more like a bug triggered by a wrong usage model that is trying to utilize lid state to make the decision.
Maybe you guys should also file a bug at freedesktop.org to see what graphics expert will do to help us.

Thanks
Lv

Note You need to log in before you can comment on or make changes to this bug.