Bug 191211

Summary: Razer Blade Stealth laptop fails to send lid open on suspend resume
Product: ACPI Reporter: Jason Dewayne Clinton (me)
Component: ECAssignee: Lv Zheng (lv.zheng)
Status: CLOSED CODE_FIX    
Severity: normal CC: rui.zhang
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: [BROKEN] Attempt to clear EC after resume on Razer laptops

Description Jason Dewayne Clinton 2016-12-26 18:20:59 UTC
The Razer Blade Stealth (2016 Kaby Lake) laptop fails to send lid open on suspend resume. A work-around is to close-and-reopen the lid a second time immediately after opening the lid before userspace re-initiates suspend. This would seem to indicate that the lid open event isn't being sent by the platform AML (based on my reading of https://www.kernel.org/doc/Documentation/acpi/acpi-lid.txt). evtest never shows the open event arriving (unless doing the workaround above):

$ sudo evtest /dev/input/event0
Input driver version is 1.0.1
Input device ID: bus 0x19 vendor 0x0 product 0x5 version 0x0
Input device name: "Lid Switch"
Supported events:
  Event type 0 (EV_SYN)
  Event type 5 (EV_SW)
    Event code 0 (SW_LID) state 1

$ sudo cat /proc/acpi/button/lid/LID0/state 
state:      closed

This bug will track implementing an in-kernel work-around.
Comment 1 Jason Dewayne Clinton 2016-12-27 16:25:35 UTC
Also affects earlier Razer Blade (14" units) and Razer Blade Stealth (Skylake): https://www.reddit.com/r/razer/comments/447vrn/razer_blade_stealth_linux/czp1qw3/ . I'm looking at a DMI quirk in ACPI ec.c to see if that will fix it.
Comment 2 Jason Dewayne Clinton 2017-01-02 18:58:31 UTC
Created attachment 249701 [details]
[BROKEN] Attempt to clear EC after resume on Razer laptops

Much to my disappointment, this patch doesn't work. Stranger:

  pr_debug("Detected system needing EC poll on resume.\n");

never shows up after resume so I'm not sure what I'm doing wrong (or if this code path ever worked?).

I've reached the limit of my knowledge of ACPI without having intimate platform knowledge. Hopefully someone can pick this up who knows what might be wrong here.
Comment 3 Lv Zheng 2017-01-03 02:57:29 UTC
I don't think clear_on_resume quirk is useful now, it's quite old, invented for old code base.

Looking at the code:
	if (!test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
		advance_transaction(ec);
The EC driver polls events right after resuming just because of the above code in __acpi_ec_enable_event(), regardless of the quirk:
acpi_ec_resume
  acpi_ec_enable_event
    __acpi_ec_enable_event

You should try to boot the kernel with button.lid_init_state=open to see if this is the known bug related to an old kernel issue and an old userspace workaround.
Please find detailed description in Documentation/acpi/acpi-lid.txt.
Comment 4 Lv Zheng 2017-01-03 03:06:57 UTC
(In reply to Jason Dewayne Clinton from comment #2)
> I've reached the limit of my knowledge of ACPI without having intimate
> platform knowledge. Hopefully someone can pick this up who knows what might
> be wrong here.

There won't be any new clear_on_resume quirks upstreamed to the Linux kernel.
There should be a cleanup deleting all clear_on_resume quirks from the Linux kernel upstream.

Thanks
Lv
Comment 5 Lv Zheng 2017-01-03 07:37:25 UTC
> A work-around is to close-and-reopen the lid a second time immediately after
> opening the lid before userspace re-initiates suspend.

This is an invalid workaround, your brain may be deceived.
It doesn't mean something (post-resume lid open event) is missing.
As long as the system is not suspended, it's likely that the lid event can be triggered and delivered to the OSPM.

But the post-resume lid open event is a different case.
The event is already handled by BIOS to wake the system up.
And it's up to BIOS to deliver such event to the OSPM because Windows cares no about the existence of this event.
You can call it a BIOS bug, but I don't think so.
It's simply a Windows designed behavior.

For such kind of issue, there is simply no mystery in LID.
For resume from S3, considering the case that the lid event is triggered by EC.
From comment 3, we know that EC driver already ensures that no events will be missing after resuming.

So if you cannot see such a lid open event after resuming, it simply means there is no such event.

> pr_debug("Detected system needing EC poll on resume.\n");

This line can only be seen during boot, you won't see the message again after resuming.
And as long as it is shown during boot, it affects EC_FLAGS_CLEAR_ON_RESUME behavior after resuming:
static void acpi_ec_enable_event(struct acpi_ec *ec)
{
	...
	if (EC_FLAGS_CLEAR_ON_RESUME)
		acpi_ec_clear(ec);
}
static int acpi_ec_resume(struct device *dev)
{
	struct acpi_ec *ec =
		acpi_driver_data(to_acpi_device(dev));

	acpi_ec_enable_event(ec);
	return 0;
}

Thanks and best regards
Lv
Comment 6 Lv Zheng 2017-01-03 08:30:49 UTC
> A work-around is to close-and-reopen the lid a second time
> immediately after opening the lid before userspace re-initiates suspend.

Another case is:
After closing lid, lid notification is triggered
1. some BIOSen choose to store hardware lid status in the event handler, and return the cached value via _LID;
2. some BIOSen choose to do nothing in the event handler, and return the hardware lid status via _LID.
For the latter, when OSPM evaluates _LID, since you opened it quickly, the result could be open if the _LID is implemented to read the real hardware status.

So userspace can see an open event.

This is also why we need this logic in drivers/acpi/button.c:
	if (button->last_state == !!state &&
	    ktime_after(ktime_get(), next_report)) {

It's not a proof that the BIOS has implemented a lid open event triggering source.

Thanks
Lv
Comment 7 Lv Zheng 2017-01-04 02:34:55 UTC
(In reply to Lv Zheng from comment #3)
> I don't think clear_on_resume quirk is useful now, it's quite old, invented
> for old code base.
> 
> Looking at the code:
>       if (!test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
>               advance_transaction(ec);
> The EC driver polls events right after resuming just because of the above
> code in __acpi_ec_enable_event(), regardless of the quirk:
> acpi_ec_resume
>   acpi_ec_enable_event
>     __acpi_ec_enable_event

On the other hand, the quirk is harmful.
There are platforms, if OSPMs send EC query command to them without seeing EV_SCI bit set in the EC status register, the EC firmware crashes.
So I believe the above code is much more useful than the quirk as it is safer, we needn't worry about that it could break specific EC firmware.

Thanks
Lv
Comment 8 Lv Zheng 2017-01-06 01:46:31 UTC
Let me close the bug as I couldn't see anything doable in it.
If the workaround is not working, feel free to re-open it and let us know.

Thanks
Lv
Comment 9 Jason Dewayne Clinton 2017-01-21 03:36:32 UTC
button.lid_init_state=open does work around the problem. Would you like me to provide a patch that bakes this quirk into the kernel so that this work out-of-box for other people?
Comment 10 Lv Zheng 2017-01-22 07:36:26 UTC
Ah, you mean this bug?

Could you upload some logs to persuade me that the quirk is useful.
IMO, it is not useful.
Let me write down some debugging steps.

Thanks
Lv
Comment 11 Lv Zheng 2017-01-22 07:41:18 UTC
Do you mean attachment 249701 [details]?
It's not working according to your comment.
So it seems we have communication problem there.

If attachment 249701 [details] is not working:
Could you post the working patch here?

If attachment 249701 [details] working:
Please add dyndbg="file ec.c +p" to your kernel command line.
And help to do the following 2 tests:
Test 1:
1. do not apply attachment 249701 [details]
2. perform the suspend/resume
3. upload the dmesg here
Test 2:
1. apply attachment 249701 [details]
2. perform the suspend/resume
3. upload the dmesg here

Thanks and best regards
Lv
Comment 12 Lv Zheng 2017-01-22 07:43:04 UTC
Also please upload the acpidump output here.
Comment 13 Lv Zheng 2017-01-22 08:30:43 UTC
And if possible, please do the test using latest upstream kernel.

Thanks
Lv
Comment 14 Jason Dewayne Clinton 2017-01-27 03:57:39 UTC
Sorry, I wasn't clear in comment #9.

We know that:
1. all Razer Blade laptops are affected by this problem and that
2. using button.lid_init_state=open works around the problem

can I give you a patch that checks DMI info for this hardware name and automatically activates button.lid_init_state=open when running on a Razer Blade laptops?

If not, that's fine. I just thought that I would offer to write that patch. Feel free to close this bug.
Comment 15 Lv Zheng 2017-02-03 02:28:44 UTC
OK.
I'm closing this bug.

The philosophy is:
This bug looks like a gap between the kernel and the userspace.
And BIOS looks correct.
So if we start to introduce DMI tables into the kernel, the table surely will grow infinitely...

I've submitted a patch to make "button.lid_init_state=open" to make it the default behavior, which should be able to reduce similar reports:
https://patchwork.kernel.org/patch/9512307/

Thanks
Lv