Bug 195651 - Boot EC - Asus BIOS contain broken DSDT EC but working ECDT EC
Summary: Boot EC - Asus BIOS contain broken DSDT EC but working ECDT EC
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-04 05:02 UTC by Lv Zheng
Modified: 2017-07-04 01:07 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.11
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump output (66.03 KB, application/x-gzip-compressed)
2017-05-04 05:02 UTC, Lv Zheng
Details
[PATCH] ACPI / EC: Enhace boot EC sanity check (1.33 KB, patch)
2017-05-05 08:11 UTC, Lv Zheng
Details | Diff
[PATCH] ACPI: EC: Add support to skip boot stage DSDT probe (2.08 KB, patch)
2017-05-05 08:11 UTC, Lv Zheng
Details | Diff
[PATCH] ACPI: EC: Add EC setting preference support (2.67 KB, patch)
2017-05-05 08:12 UTC, Lv Zheng
Details | Diff
[PATCH] ACPI / EC: Fix suspend on lid closing caused by wrong GPE in DSDT (1.19 KB, patch)
2017-06-01 06:26 UTC, Carlo Caione
Details | Diff
acpidump - ASUS GL702VMK (1.12 MB, text/plain)
2017-06-01 06:29 UTC, Carlo Caione
Details

Description Lv Zheng 2017-05-04 05:02:18 UTC
Created attachment 256185 [details]
acpidump output

The Asus laptops X550VXK/FX502VD/FX502VE have a BIOS bug where the
ECDT (correctly) states that EC events trigger GPE 0x23, but the
DSDT _GPE method (incorrectly) says GPE 0x33.

A cursory glance of the code suggests that this should work regardless
(because it looks like Linux would treat the ECDT and DSDT as two separate
ECs supported simultaneously), but in practice it is not working since the
ECDT is ultimately ignored in favour of the DSDT EC. The sequence of
events is:

1. acpi_ec_ecdt_probe() is called in early boot, and calls
   acpi_config_boot_ec(is_ecdt=true) for the ECDT EC.

acpi_config_boot_ec() sets boot_ec to this new EC, and
boot_ec_is_ecdt = true.

2. Later, acpi_ec_dsdt_probe() is called and creates a new ec to represent
   the DSDT EC. Then:
    /*
     * When the DSDT EC is available, always re-configure boot EC to
     * have _REG evaluated. _REG can only be evaluated after the
     * namespace initialization.
     * At this point, the GPE is not fully initialized, so do not to
     * handle the events.
     */
    ret = acpi_config_boot_ec(ec, ec->handle, false, false);

So the DSDT EC is passed to acpi_config_boot_ec() which does:
    /* Unset old boot EC */
    if (boot_ec != ec)
        acpi_ec_free(boot_ec);

acpi_ec_free() sets boot_ec to NULL.
Further down in acpi_config_boot_ec() we reach:

    /* Set new boot EC */
    if (!boot_ec) {
        boot_ec = ec;
        boot_ec_is_ecdt = is_ecdt;
    }

So now boot_ec points to the DSDT EC and boot_ec_is_ecdt is false.

3. Later, ecpi_ec_ecdt_start() is called and this looks like it would
   enable GPEs on our ECDT, but it bails out because of:

    if (!boot_ec_is_ecdt)
        return -ENODEV;


The comment I pasted above from acpi_ec_dsdt_probe() says that it is going
to reconfigure the boot EC, but it instead calls acpi_config_boot_ec() on
the newly probed DSDT EC. It seems like the code is not following the
comment here.

Adjusting that code to work with boot_ec adjusts the sequence of events so
that both boot EC and DSDT are treated independently and as a result, we
get EC interrupts firing correctly.

This fixes media keys on the mentioned laptop models.
Thanks to Chris Chiu for diagnosis.
Comment 1 Daniel Drake 2017-05-04 17:00:31 UTC
As explained in the thread: [PATCH] ACPI / EC: handle ECDT EC and DSDT EC simultaneously

My take on the bug (pasted above) was not really correct. The code has to deal with first_ec / boot_ec / etc for some pretty tricky reasons but ultimately it's just going to work with a single EC, so my suggestion that it should support them independently is not in agreement with the driver design nor ACPI spec.

The DSDT shows two ECs:

        Device (H_EC)
        {
            Name (_HID, EisaId ("PNP0C09") /* Embedded Controller Device */)  // _HID: Hardware ID
            Name (_UID, One)  // _UID: Unique ID
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                ^^^GFX0.CLID = 0x03
                Return (Zero)
            }

        Device (EC0)
        {
            Name (_HID, EisaId ("PNP0C09") /* Embedded Controller Device */)  // _HID: Hardware ID
            Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
            {
                IO (Decode16,
                    0x0062,             // Range Minimum
                    0x0062,             // Range Maximum
                    0x00,               // Alignment
                    0x01,               // Length
                    )
                IO (Decode16,
                    0x0066,             // Range Minimum
                    0x0066,             // Range Maximum
                    0x00,               // Alignment
                    0x01,               // Length
                    )
            })
            Method (_GPE, 0, NotSerialized)  // _GPE: General Purpose Events
            {
                Local0 = 0x33
                Return (Local0)
            }

It is quite easy to see that H_EC isn't usable - it has no _CRS/_GPE and _STA returns 0 (not present). The other one (EC0) looks valid.

Lv, you had this wrong in an email yesterday. You said that Linux is using H_EC and asked me to find a way to skip it so that it uses EC0. However, for whatever reason, Linux is currently already skipping H_EC, so EC0 is the problematic DSDT  that is being used here.

But for whatever reason, Linux doesn't even seem to try that one, and instead it is already only working with EC0. This one has correct _CRS but the _GPE value is wrong (it says 0x33, but the ECDT correctly has 0x22) and hence the laptop's media keys don't work.

It does seem like this bug shows that Windows is preferring the ECDT's GPE number over the DSDT. Lv, do you have any tools that let you modify a DSDT/ECDT on Windows? If so you could test this theory by recreating the situation (ECDT correct GPE, DSDT _GPE wrong).

Either way, changing Linux behaviour could be a bit risky here, and even though it is not ideal, we may have to DMI quirk these systems based on vendor/product info. Based on our previous experience I expect this list may end up getting quite long. We currently have identified 4 Asus models with this problem.
Comment 2 Lv Zheng 2017-05-05 03:27:06 UTC
Yes, I replied in the community.

IMO, possibly the best way is:
Stop probing dsdt ec as boot ec, still need a quirk to favor ecdt ec's settings in pnp0c09 probe.
However I have concerns for not making regressions against bug 119261.

Another possible good solution is:
Stop probing dsdt ec as boot ec, but preparing a default handler for EC opregion. Upload receiving requests, probing pnp0c09 driver.
I also have concerns for the possibility of triggering regressions due to unknown _INI/_STA orders.

For now, we can:
1. Strenthen the check for boot dsdt ec, especially, skip those with wrong IO addresses.
2. Still we may not invoke _STA for boot dsdt ec, but we can add an option to stop probing dsdt ec as boot ec.
3. In pnp0c09 probe, comparing io addresses with boot ec to figure out whether they are the same EC, and leave possibility to favor ecdt ec gpe settings there.

I'll provide you some patches to test today.
Comment 3 Lv Zheng 2017-05-05 08:11:03 UTC
Created attachment 256213 [details]
[PATCH] ACPI / EC: Enhace boot EC sanity check
Comment 4 Lv Zheng 2017-05-05 08:11:32 UTC
Created attachment 256215 [details]
[PATCH] ACPI: EC: Add support to skip boot stage DSDT probe
Comment 5 Lv Zheng 2017-05-05 08:12:06 UTC
Created attachment 256217 [details]
[PATCH] ACPI: EC: Add EC setting preference support
Comment 6 Lv Zheng 2017-05-05 08:17:22 UTC
Hi, Daniel

You can try to see if the patches can fix the issue.
I made the default setting suitable for your case.

Thanks
Lv
Comment 7 Lv Zheng 2017-05-05 08:18:17 UTC
Also please help to upload dmidecode output here.
Comment 8 Lv Zheng 2017-05-05 08:20:53 UTC
Let me copy replies from mailing list.

First reply from Lv:
Linux never thought there should be 2 ACPI ECs provided by BIOS :). How
would it be useful to put 2 ECs in your platforms and export both of them
as ACPI EmbeddedControl operation region? EC operation region itself by
design is not able to allow OS to detect which EC opregion belongs to which
EC device as we did see some platforms defined EC opregion under root but
EC device under some bus nodes.

ACPI spec thinks ECDT EC is meant to be provided to work since early stages
where the ACPI namespace hasn't been prepared.

See spec 6.5.4 _REG (Region).
You can find exceptions of _REG execution mentioned for
SystemMemory/SystemIo/PCI_Config and ECDT EC. So they are served for the
same purpose:
 Before executing any control method (aka., preparing the ACPI namespace,
 some hardware components that have already been driven by the BIOS should
 be allowed to be accessed during the namespace preparation.

And hence all DSDT PNP0C09 ECs (actually should only be 1) should always
work, in most of the cases, ECDT EC may just be used as a quirk to make
some EC opregion accesses working during that special stage - the
namespace preparation. For such kind of use case (early stage, loading
ACPI namespace), obviously no EC event should be handled, so making ECDT
GPE setting correct sounds meaningless to BIOS if the same EC settings can
also be provided via DSDT.

Thus Linux is always trying to override ECDT EC with DSDT settings and
convert the final boot EC to the PNP0C09 driver driven ones. Some code is
there not deleted just due to some historical reasons.

As such, GPE setting in DSDT should always be correct.

There are 2 possibilities for some gray areas:
1. If a BIOS has a wrong EC GPE setting in DSDT, but the OS correctly
   implemented IRQ polling to handle it, you still couldn't see any
   problems (you cannot know whether IRQ is handled via interrupt or via
   polling from user space).
2. If ECDT EC contains a namespace path differing from the DSDT one, the
   OS may choose to keep both. Or if ECDT EC contains IO addresses
   differing from the DSDT one, the OS may choose to keep both.
I don't have knowledge to know how Windows implement things in the above
gray areas. They just sound like some happen-to-work consequences of the
unspecified Windows implementation.

For gray area 1, your report just means to me that some platforms do have
correct ECDT GPE setting but incorrect DSDT GPE setting; and since Linux
doesn't implement EC event polling for some stages, for now Linux may need
a quirk to favor "correct GPE setting". 

However for gray area 2, I don't think it means Linux should keep both ECs.
I cannot see reasons in this case to support to do so unless seeing your
acpidump. Are you able to see 2 ECs in your Windows device tree on these
platforms? So could you paste your acpidump here so that we can check if
they are different ACPI ECs and make a decision closer to the Windows
behavior and tell me your Windows device tree result (you can obtain this
using a tool provided in <<Microsoft Windows Internals>>).

Maybe for your case, DSDT EC is just invalid and we should ignore it, then
if we can find a correct way to ignore the DSDT EC, we needn't change a
single line EC driver code.
Comment 9 Lv Zheng 2017-05-05 08:21:06 UTC
I checked the provided acpi tables.
Indeed, the ECDT EC is correct.
[000h 0000   4]                    Signature : "ECDT"    [Embedded Controller Boot Resources Table]
[004h 0004   4]                 Table Length : 000000C1
[008h 0008   1]                     Revision : 01
[009h 0009   1]                     Checksum : EC
[00Ah 0010   6]                       Oem ID : "_ASUS_"
[010h 0016   8]                 Oem Table ID : "Notebook"
[018h 0024   4]                 Oem Revision : 01072009
[01Ch 0028   4]              Asl Compiler ID : "AMI."
[020h 0032   4]        Asl Compiler Revision : 00000005


[024h 0036  12]      Command/Status Register : [Generic Address Structure]
[024h 0036   1]                     Space ID : 01 [SystemIO]
[025h 0037   1]                    Bit Width : 08
[026h 0038   1]                   Bit Offset : 00
[027h 0039   1]         Encoded Access Width : 00 [Undefined/Legacy]
[028h 0040   8]                      Address : 0000000000000066

[030h 0048  12]                Data Register : [Generic Address Structure]
[030h 0048   1]                     Space ID : 01 [SystemIO]
[031h 0049   1]                    Bit Width : 08
[032h 0050   1]                   Bit Offset : 00
[033h 0051   1]         Encoded Access Width : 00 [Undefined/Legacy]
[034h 0052   8]                      Address : 0000000000000062

[03Ch 0060   4]                          UID : 00000000
[040h 0064   1]                   GPE Number : 23
[041h 0065  19]                     Namepath : "\_SB.PCI0.LPCB.EC0"

While the DSDT EC is in fact invalid as it returns 0 from its _STA:
    Scope (_SB.PCI0.LPCB)
    {
        Device (H_EC)
        {
            Name (_HID, EisaId ("PNP0C09") /* Embedded Controller Device */)  // _HID: Hardware ID
            Name (_UID, One)  // _UID: Unique ID
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                ^^^GFX0.CLID = 0x03
                Return (Zero)
            }
            ...
        }
    }

It doesn't contain _CRS, so it's likely that it will fail in
acpi_ec_dsdt_probe() due to failure of walk _CRS.

Are you sure the same problem can be seen no this platform?

If so, possibly you only need to add some sanity checks in
acpi_ec_dsdt_probe(). After acpi_get_devices() call, check if
its IO addresses are invalid.
Or check its _STA (present && return valid value) after that.
It might not be possible to do such _STA execution.

So IMO, the acpi_ec_dsdt_probe() is not a good choice.
The history is:
1. In ACPI spec 1.0, there is no ECDT, BIOS developers have
   to provide wrappers in EC read/write functions in order to
   be able to access EC firmware before OS loads the EC
   driver. Before EC._REG is invoked, access systemio
   opregion instead of directly accessing EC opregions.
   In ACPI spec 2.0, ECDT is provided to eliminate the
   Complication of this needs.
2. During early Linux EC support, ECDT code is provided
   for the spec purpose. But several un-root-caused kernel
   bugs made the developers to make the wrong choice, starting
   to always override ECDT EC using DSDT EC.
   The culprit commits are c6cb0e87 and a5032bfd.
   The culprit in fact finally also played as one of those
   reasons preventing the AML interpreter from being
   correctly implemented as Windows compliant.
3. However, after switching to the old behavior, and the
   DSDT boot EC mechanism is entirely abandoned. But users
   Reported that they can see EC opregion accessed in some
   _STA/_INI, as Linux executes these control methods very
   Early (earlier before probe EC driver) and there is no
   ECDT on those platforms, we can see errors complaining
   no opregion handler. The bug is kernel Bugzilla 119261.
   TBH, these errors are not such critical IMO, the BIOS
   obviously violates facts mentioned in history 1 and we
   obviously do not have knowledge of the execution order
   of these control methods on Windows and we obviously
   do not know if such error can also be seen by Winodws
   and do not know whether such error can be functional
   failure and reached BIOS developers' attention.
   But we finally have no choice but restoring the boot
   DSDT EC hack back as we do not know how Linux can
   execute all control methods in the exact order as
   Windows...

Then now, it actually makes the problem more complicated.
as it might not be possible to execute _CRS/_GPE/_STA
at that time to sanitize the DSDT EC.
Probably we could just abandon it again?

Anyway we can firstly just add 1 line IO address and _GPE
existence sanity check in acpi_ec_dsdt_probe() to fix
your issue.
Comment 10 Lv Zheng 2017-05-05 08:21:59 UTC
Yes, there are 2 ECs.
        Device (H_EC)
        {
            Name (_HID, EisaId ("PNP0C09") /* Embedded Controller Device */)  // _HID: Hardware ID
            Name (_UID, One)  // _UID: Unique ID
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                ^^^GFX0.CLID = 0x03
                Return (Zero)
            }

        Device (EC0)
        {
            Name (_HID, EisaId ("PNP0C09") /* Embedded Controller Device */)  // _HID: Hardware ID
            Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
            {
                IO (Decode16,
                    0x0062,             // Range Minimum
                    0x0062,             // Range Maximum
                    0x00,               // Alignment
                    0x01,               // Length
                    )
                IO (Decode16,
                    0x0066,             // Range Minimum
                    0x0066,             // Range Maximum
                    0x00,               // Alignment
                    0x01,               // Length
                    )
            })
            Method (_GPE, 0, NotSerialized)  // _GPE: General Purpose Events
            {
                Local0 = 0x33
                Return (Local0)
            }

And linux EC driver only tries the 1st one.
You probably should add sanity check to skip first one, and return the 2nd one.
Comment 11 Lv Zheng 2017-05-05 08:27:11 UTC
If the default setting is not working, please try to boot kernel with:
"ec_no_dsdt_boot_ec ec_use_boot_ec_gpe"

If none of the combinations works for you, please upload the 2 dmesg boot logs here:
1. boot using the default setting (equivelent to "ec_no_dsdt_boot_ec=0 ec_use_boot_ec_gpe=1"
2. boot using "ec_no_dsdt_boot_ec=1 ec_use_boot_ec_gpe=1"
Comment 12 Daniel Drake 2017-05-10 14:00:55 UTC
Thanks. After applying those patches the problem is solved.
Comment 13 Lv Zheng 2017-05-11 00:46:10 UTC
OK.
So could you provide dmidecode output for generating a quirk to favor ECDT GPE settings.
Comment 14 Lv Zheng 2017-05-16 09:22:49 UTC
Ping

When do the upstream work, ec_use_boot_ec_gpe need to remain false in order not to reflect the old behavior.
And we need a DMI quirk to change the its value to true for these platforms.

Thanks
Comment 15 Lv Zheng 2017-05-16 09:44:04 UTC
Linking to our thread:
http://www.spinics.net/lists/linux-acpi/msg73763.html
Comment 16 Daniel Drake 2017-05-16 14:25:42 UTC
The models detected so far all have DMI_SYS_VENDOR == ASUSTeK COMPUTER INC

and DMI_PRODUCT_NAME values are:
X550VXK
FX502VD
FX502VE
X580VD
Comment 17 Daniel Drake 2017-05-16 20:25:35 UTC
sorry, the vendor name ends with a .

ASUSTeK COMPUTER INC.

reference our current workaround for the exact strings: https://github.com/endlessm/linux/commit/5f894d1a51b93d7d6d0c5044c56cac8f93e6a022
Comment 18 Lv Zheng 2017-05-19 00:57:32 UTC
Thanks.
I'll cherry-pick it and help to send it to the community.

Best regards
Lv
Comment 19 Lv Zheng 2017-05-19 07:16:48 UTC
I just sent out with the commit you mentioned included.
https://patchwork.kernel.org/patch/9736009/
https://patchwork.kernel.org/patch/9736011/
https://patchwork.kernel.org/patch/9736019/
And I dropped those not strictly related to this bug (EC_ID preference).
So let me mark this bug as resolved, and please help to review.

Thanks and best regards
Lv
Comment 20 Carlo Caione 2017-05-31 16:03:37 UTC
Lv, today we found another ASUS laptop affected by this issue (GL702VMK). Any idea who should be the right person to review the patchset? I can do it myself but probably some ACPI guys would be better.
Comment 21 Lv Zheng 2017-06-01 01:18:05 UTC
> we found another ASUS laptop affected by this issue (GL702VMK). 

Could you help to send tested update of "https://patchwork.kernel.org/patch/9736019/"?

> who should be the right person to review the patchset?

This is Windows compliance issue.
What's in our mind derived from spec may not be correctly reflecting the real world.
So IMO, reviewers/testers' comments are all important.

Any tested-by/reviewed-by is welcome.

Thanks and best regards
Lv
Comment 22 Lv Zheng 2017-06-01 01:20:25 UTC
Hi, Carlo

Can you upload the acpidump of GL702VMK here for reference?

Thanks
Lv
Comment 23 Carlo Caione 2017-06-01 06:26:30 UTC
Created attachment 256821 [details]
[PATCH] ACPI / EC: Fix suspend on lid closing caused by wrong GPE in DSDT
Comment 24 Carlo Caione 2017-06-01 06:29:00 UTC
Created attachment 256823 [details]
acpidump - ASUS GL702VMK
Comment 25 Carlo Caione 2017-06-01 06:35:03 UTC
> Could you help to send tested update of
> https://patchwork.kernel.org/patch/9736019/?

Attachment 256821 [details] is the internal patch we are using on top of your changes

> Any tested-by/reviewed-by is welcome.

Ok, I'll take a deeper look asap.

> Can you upload the acpidump of GL702VMK here for reference?

Attachment 256823 [details]

Thanks,
Comment 26 Lv Zheng 2017-07-04 01:07:26 UTC
Patches merged in linux-pm.git/linux-next.
You'll see it in upstream after next release cycle.
Closing...

Note You need to log in before you can comment on or make changes to this bug.