Bug 54841 - LID0 switch behaves badly after resume from suspend
Summary: LID0 switch behaves badly after resume from suspend
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-05 11:21 UTC by Tobias Jakobi
Modified: 2013-05-09 16:15 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.7.9
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
acpidump output (36.50 KB, application/octet-stream)
2013-03-06 10:06 UTC, Tobias Jakobi
Details
acpidump in plain text (175.60 KB, application/octet-stream)
2013-04-10 03:07 UTC, Aaron Lu
Details
Customized DSDT, fixing Acquire issues (34.12 KB, application/octet-stream)
2013-04-15 08:44 UTC, Lv Zheng
Details
Modified customized DSDT file (315.06 KB, text/plain)
2013-04-15 08:45 UTC, Lv Zheng
Details
[DBG PATCH] ACPI: Add acpi_mutex_no_timeout kernel parameter (2.31 KB, patch)
2013-04-22 04:25 UTC, Lv Zheng
Details | Diff

Description Tobias Jakobi 2013-03-05 11:21:06 UTC
Hello,

I noticed a regression when upgrading from 3.6.x to 3.7.x. The problem is related to wakeup events.

I have a line in my boot scripts which disables wakeup from LID0 events, since I don't want my system coming out of suspend when I open/close the lid. This worked fine with 3.6.x but it doesn't seem work anymore with 3.7.x.

It works once, so if I reboot the system then I can go into suspend and lid doesn't do anything. But after some more suspends (not always the second one), the wakeup state seems to reset. However writing into /proc/acpi/wakeup doesn't do anything.

Inspecting dmesg gives me this:
ACPI Error: Thread 923719504 cannot release Mutex [MUT0] acquired by thread 283920320 (20120711/exmutex-399)
ACPI Error: Method parse/execution failed [\_SB_.LID0._LID] (Node ffff8801370a8f38), AE_AML_NOT_OWNER (20120711/psparse-536)
dpm_run_callback(): acpi_button_resume+0x0/0x18 returns -19
leena kernel: PM: Device PNP0C0D:00 failed to resume: error -19

Like I said, this worked with 3.6.x.

Greets,
Tobias
Comment 1 Aaron Lu 2013-03-06 03:31:18 UTC
Hi Tobias,

Thanks for the report!

Please attach the output of acpidump:
# acpidump > acpidump.out

And if possible, please do a git bisect from tag v3.6 to v3.7 to find the offending commit, thanks.
Comment 2 Tobias Jakobi 2013-03-06 10:06:27 UTC
Created attachment 94661 [details]
acpidump output
Comment 3 Tobias Jakobi 2013-03-06 10:19:14 UTC
Dump added. Concerning bisect, I just don't have the time to do this since it involves repeatedly going to into suspend to trigger the issue (like I said, this sometimes doesn't happen after the first return-from-suspend).

I would only consider this if we could reduce this to a very specific component (e.g. there is only one change from 3.6.x to 3.7. in drivers/acpi/button.c).

I hope that the dump can already tell you something!
Comment 4 Lv Zheng 2013-03-12 07:41:36 UTC
There is a fix in ACPICA against OSL mutex.
Please wait until 3.8-rc3 to see if this is solved.
Comment 5 Robert Moore 2013-03-15 16:52:20 UTC
Please post the acpidump as a plain text file.
Comment 6 Tobias Jakobi 2013-03-15 17:53:50 UTC
The output is plain. You just have to unxz it.
Comment 7 Aaron Lu 2013-04-10 03:07:53 UTC
Created attachment 97931 [details]
acpidump in plain text

Hi Bob,
I've unxzed it for you to take a look, thanks.

Hi Tobias,
Next time, please do not compress it, thanks.
Comment 8 Aaron Lu 2013-04-10 03:08:54 UTC
BTW, Tobias, did you try v3.8 as Lv suggested?
Comment 9 Robert Moore 2013-04-10 15:32:38 UTC
Here's the mutex code in question:

Method (_LID, 0, NotSerialized)  // _LID: Lid Status
{
    If (^^PCI0.LPCB.EC0.ECOK)
    {
        Acquire (^^PCI0.LPCB.EC0.MUT0, 0x0200)
        Store (^^PCI0.LPCB.EC0.LIDS, Local1)
        Release (^^PCI0.LPCB.EC0.MUT0)


Notice that there is a timeout specified for the Acquire, but there is no code to check if the mutex has actually been acquired.

I suspect that the Acquire is timing out, and then the Release fails because the thread has in fact not actually acquired the mutex.

Note that the iASL compiler catches this problem:

dsdt.dsl   8900:                     Acquire (^^PCI0.LPCB.EC0.MUT0, 0x0200)
Warning  1129 -                                                         ^ Result is not used, possible operator timeout will be missed

This appears to me to be a bug in the BIOS ASL code. Note, there are about 14 other instances of this warning in the DSDT, apparently all for the same mutex object.

Perhaps an option to disallow timeouts on AML mutex objects would help in cases like this. A simple patch to the OSL to make all timeouts WAIT_FOREVER could test this entire theory.
Comment 10 Lv Zheng 2013-04-15 08:44:44 UTC
Created attachment 98761 [details]
Customized DSDT, fixing Acquire issues

New kernels have shipped an ACPICA that will check the return value of Acquire operation.  Then the wrong ASL codes may behave strange on the new platform.
Could you please give me a try to use the customized DSDT to resolve this BIOS bug?
Please find the hints in the Documentation/acpi/initrd_table_override.txt.
Comment 11 Lv Zheng 2013-04-15 08:45:57 UTC
Created attachment 98771 [details]
Modified customized DSDT file

The modified dsdt.dsl source codes with the Acquire methods usage fixed.
Comment 12 Lv Zheng 2013-04-16 03:35:52 UTC
As far as I can see, there are following 2 bugs in the Linux:

1. Orphan _REG method execution
   According to the spec, the OS will run a _REG in a given scope when the operation regions declared in that scope are available for use.
   By my reading, this means orphan _REG methods should be executed as long as all the operation regions declared below it's parent scope are all available for use -> this rule is not only for the EC device, but should be applied to all of the namespace scopes.
   This should be fixed in the ACPICA, ACPICA probably should check operation regions' availability in the acpi_install_address_space_handler and call all _REG methods under the given acpi_namespace_node.

2. the Embedded Controller space handler registration
   According to the spec, OSPM will make Embedded Controller operation regions, accessed via Embedded Controller described in ECDT, available before executing any control method.
   By my reading, this means only the Embedded Controller device pointed by the ECDT is responsible for registering EmbeddedControl space handler.
   This should be fixed in the Linux, Linux EC drivers need to check ECDT to see if it is responsible for handling EmbeddedControl space and the parameter passed to the acpi_install_address_space_handler should be ACPI_ROOT_OBJECT.

As a conclusion, I think non of the above fixes should be merged at this point as they all appear to be wrong approaches.
So please wait until Linux ACPI engineers can provide an acceptable solution to handle such bugs.
Comment 13 Lv Zheng 2013-04-16 03:37:28 UTC
Sorry for the wrong post.
My web page jumped to a wrong place when re-loggon.
Comment 14 Lv Zheng 2013-04-22 04:25:36 UTC
Created attachment 99561 [details]
[DBG PATCH] ACPI: Add acpi_mutex_no_timeout kernel parameter

According to Bob, it can be achieved without using the customized DSDTs.
Please apply the attached patch and boot the kernel with acpi_mutex_no_timeout to confirm such BIOS bugs.
Thanks in advance.
Comment 15 Lv Zheng 2013-04-26 06:37:15 UTC
Reassigned this bug to the BIOS category after waited for 1 week.

Thanks for reporting.
Comment 16 Lv Zheng 2013-05-07 00:25:10 UTC
This is confirmed as a BIOS bug.
No indicator can tell us the available workarounds are effective.
If you still want to work on this issue, you can reopen it.
Thanks for reporting.
Comment 17 Tobias Jakobi 2013-05-09 16:15:42 UTC
Sorry for the late reply. I currently don't have access to the system in question, so I'm leaving this closed.

Note You need to log in before you can comment on or make changes to this bug.