Bug 60542 - DSDT: Lenovo Ideapad Z580 Extremely Slow Boot, AML infinite loop
Summary: DSDT: Lenovo Ideapad Z580 Extremely Slow Boot, AML infinite loop
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-10 02:30 UTC by sduddikunta
Modified: 2019-07-11 09:46 UTC (History)
7 users (show)

See Also:
Kernel Version: no known working kernel
Subsystem:
Regression: No
Bisected commit-id:


Attachments
DSDT (43.31 KB, application/octet-stream)
2013-07-10 02:30 UTC, sduddikunta
Details
DSDT.aml (43.32 KB, application/octet-stream)
2013-07-10 02:30 UTC, sduddikunta
Details
DSDT.dsl (365.13 KB, text/x-dsl)
2013-07-10 02:31 UTC, sduddikunta
Details
DSDT.dsl.orig (365.11 KB, text/x-dsl)
2013-07-10 02:32 UTC, sduddikunta
Details
DSDT.dsl.diff (266 bytes, patch)
2013-07-10 02:32 UTC, sduddikunta
Details | Diff
acpidump.txt (240.70 KB, text/plain)
2013-08-12 20:34 UTC, sduddikunta
Details
118 (31.57 KB, application/pdf)
2019-07-11 09:46 UTC, tomas platz
Details

Description sduddikunta 2013-07-10 02:30:12 UTC
Created attachment 106855 [details]
DSDT

For the past few releases of almost all popular Linux-based distributions, users of Lenovo's Z580 laptop have been noticing that past a particular kernel version (usually a certain build of 3.2), the boot would seem to hang. In reality, the boot would eventually complete, though after 15-20 minutes (or more in extreme cases). Looking at dmesg output of boots that hung but eventually finished, three ACPI timeouts are reported (120s infinite loop timeout in BIOS).

Testing of this bug on my end took place on Ubuntu 12.04 LTS and Fedora 19, running mainline builds (unmodified mainline tree sources with distribution kernel configurations). However, users of Fedora, Arch, Mint, and Gentoo report similar issues both with distribution and mainline kernels.

Ubuntu Bug Report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1093217

From the dmesg output, it was found that booting without the battery physically installed was always successful. Later inserting the battery after boot had no adverse consequences. Furthermore, compiling a kernel without battery support (either not at all or as a module later blacklisted on the command line) would also produce successful boots.

dmesg from above showing impacted area:

[ 840.304049] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 840.304052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 840.304054] swapper/0 D ffffffff81806240 0 1 0 0x00000000
[ 840.304058] ffff880118301e80 0000000000000046 0000000000000000 ffff880114537040
[ 840.304061] ffff880118301fd8 ffff880118301fd8 ffff880118301fd8 00000000000137c0
[ 840.304065] ffff880117c19700 ffff8801182f8000 ffff880118301e70 0000000000000009
[ 840.304068] Call Trace:
[ 840.304074] [<ffffffff8165b50f>] schedule+0x3f/0x60
[ 840.304078] [<ffffffff81092df5>] async_synchronize_cookie_domain+0x75/0x120
[ 840.304082] [<ffffffff8108bd20>] ? add_wait_queue+0x60/0x60
[ 840.304085] [<ffffffff81092ef7>] async_synchronize_full+0x17/0x20
[ 840.304090] [<ffffffff81641107>] init_post+0xe/0xc5
[ 840.304094] [<ffffffff81cfcd74>] kernel_init+0x164/0x164
[ 840.304098] [<ffffffff81667b74>] kernel_thread_helper+0x4/0x10
[ 840.304101] [<ffffffff81cfcc10>] ? start_kernel+0x3bd/0x3bd
[ 840.304104] [<ffffffff81667b70>] ? gs_change+0x13/0x13
[ 947.952877] ACPI Error: Method parse/execution failed [\_SB_.WADR] (Node ffff880118260028), AE_AML_INFINITE_LOOP (20110623/psparse-536)
[ 947.952891] ACPI Error: Method parse/execution failed [\_SB_.BAT1.UPBI] (Node ffff880118260258), AE_AML_INFINITE_LOOP (20110623/psparse-536)
[ 947.952898] ACPI Error: Method parse/execution failed [\_SB_.BAT1._BIF] (Node ffff880118260208), AE_AML_INFINITE_LOOP (20110623/psparse-536)
[ 947.952906] ACPI Exception: AE_AML_INFINITE_LOOP, Evaluating _BIF (20110623/battery-419)
[ 947.952909] ACPI: Battery Slot [BAT1] (battery present)
[ 947.954127] Freeing unused kernel memory: 924k freed
[ 947.954237] Write protecting the kernel read-only data: 12288k

Specifically, this bug has been reported to appear sometime between 3.2 and 3.3, though the exact commit is unknown. Some temporary workarounds have been proposed, both on the Ubuntu tracker above, and on other forums. The only fix that has consistently worked (no failed boots) was a modification of the DSDT suggested by Tom Thompson on the Ubuntu tracker. His workaround (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1093217/comments/63) was to add a "Sleep (50)" line inside the method WAEC, recompile the DSDT, and use grub2 to replace the DSDT before booting. This fix has consistently worked on every kernel I have tested, including Mainline 3.3, 3.4, 3.8; Ubuntu 3.3, 3.4; and Fedora 3.8. This change seems to indicate either a race condition or other timing issue in the Linux kernel.

Many Z580 models exist based on both 2nd and 3rd generation Intel Core processors with Intel Integrated and optional NVIDIA graphics, and all exhibit this same behavior. Windows 7/8 will boot without modification to the BIOS provided DSDT. As Linux requires the modification to the DSDT to boot normally after kernel 3.2, this is likely a bug. All kernel versions past 3.2 do not boot successfully.

The original compiled DSDT, decompiled DSDT.dsl.orig, modified DSDT.dsl, diff DSDT.dsl.diff, and recompiled DSDT.aml are attached to this report. This bug has existed for many months, and most newer versions of distributions will not even boot live media to install. Thank you for your assistance.
Comment 1 sduddikunta 2013-07-10 02:30:46 UTC
Created attachment 106856 [details]
DSDT.aml
Comment 2 sduddikunta 2013-07-10 02:31:53 UTC
Created attachment 106857 [details]
DSDT.dsl
Comment 3 sduddikunta 2013-07-10 02:32:18 UTC
Created attachment 106858 [details]
DSDT.dsl.orig
Comment 4 sduddikunta 2013-07-10 02:32:30 UTC
Created attachment 106859 [details]
DSDT.dsl.diff
Comment 5 Aaron Lu 2013-07-15 00:56:11 UTC
Add Zheng Lv.

In the meantime, it would be good if someone can do the bisect to find the offending commit.
Comment 6 Lv Zheng 2013-07-15 04:47:13 UTC
From the DSDT, we can see that all BYFG accesses are invoked inside _BIF method.
I just wonder how _BIF method is invoked by the OSPM.
We may need to investigate to see the bisect result first.
Comment 7 sduddikunta 2013-07-15 16:37:36 UTC
After further testing, this bug may not be a regression. Because of the nature of the problem, boots even with bad kernels often don't hang. However, in the recent testing I did, I found that versions dating back to 3.0 (including the previously thought good 3.2) also exhibited the issue. I'll continue to test versions prior to 3.0.
Comment 8 sduddikunta 2013-07-15 17:01:50 UTC
Attempted to test 2.6.39 and 2.6.38; my system does not boot either one. It's not that they hang the same way the 3.x kernels do, the kernel doesn't load at all. I get no output to the screen. There is no indication of anything happening.
Comment 9 Aaron Lu 2013-08-12 08:24:42 UTC
Please attach acpidump like this:
# acpidump > acpidump.txt
Comment 10 sduddikunta 2013-08-12 20:34:51 UTC
Created attachment 107191 [details]
acpidump.txt
Comment 11 sduddikunta 2013-09-28 00:03:50 UTC
Has there been any progress on this? There is still active discussion on the Ubuntu bug tracker with no true workarounds or fixes yet.
Comment 12 Lv Zheng 2013-09-29 02:24:38 UTC
Have you tested the kernel with an ACPICA fix that has filled a gap for operation region fields?

The commit is:
Commit 4be4be8fee2ee99a52f94f90d03d2f287ee1db86
Author: Bob Moore <robert.moore@intel.com> 2013-09-06 06:27:15 (GMT) 
Subject: ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field.

Which is shipped in the mainline kernel tagged as 3.12-rc2.

This bug seems to be a duplicate of the issue fixed by this commit.
Comment 13 sduddikunta 2013-09-29 03:05:18 UTC
Thanks for the response. I just installed the Ubuntu mainline 3.12-rc2 build, and initial testing seems to be showing good results. I've asked the folks on the Ubuntu tracker for additional help in testing, and I'll report back once we're done.
Comment 14 Aaron Lu 2013-10-10 01:27:00 UTC
Anything new about the test?
Comment 15 sduddikunta 2013-10-10 01:46:51 UTC
Sorry about the delay. The new kernel does not fix the bug. Myself and a few on the Ubuntu tracker have found that it still exhibits the same occasional, seemingly random issues with the infinite loop.
Comment 16 Lv Zheng 2014-03-07 03:28:14 UTC
Hi,

The WAEC method has created a named object inside of it:

        Method (WAEC, 0, NotSerialized)
        {
            Name (CUNT, 0x1E)
            While (LNotEqual (^PCI0.LPCB.EC0.BYFG, Zero))
            {
                Sleep (0x05)
                Decrement (CUNT)
                If (LEqual (CUNT, Zero))
                {
                    Store (Zero, ^PCI0.LPCB.EC0.BYFG)
                    Store (Zero, ^PCI0.LPCB.EC0.DRFG)
                    Break
                }
            }
        }

So this looks like a method should be marked as Serialized.
Recently we have fix shipped in the Linux upstream to automatically marking control methods as Serialized.

Could you give it try?

1. Please download and checkout the git repo that this series is based on (linux-pm/linux-next branch):
   # git clone https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
   # git checkout -b linux-next --track origin/linux-next
2. Please boot the kernel without DSDT customized.

Thanks in advance.
Comment 17 sduddikunta 2014-03-09 14:58:40 UTC
I will test and report back in a few days.
Comment 18 sduddikunta 2014-03-13 00:53:56 UTC
This does not fix the bug. The same condition is seen: a seemingly randomly occurring hang. I also noticed that my computer was unable to shut down via the usual channels with this kernel. It would halt but not power off.
Comment 19 Lv Zheng 2014-03-13 07:19:14 UTC
OK, Let's do some basic debugging.

1. Please use the v3.14-rc5 kernel (it is linus/master branch);
2. Please apply the following patches:
    attachment 129031 [details]
    attachment 129041 [details]
    attachment 129051 [details]
    attachment 129061 [details]
    attachment 129071 [details]
3. Boot the kernel with "acpi.debug_layer=0x000000E4 acpi.debug_level=0x00000010 acpi_trace_once=_SB_.BAT1._BIF";
4. Post dmesg here.

Let's first track what has been executed in this case.

Thanks in advance.
Comment 20 Zhang Rui 2014-06-03 03:49:35 UTC
sduddikunta@gmail.com, any update on this?
Comment 21 Lv Zheng 2014-06-03 04:59:58 UTC
Hi, Rui

This bug is a valid report.
All information needed are uploaded here.
It reflects a gap in ACPICA interpreter.
We just don't have time working on it.
There are 2 ACPICA releases queued up for 3.16 and some urgent issues are handled this Q.

Thanks
Comment 22 Zhang Rui 2014-06-03 05:04:24 UTC
Do we have a patch that has been verified to fix this?
If no, we need one, and I think this is what you want to do in comment #19, no?
Comment 23 Lv Zheng 2014-06-03 05:25:02 UTC
No(In reply to Zhang Rui from comment #22)
> Do we have a patch that has been verified to fix this?
> If no, we need one, and I think this is what you want to do in comment #19,
> no?

You are right.
I confused this bug to another.

That kind of logging message could be useful.

Thanks
Comment 24 Lv Zheng 2014-06-11 00:22:36 UTC
Closing since no response.
You can re-open it if you still suffer from the same issue in the recent kernel.
Comment 25 tomas platz 2019-07-11 09:46:12 UTC
Created attachment 283621 [details]
118

Note You need to log in before you can comment on or make changes to this bug.