Bug 156501

Summary: AML InfiniteLoop: Battery state becomes unreadable- AE_AML_INFINITE_LOOP - Hewlett-Packard HP ENVY m4 Notebook
Product: ACPI Reporter: M Foronda (josemauricioforonda)
Component: ACPICA-CoreAssignee: Lv Zheng (lv.zheng)
Status: CLOSED CODE_FIX    
Severity: normal CC: lenb, lv.zheng
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.8-rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output of my last session with current mainline kernel
acpidump
[PATCH] ACPICA: Dispatcher: Disable infinite loop detection
dmidecode
[PATCH] ACPICA: Dispatcher: Introduce timeout mechanism for infinite loop detection
[PATCH] ACPICA: Dispatcher: Introduce timeout mechanism for infinite loop detection

Description M Foronda 2016-09-09 22:29:00 UTC
Created attachment 232971 [details]
dmesg output of my last session with current mainline kernel

I get the following errors on my HP Envy m4-1015dx:

ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC0.MMRD] (Node ffff8802468b4f50), AE_AML_INFINITE_LOOP (20150930/psparse-542)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC0.RDMB] (Node ffff8802468b5938), AE_AML_INFINITE_LOOP (20150930/psparse-542)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC0.RDMW] (Node ffff8802468b5960), AE_AML_INFINITE_LOOP (20150930/psparse-542)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC0.BAT0._BST] (Node ffff8802468b54d8), AE_AML_INFINITE_LOOP (20150930/psparse-542)

After this I can't read battery state and every process that depends on it fails, including upowerd, suspend, shutdown. 

I can't find any trigger on dmesg, it always stays silent for many minutes before this happens. It has happened after sessions lasting from 7 minutes up to a couple of days, it never takes longer. 

I have had this problem for as long as I have had Linux on this machine, which goes from kernel 3.19 up to current mainline. Currently I use Ubuntu 16.04.1 LTS with Linux 4.8-rc5.

I already filed a bug report on launchpad for the Ubuntu maintainers where there may be more information (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1615200)
Comment 1 M Foronda 2016-09-09 22:29:49 UTC
Created attachment 232981 [details]
acpidump
Comment 2 Lv Zheng 2016-09-20 02:32:32 UTC
Created attachment 239241 [details]
[PATCH] ACPICA: Dispatcher: Disable infinite loop detection

Hi,

Please give this patch a try.

Thanks
Comment 3 Lv Zheng 2016-09-20 02:32:58 UTC
Also please upload dmidecode here.
Comment 4 M Foronda 2016-09-20 23:02:35 UTC
Created attachment 239341 [details]
dmidecode
Comment 5 M Foronda 2016-09-20 23:03:12 UTC
Thanks,
I'll try the patch and report in a few days or when something interesting happens.
Comment 6 Lv Zheng 2016-09-21 05:41:19 UTC
OK.
IMO, the infinite loop detection is a compromise for the current Linux ACPI entironment. It is not reasonable but is useful during the period when there are bugs/gaps not root caused. Introducing this mechanism ensures that when such bugs/gaps cause some AML synchronization failures, Linux ACPI can still run and most of other ACPI features won't be affected.
Comment 7 M Foronda 2016-09-26 02:23:17 UTC
Hi,

I've been testing the patch on a single session spanning 5 days so far and haven't encountered the issue. I've monitored the output of the acpi command and suspended countless times. The logs have also stayed very quiet and haven't reported anything resembling an AML synchronization failure or any other ACPI error. It's the longest session I have had in memory so I'd say the issue is solved.

Thank you, this was driving me insane
Comment 8 Lv Zheng 2016-09-26 05:23:55 UTC
OK, marking this as RESOLVED.
I'll discuss this with ACPI experts, see how we can have it upstreamed under the current Linux ACPI entironment.

Best regards
Comment 9 Lv Zheng 2016-09-27 05:58:32 UTC
I've obtained some insructions.
Let me re-open this bug.

We'll about to introduce a timeout mechanism instead of this loop couting mechanism in order to break possible infinite loops.

I'll post another patch later.

Thanks
Lv
Comment 10 M Foronda 2016-09-29 21:01:02 UTC
Sure, looking forward to test the patch
Comment 11 M Foronda 2016-10-27 20:47:23 UTC
Hi,

Just a reminder that I'll test the patch you mentioned last month when it's ready.

Thanks
Comment 12 Lv Zheng 2016-10-28 19:24:16 UTC
Created attachment 243041 [details]
[PATCH] ACPICA: Dispatcher: Introduce timeout mechanism for infinite loop detection

Here you are.
Please check if it is working.
I've configured the timeout value to 120seconds by default.
It cleanly applies to linux-pm/linux-next branch.

Thanks in adnvace.
Comment 13 Lv Zheng 2016-10-28 22:52:25 UTC
Created attachment 243081 [details]
[PATCH] ACPICA: Dispatcher: Introduce timeout mechanism for infinite loop detection

I refined the patch, making it eligible for upstream.
Please test the upgraded one.

And I changed the default timeout to 30s.
If it is not suitable for your machine.
Please help to modify the value for ACPI_MAX_LOOP_TIMEOUT.

Thanks in advance.
Comment 14 M Foronda 2016-10-29 05:16:35 UTC
Thanks, I am currently testing the patch.
I'll let you know if I find any issues or after a week long session without them.
Comment 15 M Foronda 2016-11-02 04:49:57 UTC
Hi, 

I have tested the patch for 4 days and the logs haven't yet shown anything related to the bug. Nevertheless, out of 9 times I've suspended 2 have gone badly in a new way. Everything seemed to suspend fine except for the fans and the keyboard backlight, and later I wasn't able to resume. The logs don't show anything different from a successful suspend. I applied the patch to next-20161028 and didn't modify ACPI_MAX_LOOP_TIMEOUT.

Thanks
Comment 16 Lv Zheng 2016-11-02 14:27:12 UTC
Can the suspend/resume issue be reproduced with attachment 239241 [details]?
Comment 17 M Foronda 2016-11-02 15:52:32 UTC
No, I didn't have that issue with the previous patch. I applied that patch to 4.8-rc7 and this is also the first time I tried a kernel released after that. Should I build the latest kernel on the linux-next branch with no infinite loop detection?
Comment 18 Lv Zheng 2016-11-02 18:27:05 UTC
I was thinking you were complaining that ktime_get() wasn't working in suspend/resume process and the patch thus need to be improved.

While now this just looks to me like that the default timeout value is not long enough (or could never be long enough).
So we should choose a longer timeout value or we shouldn't have such infinite loop detection mechanism in the upstream.

That's a non-technical issue. I'll close the bug and let the others to determine which should be in the usptream.

Thanks for the report and the test.

Best regards
Lv
Comment 19 M Foronda 2016-11-03 04:10:00 UTC
One last thing. How should I find out when a fix is pushed upstream?
Thanks for your work
Comment 20 Lv Zheng 2016-11-03 16:08:09 UTC
I'll mention the upstream commit ID here before closing the bug. :)
Comment 21 M Foronda 2016-11-03 19:38:41 UTC
That's great, thank you very much!
Comment 22 Len Brown 2016-11-29 00:32:20 UTC
> And I changed the default timeout to 30s.

What is the approximate _minimum_ delay that makes this system work?
eg. does 3s work?

Unclear if the issue with suspend/resume is related to this one.
Are these suspend/resume problems newly created by the test patch,
or is it possible they were present before the test patch?
Comment 23 M Foronda 2016-11-29 03:00:04 UTC
Hi Len,

If by delay you mean the timeout value on the patch, I used only 30 seconds. If you mean something about the suspend/resume issue, there is no delay at resume, the laptop didn't come back from suspend 2 out of 9 times and I had to force a shutdown.

I only had this issue with this patch on kernel next-20161028. Before this, I used all ubuntu supported kernels from 3.19 until the current 4.4, mainline 4.8-rc4 until rc7 and the first patch on this thread applied to mainline 4.8-rc7. 

I might try an unpatched version of next-20161028, a different timeout value or something more specific if you think it's worthwhile.

Thanks
Comment 24 Lv Zheng 2016-12-12 07:22:26 UTC
We changed strategy, and will track ACPICA bugs in ACPICA upstream.
Please monitor the following URL for this bug:
https://github.com/acpica/acpica/pull/185

I'll close this bug due to the strategy change.
Comment 25 M Foronda 2016-12-12 13:04:11 UTC
Fine, I'll keep an eye for when it gets commited to linuxized ACPICA. Thanks for your work again.
Comment 26 M Foronda 2017-04-10 04:03:46 UTC
Hi, is there any reason why this has not been solved in ACPICA upstream yet?
Thanks