Bug 75581

Summary: Faill of battery detection on Lenovo Z480 with probability 5-10%
Product: ACPI Reporter: naszar (naszar)
Component: Power-BatteryAssignee: Lan Tianyu (tianyu.lan)
Status: CLOSED CODE_FIX    
Severity: normal CC: aaron.lu, naszar, tianyu.lan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.14.2-1-ARCH Subsystem:
Regression: No Bisected commit-id:
Attachments: output of dmesg (last line after rmmod battery&&modprobe battery)
acpidump
dmesg for 3.15-rc5
dmesg for 3.14.6-1 with patch from Lan Tianyu
battery.patch
dmesg for 3.14.6-1 with second patch from Lan Tianyu
battery.patch

Description naszar 2014-05-06 10:03:59 UTC
Created attachment 135211 [details]
output of dmesg (last line after rmmod battery&&modprobe battery)

Sometimes battery detection in module battery fails, reloading of module solves problem. I see this bug from Linux-3.7.4 until Linux-3.14.2.
Comment 1 Aaron Lu 2014-05-16 06:43:50 UTC
acpidump please:
# acpidump > acpidump.txt
Comment 2 naszar 2014-05-16 06:58:41 UTC
Created attachment 136311 [details]
acpidump
Comment 3 naszar 2014-05-16 06:59:19 UTC
I tried to blacklist module ac and it seems that problem gone.
Comment 4 Aaron Lu 2014-05-16 07:20:22 UTC
(In reply to naszar from comment #0)
> Created attachment 135211 [details]
> output of dmesg (last line after rmmod battery&&modprobe battery)
> 
> Sometimes battery detection in module battery fails, reloading of module
> solves problem. I see this bug from Linux-3.7.4 until Linux-3.14.2.

What about v3.15-rc5? Did you test that?
Comment 5 naszar 2014-05-16 07:59:48 UTC
(In reply to Aaron Lu from comment #4)
> What about v3.15-rc5? Did you test that?
No, my current kernel is 3.14.3-2-ARCH (ArchLinux).
I can try to test v3.15-rc5 tomorrow ( with archlinux default cofig is OK?)
But the v3.15-rc5 still have in drivers/acpi/acpica/psparse.c:538, that comment:
/* Check for possible multi-thread reentrancy problem */
(my dmesg says that error in psparse.c:536)
Comment 6 naszar 2014-05-17 05:51:19 UTC
Created attachment 136471 [details]
dmesg for 3.15-rc5

Same problem with 3.15-rc5. Five boots with successful battery detection. And it is result of 6'th boot - no battery.
Comment 7 naszar 2014-05-17 05:58:58 UTC
After sudo rmmod battery && sudo modprobe battery all OK:
[  889.295123] ACPI: Battery Slot [BAT1] (battery present)
(one extra line to attachment 136471 [details])
Comment 8 naszar 2014-06-02 03:35:44 UTC
Its seems that problem appears, but with lower probability, when module ac is blacklisted. 45 successful boot's, and one when battery not detected. (Probability ~2% against ~20% when module ac is active)
Comment 9 Lan Tianyu 2014-06-10 07:03:30 UTC
Hi, Could you try the following patch?

diff --git a/include/acpi/acconfig.h b/include/acpi/acconfig.h
index 932a60d..dfae872 100644
--- a/include/acpi/acconfig.h
+++ b/include/acpi/acconfig.h
@@ -138,7 +138,7 @@
 
 /* Maximum number of While() loop iterations before forced abort */
 
-#define ACPI_MAX_LOOP_ITERATIONS        0xFFFF
+#define ACPI_MAX_LOOP_ITERATIONS        0xFFFFFFFF
 
 /* Maximum sleep allowed via Sleep() operator */
Comment 10 naszar 2014-06-11 00:03:04 UTC
Created attachment 139001 [details]
dmesg for 3.14.6-1 with patch from Lan Tianyu

(In reply to Lan Tianyu from comment #9)
> Hi, Could you try the following patch?

Patch was added to Archlinux current kernel (3.14.6-1-ARCH). Don't work - udev kills module (see attachment). 19 sucessful boots and it's result of 20'th boot. Something use module battery:
$ sudo rmmod battery
rmmod: ERROR: Module battery is in use
$ lsmod|grep battery
battery                 7821  1
Comment 11 Lan Tianyu 2014-06-11 02:35:22 UTC
So far, I think the bug is hardware unstable issue. When battery module is loaded, driver will call Bios AML method _BIF to get Battery information. The AML method will wait for "^PCI0.LPCB.EC0.DRFG" to be set and this is hardware job. When the bug happened and "DRFG" wasn't set by hardware, the following while code exceeds the loop limit in the Linux ACPICA and then trigger the infinite loop log you saw. My last patch is to increase the loop limit but from your log attached, the DRFG wasn't set. 

From DSDT table,
        Method (WADR, 0, NotSerialized)
        {
            While (LNotEqual (^PCI0.LPCB.EC0.DRFG, One)) {}
        }
Comment 12 Lan Tianyu 2014-06-11 02:41:04 UTC
Created attachment 139021 [details]
battery.patch

Please try this patch which try to recall _BIF method 5 times when it failed. Not apply the previous patch.
Comment 13 naszar 2014-06-11 09:38:14 UTC
Created attachment 139081 [details]
dmesg for 3.14.6-1 with second patch from Lan Tianyu

(In reply to Lan Tianyu from comment #12)
> 
> Please try this patch which try to recall _BIF method 5 times when it
> failed.

This one solves problem.
Comment 14 Lan Tianyu 2014-06-12 02:51:03 UTC
Created attachment 139371 [details]
battery.patch

Please test this patch. If it works, I would upstream this one.
Comment 15 naszar 2014-06-12 05:04:40 UTC
(In reply to Lan Tianyu from comment #14)
> Created attachment 139371 [details]
> battery.patch
> 
> Please test this patch. If it works, I would upstream this one.

It works. If it is important, I apply a patch to 3.14.6 (https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/plain/drivers/acpi/battery.c?id=a1bc295d7a4be9425bbeecc005d0cd013eb46cea).
Result is:

[    8.655932] fuse init (API version 7.22)
[   53.327340] ACPI Error: Method parse/execution failed [\_SB_.WADR] (Node ffff88011a842e60), AE_AML_INFINITE_LOOP (20131218/psparse-536)
[   53.327351] ACPI Error: Method parse/execution failed [\_SB_.BAT1.UPBI] (Node ffff88011a8430a0), AE_AML_INFINITE_LOOP (20131218/psparse-536)
[   53.327357] ACPI Error: Method parse/execution failed [\_SB_.BAT1._BIF] (Node ffff88011a843050), AE_AML_INFINITE_LOOP (20131218/psparse-536)
[   53.327365] ACPI Exception: AE_AML_INFINITE_LOOP, Evaluating _BIF (20131218/battery-444)
[   53.869507] ACPI: Battery Slot [BAT1] (battery present)

(A bit faster than previous one)
Comment 16 Lan Tianyu 2014-06-12 06:22:19 UTC
The fix patch has been sent to linux acpi maillist
https://patchwork.kernel.org/patch/4339621/
Comment 17 Lan Tianyu 2014-06-17 02:02:07 UTC
Hi naszar:
          Could you test the following patch?
https://bugzilla.kernel.org/attachment.cgi?id=106887&action=edit
Comment 18 naszar 2014-06-17 05:44:36 UTC
(In reply to Lan Tianyu from comment #17)
> Hi naszar:
>           Could you test the following patch?
> https://bugzilla.kernel.org/attachment.cgi?id=106887&action=edit

Bug happens, but probability that it will happened extremely decrease. I have 97 successful boots and at 98'th boot I see error in log: 
[    9.615566] fuse init (API version 7.22)
[   68.963361] ACPI Error: Method parse/execution failed [\_SB_.WADR] (Node ffff88011a842e60), AE_AML_INFINITE_LOOP (20131218/psparse-536)
[   68.963373] ACPI Error: Method parse/execution failed [\_SB_.BAT1.UPBI] (Node ffff88011a8430a0), AE_AML_INFINITE_LOOP (20131218/psparse-536)
[   68.963379] ACPI Error: Method parse/execution failed [\_SB_.BAT1._BIF] (Node ffff88011a843050), AE_AML_INFINITE_LOOP (20131218/psparse-536)
[   68.963388] ACPI Exception: AE_AML_INFINITE_LOOP, Evaluating _BIF (20131218/battery-443)
[   68.963588] wmi: Mapper loaded

Patch tested against 3.14.6-1 (commit a1bc295d7a4be9 at kernel.org). I can try test it against mainline, if needed. 
Can you add some debug messages, next time, just for testing purpose? For battery.patch I can see in log that it works but last patch is silent and it is definitely hard to understand that it is change something.

P.S. Sorry but links links like [https://bugzilla.kernel.org/attachment.cgi?id=106887&action=edit] is broken for me, [https://bugzilla.kernel.org/attachment.cgi?id=106887] is much better.
Comment 19 Lan Tianyu 2014-07-08 07:21:26 UTC
Fix patch has been merged into linux-pm tree.