Bug 54831 - ACPI Error and fails to detect battery
Summary: ACPI Error and fails to detect battery
Status: CLOSED OBSOLETE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Battery (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lan Tianyu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-05 09:30 UTC by swda289346
Modified: 2013-04-14 13:06 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.7.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump (408.33 KB, application/octet-stream)
2013-03-07 05:36 UTC, swda289346
Details
enable_ec_debug.patch (297 bytes, patch)
2013-03-25 07:02 UTC, Lan Tianyu
Details | Diff
bug-dmesg-kernel-3.7.10 (174.29 KB, application/octet-stream)
2013-03-31 02:32 UTC, swda289346
Details
bug-dmesg-kernel-3.8.5 (194.05 KB, application/octet-stream)
2013-03-31 05:19 UTC, swda289346
Details

Description swda289346 2013-03-05 09:30:35 UTC
Sometimes I boot my notebook, kernel fails to detect battery. I need to shutdown, unplug battery, and plug battery again.

(Probability is less than 10%.)

Error message:

[    2.034330] ACPI Exception: AE_AML_BUFFER_LIMIT, Index (0x00000000000000FF) i
s beyond end of object (20120913/exoparg2-418)
[    2.034335] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.B
IF9] (Node ffff88022429d758), AE_AML_BUFFER_LIMIT (20120913/psparse-536)
[    2.034341] ACPI Error: Method parse/execution failed [\_SB_.PCI0.BAT0._BIF] 
(Node ffff88022429d460), AE_AML_BUFFER_LIMIT (20120913/psparse-536)
[    2.034344] ACPI Error: Method parse/execution failed [\_SB_.PCI0.BAT0._BIX] 
(Node ffff88022429d500), AE_AML_BUFFER_LIMIT (20120913/psparse-536)
[    2.034349] ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _BIX (20120913/ba
ttery-441)
Comment 1 Aaron Lu 2013-03-06 02:46:25 UTC
Hello,

Thanks for the report.

Please attach acpidump output:
# acpidump > acpidump.out

And do you have this problem before? Is it possible for you to identify from which kernel you start to have this problem? Thanks.
Comment 2 swda289346 2013-03-07 05:36:27 UTC
Created attachment 94691 [details]
acpidump
Comment 3 swda289346 2013-03-07 05:41:38 UTC
I have this bug since I install Linux on my notebook. So I can't identify which version exactly.
My first installation is Ubuntu 12.04.
Comment 4 Lan Tianyu 2013-03-20 03:07:03 UTC
Have you tried Windows?
Comment 5 Lan Tianyu 2013-03-21 03:42:01 UTC
Further more, you are using ubuntu kernel. Have you tried the latest upstream kernel?
You can get it from https://www.kernel.org/ or git a upstream tree via
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git.


Hi Lv & Bob:
     Following seems ACPICA Exception log. Could you have a look? These are related with battery.
[    2.034330] ACPI Exception: AE_AML_BUFFER_LIMIT, Index (0x00000000000000FF)
i
s beyond end of object (20120913/exoparg2-418)
[    2.034335] ACPI Error: Method parse/execution failed
[\_SB_.PCI0.LPCB.EC0_.B
IF9] (Node ffff88022429d758), AE_AML_BUFFER_LIMIT (20120913/psparse-536)
[    2.034341] ACPI Error: Method parse/execution failed [\_SB_.PCI0.BAT0._BIF] 
(Node ffff88022429d460), AE_AML_BUFFER_LIMIT (20120913/psparse-536)
[    2.034344] ACPI Error: Method parse/execution failed [\_SB_.PCI0.BAT0._BIX] 
(Node ffff88022429d500), AE_AML_BUFFER_LIMIT (20120913/psparse-536)
[    2.034349] ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _BIX
(20120913/ba
ttery-441)
Comment 6 Robert Moore 2013-03-21 15:50:10 UTC
It looks like the buffer defined in BIF9 (of length 0x20) is being overrun with an index of 0xFF.

It also looks like there's some EC I/O going on, which may be where this index comes from. It may take an EC driver expert to determine the exact cause.

        Method (BIF9, 0, NotSerialized)
        {
            Name (BSTR, Buffer (0x20) {})
            Store (SMBR (RDBL, BADR, 0x21), Local0)

            If (LNotEqual (DerefOf (Index (Local0, Zero)), Zero))
            {
                Store (MNAM, BSTR)
                Store (Zero, Index (BSTR, 0x04))

            }
            Else
            {
                Store (DerefOf (Index (Local0, 0x02)), BSTR)
                Store (Zero, Index (BSTR, DerefOf (Index (Local0, One))))
            }

            Return (BSTR)
        }
Comment 7 swda289346 2013-03-22 06:03:21 UTC
This notebook has Windows preinstalled. I think there is no problem in Windows, though I seldom boot to Windows.

Since bbswitch has problem in kernel 3.8, I can't use newer kernel.
Comment 8 Lan Tianyu 2013-03-22 14:57:06 UTC
(In reply to comment #6)
> It looks like the buffer defined in BIF9 (of length 0x20) is being overrun
> with
> an index of 0xFF.
> 
> It also looks like there's some EC I/O going on, which may be where this
> index
> comes from. It may take an EC driver expert to determine the exact cause.
Do you mean the EC driver may affect the index?
But from os view, how EC driver knows what ACPI table want to do?

> 
>         Method (BIF9, 0, NotSerialized)
>         {
>             Name (BSTR, Buffer (0x20) {})
>             Store (SMBR (RDBL, BADR, 0x21), Local0)
> 
>             If (LNotEqual (DerefOf (Index (Local0, Zero)), Zero))
>             {
>                 Store (MNAM, BSTR)
>                 Store (Zero, Index (BSTR, 0x04))
> 
>             }
>             Else
>             {
>                 Store (DerefOf (Index (Local0, 0x02)), BSTR)
>                 Store (Zero, Index (BSTR, DerefOf (Index (Local0, One))))
>             }
> 
>             Return (BSTR)
>         }
Comment 9 Robert Moore 2013-03-22 15:10:10 UTC
The index value is obtained from reading from the EC, see below.



In BIF9, this is most likely the statement that causes the exception:

  Store (Zero, Index (BSTR, DerefOf (Index (Local0, One))))

Local0 is set earlier here, from a call to SMBR:

  Store (SMBR (RDBL, BADR, 0x21), Local0)

SMBR returns this package:

        Method (SMBR, 3, Serialized)
        {
            Store (Package (0x03)
                {
                    0x07, 
                    Zero, 
                    Zero
                }, Local0)

Index 1 of this package is what appears to be set to 0xFF, probabaly here:

SMBR:
  Store (BCNT, Index (Local0, One))


BCNT is obtained from the EC:

            OperationRegion (SMBX, EmbeddedControl, 0x18, 0x28)
            Field (SMBX, ByteAcc, NoLock, Preserve)
            {
                PRTC,   8, 
                SSTS,   5, 
                    ,   1, 
                ALFG,   1, 
                CDFG,   1, 
                ADDR,   8, 
                CMDB,   8, 
                BDAT,   256, 
                BCNT,   8, 
                    ,   1, 
                ALAD,   7, 
                ALD0,   8, 
                ALD1,   8
            }
Comment 10 Lan Tianyu 2013-03-25 07:00:10 UTC
(In reply to comment #9)
> BCNT is obtained from the EC:
> 
>             OperationRegion (SMBX, EmbeddedControl, 0x18, 0x28)
>             Field (SMBX, ByteAcc, NoLock, Preserve)
>             {
>                 PRTC,   8, 
>                 SSTS,   5, 
>                     ,   1, 
>                 ALFG,   1, 
>                 CDFG,   1, 
>                 ADDR,   8, 
>                 CMDB,   8, 
>                 BDAT,   256, 
>                 BCNT,   8, 
>                     ,   1, 
>                 ALAD,   7, 
>                 ALD0,   8, 
>                 ALD1,   8
>             }
Thanks, Bob. Yes, I see. No sure why the BCNT return 0xff.
Opening EC driver debug maybe give us more clues.
Comment 11 Lan Tianyu 2013-03-25 07:02:03 UTC
Created attachment 96101 [details]
enable_ec_debug.patch

Please try this patch and attach the output of dmesg when the issue took place.
Thanks in advance.
Comment 12 swda289346 2013-03-31 02:32:06 UTC
Created attachment 96701 [details]
bug-dmesg-kernel-3.7.10
Comment 13 swda289346 2013-03-31 05:19:41 UTC
Created attachment 96711 [details]
bug-dmesg-kernel-3.8.5
Comment 14 Lan Tianyu 2013-04-03 14:52:00 UTC
[    1.651678] ACPI: EC: transaction start (cmd=0x80, addr=0x3b)
[    1.651679] ACPI: EC: <--- command = 0x80
[    1.651830] ACPI: EC: ---> status = 0x28
[    1.651835] ACPI: EC: ~~~> interrupt, status:0x28
[    1.651837] ACPI: EC: <--- data = 0x3b
[    1.651889] ACPI: EC: ---> status = 0x21
[    1.651890] ACPI: EC: ~~~> interrupt, status:0x21
[    1.651893] ACPI: EC: ---> data = 0xff
[    1.651897] ACPI: EC: ---> status = 0x20
[    1.651901] ACPI: EC: ---> status = 0x20
[    1.651909] ACPI: EC: ---> status = 0x20
[    1.651910] ACPI: EC: transaction end
[    1.651914] ACPI: EC: ---> status = 0x20
[    1.651962] ACPI Exception: AE_AML_BUFFER_LIMIT, Index (0x00000000000000FF) is beyond end of object (20121018/exoparg2-418)
[    1.651966] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.BIF9] (Node ffff88022429c758), AE_AML_BUFFER_LIMIT (20121018/psparse-537)
[    1.651971] ACPI Error: Method parse/execution failed [\_SB_.PCI0.BAT0._BIF] (Node ffff88022429c460), AE_AML_BUFFER_LIMIT (20121018/psparse-537)
[    1.651975] ACPI Error: Method parse/execution failed [\_SB_.PCI0.BAT0._BIX] (Node ffff88022429c500), AE_AML_BUFFER_LIMIT (20121018/psparse-537)
[    1.651979] ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _BIX (20121018/battery-441)

Hi Bob:
         Look the log, the EC return 0xff for BCNT. But for ec driver, it is just in charge of reading BCNT and not touch the data.
         Do you have some suggestions?
Comment 15 Robert Moore 2013-04-03 14:56:39 UTC
I would think that it must be either a BIOS/ASL bug or a hardware issue. Certainly we are not going to allow the ASL to index beyond the end of a buffer. Perhaps a new BIOS is available for the machine?
Comment 16 Lan Tianyu 2013-04-03 15:03:20 UTC
Bob, Thanks for reply and I think so, too.

Hi  swda:
       Can you try to test Windows again? Is there a newer bios? Just like Bob said, this is more possible to be a hardware or hardware problem. I intend it's a hardware issue because it requires a 0xff length buf.
Comment 17 Lan Tianyu 2013-04-09 05:14:34 UTC
ping ...
Comment 18 swda289346 2013-04-11 06:43:46 UTC
There was a bios update, but I had upgraded.

I google for the issue and can't find anybody that has this bug in Windows. Maybe it is a hardware bug and only occurs in Linux?
Comment 19 Lan Tianyu 2013-04-13 07:17:10 UTC
Hi, Sorry for late reply.
     Just like you said the "Probability is less than 10%", battery hardware maybe work unstable some time and then return the error value. Further more, these operations all happen in the Bios code, linux battery driver just call _BIX method to get some battery related info. EC driver just is in charge of getting data. From the log, EC works normally. 

(In reply to comment #18)
> I google for the issue and can't find anybody that has this bug in Windows.
> Maybe it is a hardware bug and only occurs in Linux?
I think this should be a hardware unstable problem rather than a bug. I assume this problem doesn't happen all such kind machines. 

At this point, What I can help is to try to read the battery info several times if fail until get the data or exceed five times.
Please try the following the patch.

diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
index c5cd5b5..cefe5ed 100644
--- a/drivers/acpi/battery.c
+++ b/drivers/acpi/battery.c
@@ -1064,7 +1064,7 @@ static int battery_notify(struct notifier_block *nb,

 static int acpi_battery_add(struct acpi_device *device)
 {
-       int result = 0;
+       int result = 0, count = 5;
        struct acpi_battery *battery = NULL;
        acpi_handle handle;
        if (!device)
@@ -1078,12 +1078,17 @@ static int acpi_battery_add(struct acpi_device *device)
        device->driver_data = battery;
        mutex_init(&battery->lock);
        mutex_init(&battery->sysfs_lock);
+
+retry:
        if (ACPI_SUCCESS(acpi_get_handle(battery->device->handle,
                        "_BIX", &handle)))
                set_bit(ACPI_BATTERY_XINFO_PRESENT, &battery->flags);
        result = acpi_battery_update(battery);
-       if (result)
+       if (result && count--)
+               goto retry;
+       else if (result)
                goto fail;
+
 #ifdef CONFIG_ACPI_PROCFS_POWER
        result = acpi_battery_add_fs(device);
 #endif
Comment 20 swda289346 2013-04-14 13:01:02 UTC
Thanks.

Should I close this bug?
Comment 21 Lan Tianyu 2013-04-14 13:06:29 UTC
Does this patch work for you? I close the bug.

Note You need to log in before you can comment on or make changes to this bug.