Bug 8573 - ACPI Exception (battery-0216): AE_SUPPORT, Extracting _BST [20070126] - Acer Aspire 1640z
Summary: ACPI Exception (battery-0216): AE_SUPPORT, Extracting _BST [20070126] - Acer ...
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Battery (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Alexey Starikovskiy
URL:
Keywords:
: 10202 10339 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-06-03 11:18 UTC by jonne
Modified: 2008-04-03 09:35 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.21-1.3200.fc8
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpidump (110.55 KB, text/plain)
2007-06-09 10:19 UTC, jonne
Details
dmesg.2.6.24-0.167-rc8.git4.fc9 (36.98 KB, text/plain)
2008-01-25 01:18 UTC, jonne
Details
acpidump.2.6.24-0.167-rc8.git4.fc9 (110.55 KB, text/plain)
2008-01-25 01:18 UTC, jonne
Details
patch to emit error messages for AE_SUPPORT (655 bytes, patch)
2008-01-31 15:19 UTC, Robert Moore
Details | Diff
dmesg output of 2.6.24-custom kernel with utcopy.c patch (36.55 KB, text/plain)
2008-02-09 07:46 UTC, jonne
Details
acpi output of 2.6.24-custom kernel with utcopy.c patch (110.55 KB, text/plain)
2008-02-09 07:47 UTC, jonne
Details
patch vs 2.26.25-rc6 from bug 10202 (1.56 KB, patch)
2008-03-17 19:41 UTC, Len Brown
Details | Diff

Description jonne 2007-06-03 11:18:22 UTC
I've had the exact same problem with many many different kernels (also on the
knoppix live dvd). Fedora people could not offer any help.
I cannot even find out whether it's a software or a hardware fault.

Distribution: Fedora 7
Hardware Environment: Acer Aspire 1640z
Software Environment:
Problem Description: Getting a battery exception right after HAL daemon is
started. Makes system unstable (usually it freezes within several minutes)

Steps to reproduce: Just booting, doing nothing special.

dmesg fragment:

ACPI Exception (battery-0216): AE_SUPPORT, Extracting _BST [20070126]
*** SLUB kmalloc-64: Redzone Active@0xf5969cb0 slab 0xc1ae2954 [Not tainted]
    offset=3248 flags=0x400000c3 inuse=10 freelist=0xf5969850
  Bytes b4 0xf5969ca0:  05 09 00 00 c5 0d fd ff 5a 5a 5a 5a 5a 5a 5a 5a
....
Comment 1 Zhang Rui 2007-06-08 00:49:51 UTC
Please attach the acpidump output.
This exception occurs when evaluating _BST method.
Please recompile the kernel without the ACPI battery driver
and see if there is any differnece.
Comment 2 jonne 2007-06-09 10:19:02 UTC
Created attachment 11722 [details]
acpidump
Comment 3 jonne 2007-06-09 10:20:20 UTC
Thanks very much for your help, I really appreciate.
I installed acpidump and now upload the attachment.
Further, I'm going to try to compile the kernel without ACPI battery driver like
you said and will report any new findings then.
Comment 4 jonne 2007-06-15 02:09:48 UTC
Also, I compiled kernel 2.6.21.4 without ACPI battery driver and,
well, what can I say. There was no Battery exception. Most things seemed to work just fine (wireless network did not work, but I'm not an expert, there can be many reasons).
It seems the exception does not *always* occur. Yesterday I was able to boot Fedora's newest development kernel and the exception did not occur. Everything worked, except that my battery level seemed to be stuck at 2%. After a reboot several hours later I got the battery exception again.
This weekend I'm going to try to replace the battery by somebody who owns the exact same notebook. Thanks for helping me!
Comment 5 jonne 2007-06-17 05:51:49 UTC
Replacing the battery solved the problem. I must conclude it is hardware failure. Sorry for the noise, the battery exception is quite a clear indication to faulty hardware I guess. Anyways, I guess I should search my warranty and hope for the best. Thanks for helping.
I reject the bug now as INVALID, because I think the battery exception message should have been good enough. (the windows xp machine with the faulty battery runned *extremely* slow and acer's epower management tool could not be started).
Comment 6 Len Brown 2007-07-25 19:51:30 UTC
Linux should not be getting a red-zone error.
This looks like a failure in our error path.

> Replacing the battery solved the problem.

Keep your old battery, we may need it for testing:-)
Comment 7 jonne 2007-08-07 04:21:18 UTC
Just back from holidays.
I did not throw the old battery away,
so you can send me instructions on what to do with it if needed at some time.
Thanks again, Jonne.
Comment 8 Fu Michael 2007-11-06 22:30:04 UTC
reassign to Len and see what he want to play with this malfunctioning battery... thanks for the patience, Jonne.
Comment 9 Len Brown 2008-01-08 23:03:45 UTC
jonne,
can you reproduce this failure with 2.6.24-rc?
(there are been a lot of updated to battery.c in 2.6.24)
Comment 10 jonne 2008-01-25 00:55:15 UTC
Hi,

I just did another test. However, while writing this I see that you even mention a newer kernel version than I used to test. I tried Fedora's newest stable kernel

kernel-devel-2.6.23.14-107.fc8
kernel-2.6.23.14-107.fc8
kernel-headers-2.6.23.14-107.fc8

I booted with the faulty battery, and received a lot of error messages.
I'll try to see whether I can find these in files somewhere, but for now, I could only write some by hand, which was a bit difficult, because the screen was scrolling down all the time.

Here's what I could decipher:

....
BUG: Unable to handle kernel NULL pointer dereference at virtual address 000000001
....
Oops 0000 [#8] SMP
....
Call trace:
  copy_process
  do_filp_open
  do_fork
  copy_to_user
  sys_clone
  syscall_call
  xfrm_alloc_spi

I further noticed that there were several other call traces, which were very different from the one above.

Anyways, I see that you wanted me to test with a newer version, so I'll try to install that version now, and send a message here later.
Finally, if somebody wants to have my battery, it's fine with me, I'll send it over via mail, no problem.

Regards,
Jonne.
Comment 11 jonne 2008-01-25 01:18:02 UTC
Created attachment 14568 [details]
dmesg.2.6.24-0.167-rc8.git4.fc9
Comment 12 jonne 2008-01-25 01:18:29 UTC
Created attachment 14569 [details]
acpidump.2.6.24-0.167-rc8.git4.fc9
Comment 13 jonne 2008-01-25 01:22:09 UTC
You guys are really funny :)
I'm running 2.6.24-0.167-rc8.git4.fc9 kernel now, from Fedora's development repository. I see it's a rc, maybe not the very latest, but you didn't specify which one either, so hope it's okay.

Anyways, ..., I could boot with the faulty battery !!!
I didn't see any error message !!!
Wireless is working ... everything just seems ok.

However, ... ;)
The dmesg output does show some errors, so I'm attaching dmesg and acpidump.

Have a nice day,
Jonne.

ps > battery monitor in the tray also works ;) It says I'm now 50% and charging :P I'm going to pull the plug out now.
By the way, I remember in the past the battery did work sometimes, with a lot of luck, I'll try to reboot another time and see what happens, maybe this is not what you expected to happen ...
Comment 14 jonne 2008-01-25 01:23:30 UTC
I pulled the plug and now battery monitor says 100% charged, with 0:00 hours remaining ... 
Comment 15 Alexey Starikovskiy 2008-01-25 07:53:16 UTC
dmesg shows _same_ errors, but now they don't break battery driver.
slab shows problem as acpi wrote past the end of allocated buffer of 64 bytes.
size of buffer is expected to be 4*sizeof(ACPI_OBJECT) + 1 * sizeof(ACPI_OBJECT), as it is supposed to hold PBST package of 4 integers, thus 80 bytes, not 64.
There is a problem with this DSDT, which might be connected: PBST references absent Z00A object. Acpiexec from 20080123 gives buffer length as 96 bytes.
Comment 16 Alexey Starikovskiy 2008-01-25 07:56:14 UTC
jonne,
Do you have ability to patch the kernel? 
Comment 17 Robert Moore 2008-01-29 16:03:13 UTC
I've tried to reproduce this here, no luck so far.

There exists the possibility that the caller to acpi_evaluate_object is passing in an acpi_buffer object with an incorrect length. In other words, the battery driver may be reporting a pre-allocated buffer of sufficient length, but the actual buffer is smaller than reported. This could result in the buffer overflow, as the acpica code will trust the reported length of the buffer.

Alexey, please take a look at what parameters are being passed to evaluate_object.
Thanks.
Comment 18 Alexey Starikovskiy 2008-01-29 16:14:01 UTC
It is called with NULL/ACPI_ALLOCATE buffer, so it is acpi_evaluate_object, who creates wrong size.
Also, did you notice, that ut_initialize_buffer may shrink preallocated buffer?
Comment 19 Robert Moore 2008-01-31 09:14:42 UTC
Yes, the Buffer->Length is set to the amount of buffer actually used. It doesn't really "shrink" the buffer.
Comment 20 Robert Moore 2008-01-31 10:51:16 UTC
I think we will need an ACPI trace of the _BST execution to determine the reason behind both the AE_SUPPORT and buffer problems.
Comment 21 Robert Moore 2008-01-31 15:19:11 UTC
Created attachment 14667 [details]
patch to emit error messages for AE_SUPPORT


This small patch to utcopy.c will emit an informative message when the AE_SUPPORT exception occurs. Please try it, it may give us some more information.
Comment 22 jonne 2008-02-09 05:17:40 UTC
I've been working on this today. I'm following this guide: http://www.howtoforge.com/kernel_compilation_fedora
Downloaded this kernel source package: kernel-2.6.24-2.fc9.src.rpm

I could not apply the patch directly, because the patch was created with another utcopy.c version I guess. However, the comments in the file were unique in my version of the utcopy.c, so I could easily add the two extra statements there.

I'm now compiling the result - fingers crossed.
Comment 23 jonne 2008-02-09 07:46:38 UTC
Created attachment 14766 [details]
dmesg output of 2.6.24-custom kernel with utcopy.c patch
Comment 24 jonne 2008-02-09 07:47:19 UTC
Created attachment 14767 [details]
acpi output of 2.6.24-custom kernel with utcopy.c patch
Comment 25 jonne 2008-02-09 07:50:43 UTC
I cannot find any "Unsupported object type" messages anywhere
(I expected those due to the patch I applied).
Don't know what to do next, so I'll just replace my battery again
before it explodes and wait for any further suggestions ;)
Comment 26 Alexey Starikovskiy 2008-02-29 00:57:59 UTC
Please check if this patch helps:

From: Lin Ming <ming.m.lin@intel.com>

Fix a memory overflow bug when copying
NULL internal package element object to external.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 drivers/acpi/utilities/utobject.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/drivers/acpi/utilities/utobject.c
===================================================================
--- linux-2.6.orig/drivers/acpi/utilities/utobject.c
+++ linux-2.6/drivers/acpi/utilities/utobject.c
@@ -432,7 +432,7 @@ acpi_ut_get_simple_object_size(union acp
 	 * element -- which is legal)
 	 */
 	if (!internal_object) {
-		*obj_length = 0;
+		*obj_length = sizeof(union acpi_object);
 		return_ACPI_STATUS(AE_OK);
 	}
 
Comment 27 Alexey Starikovskiy 2008-03-15 00:49:53 UTC
*** Bug 10202 has been marked as a duplicate of this bug. ***
Comment 28 Len Brown 2008-03-17 19:41:36 UTC
Created attachment 15317 [details]
patch vs 2.26.25-rc6 from bug 10202

per bug #10132
Lin-Ming's patch in comment #26 is now upstream.

Here is Alexey's patch from bug 10202,
which has been applied to the ACPI tree
to address this issue.
Comment 29 Adrian Bunk 2008-03-18 09:20:22 UTC
now in Linus' tree as commit b8a1bdb14940946fcf0438a6337b2a6c54294fb8
Comment 30 Alexey Starikovskiy 2008-04-03 09:35:43 UTC
*** Bug 10339 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.