Bug 6980 - Reading battery state causes oops, or AE_NOT_FOUND or AE_AML_OPERAND_TYPE
Summary: Reading battery state causes oops, or AE_NOT_FOUND or AE_AML_OPERAND_TYPE
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Robert Moore
URL:
Keywords:
: 7386 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-08-09 07:50 UTC by Johan Rutgeerts
Modified: 2007-03-08 00:52 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.15.22, 2.6.17.7
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
oops report with 2.6.18-rc4 (3.71 KB, text/plain)
2006-08-10 06:40 UTC, Johan Rutgeerts
Details
acpidump output (77.16 KB, text/plain)
2006-08-10 07:31 UTC, Johan Rutgeerts
Details

Description Johan Rutgeerts 2006-08-09 07:50:45 UTC
Most recent kernel where this bug did not occur: unknown (also occurs with 
2.6.15.22)
Distribution: Ubuntu
Hardware Environment: 686
Software Environment: kde
Problem Description: frequent (e.g. daily) kernel oopses:

[17186277.420000] BUG: unable to handle kernel NULL pointer dereference at 
virtual address 00000002
 [17186277.420000]  printing eip:
 [17186277.420000] c01f156a
 [17186277.420000] *pde = 00000000
 [17186277.420000] Oops: 0000 [#1]
 [17186277.420000] SMP 
 [17186277.420000] Modules linked in: binfmt_misc rfcomm l2cap bluetooth ppdev 
speedstep_ich speedstep_lib cpufreq_userspace cpufreq_stats freq_table 
cpufreq_powersave cpufreq_ondemand cpufreq_conservative video button battery 
container ac dm_mod ipv6 af_packet md_mod ide_cs lp pcmcia 3c59x snd_intel8x0 
snd_ac97_codec snd_ac97_bus yenta_socket rsrc_nonstatic mii pcmcia_core joydev 
snd_pcm_oss snd_mixer_oss hw_random snd_pcm snd_timer snd parport_pc parport 
psmouse soundcore snd_page_alloc floppy intel_agp agpgart shpchp pci_hotplug 
serio_raw tsdev evdev usbhid ext3 jbd mbcache ide_generic uhci_hcd usbcore 
ide_cd cdrom ide_disk piix generic thermal processor fan vga16fb cfbcopyarea 
vgastate cfbimgblt cfbfillrect
 [17186277.420000] CPU:    0
 [17186277.420000] EIP:    0060:[acpi_ex_store+273/519]    Not tainted VLI
 [17186277.420000] EFLAGS: 00210282   (2.6.17.7 #1) 
 [17186277.420000] EIP is at acpi_ex_store+0x111/0x207
 [17186277.420000] eax: 00000000   ebx: dfe7b8bc   ecx: 00000000   edx: 
00000000
 [17186277.420000] esi: 00000000   edi: dfe7b754   ebp: 00000014   esp: 
d9ee3e04
 [17186277.420000] ds: 007b   es: 007b   ss: 0068
 [17186277.420000] Process hald (pid: 3808, threadinfo=d9ee2000 task=defd1030)
 [17186277.420000] Stack: dfe7b8bc d87ed5bc 00000000 d87ed400 c01f0eea 
d87ed400 d87ed5bc 00000003 
 [17186277.420000]        00000000 dfe7b8bc d87ed400 00000000 c75a93a0 
00000007 c01e84a8 d9ee3e60 
 [17186277.420000]        d87ed400 00000000 d87ed5ec c01f61b3 e080bfac 
00000000 00200286 c75a9288 
 [17186277.420000] Call Trace:
 [17186277.420000]  <c01f0eea> acpi_ex_opcode_2A_1T_1R+0x33b/0x368  <c01e84a8> 
acpi_ds_exec_end_op+0xb3/0x363
 [17186277.420000]  <c01f61b3> acpi_ps_parse_loop+0x55f/0x848  <c01f5acb> 
acpi_ps_parse_aml+0x45/0x1ce
 [17186277.420000]  <c01f68c8> acpi_ps_execute_pass+0x6b/0x7b  <c01f698c> 
acpi_ps_execute_method+0xb4/0x140
 [17186277.420000]  <c01f3fcd> acpi_ns_evaluate_by_handle+0xd1/0x12a  
<c01f4137> acpi_ns_evaluate_relative+0xa7/0xc8
 [17186277.420000]  <c01f38cd> acpi_evaluate_object+0x109/0x1b4  <c0170003> 
__link_path_walk+0x4b2/0xed4
 [17186277.420000]  <e0a5a66b> acpi_battery_read_state+0x94/0x242 [battery]  
<c0180129> seq_read+0x0/0x2ba
 [17186277.420000]  <c0180190> seq_read+0x67/0x2ba  <c0180129> 
seq_read+0x0/0x2ba
 [17186277.420000]  <c01623fd> vfs_read+0xa2/0x158  <c0162ea0> 
sys_read+0x41/0x6a
 [17186277.420000]  <c0102ceb> sysenter_past_esp+0x54/0x75 
 [17186277.420000] Code: 8b 43 18 8b 38 89 e2 89 f0 e8 1d 96 00 00 89 c2 85 c0 
0f 85 01 01 00 00 31 f6 85 ff 75 0a eb 13 89 f8 e8 b2 97 00 00 46 8b 43 10 
<0f> b7 40 02 39 c6 72 ed 8b 53 18 8b 04 24 89 02 be 01 00 00 00 
 [17186277.420000] EIP: [acpi_ex_store+273/519] acpi_ex_store+0x111/0x207 
SS:ESP 0068:d9ee3e04




dmesg output of similar oops'es for other kernel versions can be found at
<http://people.mech.kuleuven.be/~jrutgeer/oopses/>.


Greetings,
Johan
Comment 1 Len Brown 2006-08-09 22:45:41 UTC
Looks like something is reading /proc/acpi/battery/*/state
and that is crashing the AML interpreter.

If this is specific to the battery, then it will go away
if you "rmmod battery".  To make it happen sooner, you could
probably "cat /proc/acpi/battery/*/state" in a loop.

Can you reproduce this failure with 2.6.18-rc4 or later?
It would be interesting to know if this has _always_ happened -- 
ie. if you boot a kernel even older than 2.6.15 do you see this?
(eg. 2.6.9)


Please attach the output from acpidump, available in pmtools here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
Comment 2 Johan Rutgeerts 2006-08-10 06:40:29 UTC
Created attachment 8745 [details]
oops report with 2.6.18-rc4

It also happens with 2.6.18-rc4.
Comment 3 Johan Rutgeerts 2006-08-10 07:28:58 UTC
With 2.6.18-rc4, I get when booting:

[   84.272000] Lukewarm IQ detected in hotplug locking
[   84.272000] BUG: warning at kernel/cpu.c:38/lock_cpu_hotplug()
[   84.272000]  [<c0138fc5>] lock_cpu_hotplug+0x74/0x7d
[   84.272000]  [<c01306f4>] __create_workqueue+0x39/0x11d
[   84.272000]  [<e0a6b441>] cpufreq_governor_dbs+0x29d/0x2e7 
[cpufreq_ondemand]
[   84.272000]  [<c024540f>] __cpufreq_governor+0x3c/0xc2
[   84.272000]  [<c024565e>] __cpufreq_set_policy+0xc5/0x10c
[   84.272000]  [<c0245857>] store_scaling_governor+0xa2/0x1ba
[   84.272000]  [<c0246162>] handle_update+0x0/0x5
[   84.272000]  [<c01d3900>] kobject_set_name+0x80/0x98
[   84.272000]  [<c02457b5>] store_scaling_governor+0x0/0x1ba
[   84.272000]  [<c02452b3>] store+0x2e/0x3e
[   84.272000]  [<c01a38a3>] sysfs_write_file+0x84/0xc9
[   84.272000]  [<c01a381f>] sysfs_write_file+0x0/0xc9
[   84.272000]  [<c0167c2b>] vfs_write+0xa4/0x162
[   84.272000]  [<c0168598>] sys_write+0x41/0x6a
[   84.272000]  [<c0102dd5>] sysenter_past_esp+0x56/0x79

Not sure if this is related.



When issuing "cat /proc/acpi/battery/BAT1/state" in a loop, every once in a 
while i get an error, with each time in dmesg:

ACPI Error (psargs-0355): [STAT] Namespace lookup failure, AE_NOT_FOUND
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.LPCB.BAT1._BST] (Node df8c4784), AE_NOT_FOUND
ACPI Exception (acpi_battery-0206): AE_NOT_FOUND, Evaluating _BST [20060707]

and sometimes also:

ACPI Error (exstore-0296): Target is not a Reference or Constant object - 
Integer [dfe7ee34] [20060707]
ACPI Error (psparse-0537): Method parse/execution failed 
[\_SB_.PCI0.LPCB.BAT1._BST] (Node df8c4784), AE_AML_OPERAND_TYPE
ACPI Exception (acpi_battery-0206): AE_AML_OPERAND_TYPE, Evaluating _BST 
[20060707]



I didn't succeed in reproducing the oops, but when running the loop long 
enough I get hundreds of thousands of lines in the kernel log saying

ACPI Warning (utdelete-0397): Large Reference Count (801) in object dfe7eaec 
[20060707]


with the count running up. This more or less completely uses up my processing 
power.

The exact command I issued was:
"for i in `seq 1 10000`; do cat /proc/acpi/battery/BAT1/state; dmesg |
tail -10; sleep 0.3; done"

Maybe 0.3 seconds sleep is too fast?


As far as I know, it didn't always happen, or at least not as frequently. It 
must have started sometime with an ubuntu update installing a newer kernel. 
I'll try with older versions and report.


Thanks,

Johan
Comment 4 Johan Rutgeerts 2006-08-10 07:31:03 UTC
Created attachment 8746 [details]
acpidump output
Comment 5 Robert Moore 2006-08-10 16:13:46 UTC
It would be interesting to know how often "once in a while" is. So far, I have 
not been able to reproduce the problem. Of course, it will be interesting to 
know if earlier versions did not show the problem.
Comment 6 Len Brown 2006-08-11 09:15:13 UTC
> earlier versions ...

From above we know that this happens with
ACPICA 20050902 (Linux 2.6.15.stable)
ACPICA 20060127 (Linux 2.6.16 and Linux 2.6.17)
ACPICA 20060707 (Linux 2.6.18)

So it looks like we've been fooled by this machine for some time
and this isn't the result of a recent regression.

Comment 7 Len Brown 2006-08-11 11:05:29 UTC
> [   84.272000] Lukewarm IQ detected in hotplug locking 
 
You can ignore this. 
It isn't related and should be gone at 2.6.18.final. 
 
Comment 8 Johan Rutgeerts 2006-08-12 04:14:27 UTC
> to know how often "once in a while" is. So far, I have 

Sometimes 3 minutes, sometimes 15... It usually doesn't take long.

Note that until now I didn't succeed in reproducing the actual oops by doing 
the 'cat /proc/acpi/.../state'. However, after a few minutes, the kernel 
consistently gets into a loop flooding the logs with "(utdelete-0397): Large 
Reference Count" messages. Sometimes this is preceded by some AE_NOT_FOUND and 
AE_AML_OPERAND_TYPE errors, sometimes not.




> not been able to reproduce the problem.

Just spoke to a colleague of mine who has the same portable as I (Dell 
Inspiron 2650): he gets the same oops'es if he enables the Gnome battery 
status applet.



> Of course, it will be interesting to know if earlier versions did not show 
the problem.

I'll report on this asap, I'm a bit short on time and I'm having troubles 
compiling an older kernel which actually boots, using the .config's I have. 


Thank you!

Johan
Comment 9 Johan Rutgeerts 2006-08-12 13:55:07 UTC
When booting with an Ubuntu Breezy live CD, with kernel version 2.6.12, I don't
seem to be getting any errors.
Comment 10 Johan Rutgeerts 2006-08-12 13:58:24 UTC
Correction:

With 2.6.12, I get many of these:

[4302572.547000] read EC, IB not empty
[4302572.597000] read EC, OB not full
[4302572.597000]     ACPI-0423: *** Error: Handler for [EmbeddedControl]
returned AE_TIME
[4302572.597000]     ACPI-0508: *** Error: Method execution failed
[\_SB_.PCI0.LPCB.BAT1._BST] (Node dfec5f20), AE_TIME


but AFAIK this was another issue which is resolved in more recent kernel versions.
Comment 11 Robert Moore 2006-08-31 14:11:58 UTC
I can reproduce one of the problems when I hit the method with 2 threads:

Creating 2 threads to execute 186A0 times each
ACPI Error (psargs-0459): [STAT] Namespace lookup failure, AE_NOT_FOUND
0 executions
**** AcpiExec: Exception AE_NOT_FOUND during execution of method [_BST] Opcode 
[-NamePath-] @A6

**** Exception AE_NOT_FOUND during execution of method 
[\_SB_.PCI0.LPCB.BAT1._BST] (Node 00461298)

You should try setting the AcpiGbl_AllMethodsSerialized flag to TRUE in order 
to force method serialization.


Comment 12 Johan Rutgeerts 2006-09-01 03:52:49 UTC
> You should try setting the AcpiGbl_AllMethodsSerialized flag to TRUE
> in order to force method serialization.

And how do I do that?


Comment 13 Robert Moore 2006-09-01 11:43:05 UTC
This looks like a bug in the DSDT. The _BST method is 
declared "NotSerialized", yet it creates a namespace object (STAT) AND it 
performs blocking I/O operations on the Embedded Controller that will 
relinquish the interpreter. A second thread that attempts to reenter the 
method at this time (during EC I/O) will fail, in unpredictable (timing-
dependent) ways.

Method (_BST, 0, NotSerialized)
{
    Name (STAT, Package (0x04)
    {
        0x01, 
        0x00, 
        0x0F28, 
        0x39D0
    })

    ... /* EC operations to get battery status */


    Return (STAT)
}

Changing the declaration above to:

Method (_BST, 0, Serialized)

Fixes the problem here.

You can either fix the DSDT and override the BIOS version, or set the 
serialized flag in the Linux configuration, I think it's like this:

acpi_serialize = TRUE
Comment 14 Len Brown 2006-09-01 12:19:17 UTC
Boot with the Linux kernel cmdline option "acpi_serialize".
Internally that sets acpi_gbl_all_methods_serialized = TRUE.
Comment 15 Johan Rutgeerts 2006-09-07 01:08:23 UTC
The "acpi_serialize" indeed seems to do the trick.

i didn't get a chance to test altering the DSDT yet.
Comment 16 Robert Moore 2006-09-12 14:15:21 UTC
The auto-serialize mechanism has been fixed for ACPICA version 20060912:

Fixed a regression where an error was no longer emitted if a control method 
attempts to create 2 objects of the same name. This once again returns 
AE_ALREADY_EXISTS. When this exception occurs, it invokes the mechanism that 
will dynamically serialize the control method to possibly prevent future 
errors.

Comment 17 Vladimir Lebedev 2006-11-12 07:12:01 UTC
*** Bug 7386 has been marked as a duplicate of this bug. ***
Comment 18 Len Brown 2007-03-08 00:52:50 UTC
ACPICA 20060912 shipped in 2.6.21-rc1
closed.

Note You need to log in before you can comment on or make changes to this bug.