Bug 6980
Summary: | Reading battery state causes oops, or AE_NOT_FOUND or AE_AML_OPERAND_TYPE | ||
---|---|---|---|
Product: | ACPI | Reporter: | Johan Rutgeerts (johan.rutgeerts) |
Component: | ACPICA-Core | Assignee: | Robert Moore (Robert.Moore) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, linux |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.15.22, 2.6.17.7 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
oops report with 2.6.18-rc4
acpidump output |
Description
Johan Rutgeerts
2006-08-09 07:50:45 UTC
Looks like something is reading /proc/acpi/battery/*/state and that is crashing the AML interpreter. If this is specific to the battery, then it will go away if you "rmmod battery". To make it happen sooner, you could probably "cat /proc/acpi/battery/*/state" in a loop. Can you reproduce this failure with 2.6.18-rc4 or later? It would be interesting to know if this has _always_ happened -- ie. if you boot a kernel even older than 2.6.15 do you see this? (eg. 2.6.9) Please attach the output from acpidump, available in pmtools here: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ Created attachment 8745 [details]
oops report with 2.6.18-rc4
It also happens with 2.6.18-rc4.
With 2.6.18-rc4, I get when booting: [ 84.272000] Lukewarm IQ detected in hotplug locking [ 84.272000] BUG: warning at kernel/cpu.c:38/lock_cpu_hotplug() [ 84.272000] [<c0138fc5>] lock_cpu_hotplug+0x74/0x7d [ 84.272000] [<c01306f4>] __create_workqueue+0x39/0x11d [ 84.272000] [<e0a6b441>] cpufreq_governor_dbs+0x29d/0x2e7 [cpufreq_ondemand] [ 84.272000] [<c024540f>] __cpufreq_governor+0x3c/0xc2 [ 84.272000] [<c024565e>] __cpufreq_set_policy+0xc5/0x10c [ 84.272000] [<c0245857>] store_scaling_governor+0xa2/0x1ba [ 84.272000] [<c0246162>] handle_update+0x0/0x5 [ 84.272000] [<c01d3900>] kobject_set_name+0x80/0x98 [ 84.272000] [<c02457b5>] store_scaling_governor+0x0/0x1ba [ 84.272000] [<c02452b3>] store+0x2e/0x3e [ 84.272000] [<c01a38a3>] sysfs_write_file+0x84/0xc9 [ 84.272000] [<c01a381f>] sysfs_write_file+0x0/0xc9 [ 84.272000] [<c0167c2b>] vfs_write+0xa4/0x162 [ 84.272000] [<c0168598>] sys_write+0x41/0x6a [ 84.272000] [<c0102dd5>] sysenter_past_esp+0x56/0x79 Not sure if this is related. When issuing "cat /proc/acpi/battery/BAT1/state" in a loop, every once in a while i get an error, with each time in dmesg: ACPI Error (psargs-0355): [STAT] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.BAT1._BST] (Node df8c4784), AE_NOT_FOUND ACPI Exception (acpi_battery-0206): AE_NOT_FOUND, Evaluating _BST [20060707] and sometimes also: ACPI Error (exstore-0296): Target is not a Reference or Constant object - Integer [dfe7ee34] [20060707] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.BAT1._BST] (Node df8c4784), AE_AML_OPERAND_TYPE ACPI Exception (acpi_battery-0206): AE_AML_OPERAND_TYPE, Evaluating _BST [20060707] I didn't succeed in reproducing the oops, but when running the loop long enough I get hundreds of thousands of lines in the kernel log saying ACPI Warning (utdelete-0397): Large Reference Count (801) in object dfe7eaec [20060707] with the count running up. This more or less completely uses up my processing power. The exact command I issued was: "for i in `seq 1 10000`; do cat /proc/acpi/battery/BAT1/state; dmesg | tail -10; sleep 0.3; done" Maybe 0.3 seconds sleep is too fast? As far as I know, it didn't always happen, or at least not as frequently. It must have started sometime with an ubuntu update installing a newer kernel. I'll try with older versions and report. Thanks, Johan Created attachment 8746 [details]
acpidump output
It would be interesting to know how often "once in a while" is. So far, I have not been able to reproduce the problem. Of course, it will be interesting to know if earlier versions did not show the problem. > earlier versions ...
From above we know that this happens with
ACPICA 20050902 (Linux 2.6.15.stable)
ACPICA 20060127 (Linux 2.6.16 and Linux 2.6.17)
ACPICA 20060707 (Linux 2.6.18)
So it looks like we've been fooled by this machine for some time
and this isn't the result of a recent regression.
> [ 84.272000] Lukewarm IQ detected in hotplug locking
You can ignore this.
It isn't related and should be gone at 2.6.18.final.
> to know how often "once in a while" is. So far, I have Sometimes 3 minutes, sometimes 15... It usually doesn't take long. Note that until now I didn't succeed in reproducing the actual oops by doing the 'cat /proc/acpi/.../state'. However, after a few minutes, the kernel consistently gets into a loop flooding the logs with "(utdelete-0397): Large Reference Count" messages. Sometimes this is preceded by some AE_NOT_FOUND and AE_AML_OPERAND_TYPE errors, sometimes not. > not been able to reproduce the problem. Just spoke to a colleague of mine who has the same portable as I (Dell Inspiron 2650): he gets the same oops'es if he enables the Gnome battery status applet. > Of course, it will be interesting to know if earlier versions did not show the problem. I'll report on this asap, I'm a bit short on time and I'm having troubles compiling an older kernel which actually boots, using the .config's I have. Thank you! Johan When booting with an Ubuntu Breezy live CD, with kernel version 2.6.12, I don't seem to be getting any errors. Correction: With 2.6.12, I get many of these: [4302572.547000] read EC, IB not empty [4302572.597000] read EC, OB not full [4302572.597000] ACPI-0423: *** Error: Handler for [EmbeddedControl] returned AE_TIME [4302572.597000] ACPI-0508: *** Error: Method execution failed [\_SB_.PCI0.LPCB.BAT1._BST] (Node dfec5f20), AE_TIME but AFAIK this was another issue which is resolved in more recent kernel versions. I can reproduce one of the problems when I hit the method with 2 threads: Creating 2 threads to execute 186A0 times each ACPI Error (psargs-0459): [STAT] Namespace lookup failure, AE_NOT_FOUND 0 executions **** AcpiExec: Exception AE_NOT_FOUND during execution of method [_BST] Opcode [-NamePath-] @A6 **** Exception AE_NOT_FOUND during execution of method [\_SB_.PCI0.LPCB.BAT1._BST] (Node 00461298) You should try setting the AcpiGbl_AllMethodsSerialized flag to TRUE in order to force method serialization. > You should try setting the AcpiGbl_AllMethodsSerialized flag to TRUE
> in order to force method serialization.
And how do I do that?
This looks like a bug in the DSDT. The _BST method is declared "NotSerialized", yet it creates a namespace object (STAT) AND it performs blocking I/O operations on the Embedded Controller that will relinquish the interpreter. A second thread that attempts to reenter the method at this time (during EC I/O) will fail, in unpredictable (timing- dependent) ways. Method (_BST, 0, NotSerialized) { Name (STAT, Package (0x04) { 0x01, 0x00, 0x0F28, 0x39D0 }) ... /* EC operations to get battery status */ Return (STAT) } Changing the declaration above to: Method (_BST, 0, Serialized) Fixes the problem here. You can either fix the DSDT and override the BIOS version, or set the serialized flag in the Linux configuration, I think it's like this: acpi_serialize = TRUE Boot with the Linux kernel cmdline option "acpi_serialize". Internally that sets acpi_gbl_all_methods_serialized = TRUE. The "acpi_serialize" indeed seems to do the trick. i didn't get a chance to test altering the DSDT yet. The auto-serialize mechanism has been fixed for ACPICA version 20060912: Fixed a regression where an error was no longer emitted if a control method attempts to create 2 objects of the same name. This once again returns AE_ALREADY_EXISTS. When this exception occurs, it invokes the mechanism that will dynamically serialize the control method to possibly prevent future errors. *** Bug 7386 has been marked as a duplicate of this bug. *** ACPICA 20060912 shipped in 2.6.21-rc1 closed. |