Most recent kernel where this bug did not occur: unknown (also occurs with 2.6.15.22) Distribution: Ubuntu Hardware Environment: 686 Software Environment: kde Problem Description: frequent (e.g. daily) kernel oopses: [17186277.420000] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002 [17186277.420000] printing eip: [17186277.420000] c01f156a [17186277.420000] *pde = 00000000 [17186277.420000] Oops: 0000 [#1] [17186277.420000] SMP [17186277.420000] Modules linked in: binfmt_misc rfcomm l2cap bluetooth ppdev speedstep_ich speedstep_lib cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave cpufreq_ondemand cpufreq_conservative video button battery container ac dm_mod ipv6 af_packet md_mod ide_cs lp pcmcia 3c59x snd_intel8x0 snd_ac97_codec snd_ac97_bus yenta_socket rsrc_nonstatic mii pcmcia_core joydev snd_pcm_oss snd_mixer_oss hw_random snd_pcm snd_timer snd parport_pc parport psmouse soundcore snd_page_alloc floppy intel_agp agpgart shpchp pci_hotplug serio_raw tsdev evdev usbhid ext3 jbd mbcache ide_generic uhci_hcd usbcore ide_cd cdrom ide_disk piix generic thermal processor fan vga16fb cfbcopyarea vgastate cfbimgblt cfbfillrect [17186277.420000] CPU: 0 [17186277.420000] EIP: 0060:[acpi_ex_store+273/519] Not tainted VLI [17186277.420000] EFLAGS: 00210282 (2.6.17.7 #1) [17186277.420000] EIP is at acpi_ex_store+0x111/0x207 [17186277.420000] eax: 00000000 ebx: dfe7b8bc ecx: 00000000 edx: 00000000 [17186277.420000] esi: 00000000 edi: dfe7b754 ebp: 00000014 esp: d9ee3e04 [17186277.420000] ds: 007b es: 007b ss: 0068 [17186277.420000] Process hald (pid: 3808, threadinfo=d9ee2000 task=defd1030) [17186277.420000] Stack: dfe7b8bc d87ed5bc 00000000 d87ed400 c01f0eea d87ed400 d87ed5bc 00000003 [17186277.420000] 00000000 dfe7b8bc d87ed400 00000000 c75a93a0 00000007 c01e84a8 d9ee3e60 [17186277.420000] d87ed400 00000000 d87ed5ec c01f61b3 e080bfac 00000000 00200286 c75a9288 [17186277.420000] Call Trace: [17186277.420000] <c01f0eea> acpi_ex_opcode_2A_1T_1R+0x33b/0x368 <c01e84a8> acpi_ds_exec_end_op+0xb3/0x363 [17186277.420000] <c01f61b3> acpi_ps_parse_loop+0x55f/0x848 <c01f5acb> acpi_ps_parse_aml+0x45/0x1ce [17186277.420000] <c01f68c8> acpi_ps_execute_pass+0x6b/0x7b <c01f698c> acpi_ps_execute_method+0xb4/0x140 [17186277.420000] <c01f3fcd> acpi_ns_evaluate_by_handle+0xd1/0x12a <c01f4137> acpi_ns_evaluate_relative+0xa7/0xc8 [17186277.420000] <c01f38cd> acpi_evaluate_object+0x109/0x1b4 <c0170003> __link_path_walk+0x4b2/0xed4 [17186277.420000] <e0a5a66b> acpi_battery_read_state+0x94/0x242 [battery] <c0180129> seq_read+0x0/0x2ba [17186277.420000] <c0180190> seq_read+0x67/0x2ba <c0180129> seq_read+0x0/0x2ba [17186277.420000] <c01623fd> vfs_read+0xa2/0x158 <c0162ea0> sys_read+0x41/0x6a [17186277.420000] <c0102ceb> sysenter_past_esp+0x54/0x75 [17186277.420000] Code: 8b 43 18 8b 38 89 e2 89 f0 e8 1d 96 00 00 89 c2 85 c0 0f 85 01 01 00 00 31 f6 85 ff 75 0a eb 13 89 f8 e8 b2 97 00 00 46 8b 43 10 <0f> b7 40 02 39 c6 72 ed 8b 53 18 8b 04 24 89 02 be 01 00 00 00 [17186277.420000] EIP: [acpi_ex_store+273/519] acpi_ex_store+0x111/0x207 SS:ESP 0068:d9ee3e04 dmesg output of similar oops'es for other kernel versions can be found at <http://people.mech.kuleuven.be/~jrutgeer/oopses/>. Greetings, Johan
Looks like something is reading /proc/acpi/battery/*/state and that is crashing the AML interpreter. If this is specific to the battery, then it will go away if you "rmmod battery". To make it happen sooner, you could probably "cat /proc/acpi/battery/*/state" in a loop. Can you reproduce this failure with 2.6.18-rc4 or later? It would be interesting to know if this has _always_ happened -- ie. if you boot a kernel even older than 2.6.15 do you see this? (eg. 2.6.9) Please attach the output from acpidump, available in pmtools here: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
Created attachment 8745 [details] oops report with 2.6.18-rc4 It also happens with 2.6.18-rc4.
With 2.6.18-rc4, I get when booting: [ 84.272000] Lukewarm IQ detected in hotplug locking [ 84.272000] BUG: warning at kernel/cpu.c:38/lock_cpu_hotplug() [ 84.272000] [<c0138fc5>] lock_cpu_hotplug+0x74/0x7d [ 84.272000] [<c01306f4>] __create_workqueue+0x39/0x11d [ 84.272000] [<e0a6b441>] cpufreq_governor_dbs+0x29d/0x2e7 [cpufreq_ondemand] [ 84.272000] [<c024540f>] __cpufreq_governor+0x3c/0xc2 [ 84.272000] [<c024565e>] __cpufreq_set_policy+0xc5/0x10c [ 84.272000] [<c0245857>] store_scaling_governor+0xa2/0x1ba [ 84.272000] [<c0246162>] handle_update+0x0/0x5 [ 84.272000] [<c01d3900>] kobject_set_name+0x80/0x98 [ 84.272000] [<c02457b5>] store_scaling_governor+0x0/0x1ba [ 84.272000] [<c02452b3>] store+0x2e/0x3e [ 84.272000] [<c01a38a3>] sysfs_write_file+0x84/0xc9 [ 84.272000] [<c01a381f>] sysfs_write_file+0x0/0xc9 [ 84.272000] [<c0167c2b>] vfs_write+0xa4/0x162 [ 84.272000] [<c0168598>] sys_write+0x41/0x6a [ 84.272000] [<c0102dd5>] sysenter_past_esp+0x56/0x79 Not sure if this is related. When issuing "cat /proc/acpi/battery/BAT1/state" in a loop, every once in a while i get an error, with each time in dmesg: ACPI Error (psargs-0355): [STAT] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.BAT1._BST] (Node df8c4784), AE_NOT_FOUND ACPI Exception (acpi_battery-0206): AE_NOT_FOUND, Evaluating _BST [20060707] and sometimes also: ACPI Error (exstore-0296): Target is not a Reference or Constant object - Integer [dfe7ee34] [20060707] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.BAT1._BST] (Node df8c4784), AE_AML_OPERAND_TYPE ACPI Exception (acpi_battery-0206): AE_AML_OPERAND_TYPE, Evaluating _BST [20060707] I didn't succeed in reproducing the oops, but when running the loop long enough I get hundreds of thousands of lines in the kernel log saying ACPI Warning (utdelete-0397): Large Reference Count (801) in object dfe7eaec [20060707] with the count running up. This more or less completely uses up my processing power. The exact command I issued was: "for i in `seq 1 10000`; do cat /proc/acpi/battery/BAT1/state; dmesg | tail -10; sleep 0.3; done" Maybe 0.3 seconds sleep is too fast? As far as I know, it didn't always happen, or at least not as frequently. It must have started sometime with an ubuntu update installing a newer kernel. I'll try with older versions and report. Thanks, Johan
Created attachment 8746 [details] acpidump output
It would be interesting to know how often "once in a while" is. So far, I have not been able to reproduce the problem. Of course, it will be interesting to know if earlier versions did not show the problem.
> earlier versions ... From above we know that this happens with ACPICA 20050902 (Linux 2.6.15.stable) ACPICA 20060127 (Linux 2.6.16 and Linux 2.6.17) ACPICA 20060707 (Linux 2.6.18) So it looks like we've been fooled by this machine for some time and this isn't the result of a recent regression.
> [ 84.272000] Lukewarm IQ detected in hotplug locking You can ignore this. It isn't related and should be gone at 2.6.18.final.
> to know how often "once in a while" is. So far, I have Sometimes 3 minutes, sometimes 15... It usually doesn't take long. Note that until now I didn't succeed in reproducing the actual oops by doing the 'cat /proc/acpi/.../state'. However, after a few minutes, the kernel consistently gets into a loop flooding the logs with "(utdelete-0397): Large Reference Count" messages. Sometimes this is preceded by some AE_NOT_FOUND and AE_AML_OPERAND_TYPE errors, sometimes not. > not been able to reproduce the problem. Just spoke to a colleague of mine who has the same portable as I (Dell Inspiron 2650): he gets the same oops'es if he enables the Gnome battery status applet. > Of course, it will be interesting to know if earlier versions did not show the problem. I'll report on this asap, I'm a bit short on time and I'm having troubles compiling an older kernel which actually boots, using the .config's I have. Thank you! Johan
When booting with an Ubuntu Breezy live CD, with kernel version 2.6.12, I don't seem to be getting any errors.
Correction: With 2.6.12, I get many of these: [4302572.547000] read EC, IB not empty [4302572.597000] read EC, OB not full [4302572.597000] ACPI-0423: *** Error: Handler for [EmbeddedControl] returned AE_TIME [4302572.597000] ACPI-0508: *** Error: Method execution failed [\_SB_.PCI0.LPCB.BAT1._BST] (Node dfec5f20), AE_TIME but AFAIK this was another issue which is resolved in more recent kernel versions.
I can reproduce one of the problems when I hit the method with 2 threads: Creating 2 threads to execute 186A0 times each ACPI Error (psargs-0459): [STAT] Namespace lookup failure, AE_NOT_FOUND 0 executions **** AcpiExec: Exception AE_NOT_FOUND during execution of method [_BST] Opcode [-NamePath-] @A6 **** Exception AE_NOT_FOUND during execution of method [\_SB_.PCI0.LPCB.BAT1._BST] (Node 00461298) You should try setting the AcpiGbl_AllMethodsSerialized flag to TRUE in order to force method serialization.
> You should try setting the AcpiGbl_AllMethodsSerialized flag to TRUE > in order to force method serialization. And how do I do that?
This looks like a bug in the DSDT. The _BST method is declared "NotSerialized", yet it creates a namespace object (STAT) AND it performs blocking I/O operations on the Embedded Controller that will relinquish the interpreter. A second thread that attempts to reenter the method at this time (during EC I/O) will fail, in unpredictable (timing- dependent) ways. Method (_BST, 0, NotSerialized) { Name (STAT, Package (0x04) { 0x01, 0x00, 0x0F28, 0x39D0 }) ... /* EC operations to get battery status */ Return (STAT) } Changing the declaration above to: Method (_BST, 0, Serialized) Fixes the problem here. You can either fix the DSDT and override the BIOS version, or set the serialized flag in the Linux configuration, I think it's like this: acpi_serialize = TRUE
Boot with the Linux kernel cmdline option "acpi_serialize". Internally that sets acpi_gbl_all_methods_serialized = TRUE.
The "acpi_serialize" indeed seems to do the trick. i didn't get a chance to test altering the DSDT yet.
The auto-serialize mechanism has been fixed for ACPICA version 20060912: Fixed a regression where an error was no longer emitted if a control method attempts to create 2 objects of the same name. This once again returns AE_ALREADY_EXISTS. When this exception occurs, it invokes the mechanism that will dynamically serialize the control method to possibly prevent future errors.
*** Bug 7386 has been marked as a duplicate of this bug. ***
ACPICA 20060912 shipped in 2.6.21-rc1 closed.