With CONFIG_DEBUG_SLAB=y or CONFIG_SLUB_DEBUG=y, the kernel crashes upon modprobe acpi-cpufreq. This is true from 2.6.20 (earliest tested) through 2.6.23, and includes 2.6.24-rc3 BUG: unable to handle kernel paging request at virtual address 6b6b6b71 printing eip: c025d8ea *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: acpi_cpufreq freq_table dm_mod parport_pc snd_hda_intel parport ehci_hcd button thermal intel_agp agpgart Pid: 2496, comm: modprobe Not tainted (2.6.24-rc3-g2ffbb837 #3) EIP: 0060:[<c025d8ea>] EFLAGS: 00010202 CPU: 0 EIP is at acpi_ns_get_parent_node+0x6/0x14 EAX: 6b6b6b6b EBX: 0000000a ECX: f75ea650 EDX: 00000000 ESI: f75ea650 EDI: f75ea650 EBP: c0266aef ESP: f76f4d94 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process modprobe (pid: 2496, ti=f76f4000 task=f7f5f080 task.ti=f76f4000) Stack: c025d6c9 f76f4dbc c0266a20 f75ea710 00000000 00000000 c0266aef c0266ee3 f76f4dec c0266b12 00000010 f7c04458 00000000 c0264ecf f76f4dec f7c04de8 f75ea590 f76f4e38 f7047c6c 00000000 c0266ac7 f76f4dec 00000074 00000020 Call Trace: [<c025d6c9>] acpi_ns_get_pathname_length+0xd/0x24 [<c0266a20>] acpi_ut_get_simple_object_size+0x70/0xd9 [<c0266aef>] acpi_ut_get_element_length+0x0/0x43 [<c0266ee3>] acpi_ut_create_generic_state+0x24/0x3d [<c0266b12>] acpi_ut_get_element_length+0x23/0x43 [<c0264ecf>] acpi_ut_walk_package_tree+0x4a/0xb7 [<c0266ac7>] acpi_ut_get_object_size+0x3e/0x66 [<c025d0c7>] acpi_evaluate_object+0x1f2/0x297 [<c04265be>] __mutex_lock_slowpath+0x1ee/0x1f6 [<c02718b0>] acpi_processor_preregister_performance+0xb6/0x55e [<c0166e96>] percpu_populate+0x20/0x26 [<c01659ab>] __kmalloc+0xb2/0xe7 [<c0166e96>] percpu_populate+0x20/0x26 [<c022805d>] __next_cpu+0x12/0x1e [<c0166f3b>] __percpu_populate_mask+0x55/0x65 [<f883605d>] acpi_cpufreq_init+0x5d/0x8f [acpi_cpufreq] [<c01426e5>] sys_init_module+0x14c7/0x15ea [<c01688e8>] do_sync_read+0xc7/0x10a [<c0103dc6>] sysenter_past_esp+0x5f/0x89 [<c0420000>] gss_pipe_downcall+0xc7/0x30a ======================= Code: 04 8d 04 82 89 41 0c 5b c3 89 c2 8d 40 ff 83 f8 fd 76 06 a1 24 92 61 c0 c3 31 c0 80 7a 04 0f 75 02 89 d0 c3 c3 31 d2 85 c0 74 0b <f6> 40 06 01 8b 40 10 74 f7 89 c2 89 d0 c3 31 d2 f6 40 06 01 75 EIP: [<c025d8ea>] acpi_ns_get_parent_node+0x6/0x14 SS:ESP 0068:f76f4d94 /etc/rc5.d/S06cpuspeed: line 112: 2496 Segmentation fault /sbin/modprobe acpi-cpufreq 2> /dev/null
Created attachment 13673 [details] console log 2.6.23-rc3+
Created attachment 13674 [details] console log 2.6.24-rc3+ w/ CONFIG_DEBUG_SLAB=n
Created attachment 13675 [details] acpidump
typo: comment #1 is 2.6.24-rc3+, not .23-rc3+ The system under test is a Weybridge SDV with a dual core processor.
Len, Can you attach your config. Also, do yo see the problem with 64 bit kernel as well?
0x6b is POISON_FREE. SLAB and SLUB debug sets the freed area to 0x6b. But, somewhere in the linked list there is still an active pointer to some freed area which we try to access while parsing the list causing this failure. It must be in namespace/ns_alloc.c, in the way we allocate free nodes.
Created attachment 13689 [details] failing config-2.6.24-rc3+ here's a config that fails. disable SLAB debug and it will succeed. I enabled slab debugging on configs for 2.6.23, 2.6.22, 2.6.21, 2.6.20 and they all failed the same way -- so this is easy to reproduce -- at least with this BIOS on this board. Note, however, that I don't recall seeing this failure until after a recent processor & BIOS upgrade.
Created attachment 13690 [details] console log 2.6.24-rc3+ 64-bit Yes, 64-bit mode fails too. Attached is a console log from a 64-bit crash.
CONFIG_SMP=n has same failure. Checking for hardware changes [ OK ] general protection fault: 0000 [1] CPU 0 Modules linked in: acpi_cpufreq pata_acpi e1000e thermal Pid: 2365, comm: modprobe Not tainted 2.6.24-rc3-g2ffbb837-dirty #5 RIP: 0010:[<ffffffff80359c2b>] [<ffffffff80359c2b>] acpi_ns_get_parent_node+0x7/0x15 RSP: 0018:ffff81007b403d10 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000005 RCX: ffff81007b403d98 RDX: ffff81007d956198 RSI: ffff81007b403d40 RDI: a56b6b6b6b6b6b6b RBP: 0000000000000000 R08: ffff81007d956198 R09: ffff81007b5726e8 R10: ffffffff8063dc76 R11: ffffffff803635f2 R12: ffff81007d956198 R13: ffff81007b403d98 R14: ffffffff803631d4 R15: ffff81007b403e48 FS: 00002b1681bbc6f0(0000) GS:ffffffff8070e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b4176f4d000 CR3: 000000007c09e000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2365, threadinfo ffff81007b402000, task ffff81007c4cb040) Stack: ffffffff803599ce ffff81007b403d40 ffffffff8036310a ffff81007b403d98 ffffffff803631ff ffff81007b5725b0 0000000000000018 ffff81007b5726e8 ffffffff80361744 0000000000004001 ffff81007b5725b0 ffff81007b403de8 Call Trace: [<ffffffff803599ce>] acpi_ns_get_pathname_length+0xa/0x31 [<ffffffff8036310a>] acpi_ut_get_simple_object_size+0x72/0xd6 [<ffffffff803631ff>] acpi_ut_get_element_length+0x2b/0x4f [<ffffffff80361744>] acpi_ut_walk_package_tree+0x5b/0xd5 [<ffffffff803631a8>] acpi_ut_get_object_size+0x3a/0x66 [<ffffffff8035935f>] acpi_evaluate_object+0x1e0/0x28d [<ffffffff80267a31>] poison_obj+0x26/0x30 [<ffffffff8036dd93>] acpi_processor_preregister_performance+0x9c/0x311 [<ffffffff8802005d>] :acpi_cpufreq:acpi_cpufreq_init+0x5d/0x8c [<ffffffff8024440a>] sys_init_module+0x12dc/0x140b [<ffffffff8020b40e>] system_call+0x7e/0x83 Code: f6 47 0a 01 48 8b 7f 18 74 f6 48 89 f8 c3 31 c0 f6 47 0a 01 RIP [<ffffffff80359c2b>] acpi_ns_get_parent_node+0x7/0x15 RSP <ffff81007b403d10> /etc/rc5.d/S06cpuspeed: line 112: 2365 Segmentation fault /sbin/modprobe acpi-cpufreq 2> /dev/null
The 93,414th call to acpi_ns_get_parent_node() caused the crash. It was passed a namespace node... at address ffff81007d8544c8. node->name "kkkk" node->flags 6b node->peer a56b6b6b6b6b6b6b We had never done an ACPI_FREE on ffff81007d8544c8. However, it had been used for other namespace nodes, most recently "DOMN", a child of "_PSD".: acpi_ns_get_parent_node ffff81007d8544c8 GPL5 0x1 ffff81007d836fb8 ffff81007d836fb8 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 GP00 0x0 ffff81007d854500 ffff81007d833810 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 MB03 0x0 ffff81007d854618 ffff81007d833c70 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 BAS2 0x0 ffff81007d854500 ffff81007d833f80 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 STS0 0x0 ffff81007d854618 ffff81007d8543e8 _OSC 0x0 acpi_ns_get_parent_node ffff81007d8544c8 DOMN 0x2 ffff81007d854618 ffff81007d854a08 _PSD 0x0 acpi_ns_get_parent_node ffff81007d8544c8 kkkk 0x6b a56b6b6b6b6b6b6b <oops>
Created attachment 13705 [details] console log 2.6.24+ w/o ACPI tracing & debugging enabled all debug flags at the top of acpi_processor_preregister_performance. (hmmm, thought i enabled tracing too, but don't see it here) in any case the delete of the node before access is clear: acpi_ns_delete_node ffff81007d854500 DOMN though it isn't clear to me what the path of control getting to the access actually is... Also, there are some other 6b's evident in the log, so it isn't immediately clear how far before the actual crash things went wrong...
Created attachment 13718 [details] CPU0IST.dat
Created attachment 13719 [details] CPU1IST.dat $ acpidump -a 0x7DFB0720 -l 0x00000359 > CPU1IST.dat
Bob, It looks like _PSD is returning a package that contains temporary objects?! Method (_PSD, 0, NotSerialized) { Name (DOMN, 0x00) Name (CRTP, 0x00) Name (NOPR, 0x00) If (And (PDC0, 0x0800)) { Store (0xFE, CRTP) } Else { Store (0xFC, CRTP) } Divide (0x01, NPCP, Local1, Local2) If (LEqual (Local1, 0x00)) { Store (NPCP, Local1) } Decrement (Local1) Store (Local1, DOMN) Divide (NCPU, NPCP, Local2, Local3) Store (Local3, NOPR) Return (Package (0x01) { Package (0x05) { 0x05, 0x00, DOMN, CRTP, NOPR } }) }
Created attachment 13721 [details] console log 2.6.24+ w/ ACPI tracing & debugging Attached is a re-run with ACPICA function tracing properly enabled. Apparently _PSD is returning references to objects which were freed upon the completion of _PSD evaluation. Subsequent reference to those objects crash the system. psxface-0310 [FFFF81007D852750] [03] ps_execute_method : Method returned ObjDesc=ffff81007d9464b0 exdump-0489 [FFFF81007D852750] [03] ex_dump_operand : ffff81007d9464b0 Package [Len 1] ElementArray ffff81007c5e41f0 exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [1] ffff81007d946e50 Package [Len 5] ElementArray ffff81007c5b3be8 exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946140 Integer 0000000000000005 exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946ea8 Integer 0000000000000000 exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946f58 Reference.Node->Name 6B6B6B6B exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946610 Reference.Node->Name 6B6B6B6B exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946508 Reference.Node->Name 6B6B6B6B psxface-0316 [FFFF81007D852750] [03] ps_execute_method : ----Exit- ****Exception****: AE_CTRL_RETURN_VALUE exutils-0151 [FFFF81007D852750] [03] ex_exit_interpreter : ----Entry utmutex-0291 [FFFF81007D852750] [03] ut_release_mutex : Thread FFFF81007D852750 releasing Mutex [ACPI_MTX_Interpreter] osl-0896 [FFFF81007D852750] [03] os_signal_semaphore : Signaling semaphore[ffff81007d81d3b0|1] exutils-0159 [FFFF81007D852750] [03] ex_exit_interpreter : ----Exit- nseval-0226 [FFFF81007D852750] [02] ns_evaluate : *** Completed evaluation of object _PSD *** nseval-0232 [FFFF81007D852750] [02] ns_evaluate : ----Exit- AE_OK utobject-0607 [FFFF81007D852750] [02] ut_get_package_object_: ----Entry ffff81007d9464b0 utmisc-0918 [FFFF81007D852750] [03] ut_walk_package_tree : ----Entry utstate-0269 [FFFF81007D852750] [04] ut_create_pkg_state : ----Entry ffff81007d9464b0 utstate-0286 [FFFF81007D852750] [04] ut_create_pkg_state : ----Exit- ffff81007d83c3a8 utstate-0099 [FFFF81007D852750] [04] ut_push_generic_state : ----Entry utstate-0106 [FFFF81007D852750] [04] ut_push_generic_state : ----Exit- utstate-0269 [FFFF81007D852750] [04] ut_create_pkg_state : ----Entry ffff81007d946e50 utstate-0286 [FFFF81007D852750] [04] ut_create_pkg_state : ----Exit- ffff81007d83c618 utobject-0428 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Entry ffff81007d946140 utobject-0524 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Exit- AE_OK utobject-0428 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Entry ffff81007d946ea8 utobject-0524 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Exit- AE_OK utobject-0428 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Entry ffff81007d946f58 acpi_ns_get_parent_node ffff81007d854500 kkkk 0x6b a56b6b6b6b6b6b6b acpi_ns_get_parent_node 91231 calls, here is stack for bad one: Call Trace: [<ffffffff803618fc>] acpi_ns_get_parent_node+0x72/0x96 [<ffffffff8036151d>] acpi_ns_get_pathname_length+0x15/0x3c [<ffffffff8036e27b>] acpi_ut_get_simple_object_size+0xcf/0x146 [<ffffffff8036e3f7>] acpi_ut_get_element_length+0x2b/0x4f [<ffffffff8036bdf0>] acpi_ut_walk_package_tree+0x102/0x1b6 [<ffffffff8036ad7b>] acpi_ut_trace_ptr+0x56/0x63 [<ffffffff8036e361>] acpi_ut_get_object_size+0x6f/0xda [<ffffffff80360bc7>] acpi_evaluate_object+0x244/0x30d [<ffffffff80379879>] acpi_processor_preregister_performance+0xba/0x331 [<ffffffff8072edf7>] acpi_cpufreq_init+0x5d/0x8c [<ffffffff8072554c>] kernel_init+0xdf/0x249 [<ffffffff8020bd58>] child_rip+0xa/0x12 [<ffffffff8072546d>] kernel_init+0x0/0x249 [<ffffffff8020bd4e>] child_rip+0x0/0x12
I do see few examples in ACPI spec which has Return (Package(x)) {} syntax. That must be valid. Isn't it?
>I do see few examples in ACPI spec which has >Return (Package(x)) {} >syntax. That must be valid. Isn't it? Yes, _PSD is supposed to return a package. The question is what to do when the the returned package refers to objects of temporary scope. ie. do we just flag it as an error and fail, or do we copy-by-value? Certainly we can't return references to objects that no longer exist, as we do today...
I suspect that this is fixed in version 20071019: Fixed a problem with the Package operator where all named references were created as object references and left otherwise unresolved. According to the ACPI specification, a Package can only contain Data Objects or references to control methods. The implication is that named references to Data Objects (Integer, Buffer, String, Package, BufferField, Field) should be resolved immediately upon package creation. This is the approach taken with this change. References to all other named objects (Methods, Devices, Scopes, etc.) are all now properly created as reference objects. (BZ 5328) I'll validate this, however.
Verified failure on 20070919, works correctly on 20071019: - ex _psd Executing \_PSD Execution of \_PSD returned object 00327E40 Buflen 70 [Package] Contains 1 Elements: [Package] Contains 5 Elements: [Integer] = 0000000000000005 [Integer] = 0000000000000000 [Integer] = 0000000000000000 [Integer] = 00000000000000FC [Integer] = 0000000000000002
Created attachment 13831 [details] patch vs 2.6.24-rc3 verified that this patch, cherry-picked from the ACPICA 20071019 release, fixes the problem.
shipped in linux-2.6.24-rc4 closed. note that this patch is also valid for linux-2.6.21 linux-2.6.22 linux-2.6.23 it is needed by all earlier release, but needs to be tweaked for those.
Created attachment 13882 [details] patch from comment #20 backported to linux-2.6.16, 2.6.17
Created attachment 13883 [details] patch from comment #20 backported to linux-2.6.18, 2.6.19, 2.6.20
*** Bug 9486 has been marked as a duplicate of this bug. ***