Bug 9429
Summary: | acpi-cpufreq boot crash | ||
---|---|---|---|
Product: | ACPI | Reporter: | Len Brown (lenb) |
Component: | ACPICA-Core | Assignee: | Robert Moore (Robert.Moore) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, craftyguy, Robert.Moore |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.20-2.6.23-rc3 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
console log 2.6.23-rc3+
console log 2.6.24-rc3+ w/ CONFIG_DEBUG_SLAB=n acpidump failing config-2.6.24-rc3+ console log 2.6.24-rc3+ 64-bit console log 2.6.24+ w/o ACPI tracing & debugging CPU0IST.dat CPU1IST.dat console log 2.6.24+ w/ ACPI tracing & debugging patch vs 2.6.24-rc3 patch from comment #20 backported to linux-2.6.16, 2.6.17 patch from comment #20 backported to linux-2.6.18, 2.6.19, 2.6.20 |
Description
Len Brown
2007-11-21 10:58:32 UTC
Created attachment 13673 [details]
console log 2.6.23-rc3+
Created attachment 13674 [details]
console log 2.6.24-rc3+ w/ CONFIG_DEBUG_SLAB=n
Created attachment 13675 [details]
acpidump
typo: comment #1 is 2.6.24-rc3+, not .23-rc3+ The system under test is a Weybridge SDV with a dual core processor. Len, Can you attach your config. Also, do yo see the problem with 64 bit kernel as well? 0x6b is POISON_FREE. SLAB and SLUB debug sets the freed area to 0x6b. But, somewhere in the linked list there is still an active pointer to some freed area which we try to access while parsing the list causing this failure. It must be in namespace/ns_alloc.c, in the way we allocate free nodes. Created attachment 13689 [details]
failing config-2.6.24-rc3+
here's a config that fails. disable SLAB debug and it will succeed.
I enabled slab debugging on configs for 2.6.23, 2.6.22, 2.6.21, 2.6.20
and they all failed the same way -- so this is easy to reproduce --
at least with this BIOS on this board. Note, however, that I don't
recall seeing this failure until after a recent processor & BIOS upgrade.
Created attachment 13690 [details]
console log 2.6.24-rc3+ 64-bit
Yes, 64-bit mode fails too. Attached is a console log from a 64-bit crash.
CONFIG_SMP=n has same failure. Checking for hardware changes [ OK ] general protection fault: 0000 [1] CPU 0 Modules linked in: acpi_cpufreq pata_acpi e1000e thermal Pid: 2365, comm: modprobe Not tainted 2.6.24-rc3-g2ffbb837-dirty #5 RIP: 0010:[<ffffffff80359c2b>] [<ffffffff80359c2b>] acpi_ns_get_parent_node+0x7/0x15 RSP: 0018:ffff81007b403d10 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000005 RCX: ffff81007b403d98 RDX: ffff81007d956198 RSI: ffff81007b403d40 RDI: a56b6b6b6b6b6b6b RBP: 0000000000000000 R08: ffff81007d956198 R09: ffff81007b5726e8 R10: ffffffff8063dc76 R11: ffffffff803635f2 R12: ffff81007d956198 R13: ffff81007b403d98 R14: ffffffff803631d4 R15: ffff81007b403e48 FS: 00002b1681bbc6f0(0000) GS:ffffffff8070e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b4176f4d000 CR3: 000000007c09e000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2365, threadinfo ffff81007b402000, task ffff81007c4cb040) Stack: ffffffff803599ce ffff81007b403d40 ffffffff8036310a ffff81007b403d98 ffffffff803631ff ffff81007b5725b0 0000000000000018 ffff81007b5726e8 ffffffff80361744 0000000000004001 ffff81007b5725b0 ffff81007b403de8 Call Trace: [<ffffffff803599ce>] acpi_ns_get_pathname_length+0xa/0x31 [<ffffffff8036310a>] acpi_ut_get_simple_object_size+0x72/0xd6 [<ffffffff803631ff>] acpi_ut_get_element_length+0x2b/0x4f [<ffffffff80361744>] acpi_ut_walk_package_tree+0x5b/0xd5 [<ffffffff803631a8>] acpi_ut_get_object_size+0x3a/0x66 [<ffffffff8035935f>] acpi_evaluate_object+0x1e0/0x28d [<ffffffff80267a31>] poison_obj+0x26/0x30 [<ffffffff8036dd93>] acpi_processor_preregister_performance+0x9c/0x311 [<ffffffff8802005d>] :acpi_cpufreq:acpi_cpufreq_init+0x5d/0x8c [<ffffffff8024440a>] sys_init_module+0x12dc/0x140b [<ffffffff8020b40e>] system_call+0x7e/0x83 Code: f6 47 0a 01 48 8b 7f 18 74 f6 48 89 f8 c3 31 c0 f6 47 0a 01 RIP [<ffffffff80359c2b>] acpi_ns_get_parent_node+0x7/0x15 RSP <ffff81007b403d10> /etc/rc5.d/S06cpuspeed: line 112: 2365 Segmentation fault /sbin/modprobe acpi-cpufreq 2> /dev/null The 93,414th call to acpi_ns_get_parent_node() caused the crash. It was passed a namespace node... at address ffff81007d8544c8. node->name "kkkk" node->flags 6b node->peer a56b6b6b6b6b6b6b We had never done an ACPI_FREE on ffff81007d8544c8. However, it had been used for other namespace nodes, most recently "DOMN", a child of "_PSD".: acpi_ns_get_parent_node ffff81007d8544c8 GPL5 0x1 ffff81007d836fb8 ffff81007d836fb8 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 GP00 0x0 ffff81007d854500 ffff81007d833810 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 MB03 0x0 ffff81007d854618 ffff81007d833c70 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 BAS2 0x0 ffff81007d854500 ffff81007d833f80 _CRS 0x1 acpi_ns_get_parent_node ffff81007d8544c8 STS0 0x0 ffff81007d854618 ffff81007d8543e8 _OSC 0x0 acpi_ns_get_parent_node ffff81007d8544c8 DOMN 0x2 ffff81007d854618 ffff81007d854a08 _PSD 0x0 acpi_ns_get_parent_node ffff81007d8544c8 kkkk 0x6b a56b6b6b6b6b6b6b <oops> Created attachment 13705 [details]
console log 2.6.24+ w/o ACPI tracing & debugging
enabled all debug flags at the top of acpi_processor_preregister_performance.
(hmmm, thought i enabled tracing too, but don't see it here)
in any case the delete of the node before access is clear:
acpi_ns_delete_node ffff81007d854500 DOMN
though it isn't clear to me what the path of control getting
to the access actually is...
Also, there are some other 6b's evident in the log,
so it isn't immediately clear how far before the actual crash things went wrong...
Created attachment 13718 [details]
CPU0IST.dat
Created attachment 13719 [details]
CPU1IST.dat
$ acpidump -a 0x7DFB0720 -l 0x00000359 > CPU1IST.dat
Bob, It looks like _PSD is returning a package that contains temporary objects?! Method (_PSD, 0, NotSerialized) { Name (DOMN, 0x00) Name (CRTP, 0x00) Name (NOPR, 0x00) If (And (PDC0, 0x0800)) { Store (0xFE, CRTP) } Else { Store (0xFC, CRTP) } Divide (0x01, NPCP, Local1, Local2) If (LEqual (Local1, 0x00)) { Store (NPCP, Local1) } Decrement (Local1) Store (Local1, DOMN) Divide (NCPU, NPCP, Local2, Local3) Store (Local3, NOPR) Return (Package (0x01) { Package (0x05) { 0x05, 0x00, DOMN, CRTP, NOPR } }) } Created attachment 13721 [details]
console log 2.6.24+ w/ ACPI tracing & debugging
Attached is a re-run with ACPICA function tracing properly enabled.
Apparently _PSD is returning references to objects which
were freed upon the completion of _PSD evaluation.
Subsequent reference to those objects crash the system.
psxface-0310 [FFFF81007D852750] [03] ps_execute_method : Method returned ObjDesc=ffff81007d9464b0
exdump-0489 [FFFF81007D852750] [03] ex_dump_operand : ffff81007d9464b0 Package [Len 1] ElementArray ffff81007c5e41f0
exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [1] ffff81007d946e50 Package [Len 5] ElementArray ffff81007c5b3be8
exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946140 Integer 0000000000000005
exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946ea8 Integer 0000000000000000
exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946f58 Reference.Node->Name 6B6B6B6B
exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946610 Reference.Node->Name 6B6B6B6B
exdump-0487 [FFFF81007D852750] [03] ex_dump_operand : [2] ffff81007d946508 Reference.Node->Name 6B6B6B6B
psxface-0316 [FFFF81007D852750] [03] ps_execute_method : ----Exit- ****Exception****: AE_CTRL_RETURN_VALUE
exutils-0151 [FFFF81007D852750] [03] ex_exit_interpreter : ----Entry
utmutex-0291 [FFFF81007D852750] [03] ut_release_mutex : Thread FFFF81007D852750 releasing Mutex [ACPI_MTX_Interpreter]
osl-0896 [FFFF81007D852750] [03] os_signal_semaphore : Signaling semaphore[ffff81007d81d3b0|1]
exutils-0159 [FFFF81007D852750] [03] ex_exit_interpreter : ----Exit-
nseval-0226 [FFFF81007D852750] [02] ns_evaluate : *** Completed evaluation of object _PSD ***
nseval-0232 [FFFF81007D852750] [02] ns_evaluate : ----Exit- AE_OK
utobject-0607 [FFFF81007D852750] [02] ut_get_package_object_: ----Entry ffff81007d9464b0
utmisc-0918 [FFFF81007D852750] [03] ut_walk_package_tree : ----Entry
utstate-0269 [FFFF81007D852750] [04] ut_create_pkg_state : ----Entry ffff81007d9464b0
utstate-0286 [FFFF81007D852750] [04] ut_create_pkg_state : ----Exit- ffff81007d83c3a8
utstate-0099 [FFFF81007D852750] [04] ut_push_generic_state : ----Entry
utstate-0106 [FFFF81007D852750] [04] ut_push_generic_state : ----Exit-
utstate-0269 [FFFF81007D852750] [04] ut_create_pkg_state : ----Entry ffff81007d946e50
utstate-0286 [FFFF81007D852750] [04] ut_create_pkg_state : ----Exit- ffff81007d83c618
utobject-0428 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Entry ffff81007d946140
utobject-0524 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Exit- AE_OK
utobject-0428 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Entry ffff81007d946ea8
utobject-0524 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Exit- AE_OK
utobject-0428 [FFFF81007D852750] [04] ut_get_simple_object_s: ----Entry ffff81007d946f58
acpi_ns_get_parent_node ffff81007d854500 kkkk 0x6b a56b6b6b6b6b6b6b
acpi_ns_get_parent_node 91231 calls, here is stack for bad one:
Call Trace:
[<ffffffff803618fc>] acpi_ns_get_parent_node+0x72/0x96
[<ffffffff8036151d>] acpi_ns_get_pathname_length+0x15/0x3c
[<ffffffff8036e27b>] acpi_ut_get_simple_object_size+0xcf/0x146
[<ffffffff8036e3f7>] acpi_ut_get_element_length+0x2b/0x4f
[<ffffffff8036bdf0>] acpi_ut_walk_package_tree+0x102/0x1b6
[<ffffffff8036ad7b>] acpi_ut_trace_ptr+0x56/0x63
[<ffffffff8036e361>] acpi_ut_get_object_size+0x6f/0xda
[<ffffffff80360bc7>] acpi_evaluate_object+0x244/0x30d
[<ffffffff80379879>] acpi_processor_preregister_performance+0xba/0x331
[<ffffffff8072edf7>] acpi_cpufreq_init+0x5d/0x8c
[<ffffffff8072554c>] kernel_init+0xdf/0x249
[<ffffffff8020bd58>] child_rip+0xa/0x12
[<ffffffff8072546d>] kernel_init+0x0/0x249
[<ffffffff8020bd4e>] child_rip+0x0/0x12
I do see few examples in ACPI spec which has Return (Package(x)) {} syntax. That must be valid. Isn't it?
>I do see few examples in ACPI spec which has
>Return (Package(x)) {}
>syntax. That must be valid. Isn't it?
Yes, _PSD is supposed to return a package.
The question is what to do when the the returned
package refers to objects of temporary scope.
ie. do we just flag it as an error and fail,
or do we copy-by-value? Certainly we can't
return references to objects that no longer exist, as we do today...
I suspect that this is fixed in version 20071019: Fixed a problem with the Package operator where all named references were created as object references and left otherwise unresolved. According to the ACPI specification, a Package can only contain Data Objects or references to control methods. The implication is that named references to Data Objects (Integer, Buffer, String, Package, BufferField, Field) should be resolved immediately upon package creation. This is the approach taken with this change. References to all other named objects (Methods, Devices, Scopes, etc.) are all now properly created as reference objects. (BZ 5328) I'll validate this, however. Verified failure on 20070919, works correctly on 20071019: - ex _psd Executing \_PSD Execution of \_PSD returned object 00327E40 Buflen 70 [Package] Contains 1 Elements: [Package] Contains 5 Elements: [Integer] = 0000000000000005 [Integer] = 0000000000000000 [Integer] = 0000000000000000 [Integer] = 00000000000000FC [Integer] = 0000000000000002 Created attachment 13831 [details]
patch vs 2.6.24-rc3
verified that this patch, cherry-picked from the
ACPICA 20071019 release, fixes the problem.
shipped in linux-2.6.24-rc4 closed. note that this patch is also valid for linux-2.6.21 linux-2.6.22 linux-2.6.23 it is needed by all earlier release, but needs to be tweaked for those. Created attachment 13882 [details] patch from comment #20 backported to linux-2.6.16, 2.6.17 Created attachment 13883 [details] patch from comment #20 backported to linux-2.6.18, 2.6.19, 2.6.20 *** Bug 9486 has been marked as a duplicate of this bug. *** |