Bug 19462

Summary: Reproducible Kernel panic/oops during boot and startup - Panasonic Toughbooks
Product: ACPI Reporter: Anthony Awtrey (tony)
Component: ACPICA-CoreAssignee: Lin Ming (ming.m.lin)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, florian, frozenpoint, john.floyd, lenb, Robert.Moore, rui.zhang, srini.yendeti
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg kernel panic/oops #1
dmesg kernel panic/oops #2
dmesg kernel panic/oops #3
dmesg kernel panic/oops #4
dmesg kernel panic/oops #5
dmesg kernel panic/oops #6
dmesg kernel panic/oops #8
dmesg kernel panic/oops #9
dmesg kernel panic/oops #10
dmesg kernel panic/oops #11
dmesg for toughbook cf30 mk3 problems
cf30 mk3 dsdt
acpi dump from cf30
Mainline kernel dmesg cf30
acpidump output from cf-19m4
dmesg output from cf-19m
acpidump output from cf-74m4
dmesg output from cf-74m4
acpidump output from cf-30m2
dmesg output from cf-30m2
CPU0IST.dat as requested
APIST.dat as requested
CPU0CST.dat as requested
APCST.dat as requested
patch to fix ACPI errors

Description Anthony Awtrey 2010-10-01 22:28:51 UTC
I've got reproducible kernel panics on recent stable kernels (2.6.32.8/16/23 and 2.6.35.2 specifically), but not on 2.6.30.x kernels. The panic/oops are occurring on two different hardware platforms; the Panasonic Toughbook CF-19 Mark 3 and and Toughbook CF-74 Mark 4. These are the latest revisions of these hardware models and we have seen the issue on multiple units.

I'll attach the dmesg output from a few different panics / oops we've captured. There are two or three variations of kernel panics that repeat, so I'll include representatives from each type. All of the kernel panics I've seen occur during  boot (initramfs or early init boot process). Once a system is up and running fully, I've not seen a single lockup or oops of this type again. I've looked at related bugs with the acpi subsystem, but haven't seen anything quite like our issue, so I am opening a new issue here.

FYI, I build mainline kernels, using the Debian config and package process, with some very minor config tweaks. My base OS is the Squeeze pre-release. My local mirror has the following package versions for key related packages:

 * udev 160-1
 * module-init-tools 3.12-1
 * initramfs-tools 0.98.1

I have been *unable* to repeat this issue using either a Lenny baseline or a Squeeze baseline running the 2.6.30 kernel. I'll include the full panic messages to this ticket for each of the crashes, but here is a summary of the fun.


The first type of issue reports bugs in the mm/slab.c or kernel/cred.c code.

kernel BUG at mm/slab.c:2972!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:02.1/resource

kernel BUG at mm/slab.c:2974!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/class/net/lo/operstate

kernel BUG at mm/slab.c:2974!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/class/power_supply/BAT1/technology

kernel BUG at mm/slab.c:2974!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/class/power_supply/BAT1/technology

kernel BUG at kernel/cred.c:101!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/class/net/lo/operstate


Another type of issue is a report of an unable to handle kernel paging request

BUG: unable to handle kernel paging request at 52545346
IP: [<c11696af>] acpi_ut_update_object_reference+0x1d/0x11b
*pdpt = 00000000375f1001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/class/net/lo/operstate


The the last type of issue we've seen are null pointer dereferences.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<(null)>] (null)
*pdpt = 000000003673e001 *pde = 0000000000000000
Oops: 0010 [#1] SMP
last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT1/energy_full

BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<c112dee8>] acpi_ds_build_internal_object+0x10/0x12f
*pdpt = 0000000036e03001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/module/processor/initstate

BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<c112dee8>] acpi_ds_build_internal_object+0x10/0x12f
*pdpt = 0000000036ddf001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/virtual/net/lo/operstate



Now, it may seem like I'm cluttering up one bug report with too many different bits of information, but this issue I'm seeing does not appear on any of the other dozen or so different manufacturers / models I am testing this kernel on. I've only had issues on these two models and on multiple, different hardware units.

The steps to reproduce are not exactly easy, but it involves a soft reboot after disk activity and/or fuse file system mounting. I can repeat this for testing any patches you may be willing to supply.

Tony
Comment 1 Anthony Awtrey 2010-10-01 22:29:51 UTC
Created attachment 32232 [details]
dmesg kernel panic/oops #1
Comment 2 Anthony Awtrey 2010-10-01 22:30:20 UTC
Created attachment 32242 [details]
dmesg kernel panic/oops #2
Comment 3 Anthony Awtrey 2010-10-01 22:30:38 UTC
Created attachment 32252 [details]
dmesg kernel panic/oops #3
Comment 4 Anthony Awtrey 2010-10-01 22:31:18 UTC
Created attachment 32262 [details]
dmesg kernel panic/oops #4
Comment 5 Anthony Awtrey 2010-10-01 22:31:47 UTC
Created attachment 32272 [details]
dmesg kernel panic/oops #5
Comment 6 Anthony Awtrey 2010-10-01 22:33:21 UTC
Created attachment 32282 [details]
dmesg kernel panic/oops #6
Comment 7 Anthony Awtrey 2010-10-01 22:33:41 UTC
Created attachment 32292 [details]
dmesg kernel panic/oops #8
Comment 8 Anthony Awtrey 2010-10-01 22:34:05 UTC
Created attachment 32302 [details]
dmesg kernel panic/oops #9
Comment 9 Anthony Awtrey 2010-10-01 22:34:27 UTC
Created attachment 32312 [details]
dmesg kernel panic/oops #10
Comment 10 Anthony Awtrey 2010-10-01 22:34:59 UTC
Created attachment 32322 [details]
dmesg kernel panic/oops #11
Comment 11 John Floyd 2010-10-13 04:33:31 UTC
Created attachment 33422 [details]
dmesg for toughbook cf30 mk3 problems
Comment 12 John Floyd 2010-10-13 04:44:21 UTC
I ahve the same problems on the toughbook cf30-mk3.  I also have acpi erros on boot including the EC)_._REG not found error.  I added the necessary dmi info in the kernel source and recompiled - the allow the early pdc call.  This was working at kernel level but did not change anything.

Kernel is very unstable - requiring multiple boots before obtaining a semi-stable platform to be able to get info off.

V2.6.33 seems a bit more stable then 2.6.34.

I have previously attached a dmesg output, showing the acpi errors.

Will attach the dsdt shortly.

Have previously been running a cf30 mk2 with no problems.

Cheers
John
Comment 13 John Floyd 2010-10-13 04:46:15 UTC
Created attachment 33432 [details]
cf30 mk3 dsdt
Comment 14 John Floyd 2010-10-13 04:50:59 UTC
Extra - in trying to lock this down - I have updated to the latest Panasonic Bios and EC.

John
Comment 15 Zhang Rui 2010-10-13 06:45:19 UTC
ming,
would you please look at this issue?
Comment 16 Lin Ming 2010-10-13 08:39:46 UTC
Hi, Anthony and John

Could you guys have a try mainline kernel?
And please attach the acpidump output?

Thanks.
Comment 17 John Floyd 2010-10-14 00:08:59 UTC
I have tried a mainline - dmesg-cf30 attached.  Also the acpidump results.

The ACPI errors are still there.

Under the mainline kernel the system wont run X but I expect that this is due to both my config I have set up and the drm and graphic card patches that redhat apply.

John
Comment 18 John Floyd 2010-10-14 00:09:59 UTC
Created attachment 33542 [details]
acpi dump from cf30
Comment 19 John Floyd 2010-10-14 00:10:47 UTC
Created attachment 33552 [details]
Mainline kernel dmesg cf30
Comment 20 Lin Ming 2010-10-14 01:17:58 UTC
ACPI Error (dswload-0677): [DD02] Namespace lookup failure, AE_NOT_FOUND
ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20100121/psloop-231)
ACPI Error (psparse-0537): Method parse/execution failed [\] (Node ffffffff81d94ed0), AE_NOT_FOUND
ACPI Error (dswload-0677): [USB0] Namespace lookup failure, AE_NOT_FOUND
ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20100121/psloop-231)
ACPI Error (psparse-0537): Method parse/execution failed [\] (Node ffffffff81d94ed0), AE_NOT_FOUND
ACPI: Executed 3 blocks of module-level executable AML code


ACPI Error (psargs-0359): [\_SB_.PCI0.GFX0.DD02.CUBL] Namespace lookup failure, AE_NOT_FOUND
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.GFX0.INIG] (Node ffff880074d446c0), AE_NOT_FOUND
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_._REG] (Node ffff880074d42d40), AE_NOT_FOUND


\_SB_.PCI0.GFX0.DD02.CUBL is created in the module level code, but seems module level code is not executed correctly.

Thanks for the info, I'm looking at this issue.
Comment 21 Anthony Awtrey 2010-10-14 17:29:22 UTC
Sorry I took so long to respond. I have now built the latest mainline (2.6.35.7) and tested it on the following platforms:

 cf-19m4 - ACPI errors, but no panic/oops anymore
 cf-74m4 - ACPI errors, but no panic/oops anymore
 cf-30m2 - ACPI errors and still seeing oops/panics

I'll upload the dmesg  and acpidump  from each model momentarily.

Thanks for looking into this! I've been working with some US-based Panasonic field engineers if there is something you might want me to try to get from them.

Tony
Comment 22 Anthony Awtrey 2010-10-14 17:30:44 UTC
Created attachment 33582 [details]
acpidump output from cf-19m4
Comment 23 Anthony Awtrey 2010-10-14 17:32:14 UTC
Created attachment 33592 [details]
dmesg output from cf-19m
Comment 24 Anthony Awtrey 2010-10-14 17:33:10 UTC
Created attachment 33602 [details]
acpidump output from cf-74m4
Comment 25 Anthony Awtrey 2010-10-14 17:33:33 UTC
Created attachment 33612 [details]
dmesg output from cf-74m4
Comment 26 Anthony Awtrey 2010-10-14 17:34:42 UTC
Created attachment 33622 [details]
acpidump output from cf-30m2
Comment 27 Anthony Awtrey 2010-10-14 17:35:57 UTC
Created attachment 33632 [details]
dmesg output from cf-30m2
Comment 28 Lin Ming 2010-10-15 03:58:10 UTC
Hi, Anthony

Here are 2 issues,

1. ACPI errors
I have found the root cause, see comment #20, I'm working on the fix

2. oops/panics
[   20.236238] ACPI Error: Target is not a Reference or Constant object - Integer [f70e8cf8] (20100428/exstore-136)
[   20.236255] ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU0._CST] (Node f7026b58), AE_AML_OPERAND_TYPE
[   20.336525] BUG: unable to handle kernel paging request at 52505f60
[   20.337941] IP: [<c03a10cb>] acpi_ns_check_object_type+0x39/0x208

The _CST method is in some dynamic table, described as below

        Name (SSDT, Package (0x0C)
        {
            "CPU0IST ",
            0x77A09C98,
            0x00000223,
            "APIST   ",
            0x77A08E18,
            0x000001CF,
            "CPU0CST ",
            0x77A07598,
            0x00000546,
            "APCST   ",
            0x77A09F18,
            0x0000008D
        })

Would you please also attach these tables?
acpidump --addr 0x77A09C98 --length 0x223 > CPU0IST.dat
acpidump --addr 0x77A08E18 --length 0x1CF > APIST.dat
acpidump --addr 0x77A07598 --length 0x546 > CPU0CST.dat
acpidump --addr 0x77A09F18 --length 0x8D  > APCST.dat
Comment 29 Anthony Awtrey 2010-10-15 12:36:55 UTC
Created attachment 33722 [details]
CPU0IST.dat as requested
Comment 30 Anthony Awtrey 2010-10-15 12:37:36 UTC
Created attachment 33732 [details]
APIST.dat as requested
Comment 31 Anthony Awtrey 2010-10-15 12:38:07 UTC
Created attachment 33742 [details]
CPU0CST.dat as requested
Comment 32 Anthony Awtrey 2010-10-15 12:38:39 UTC
Created attachment 33752 [details]
APCST.dat as requested
Comment 33 Lin Ming 2010-10-18 02:04:49 UTC
Created attachment 33902 [details]
patch to fix ACPI errors

Bob wrote a patch to fix the ACPI errors.

Would you guys give it a try?

Thanks.
Comment 34 John Floyd 2010-10-18 05:09:46 UTC
CF30-Mk3 Toughbook -> successful results.

Applied patch to both the mainline kernel and fedora's distributed kernel and everything worked.

Screen backlight adjustment now works!! First time.  Dmesg has no acpi errors!

The stability of the machine may have been linked to some display bugs which fedora appear to have fixed independently.

Thanks.  A great result (for me).  Now why do these nasties always show up with  panasonic machines??

Cheers and Thanks
John
Comment 35 Anthony Awtrey 2010-10-18 19:06:03 UTC
Success here as well. I got a clean boot and no obvious ACPI errors in dmesg.

Thanks very much for the assistance! This issue is closed as far as I am concerned.

Tony
Comment 36 Lin Ming 2010-10-19 00:53:06 UTC
Thanks for the testing.

Mark it as RESOLVED.
Comment 37 Florian Mickler 2010-10-31 20:04:13 UTC
This is comitted as:
 
commit 8df3fc981dc12d9fdcaef4100a2193b605024d7a
Author: Bob Moore <robert.moore@intel.com>
Date:   Sat Oct 23 01:36:40 2010 -0400

    Subject: [PATCH] ACPICA: Fix Scope() op in module level code

Is this already marked to be backported for stable?