Bug 67901 - Kernel panic - not syncing (on system start)
Summary: Kernel panic - not syncing (on system start)
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: i386 Linux
: P1 high
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-30 00:34 UTC by Lv Zheng
Modified: 2014-02-10 02:03 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.10
Tree: Mainline
Regression: No


Attachments
Screen shot of the crash (666.67 KB, image/pjpeg)
2013-12-30 00:42 UTC, Lv Zheng
Details
acpidum kernel version 3.8.0-19 (161.93 KB, text/plain)
2014-01-09 17:42 UTC, Fabian Wehning
Details
New screenshot of the crash (2.96 MB, image/jpeg)
2014-01-09 18:16 UTC, Fabian Wehning
Details
ACPI: Add boot option to disable auto return object repair (3.57 KB, patch)
2014-01-10 01:30 UTC, Lv Zheng
Details | Diff
ACPICA: Namespace: Restore NULL element repair. (2.05 KB, application/octet-stream)
2014-01-13 08:22 UTC, Lv Zheng
Details
ACPICA: Resources: Object can be NULL in a package if repair mechanism is disabled. (1.99 KB, application/octet-stream)
2014-01-13 08:26 UTC, Lv Zheng
Details
dmidecode (8.82 KB, text/plain)
2014-01-13 21:50 UTC, Fabian Wehning
Details

Description Lv Zheng 2013-12-30 00:34:06 UTC
[1.] One line summary of the problem:
[Medion P8610] 13.10: Kernel panic - not syncing (on system start)

[2.] Full description of the problem/report:
While booting linux crashes with the message "Kernel panic - not
syncing: Attempt to kill init! exitcode=0x00000009". In a bisect (done
by Joseph Salisbury [joseph.salisbury@canonical.com]) it was tested and
this bug seems to be introduced either by

commit 76a6225bf0b64572251a8c27d33e84afac6af713
Author: Bob Moore <robert.moore@intel.com>
Date: Fri Mar 8 09:23:16 2013 +0000

    ACPICA: Split object conversion functions to a new file

or

commit d5a36100f62fa6db5541344e08b361b34e9114c5
Author: Bob Moore <robert.moore@intel.com>
Date: Fri Mar 8 09:23:03 2013 +0000

    ACPICA: Add mechanism for early object repairs on a per-name basis


[3.] Keywords (i.e., modules, networking, kernel):
kernel, Ubuntu, ACPICA

[4.] Kernel version (from /proc/version):
--> unable to boot therefore impossible to get

[5.] No Oops.. message

[6.] Beyond skill level

[7.] Environment
--> unable to boot therefore impossible to execute the command

[7.1.] Software:
--> unable to boot therefore impossible to get

[7.2.] Processor information (from /proc/cpuinfo):
--> unable to boot therefore impossible to get anything useful

[7.3.] Module information (from /proc/modules):
--> unable to boot therefore impossible to get anything useful

[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
--> unable to boot therefore impossible to get anything useful

[7.5.] PCI information ('lspci -vvv' as root)
--> unable to boot therefore impossible to get anything useful

[7.6.] SCSI information (from /proc/scsi/scsi)
--> unable to boot therefore impossible to get anything useful

[7.7.] Other information that might be relevant to the problem (please
look in /proc and include all information that you think to be relevant):
--> unable to boot therefore impossible to get anything useful

[X.] Other notes, patches, fixes, workarounds:
https://bugs.launchpad.net/bugs/1242938
Comment 1 Lv Zheng 2013-12-30 00:42:46 UTC
Created attachment 120201 [details]
Screen shot of the crash

The screen shot of the crash.
Comment 2 Lv Zheng 2013-12-30 00:47:01 UTC
The mentioned commit may break old _PRT repair code.

The following commit has merged old _PRT repair code to the new mechanism.
=====
Author Lv Zheng <lv.zheng@intel.com> 2013-06-08 01:01:01 (GMT) 
Commit aa6329c44bccedbd8b17094c1c1aee1d9a9de461 (patch) 
Subject: ACPICA: Move _PRT repair into the standard complex repair module

Moved this longstanding repair to the relatively new predefined
name repair module. ACPICA BZ 783. Lv Zheng.

No functional change.  This change simply moves the repair code from
where it was originally implemented to the (more recent) repair
module where it now belongs.
=====
We need to check if the above new repair code can fix this bug.

And if it can not, we need to investigate this with target machine's acpidump output.
Comment 3 Lv Zheng 2014-01-09 07:35:56 UTC
The screen shot is also not sufficient to obtain useful information for this bug.

Note that pci_assign_unassigned_resources() cannot be found in the kernel that first time contains the above mentioned commits.
And the "Code" dumped cannot be generated either by an i386 or an x86_64 build from the kernel source code that first time contains the above mentioned commits.
A new screen shot for the latest kernel is preferred.

To solve this issue, developers need to know:
1. For which device, the _PRT control method is executed (this requires an entire boot up log. I should ask: can it be achieved via serial port, or can it be achieved via video clips).
2. The acpidump of the target platform (this can be obtained from a working kernel).
Comment 4 Fabian Wehning 2014-01-09 17:42:44 UTC
Created attachment 121381 [details]
acpidum kernel version 3.8.0-19

Sorry for answering so late but i was badly ill and had to stay for a while in hospital. I can not create the acpidump in the newest kernel because the system does not start. Therefore i have uploaded an acpidump in one of the last offical versions which is running on this system.
Comment 5 Fabian Wehning 2014-01-09 17:46:00 UTC
I will try to create a further screenshot with the newest kernel version but this may possibly not work because the newest versions mostly give no information at all anymore (perhaps another bug?)
Comment 6 Fabian Wehning 2014-01-09 18:16:34 UTC
Created attachment 121391 [details]
New screenshot of the crash

This screenshot is taken by trying to boot into kernel version "3.13.0-031300rc6-generic" (64-bit). Hope this helps. If you need further information please ask.
Comment 7 Fabian Wehning 2014-01-09 18:17:55 UTC
Is there any way to get the output shown in the screenshot as plain text? Can I obtain it from the hdd?
Comment 8 Lv Zheng 2014-01-10 00:19:56 UTC
(In reply to Fabian Wehning from comment #7)
> Is there any way to get the output shown in the screenshot as plain text?
> Can I obtain it from the hdd?

The new attachments are useful.
I'll start from there.

Normally, we'll obtain kernel boot log using serial ports.
If you are not able to access serial port on your platform, you can also upload a video clip of the kernel bootup process.
As long as we can see kernel output clearly through the paused frame, we can find what we want.

Thanks in advance.
Comment 9 Lv Zheng 2014-01-10 01:22:33 UTC
And you can also offer more helpful debugging information by:
1. recompile kernel with CONFIG_DEBUG_INFO enabled.
2. boot the kernel and take the screenshot (to obtain the panic RIP, it is ffffffff81435e09 in your last screenshot)
3. upload your drivers/acpi/acpica/rscreate.c here
4. using addr2line and offer the debug information to us, for example, for the screenshot you posted, you can stay in your kernel source tree where the booted vmlinux is built and execute addr2line -f -e ./vmlinux ffffffff81435e09
5. it is better to upload your drivers/acpi/acpica/rscreate.o here
Comment 10 Lv Zheng 2014-01-10 01:30:04 UTC
Created attachment 121491 [details]
ACPI: Add boot option to disable auto return object repair

You can also try this patch, and specify "acpica_no_return_repair" in the kernel command line to see if disabling repair mechanism can simply fix your issue.
Comment 11 Lv Zheng 2014-01-10 06:02:27 UTC
Hi, I did following possible things using the acpidump you've provided.

1. Checking the wrong _PRT in the DSDT (I didn't check SSDTs)
   I found a suspicious line:

    Scope (_SB)
    {
        Device (PCI0)
        {
    ...
            Name (_PRT, Package (0x26)  // _PRT: PCI Routing Table
            {
    ...
                Package (0x04)
                {
                    Z002,         <- This line is wrong, it should be an integer
                    0x00, 
                    \_SB.PCI0.LPID, 
                    0x00
                }, 
    ...

2. I customized the DSDT of my computer, and boot the machine, no panic happened, I can only see some drivers are failed to get quick responses and there is one line warning message:

    ACPI Warning: \_SB_.PCI0._PRT: Return Package type mismatch at index 0 - found Reference, expected Integer (20131218/nspredef-297)

   This behavior matches what I can see in acpiexec environment by executing all _PRTs (only 3 in DSDT).
   So there might be nothing wrong in the updated repair mechanism.  This bug might be caused by other issues.  Hope we can find more useful debugging information from further provided data.
Comment 12 Lv Zheng 2014-01-13 00:20:09 UTC
Also please upload DMI information output by executing "dmidecode", it can help to understand which quirk is executed by pci_apply_final_quirks().
Comment 13 Lv Zheng 2014-01-13 08:22:37 UTC
Created attachment 121761 [details]
ACPICA: Namespace: Restore NULL element repair.

There is something wrong with the bisected commit.
This diff isn't get merged.
In the original design change, the ACPI_RTYPE_NONE is defined to collect acpi_ns_repair_null_element(), but it might be split into another patch and finally was not merged.
I didn't find the original one and regenerated this commit, it could fix the issue.
Please give it a try.
Comment 14 Lv Zheng 2014-01-13 08:26:33 UTC
Created attachment 121771 [details]
ACPICA: Resources: Object can be NULL in a package if repair mechanism is disabled.

This patch along with the following one:
 121491: ACPI: Add boot option to disable auto return object repair
 (https://bugzilla.kernel.org/attachment.cgi?id=121491&action=diff)
might also be able to make your system booted.

I wonder if you can give them a try so that I can add "Tested-by" to this patch and submit it to ACPICA upstream.
Comment 15 Fabian Wehning 2014-01-13 21:50:30 UTC
Created attachment 121901 [details]
dmidecode

Here is the dmidecode but I really have to admit the rest is way beyond my scope of knowledge. I really have no idea were to start to follow your sugestions. Is there anywhere some sort of tutorial which will guide me?
Comment 16 Lv Zheng 2014-01-14 00:25:41 UTC
I've reproduced you problem in the acpiexec simulation environment, so no need to provide further debugging information any more.
The problem is stated in the Comment 13.

There are 3 patches in this thread, let me number them with the order they appear in this thread.

PATCH 1: https://bugzilla.kernel.org/attachment.cgi?id=121491
         ACPI: Add boot option to disable auto return object repair
      2: https://bugzilla.kernel.org/attachment.cgi?id=121761
         ACPICA: Namespace: Restore NULL element repair.
      3: https://bugzilla.kernel.org/attachment.cgi?id=121771
         ACPICA: Resources: Object can be NULL in a package if repair mechanism is disabled.

I need you to offer tests to see whether these fixes can help to fix the bug and improve the quality of Linux.  Please prepare the latest kernel, then:

1. Apply PATCH 2, do not apply PATCH 1 and PATCH 3, do a build/boot test again to see if the issue is solved.  If it is not solved, take a screenshot/videoclip and post here.
2. Apply PATCH 1 and PATCH 3, do not apply PATCH 2, do a build/boot test again to see if the issue is solved.  If it is not solved, take a screenshot/videoclip and post here.

Thanks in advance.
Comment 17 Lv Zheng 2014-01-22 10:49:25 UTC
Pinging...
Fabian, could you give the fixes a try?
Comment 18 Fabian Wehning 2014-01-23 13:14:33 UTC
Still want to try. But I really have severe problems applying those patches. I have never build a kernel before and therefore I am trying to find any good manual which explains the procedure. Additonally I am currently in the examination period of my studies which lefts only litte time. Therefore this may take some more time than expected...sry
Comment 19 Fabian Wehning 2014-01-23 13:53:26 UTC
OK it seems as I have found a manual and understood how to build a kernel. I will try to apply your patches now. It may take a lot of time because the system is really slow. So building some kernels will definitly take a huge amount of time.
Comment 20 Fabian Wehning 2014-01-23 16:07:18 UTC
(In reply to Lv Zheng from comment #16)
> 1. Apply PATCH 2, do not apply PATCH 1 and PATCH 3, do a build/boot test
> again to see if the issue is solved.  If it is not solved, take a...

This is a solution for the bug. My system starts. There are some erros shown but I am pretty sure those are related to the low disk space (11,4 MB). I will clean up the disk now an build the other kernel applying patch 1 and 3.
Comment 21 Fabian Wehning 2014-01-23 21:59:23 UTC
(In reply to Lv Zheng from comment #16)
> 2. Apply PATCH 1 and PATCH 3, do not apply PATCH 2, do a build/boot test
> again to see if the issue is solved.  If it is not solved, take a
> screenshot/videoclip and post here.

This is a solution of the bug, too. Nevertheless there is a major problem. It slowes the system extremely down.
Comment 22 Lv Zheng 2014-01-24 00:07:56 UTC
Thanks for testing.
I'll add your name to the patches with Test-by and push them to the ACPICA upstream.
This bug is going to be marked as RESOLVED and will be changed to CLOSED after the necessary patches are upstreamed.

Note You need to log in before you can comment on or make changes to this bug.