Bug 67901
Summary: | Kernel panic - not syncing (on system start) | ||
---|---|---|---|
Product: | ACPI | Reporter: | Lv Zheng (lv.zheng) |
Component: | ACPICA-Core | Assignee: | Lv Zheng (lv.zheng) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | aaron.lu, fabian.wehning |
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 3.10 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Screen shot of the crash
acpidum kernel version 3.8.0-19 New screenshot of the crash ACPI: Add boot option to disable auto return object repair ACPICA: Namespace: Restore NULL element repair. ACPICA: Resources: Object can be NULL in a package if repair mechanism is disabled. dmidecode |
Description
Lv Zheng
2013-12-30 00:34:06 UTC
Created attachment 120201 [details]
Screen shot of the crash
The screen shot of the crash.
The mentioned commit may break old _PRT repair code. The following commit has merged old _PRT repair code to the new mechanism. ===== Author Lv Zheng <lv.zheng@intel.com> 2013-06-08 01:01:01 (GMT) Commit aa6329c44bccedbd8b17094c1c1aee1d9a9de461 (patch) Subject: ACPICA: Move _PRT repair into the standard complex repair module Moved this longstanding repair to the relatively new predefined name repair module. ACPICA BZ 783. Lv Zheng. No functional change. This change simply moves the repair code from where it was originally implemented to the (more recent) repair module where it now belongs. ===== We need to check if the above new repair code can fix this bug. And if it can not, we need to investigate this with target machine's acpidump output. The screen shot is also not sufficient to obtain useful information for this bug. Note that pci_assign_unassigned_resources() cannot be found in the kernel that first time contains the above mentioned commits. And the "Code" dumped cannot be generated either by an i386 or an x86_64 build from the kernel source code that first time contains the above mentioned commits. A new screen shot for the latest kernel is preferred. To solve this issue, developers need to know: 1. For which device, the _PRT control method is executed (this requires an entire boot up log. I should ask: can it be achieved via serial port, or can it be achieved via video clips). 2. The acpidump of the target platform (this can be obtained from a working kernel). Created attachment 121381 [details]
acpidum kernel version 3.8.0-19
Sorry for answering so late but i was badly ill and had to stay for a while in hospital. I can not create the acpidump in the newest kernel because the system does not start. Therefore i have uploaded an acpidump in one of the last offical versions which is running on this system.
I will try to create a further screenshot with the newest kernel version but this may possibly not work because the newest versions mostly give no information at all anymore (perhaps another bug?) Created attachment 121391 [details]
New screenshot of the crash
This screenshot is taken by trying to boot into kernel version "3.13.0-031300rc6-generic" (64-bit). Hope this helps. If you need further information please ask.
Is there any way to get the output shown in the screenshot as plain text? Can I obtain it from the hdd? (In reply to Fabian Wehning from comment #7) > Is there any way to get the output shown in the screenshot as plain text? > Can I obtain it from the hdd? The new attachments are useful. I'll start from there. Normally, we'll obtain kernel boot log using serial ports. If you are not able to access serial port on your platform, you can also upload a video clip of the kernel bootup process. As long as we can see kernel output clearly through the paused frame, we can find what we want. Thanks in advance. And you can also offer more helpful debugging information by: 1. recompile kernel with CONFIG_DEBUG_INFO enabled. 2. boot the kernel and take the screenshot (to obtain the panic RIP, it is ffffffff81435e09 in your last screenshot) 3. upload your drivers/acpi/acpica/rscreate.c here 4. using addr2line and offer the debug information to us, for example, for the screenshot you posted, you can stay in your kernel source tree where the booted vmlinux is built and execute addr2line -f -e ./vmlinux ffffffff81435e09 5. it is better to upload your drivers/acpi/acpica/rscreate.o here Created attachment 121491 [details]
ACPI: Add boot option to disable auto return object repair
You can also try this patch, and specify "acpica_no_return_repair" in the kernel command line to see if disabling repair mechanism can simply fix your issue.
Hi, I did following possible things using the acpidump you've provided. 1. Checking the wrong _PRT in the DSDT (I didn't check SSDTs) I found a suspicious line: Scope (_SB) { Device (PCI0) { ... Name (_PRT, Package (0x26) // _PRT: PCI Routing Table { ... Package (0x04) { Z002, <- This line is wrong, it should be an integer 0x00, \_SB.PCI0.LPID, 0x00 }, ... 2. I customized the DSDT of my computer, and boot the machine, no panic happened, I can only see some drivers are failed to get quick responses and there is one line warning message: ACPI Warning: \_SB_.PCI0._PRT: Return Package type mismatch at index 0 - found Reference, expected Integer (20131218/nspredef-297) This behavior matches what I can see in acpiexec environment by executing all _PRTs (only 3 in DSDT). So there might be nothing wrong in the updated repair mechanism. This bug might be caused by other issues. Hope we can find more useful debugging information from further provided data. Also please upload DMI information output by executing "dmidecode", it can help to understand which quirk is executed by pci_apply_final_quirks(). Created attachment 121761 [details]
ACPICA: Namespace: Restore NULL element repair.
There is something wrong with the bisected commit.
This diff isn't get merged.
In the original design change, the ACPI_RTYPE_NONE is defined to collect acpi_ns_repair_null_element(), but it might be split into another patch and finally was not merged.
I didn't find the original one and regenerated this commit, it could fix the issue.
Please give it a try.
Created attachment 121771 [details] ACPICA: Resources: Object can be NULL in a package if repair mechanism is disabled. This patch along with the following one: 121491: ACPI: Add boot option to disable auto return object repair (https://bugzilla.kernel.org/attachment.cgi?id=121491&action=diff) might also be able to make your system booted. I wonder if you can give them a try so that I can add "Tested-by" to this patch and submit it to ACPICA upstream. Created attachment 121901 [details]
dmidecode
Here is the dmidecode but I really have to admit the rest is way beyond my scope of knowledge. I really have no idea were to start to follow your sugestions. Is there anywhere some sort of tutorial which will guide me?
I've reproduced you problem in the acpiexec simulation environment, so no need to provide further debugging information any more. The problem is stated in the Comment 13. There are 3 patches in this thread, let me number them with the order they appear in this thread. PATCH 1: https://bugzilla.kernel.org/attachment.cgi?id=121491 ACPI: Add boot option to disable auto return object repair 2: https://bugzilla.kernel.org/attachment.cgi?id=121761 ACPICA: Namespace: Restore NULL element repair. 3: https://bugzilla.kernel.org/attachment.cgi?id=121771 ACPICA: Resources: Object can be NULL in a package if repair mechanism is disabled. I need you to offer tests to see whether these fixes can help to fix the bug and improve the quality of Linux. Please prepare the latest kernel, then: 1. Apply PATCH 2, do not apply PATCH 1 and PATCH 3, do a build/boot test again to see if the issue is solved. If it is not solved, take a screenshot/videoclip and post here. 2. Apply PATCH 1 and PATCH 3, do not apply PATCH 2, do a build/boot test again to see if the issue is solved. If it is not solved, take a screenshot/videoclip and post here. Thanks in advance. Pinging... Fabian, could you give the fixes a try? Still want to try. But I really have severe problems applying those patches. I have never build a kernel before and therefore I am trying to find any good manual which explains the procedure. Additonally I am currently in the examination period of my studies which lefts only litte time. Therefore this may take some more time than expected...sry OK it seems as I have found a manual and understood how to build a kernel. I will try to apply your patches now. It may take a lot of time because the system is really slow. So building some kernels will definitly take a huge amount of time. (In reply to Lv Zheng from comment #16) > 1. Apply PATCH 2, do not apply PATCH 1 and PATCH 3, do a build/boot test > again to see if the issue is solved. If it is not solved, take a... This is a solution for the bug. My system starts. There are some erros shown but I am pretty sure those are related to the low disk space (11,4 MB). I will clean up the disk now an build the other kernel applying patch 1 and 3. (In reply to Lv Zheng from comment #16) > 2. Apply PATCH 1 and PATCH 3, do not apply PATCH 2, do a build/boot test > again to see if the issue is solved. If it is not solved, take a > screenshot/videoclip and post here. This is a solution of the bug, too. Nevertheless there is a major problem. It slowes the system extremely down. Thanks for testing. I'll add your name to the patches with Test-by and push them to the ACPICA upstream. This bug is going to be marked as RESOLVED and will be changed to CLOSED after the necessary patches are upstreamed. |