Bug 13002
Description
Tiago Requeijo
2009-04-03 14:20:05 UTC
hi, Tiago From the problem description it seems that the box can be booted successfully with ACPI enabled on the previous kernel version(<2.6.28). But it can't be booted on the 2.6.28. Right? Please double check it. If so, will you please use the git-bisect to identify the commit which causes the regression? Will you please also try the following boot option on the 2.6.28 kernel and attach the output of dmesg? a. acpi=noapic b. acpi=ht Will you please also attach the output of acpidump and dmesg on the 2.6.27-xx kernel? Thanks. Hi, That's correct, it booted fine with acpi on 2.6.27-xx and lower. I'm attaching the output of dmesg for acpi=ht on 2.6.28. The system didn't boot with acpi=noapic. I'll post the results of acpidump and dmesg on the 2.6.27 kernel as soon as compile one with ext4 support. I'll start the git bisect process when I finish downloading the whole kernel tree. Thanks Created attachment 20847 [details]
output of dmesg on 2.6.28, booting with acpi=ht
what if you boot with pci=noacpi? Booting with pci=noacpi works. Sorry that I gave the incorrect boot option. Will you please try the boot option of "noapic" and see whether the box can be booted? Please also attach the output of dmesg on the 2.6.27-xx working kernel. Thanks. how about boot option 'acpi=noirq'? Created attachment 20941 [details]
output of dmesg on 2.6.27
The output of dmesg on 2.6.27 is attached above. On 2.6.28-xx, the options 'acpi=noirq' and 'pci=noapic' do not work. The boot process freezes at the same spot as before. Hi there, I've got the same problem with an inspiron 2500 (2.6.27 works and 2.8.28/29 does not boot except with acpi disabled or acpi=ht). I've digged a bit into the kernel and saw that acpi goes into an endless loop (in acpi_ps_parse_loop) when the pci stuff tries to switch the cardbus bridge to state D0. If it can help, I can send the results of acpidump & lspci. I've got a trace of the endless loop with acpi.debug_layer=0x00400010 acpi.debug_level=0x000007ff also. But maybe I should open a new report. rrr please attach the result of acpidump and lspci. It will be very helpful to evaulate your observation. Created attachment 20956 [details]
acpidump on a Dell Inspiron 2500
Created attachment 20957 [details]
And the result of lspci -vv
Hi, I've attached the dumps. The acpidump is really a text file (I've checked the wrong option). And here's a part of the boot log ... nsdump-0087 [09] ns_print_pathname : [PMS0] nssearch-0110 [11] ns_search_one_scope : Searching \_SB_.PCI0.HUB_.CDB0.CAIN (ce81e8e8) For [PMS0] (Untyped) nssearch-0174 [11] ns_search_one_scope : Name [PMS0] (Untyped) not found in search in scope [CAIN] ce81e8e8 first child (null) nssearch-0240 [11] ns_search_parent_tree : Searching parent [CDB0] for [PMS0] nssearch-0110 [12] ns_search_one_scope : Searching \_SB_.PCI0.HUB_.CDB0 (ce81e7e0) For [PMS0] (Untyped) nssearch-0145 [12] ns_search_one_scope : Name [PMS0] (RegionField) ce81e888 found in scope [CDB0] ce81e7e0 nsaccess-0404 [09] ns_lookup : Seaching relative to prefix scope [CAIN] (ce81e8e8) nsaccess_0514 [09] ns_lookup : Simple Pathname (1 segment, Flags=3) nsdump-0087 [09] ns_print_pathname : [PMS0] ......... etc Created attachment 21188 [details]
2.6.29.2.txt
same issue with 2.6.29.2 on inspiron 2500
As this is a regression, would you please use git-bisect to find out which commit introduces the problem? The result from git-bisect: eab4b645769fa2f8703f5a3cb0cc4ac090d347af is first bad commit commit eab4b645769fa2f8703f5a3cb0cc4ac090d347af Author: Zhao Yakui <yakui.zhao@intel.com> Date: Mon Aug 11 14:54:16 2008 +0800 ACPI: Attach the ACPI device to the ACPI handle as early as possible Attach the ACPI device to the ACPI handle as early as possible so that OS can get the corresponding ACPI device by the acpi handle in the course of getting the power/wakeup/performance flags. http://bugzilla.kernel.org/show_bug.cgi?id=8049 http://bugzilla.kernel.org/show_bug.cgi?id=11000 Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> :040000 040000 edb720b0671993e3ae3956f58710c9470264760c 394ad6c935fa660606bb12082c4e834a232d87be M drivers wow, so the problem doesn't exist if you revert this patch, right? I didn't try reverting this patch on a newer kernel. Nevertheless, the kernel works fine right before that patch. I'll check to see if this patch can be (easily) reverted on a more current kernel. If so, I'll let you know if it solves the problem. does reverting eab4b645769fa2f8703f5a3cb0cc4ac090d347af fix the bug? Reverting the patch fixes the problem. please attach the dmesg output after revert this patch. Created attachment 21561 [details]
dmesg output after patch reversion on 2.6.30-rc5
Created attachment 21562 [details]
output of acpidump after patch reversion on 2.6.30-rc5
I attached above the outputs of dmesg and acpidump (in case it helps) after reverting the patch. Created attachment 21575 [details]
skip getting the power state for the device CDB0/CDB1
Hi, Tiago Will you please try the debug patch in comment #26 and see whether the box can be booted? From the log in comment #24 it seems that the box can be booted if reverting the commit of eab4b645769fa2f8703f5a3cb0cc4ac090d347af. After reverting the above commit, OS won't detect the power state if it is power_manageable. This is what we have done in the 2.6.27 kernel. From the acpidump we know that there exists the _PSC object for the device CDB0/CDB1, which is evaluated in course of getting the power state. Please try the debug patch in comment #26. If it can be booted, please attach the output of dmesg. In the debug patch it will skip getting the power state for the device CDB0/CDB1. Thanks. Created attachment 21577 [details] skip getting the power state for all the devices in the boot phase If the box still hangs with the patch in comment #26 applied, will you please try the attached patch and see whether the box can be booted? Thanks. No luck with the patch from comment #26, the computer still hangs on boot. Patch #28 lets the computer boot. Just for testing, I changed the code in your patch #26 so every device is skipped. The following is the relevant part from dmesg: [ 0.027737] ACPI: EC: Look up EC in DSDT [ 0.032282] ACPI: Interpreter enabled [ 0.032442] ACPI: (supports S0 S1 S3 S4 S5) [ 0.032895] ACPI: Using PIC for interrupt routing [ 0.033572] skip getting the power state for VGA [ 0.033924] skip getting the power state for CB1 [ 0.037402] skip getting the power state for FDC [ 0.038019] ACPI: EC: non-query interrupt received, switching to interrupt mode [ 0.044315] skip getting the power state for PRID [ 0.044467] skip getting the power state for SECD [ 0.044848] ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62 [ 0.044998] ACPI: EC: driver started in interrupt mode [ 0.045399] ACPI: No dock devices found. [ 0.045588] ACPI: PCI Root Bridge [PCI0] (0000:00) In particular, there is no CDB0/CDB1 device listed. Hi, Tiago thanks for the test. It seems that the box can be booted if we skip getting the power state. Sorry for that I mix the acpidump on your box with that in comment #12. Will you please try the update patch and see whether the box can be booted? It will be great if you can attach the output of lspci -vxxx. Thanks. Created attachment 21674 [details]
skip getting the power state for the device CB0/CB1
Will you please try the updated debug patch and see whether the box can be booted?
thanks.
The laptop boots fine with the patch for the device CB0/CB1. Relevant dmesg part: [ 0.031943] ACPI: EC: Look up EC in DSDT [ 0.038176] ACPI: Interpreter enabled [ 0.038362] ACPI: (supports S0 S1 S3 S4 S5) [ 0.038876] ACPI: Using PIC for interrupt routing [ 0.040353] skip getting the power state for CB1 [ 0.045538] ACPI: EC: non-query interrupt received, switching to interrupt mode [ 0.051273] ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62 [ 0.051444] ACPI: EC: driver started in interrupt mode [ 0.051951] ACPI: No dock devices found. [ 0.052024] ACPI: PCI Root Bridge [PCI0] (0000:00) Created attachment 21691 [details]
output of lspci -vxxx
output of lspci -vxxx
I have a Compal CL10 based notebook and am experienced the same problem with Fedora 11 (Kernels 2.6.29.4 and 2.6.29.5). Patch from #31 has helped with a little change. My Cardbus Bridges are named CB1 and CB2, so my change is: if (!strncmp(acpi_device_bid(device), "CB", 2)) { printk(KERN_DEBUG "skip getting the power state for %s\n", acpi_device_bid(device)); } else { acpi_bus_get_power(device->handle, &(device->power.state)); } Created attachment 22229 [details]
try the debug patch
Will you please try the debug patch on the latest kernel and capture the screenshot when the box can't be booted?
BTW: please add the boot option of "initcall_debug".
Thanks.
Created attachment 22265 [details]
2.6.31-rc2, initcall_debug - hang, screenshot
Created attachment 22266 [details]
2.6.31-rc2, initcall_debug pci=noacpi - boot "somehow", dmesg
Comment on attachment 22265 [details]
2.6.31-rc2, initcall_debug - hang, screenshot
Patch id=22265 has been applied.
Comment on attachment 22265 [details]
2.6.31-rc2, initcall_debug - hang, screenshot
Patch id=22265 has been applied.
Please, remove two last comment (38, 39) - they are erroneous. :( Hi, Nicholas From the log in comment #37 it seems that the box can be booted with the boot option of "pci=noacpi". Will you please double check it again? Thanks. Hi, Yes, kernel boots, but then there are problems. "irq 10: nobody cared (try booting with the "irqpoll" option)" - irqpoll doesn't help and CardBus WiFi adapter doesn't work (no PCI IRQ, CardBus support disabled for this socket). Created attachment 22435 [details]
try the debug patch
Will you please try the debug patch and attach the output of dmesg after adding the boot option of "pci=noacpi"?
Please also capture the picture when it hangs.
Thanks.
Created attachment 22448 [details] dmesg file with patch from Comment #43 and pci=noacpi Created attachment 22449 [details]
dmesg file with new patch and pci=noacpi irqpoll
Sorry, I was wrong - irqpoll option partially helps, usb mouse has smooth movements again, otherwise it has harsh movements, but the notebook still has no CardBus functionality.
Hi, Nicholas Will you please not add the boot option of "pci=noacpi" and capture the screen shot when it hangs? Thanks. Created attachment 22513 [details]
The screen shot without pci=noacpi option
Kernel version is 2.6.31-rc2 with the last debug patch.
Hi, ykzhao What are we looking for with the last debugging patches? With patch from Comment #34 kernel boots, acpi is enabled, interrupts are assigned and handled, CardBus adapter (WiFi) works. What else we need? This still affects a dell inspiron 2600, as of the latest 2.6.30.5 kernel (i think it was the 2.6.30.5 may have been the .4 tho). What is the status of the patches.... ? (when will a stable kernel be released which works on these old dells). Created attachment 23015 [details] patch to revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible Please verify that this patch makes the regression go away, allowing your system to boot properly. commit f61f925859c57f6175082aeeee17743c68558a6e Author: Len Brown <len.brown@intel.com> Date: Sat Sep 5 13:33:23 2009 -0400 Revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible" This reverts commit eab4b645769fa2f8703f5a3cb0cc4ac090d347af. Hi, Len Have tested the patch against 2.6.30.5. The kernel boots. Test against 2.6.31-rc8 will follow in a few day. Kernel 2.6.31-rc8 boots with the patch also. Linux-2.6.32-git14 (pre 2.6.32-rc1) includes this commit: commit f61f925859c57f6175082aeeee17743c68558a6e Author: Len Brown <len.brown@intel.com> Date: Sat Sep 5 13:33:23 2009 -0400 Revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible" This reverts commit eab4b645769fa2f8703f5a3cb0cc4ac090d347af. |