Bug 13002 - kernel 2.6.28 doesn't boot - bisected - dell inspiron 2650
Summary: kernel 2.6.28 doesn't boot - bisected - dell inspiron 2650
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-03 14:20 UTC by Tiago Requeijo
Modified: 2009-09-24 21:41 UTC (History)
8 users (show)

See Also:
Kernel Version: 2.6.28, 2.6.29
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
output of dmesg on 2.6.28, booting with acpi=ht (33.40 KB, application/octet-stream)
2009-04-07 04:55 UTC, Tiago Requeijo
Details
output of dmesg on 2.6.27 (28.75 KB, text/plain)
2009-04-11 23:45 UTC, Tiago Requeijo
Details
acpidump on a Dell Inspiron 2500 (75.73 KB, application/octet-stream)
2009-04-13 09:07 UTC, RogerRamonRamirez
Details
And the result of lspci -vv (7.96 KB, text/plain)
2009-04-13 09:09 UTC, RogerRamonRamirez
Details
2.6.29.2.txt (15.53 KB, text/plain)
2009-05-02 17:56 UTC, GNUtoo
Details
dmesg output after patch reversion on 2.6.30-rc5 (25.98 KB, application/octet-stream)
2009-05-26 19:39 UTC, Tiago Requeijo
Details
output of acpidump after patch reversion on 2.6.30-rc5 (77.15 KB, application/octet-stream)
2009-05-26 19:40 UTC, Tiago Requeijo
Details
skip getting the power state for the device CDB0/CDB1 (821 bytes, patch)
2009-05-27 02:54 UTC, ykzhao
Details | Diff
skip getting the power state for all the devices in the boot phase (667 bytes, patch)
2009-05-27 03:12 UTC, ykzhao
Details | Diff
skip getting the power state for the device CB0/CB1 (819 bytes, patch)
2009-06-01 02:49 UTC, ykzhao
Details | Diff
output of lspci -vxxx (6.83 KB, application/octet-stream)
2009-06-01 15:42 UTC, Tiago Requeijo
Details
try the debug patch (1.46 KB, patch)
2009-07-06 09:15 UTC, ykzhao
Details | Diff
2.6.31-rc2, initcall_debug - hang, screenshot (415.92 KB, image/jpeg)
2009-07-08 17:51 UTC, Nicholas Kudriavtsev
Details
2.6.31-rc2, initcall_debug pci=noacpi - boot "somehow", dmesg (82.33 KB, application/octet-stream)
2009-07-08 17:54 UTC, Nicholas Kudriavtsev
Details
try the debug patch (1.52 KB, patch)
2009-07-22 05:13 UTC, ykzhao
Details | Diff
dmesg file with patch from Comment #43 and pci=noacpi (82.12 KB, application/octet-stream)
2009-07-22 18:48 UTC, Nicholas Kudriavtsev
Details
dmesg file with new patch and pci=noacpi irqpoll (79.15 KB, application/octet-stream)
2009-07-22 19:02 UTC, Nicholas Kudriavtsev
Details
The screen shot without pci=noacpi option (254.49 KB, image/jpeg)
2009-07-27 18:10 UTC, Nicholas Kudriavtsev
Details
patch to revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible (1.39 KB, patch)
2009-09-05 17:36 UTC, Len Brown
Details | Diff

Description Tiago Requeijo 2009-04-03 14:20:05 UTC
The boot process on a Dell Inspiron 2650 stops at the point on the image on the link below. The same happens with 2.6.29.  Booting with acpi=off works.  Previous kernel versions worked fine (<2.6.28)

The output of lspci is:

00:00.0 Host bridge: Intel Corporation 82845 845 [Brookdale] Chipset Host Bridge (rev 05)
00:01.0 PCI bridge: Intel Corporation 82845 845 [Brookdale] Chipset AGP Bridge (rev 05)
00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 42)
00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM AC'97 Audio Controller (rev 02)
00:1f.6 Modem: Intel Corporation 82801CA/CAM AC'97 Modem Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 Go] (rev b2)
02:01.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
02:04.0 CardBus bridge: O2 Micro, Inc. OZ601/6912/711E0 CardBus/SmartCardBus Controller


Bootscreen image:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/327499/comments/1

output of lspci -vv:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/327499/comments/2

I initially submitted this bug to ubuntu's bugtracker at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/327499.  There is someone else on that page with the same exact problem on a similar laptop.
Comment 1 ykzhao 2009-04-07 02:08:42 UTC
hi, Tiago
   From the problem description it seems that the box can be booted successfully with ACPI enabled on the previous kernel version(<2.6.28). But it can't be booted on the 2.6.28. Right? Please double check it.
   If so, will you please use the git-bisect to identify the commit which causes the regression?
   Will you please also try the following boot option on the 2.6.28 kernel and attach the output of dmesg?
   a. acpi=noapic
   b. acpi=ht
   Will you please also attach the output of acpidump and dmesg on the 2.6.27-xx kernel?
   Thanks.
Comment 2 Tiago Requeijo 2009-04-07 04:53:32 UTC
Hi,

That's correct, it booted fine with acpi on 2.6.27-xx and lower.  I'm attaching the output of dmesg for acpi=ht on 2.6.28. The system didn't boot with acpi=noapic.

I'll post the results of acpidump and dmesg on the 2.6.27 kernel as soon as compile one with ext4 support.
I'll start the git bisect process when I finish downloading the whole kernel tree. 

Thanks
Comment 3 Tiago Requeijo 2009-04-07 04:55:03 UTC
Created attachment 20847 [details]
output of dmesg on 2.6.28, booting with acpi=ht
Comment 4 Zhang Rui 2009-04-07 07:00:12 UTC
what if you boot with pci=noacpi?
Comment 5 Tiago Requeijo 2009-04-07 12:25:56 UTC
Booting with pci=noacpi works.
Comment 6 ykzhao 2009-04-08 01:33:52 UTC
Sorry that I gave the incorrect boot option.
Will you please try the boot option of "noapic" and see whether the box can be booted?

Please also attach the output of dmesg on the 2.6.27-xx working kernel.
 Thanks.
Comment 7 Shaohua 2009-04-08 06:24:30 UTC
how about boot option 'acpi=noirq'?
Comment 8 Tiago Requeijo 2009-04-11 23:45:35 UTC
Created attachment 20941 [details]
output of dmesg on 2.6.27
Comment 9 Tiago Requeijo 2009-04-11 23:47:39 UTC
The output of dmesg on 2.6.27 is attached above.

On 2.6.28-xx, the options 'acpi=noirq' and 'pci=noapic' do not work. The boot process freezes at the same spot as before.
Comment 10 RogerRamonRamirez 2009-04-12 16:37:18 UTC
Hi there,
I've got the same problem with an inspiron 2500 (2.6.27 works and 2.8.28/29 does not boot except with acpi disabled or acpi=ht).

I've digged a bit into the kernel and saw that acpi goes into an endless loop (in acpi_ps_parse_loop) when the pci stuff tries to switch the cardbus bridge to state D0.

If it can help, I can send the results of acpidump & lspci. I've got a trace of the endless loop with acpi.debug_layer=0x00400010 acpi.debug_level=0x000007ff also. But maybe I should open a new report.

rrr
Comment 11 Shaohua 2009-04-13 00:49:01 UTC
please attach the result of acpidump and lspci. It will be very helpful to evaulate your observation.
Comment 12 RogerRamonRamirez 2009-04-13 09:07:06 UTC
Created attachment 20956 [details]
acpidump on a Dell Inspiron 2500
Comment 13 RogerRamonRamirez 2009-04-13 09:09:36 UTC
Created attachment 20957 [details]
And the result of lspci -vv
Comment 14 RogerRamonRamirez 2009-04-13 09:16:20 UTC
Hi,
I've attached the dumps. The acpidump is really a text file (I've checked the wrong option). And here's a part of the boot log

...
  nsdump-0087 [09] ns_print_pathname         : [PMS0]
nssearch-0110 [11] ns_search_one_scope       : Searching \_SB_.PCI0.HUB_.CDB0.CAIN (ce81e8e8) For [PMS0] (Untyped)
nssearch-0174 [11] ns_search_one_scope       : Name [PMS0] (Untyped) not found in search in scope [CAIN] ce81e8e8 first child (null)
nssearch-0240 [11] ns_search_parent_tree     : Searching parent [CDB0] for [PMS0]
nssearch-0110 [12] ns_search_one_scope       : Searching \_SB_.PCI0.HUB_.CDB0 (ce81e7e0) For [PMS0] (Untyped)
nssearch-0145 [12] ns_search_one_scope       : Name [PMS0] (RegionField) ce81e888 found in scope [CDB0] ce81e7e0
nsaccess-0404 [09] ns_lookup                 : Seaching relative to prefix scope [CAIN] (ce81e8e8)
nsaccess_0514 [09] ns_lookup                 : Simple Pathname (1 segment, Flags=3)
  nsdump-0087 [09] ns_print_pathname         : [PMS0]
        ......... etc
Comment 15 GNUtoo 2009-05-02 17:56:26 UTC
Created attachment 21188 [details]
2.6.29.2.txt

same issue with 2.6.29.2 on inspiron 2500
Comment 16 Zhang Rui 2009-05-05 02:23:31 UTC
As this is a regression,
would you please use git-bisect to find out which commit introduces the problem?
Comment 17 Tiago Requeijo 2009-05-14 12:40:18 UTC
The result from git-bisect:

eab4b645769fa2f8703f5a3cb0cc4ac090d347af is first bad commit
commit eab4b645769fa2f8703f5a3cb0cc4ac090d347af
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Mon Aug 11 14:54:16 2008 +0800

    ACPI: Attach the ACPI device to the ACPI handle as early as possible
    
    Attach the ACPI device to the ACPI handle as early as possible so that OS
    can get the corresponding ACPI device by the acpi handle in the course
    of getting the power/wakeup/performance flags.
    
    http://bugzilla.kernel.org/show_bug.cgi?id=8049
    http://bugzilla.kernel.org/show_bug.cgi?id=11000
    
    Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
    Signed-off-by: Zhang Rui <rui.zhang@intel.com>
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

:040000 040000 edb720b0671993e3ae3956f58710c9470264760c 394ad6c935fa660606bb12082c4e834a232d87be M	drivers
Comment 18 Zhang Rui 2009-05-15 01:32:16 UTC
wow, so the problem doesn't exist if you revert this patch, right?
Comment 19 Tiago Requeijo 2009-05-15 13:03:31 UTC
I didn't try reverting this patch on a newer kernel.  Nevertheless, the kernel works fine right before that patch.

I'll check to see if this patch can be (easily) reverted on a more current kernel. If so, I'll let you know if it solves the problem.
Comment 20 Len Brown 2009-05-19 01:32:58 UTC
does reverting eab4b645769fa2f8703f5a3cb0cc4ac090d347af fix the bug?
Comment 21 Tiago Requeijo 2009-05-20 13:05:15 UTC
Reverting the patch fixes the problem.
Comment 22 Zhang Rui 2009-05-21 01:47:30 UTC
please attach the dmesg output after revert this patch.
Comment 23 Tiago Requeijo 2009-05-26 19:39:56 UTC
Created attachment 21561 [details]
dmesg output after patch reversion on 2.6.30-rc5
Comment 24 Tiago Requeijo 2009-05-26 19:40:39 UTC
Created attachment 21562 [details]
output of acpidump after patch reversion on 2.6.30-rc5
Comment 25 Tiago Requeijo 2009-05-26 19:42:00 UTC
I attached above the outputs of dmesg and acpidump (in case it helps) after reverting the patch.
Comment 26 ykzhao 2009-05-27 02:54:30 UTC
Created attachment 21575 [details]
skip getting the power state for the device CDB0/CDB1
Comment 27 ykzhao 2009-05-27 03:03:34 UTC
Hi, Tiago
    Will you please try the debug patch in comment #26 and see whether the box can be booted?
    From the log in comment #24 it seems that the box can be booted if reverting the commit of eab4b645769fa2f8703f5a3cb0cc4ac090d347af.
    After reverting the above commit, OS won't detect the power state if it is power_manageable. This is what we have done in the 2.6.27 kernel.
    From the acpidump we know that there exists the _PSC object for the device CDB0/CDB1, which is evaluated in course of getting the power state.
    
    Please try the debug patch in comment #26. If it can be booted, please attach the output of dmesg. In the debug patch it will skip getting the power state for the device CDB0/CDB1.
    Thanks.
Comment 28 ykzhao 2009-05-27 03:12:07 UTC
Created attachment 21577 [details]
skip getting the power state for all the devices in the boot phase

If the box still hangs with the patch in comment #26 applied, will you please try the attached patch and see whether the box can be booted?
    Thanks.
Comment 29 Tiago Requeijo 2009-05-28 15:38:51 UTC
No luck with the patch from comment #26, the computer still hangs on boot. 

Patch #28 lets the computer boot.

Just for testing, I changed the code in your patch #26 so every device is skipped. The following is the relevant part from dmesg:

[    0.027737] ACPI: EC: Look up EC in DSDT
[    0.032282] ACPI: Interpreter enabled
[    0.032442] ACPI: (supports S0 S1 S3 S4 S5)
[    0.032895] ACPI: Using PIC for interrupt routing
[    0.033572] skip getting the power state for VGA
[    0.033924] skip getting the power state for CB1
[    0.037402] skip getting the power state for FDC
[    0.038019] ACPI: EC: non-query interrupt received, switching to interrupt mode
[    0.044315] skip getting the power state for PRID
[    0.044467] skip getting the power state for SECD
[    0.044848] ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62
[    0.044998] ACPI: EC: driver started in interrupt mode
[    0.045399] ACPI: No dock devices found.
[    0.045588] ACPI: PCI Root Bridge [PCI0] (0000:00)

In particular, there is no CDB0/CDB1 device listed.
Comment 30 ykzhao 2009-06-01 02:46:59 UTC
Hi, Tiago
    thanks for the test. It seems that the box can be booted if we skip getting the power state.
    Sorry for that I mix the acpidump on your box with that in comment #12.
    Will you please try the update patch and see whether the box can be booted?
    It will be great if you can attach the output of lspci -vxxx.
    Thanks.
Comment 31 ykzhao 2009-06-01 02:49:56 UTC
Created attachment 21674 [details]
skip getting the power state for the device CB0/CB1

Will you please try the updated debug patch and see whether the box can be booted?
   thanks.
Comment 32 Tiago Requeijo 2009-06-01 15:41:28 UTC
The laptop boots fine with the patch for the device CB0/CB1.

Relevant dmesg part:

[    0.031943] ACPI: EC: Look up EC in DSDT
[    0.038176] ACPI: Interpreter enabled
[    0.038362] ACPI: (supports S0 S1 S3 S4 S5)
[    0.038876] ACPI: Using PIC for interrupt routing
[    0.040353] skip getting the power state for CB1
[    0.045538] ACPI: EC: non-query interrupt received, switching to interrupt mode
[    0.051273] ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62
[    0.051444] ACPI: EC: driver started in interrupt mode
[    0.051951] ACPI: No dock devices found.
[    0.052024] ACPI: PCI Root Bridge [PCI0] (0000:00)
Comment 33 Tiago Requeijo 2009-06-01 15:42:20 UTC
Created attachment 21691 [details]
output of lspci -vxxx

output of lspci -vxxx
Comment 34 Nicholas Kudriavtsev 2009-06-28 18:36:35 UTC
I have a Compal CL10 based notebook and am experienced the same problem with Fedora 11 (Kernels 2.6.29.4 and 2.6.29.5). Patch from #31 has helped with a little change. My Cardbus Bridges are named CB1 and CB2, so my change is:

	if (!strncmp(acpi_device_bid(device), "CB", 2)) {
		printk(KERN_DEBUG "skip getting the power state for %s\n",
			acpi_device_bid(device));
	} else {
		acpi_bus_get_power(device->handle, &(device->power.state));
	}
Comment 35 ykzhao 2009-07-06 09:15:09 UTC
Created attachment 22229 [details]
try the debug patch

Will you please try the debug patch on the latest kernel and capture the screenshot when the box can't be booted?
    
BTW: please add the boot option of "initcall_debug".
Thanks.
Comment 36 Nicholas Kudriavtsev 2009-07-08 17:51:07 UTC
Created attachment 22265 [details]
2.6.31-rc2, initcall_debug - hang, screenshot
Comment 37 Nicholas Kudriavtsev 2009-07-08 17:54:25 UTC
Created attachment 22266 [details]
2.6.31-rc2, initcall_debug pci=noacpi - boot "somehow", dmesg
Comment 38 Nicholas Kudriavtsev 2009-07-08 17:58:27 UTC
Comment on attachment 22265 [details]
2.6.31-rc2, initcall_debug - hang, screenshot

Patch id=22265 has been applied.
Comment 39 Nicholas Kudriavtsev 2009-07-08 18:00:45 UTC
Comment on attachment 22265 [details]
2.6.31-rc2, initcall_debug - hang, screenshot

Patch id=22265 has been applied.
Comment 40 Nicholas Kudriavtsev 2009-07-08 18:06:50 UTC
Please, remove two last comment (38, 39) - they are erroneous. :(
Comment 41 ykzhao 2009-07-20 12:16:38 UTC
Hi, Nicholas
    From the log in comment #37 it seems that the box can be booted with the boot option of "pci=noacpi". Will you please double check it again?
    Thanks.
Comment 42 Nicholas Kudriavtsev 2009-07-20 13:27:35 UTC
Hi,

Yes, kernel boots, but then there are problems. "irq 10: nobody cared (try booting with the "irqpoll" option)" - irqpoll doesn't help and CardBus WiFi adapter doesn't work (no PCI IRQ, CardBus support disabled for this socket).
Comment 43 ykzhao 2009-07-22 05:13:01 UTC
Created attachment 22435 [details]
try the debug patch

Will you please try the debug patch and attach the output of dmesg after adding the boot option of "pci=noacpi"?
Please also capture the picture when it hangs.
Thanks.
Comment 44 Nicholas Kudriavtsev 2009-07-22 18:48:02 UTC
Created attachment 22448 [details]
dmesg file with patch from Comment #43 and pci=noacpi
Comment 45 Nicholas Kudriavtsev 2009-07-22 19:02:23 UTC
Created attachment 22449 [details]
dmesg file with new patch and pci=noacpi irqpoll

Sorry, I was wrong - irqpoll option partially helps, usb mouse has smooth movements again, otherwise it has harsh movements, but the notebook still has no CardBus functionality.
Comment 46 ykzhao 2009-07-27 02:12:06 UTC
Hi, Nicholas
    Will you please not add the boot option of "pci=noacpi" and capture the screen shot when it hangs?
   Thanks.
Comment 47 Nicholas Kudriavtsev 2009-07-27 18:10:00 UTC
Created attachment 22513 [details]
The screen shot without pci=noacpi option

Kernel version is 2.6.31-rc2 with the last debug patch.
Comment 48 Nicholas Kudriavtsev 2009-07-29 09:54:08 UTC
Hi, ykzhao

What are we looking for with the last debugging patches?

With patch from Comment #34 kernel boots, acpi is enabled, interrupts are assigned and handled, CardBus adapter (WiFi) works. What else we need?
Comment 49 david b 2009-09-05 03:06:35 UTC
This still affects a dell inspiron 2600, as of the latest 2.6.30.5 kernel (i think it was the 2.6.30.5 may have been the .4 tho). 


What is the status of the patches.... ? 
(when will a stable kernel be released which works on these old dells).
Comment 50 Len Brown 2009-09-05 17:36:31 UTC
Created attachment 23015 [details]
patch to revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible

Please verify that this patch makes the regression go away,
allowing your system to boot properly.

commit f61f925859c57f6175082aeeee17743c68558a6e
Author: Len Brown <len.brown@intel.com>
Date:   Sat Sep 5 13:33:23 2009 -0400

    Revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible"
    
    This reverts commit eab4b645769fa2f8703f5a3cb0cc4ac090d347af.
Comment 51 Nicholas Kudriavtsev 2009-09-06 06:33:11 UTC
Hi, Len

Have tested the patch against 2.6.30.5. The kernel boots.

Test against 2.6.31-rc8 will follow in a few day.
Comment 52 Nicholas Kudriavtsev 2009-09-07 07:44:26 UTC
Kernel 2.6.31-rc8 boots with the patch also.
Comment 53 Len Brown 2009-09-24 21:41:27 UTC
Linux-2.6.32-git14 (pre 2.6.32-rc1) includes this commit:

commit f61f925859c57f6175082aeeee17743c68558a6e
Author: Len Brown <len.brown@intel.com>
Date:   Sat Sep 5 13:33:23 2009 -0400

    Revert "ACPI: Attach the ACPI device to the ACPI handle as early as possible"
    
    This reverts commit eab4b645769fa2f8703f5a3cb0cc4ac090d347af.

Note You need to log in before you can comment on or make changes to this bug.