Bug 10140

Summary: Kernel blocked after "ACPI: bus type pci registered"
Product: Platform Specific/Hardware Reporter: François Valenduc (francoisvalenduc)
Component: x86-64Assignee: Thomas Gleixner (tglx)
Status: CLOSED CODE_FIX    
Severity: high CC: bunk, francis.moro, mingo, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc3-git3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: Output of dmesg (with kernel 2.6.25-rc3)
Patch for testing
Patch fixing the problem for me.

Description François Valenduc 2008-03-01 08:13:46 UTC
Latest working kernel version: 2.6.25-rc3
Earliest failing kernel version:2.6.25-rc3-git3
Distribution: Gentoo
Hardware Environment: Packard Bell EasyNote, Intel Core 2 Duo
Software Environment: Gentoo
Problem Description: 

Steps to reproduce: Simply try to boot the computer !

With the current version of 2.6.25-git, the kernel stop after "ACPI: bus type pci registered". After having tried git-bisect once again, it seems that the following commit is problematic:

8be8f54bae3453588011cad06363813a5293af53 is first bad commit
commit 8be8f54bae3453588011cad06363813a5293af53
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sat Feb 23 20:43:21 2008 +0100

    x86: CPA: avoid split of alias mappings

If I revert it, things work a little better but the kernel once again stop when it try to scan the partition table.
Comment 1 François Valenduc 2008-03-01 08:19:00 UTC
Created attachment 15104 [details]
Output of dmesg (with kernel 2.6.25-rc3)
Comment 2 Thomas Gleixner 2008-03-02 11:02:18 UTC
Created attachment 15109 [details]
Patch for testing

Can you please apply the attached patch on top of git latest ?
Comment 3 Rafael J. Wysocki 2008-03-02 11:29:09 UTC
With the patch from Comment #2 applied I get:

  CC      arch/x86/kernel/traps_64.o
/home/rafael/src/linux-2.6/arch/x86/mm/pageattr.c: In function ‘cpa_process_alias’:
/home/rafael/src/linux-2.6/arch/x86/mm/pageattr.c:626: error: ‘struct cpa_data’ has no member named ‘tlb_flush’
/home/rafael/src/linux-2.6/arch/x86/mm/pageattr.c:627: error: ‘struct cpa_data’ has no member named ‘tlb_flush’
/home/rafael/src/linux-2.6/arch/x86/mm/pageattr.c:655: error: ‘struct cpa_data’ has no member named ‘tlb_flush’
/home/rafael/src/linux-2.6/arch/x86/mm/pageattr.c:656: error: ‘struct cpa_data’ has no member named ‘tlb_flush’
make[2]: *** [arch/x86/mm/pageattr.o] Error 1
Comment 4 Rafael J. Wysocki 2008-03-02 11:30:44 UTC
The commit mentioned above also breaks my HP nx6325.  The box crashes early in the boot process, 100% of the time.
Comment 5 Rafael J. Wysocki 2008-03-02 11:32:25 UTC
This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.

Handled-By : Thomas Gleixner <tglx@linutronix.de>
Comment 6 Rafael J. Wysocki 2008-03-02 11:38:34 UTC
References : http://lkml.org/lkml/2008/2/28/153
Comment 7 Rafael J. Wysocki 2008-03-02 13:30:11 UTC
Created attachment 15110 [details]
Patch fixing the problem for me.

The attached patch fixes the issue for me.  Can you test it, please?
Comment 8 Thomas Gleixner 2008-03-02 16:28:50 UTC
Doh, ignore the patch. We revert the commit for now and try to figure out why your boot gets stuck later.
Comment 9 François Valenduc 2008-03-03 00:57:36 UTC
The patched proposed by Rafaël indeed avoid blocking the kernel. In fact, the kernel is blocked right when the framebuffer (uvesafb in my case) would be initialized.
I will retry with the commit reverted and see what happens. I don't understand why it can't boot in that case, since during the different stages of the bisection, the kernel always booted when git-bisect stopped at a point where this commit was not applied. Maybe I made a mistake with the initramfs image needed to boot from a LVM2 root partition. 
Comment 10 François Valenduc 2008-03-03 10:00:48 UTC
Reverting the commit or applying the patch proposed by Rafael solves the problem. With the commit applied and uvesafb disabled, the problem also occurs. So, the initialization of the framebuffer is not the cause of the problem.