Bug 15075

Summary: [bisected] Boot hangs in cpu_debug_init() call on AMD Athlon XP processors
Product: Platform Specific/Hardware Reporter: Ozan Caglayan (ozan)
Component: i386Assignee: platform_i386
Status: CLOSED OBSOLETE    
Severity: high CC: florian, hpa, jaswinder, mingo, rjw, torvalds, yinghai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31.y Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13615    

Description Ozan Caglayan 2010-01-17 09:20:32 UTC
Hi,

this is an abstract of the relevant LKML thread about $summary.

Boot hangs at the following step on users having an AMD Athlon XP processor (Not an 64 or X2 one, plain XP):

TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
initcall inet_init+0x0/0x199 returned 0 after 3585 usecs
calling af_unix_init+0x0/0x47 @ 1
NET: Registered protocol family 1
initcall af_unix_init+0x0/0x47 returned 0 after 101 usecs
calling populate_rootfs+0x0/0x62 @ 1
Unpacking initramfs...
Freeing initrd memory: 5109k freed
initcall populate_rootfs+0x0/0x62 returned 0 after 215338 usecs
calling i8259A_init_sysfs+0x0/0x1d @ 1
initcall i8259A_init_sysfs+0x0/0x1d returned 0 after 42 usecs
calling sbf_init+0x0/0xda @ 1
initcall sbf_init+0x0/0xda returned 0 after 0 usecs
calling i8237A_init_sysfs+0x0/0x1d @ 1
initcall i8237A_init_sysfs+0x0/0x1d returned 0 after 13 usecs
calling add_rtc_cmos+0x0/0x94 @ 1
initcall add_rtc_cmos+0x0/0x94 returned 0 after 4 usecs
calling cache_sysfs_init+0x0/0x55 @ 1
initcall cache_sysfs_init+0x0/0x55 returned 0 after 64 usecs
calling cpu_debug_init+0x0/0xe3 @ 1

Not that this is the place where on 2.6.30.9 it continues with:
 
[    0.404102] cpu0(1) debug files 5 <--
[    0.404109] Machine check exception polling timer started.
[    0.404119] cpufreq-nforce2: Detected nForce2 chipset revision C1
[    0.404122] cpufreq-nforce2: FSB changing is maybe unstable and can lead to
crashes and data loss.
[    0.404135] cpufreq-nforce2: FSB currently at 167 MHz, FID 11.5
[    0.404155] ondemand governor failed, too long transition latency of HW,
fallback to performance governor


First of all I disabled CONFIG_X86_CPU_DEBUG and all the faulty systems booted correctly. Then I checked the latest changes on cpu_debug code and found out that reverting the following commit which was introduced during 2.6.31 merge window fixes the problem:

From 5095f59bda6793a7b8f0856096d6893fe98e0e51 Mon Sep 17 00:00:00 2001
From: Jaswinder Singh Rajput <jaswinder@kernel.org>
Date: Fri, 5 Jun 2009 23:27:17 +0530
Subject: [PATCH] x86: cpu_debug: Remove model information to reduce encoding-decoding

Unless it's fixed or reverted it should affect also the current 2.6.32.y and linux-2.6 master branch.
Comment 1 Rafael J. Wysocki 2010-01-17 13:41:38 UTC
This is a regression from 2.6.30, so I'm linking it to bug #13615.
Comment 2 Florian Mickler 2010-10-07 21:02:07 UTC
It seems there has been no action on this bugreport?
Comment 3 Linus Torvalds 2010-10-07 21:47:54 UTC
On Thu, Oct 7, 2010 at 2:02 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
>
> It seems there has been no action on this bugreport?

On the report? No. But X86_CPU_DEBUG was removed back in January as a
result of the lkml thread (commit
b160091802d4a76dd063facb09fcf10bf5d5d747: "x86: Remove "x86 CPU
features in debugfs" (CONFIG_X86_CPU_DEBUG)").

People need to understand that bugzilla is the _secondary_ bug
tracking thing, not the primary one. Nobody goes to bugzilla to search
for these things. If the reporter didn't close it as fixed (or, update
it to "still pending"), then the bugzilla entry is dead.

                    Linus
Comment 4 Florian Mickler 2010-10-07 22:58:21 UTC
No problem. just trying to clean up. 

Regards,
Flo 

p.s.: 
I think I actually saw all the /^-/ when going through the git log -p output for this...
Comment 5 Florian Mickler 2010-10-07 23:04:20 UTC
Oh and btw, my procedure for cleaning up is to first check the cc's then check the google and the git log for obvious fixes. Only if I can't determine the status of a bugzilla entry, did not see any related mailinglist traffic of the reporter regarding this issue, I will ask. 

Otherwise, just shutting down a bug report may result in the wrong impressions. 

Sorry if I startled you.
Comment 6 Ozan Caglayan 2010-10-08 07:01:32 UTC
Sorry I've completely forgot this bug report. BTW, isn't RESOLVED/CODE_FIX more appropriate as the bug is resolved with the commit that Linus quoted?
Comment 7 Florian Mickler 2010-10-08 08:47:38 UTC
It does not matter in the end. But the feature got removed, so this not really a fix, is it? I mean, you can't use it any more after the removal. 

(Note: I'm totally not saying the removal was wrong, it seems to have been redundant anyway)
Comment 8 H. Peter Anvin 2010-10-08 16:01:58 UTC
We could use a CLOSED/OBSOLETE or CLOSED/MOOT.