Bug 11224

Summary: Only three cores found on quad-core machine.
Product: Platform Specific/Hardware Reporter: Rafael J. Wysocki (rjw)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED CODE_FIX    
Severity: normal CC: cebbert, davej, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    

Description Rafael J. Wysocki 2008-08-01 15:18:29 UTC
Subject    : Only three cores found on quad-core machine.
Submitter  : Dave Jones <davej@redhat.com>
Date       : 2008-08-01 18:15
References : http://marc.info/?l=linux-kernel&m=121761475224719&w=4

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Thomas Gleixner 2008-09-04 12:42:40 UTC
Dave, still not fixed ?
Comment 2 Chuck Ebbert 2008-09-06 17:59:30 UTC
0000000000000000 <.text>:
   0:   4a 8b 14 ea             mov    (%rdx,%r13,8),%rdx

rdx: ffff880001027f80
r13: 000a81a000000002

Non-canonical address?
Comment 3 H. Peter Anvin 2008-09-06 20:37:30 UTC
Looks like r13 contains crap, there...
Comment 4 Chuck Ebbert 2008-09-12 19:00:09 UTC
Looks like CPU2 never finished its initialization:

Booting processor 2/1 ip 6000
Initializing CPU#2
Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: failed
... APIC #1 VERSION: failed
... APIC #1 SPIV: failed
migration/2 used greatest stack depth: 7224 bytes left
Comment 5 Chuck Ebbert 2008-09-12 19:14:57 UTC
And then later it oopsed in the CPU init code, probably due to ftrace.

After that things go wrong because the CPU init never finished but that processor is trying to allocate memory to handle the oops.
Comment 6 Rafael J. Wysocki 2008-10-10 12:33:46 UTC
On Friday, 10 of October 2008, Dave Jones wrote:
> On Wed, Oct 08, 2008 at 03:24:40PM +1100, Nick Piggin wrote:
>  > On Wednesday 08 October 2008 02:32, Dave Jones wrote:
>  > > On Wed, Oct 08, 2008 at 02:18:18AM +1100, Nick Piggin wrote:
>  > >  > On Sunday 05 October 2008 04:32, Rafael J. Wysocki wrote:
>  > >  > > This message has been generated automatically as a part of a report
>  > >  > > of recent regressions.
>  > >  > >
>  > >  > > The following bug entry is on the current list of known regressions
>  > >  > > from 2.6.26.  Please verify if it still should be listed and let me
>  > >  > > know (either way).
>  > >  > >
>  > >  > >
>  > >  > > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=11224
>  > >  > > Subject             : Only three cores found on quad-core machine.
>  > >  > > Submitter   : Dave Jones <davej@redhat.com>
>  > >  > > Date                : 2008-08-01 18:15 (65 days old)
>  > >  > > References  :
>  http://marc.info/?l=linux-kernel&m=121761475224719&w=4
>  > >  >
>  > >  > Dave, is your CPU2 getting stuck in calibrate_delay? Can you still
>  > >  > reproduce the bug? What if you boot the kernel with lpj=<something
>  sane
>  > >  > like 2666785> in order to skip the calibrate_delay code?
>  > >  >
>  > >  > If that helps, can you try adding some printks to narrow down where
>  it
>  > >  > is getting stuck?
>  > >
>  > > This is going to sound strange, but I've forgotten which box that was.
>  > > I'm due to reinstall all my test boxes now that the next Fedora beta
>  > > has come out anyway, so I'll hopefully figure out by the end of the week
>  :)
>  > 
>  > That does sound strange ;) But I can't believe that! It would be really
>  > interesting if it was stuck in calibrate_delay, but maybe the backtrace
>  > is just messed up...
>  > 
>  > If you do manage to reproduce it, I would be quite interested. Otherwise,
>  > I think we can say it's not a showstopper for 2.6.27.
> 
> Ok, this can be closed out. 2.6.27 seems to work fine on all the
> quad core machines I have in my cube.