Bug 2 - NUMA-Q hangs during TSC initialization on boot.
NUMA-Q hangs during TSC initialization on boot.
Status: CLOSED CODE_FIX
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386
IA-32 Linux
: P2 normal
Assigned To: Martin J. Bligh
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2002-11-13 15:56 UTC by Martin J. Bligh
Modified: 2003-01-05 11:28 UTC (History)
0 users

See Also:
Kernel Version:
Tree: Mainline
Regression: ---


Attachments

Description Martin J. Bligh 2002-11-13 15:56:16 UTC
Exact Kernel version: 2.5.46
Distribution: debian woody
Harware Environment: 16-way NUMA-Q
Problem Description: Hangs during TSC initialization on boot.

There's a garbled panic during IO-APIC init, then hang during TSC sync
Comment 1 Martin J. Bligh 2002-11-13 15:58:49 UTC
Here's an ungarbled panic, derived by putting some delays into the IO-APIC init
code:

CPU:    1
EIP:    0060:[<ffffff97>]    Not tainted
EFLAGS: 00010286
EIP is at E ipv4_config+0x3fc828af/0xffe42c88
eax: 00000000   ebx: c3934940   ecx: c031f178   edx: 036147a0
esi: 00000000   edi: f019c000   ebp: 00000001   esp: f019def0
ds: 0068   es: 0068   ss: 0068
Process swapper (pid: 0, threadinfo=f019c000 task=f01c5740)
Stack: f019c000 00000000 00000001 f019df10 c3934940 036147a0 c031f178 00000000
       c011d8b5 00000000 00000011 c02db960 ffffffee 00000020 c031f178 c031f178
       c011d5ba c02db960 f019c000 00000000 c02aff00 f019df64 00000046 c010904d
Call Trace:
 [<c011d8b5>] tasklet_hi_action+0x85/0xe0
 [<c011d5ba>] do_softirq+0x5a/0xac
 [<c010904d>] do_IRQ+0x18d/0x1b0
 [<c01078cc>] common_interrupt+0x18/0x20
 [<c0118bc8>] _call_console_drivers+0x50/0x58
 [<c0118ca9>] call_console_drivers+0xd9/0xe0
 [<c0118ee2>] release_console_sem+0x42/0xa4
Code:  Bad EIP value.
<0>Kernel panic: Aiee, killing interrupt handler!
Comment 2 Martin J. Bligh 2002-11-13 16:02:17 UTC
A non-boot cpu is getting a timer interrupt, and trying to do
softirq processing in irq_exit. However, the per-cpu stuff for
this is not initialised yet.

This patch fixes the problem by disabling interrupts for the
secondary CPU from before we program the IO-APIC until after
we've done the init in __cpu_up.

diff -purN -X /home/mbligh/.diff.exclude virgin/arch/i386/kernel/smpboot.c
noearlyirq/arch/i386/kernel/smpboot.c
--- virgin/arch/i386/kernel/smpboot.c   Mon Nov  4 14:30:27 2002
+++ noearlyirq/arch/i386/kernel/smpboot.c       Wed Nov  6 15:22:12 2002
@@ -419,6 +419,7 @@ void __init smp_callin(void)
        smp_store_cpu_info(cpuid);
 
        disable_APIC_timer();
+       local_irq_disable();
        /*
         * Allow the master to continue.
         */
@@ -1186,6 +1187,7 @@ int __devinit __cpu_up(unsigned int cpu)
        if (!test_bit(cpu, &cpu_callin_map))
                return -EIO;
 
+       local_irq_enable();
        /* Unleash the CPU! */
        set_bit(cpu, &smp_commenced_mask);
        while (!test_bit(cpu, &cpu_online_map))

Note You need to log in before you can comment on or make changes to this bug.