Bug 9473

Summary: [parisc] 2.6.24-rc3 (64-bit, smp) fails to boot on 9000/785/J5600
Product: Platform Specific/Hardware Reporter: Rafael J. Wysocki (rjwysocki)
Component: PA-RISCAssignee: Kyle McMartin (kyle)
Status: CLOSED CODE_FIX    
Severity: normal CC: elendil, kyle, mingo
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    

Description Rafael J. Wysocki 2007-11-28 15:50:55 UTC
Subject         : [parisc] 2.6.24-rc3 (64-bit, smp) fails to boot on 9000/785/J5600
Submitter       : Frans Pop <elendil@planet.nl>
References      : http://lkml.org/lkml/2007/11/27/72
Handled-By      : Kyle McMartin <kyle@mcmartin.ca>
Comment 1 Ingo Molnar 2007-11-30 06:46:12 UTC
Is there any way on HP-PARISC to figure out the place of such hangs? (like nmi_watchdog=1/2 on x86)?

if there's no such mechanism and if you've got time you could try the latency tracer and its print_functions feature:

http://people.redhat.com/mingo/latency-tracing-patches/latency-tracing-v2.6.24-rc3.combo.patch

but this would need some hacking from a HP-PARISC developer, the mcount stub is needed to make boot-hang debugging functional. In that case mcount_enabled=1 and print_functions=1 together can help debug the location of such hangs.
Comment 2 Kyle McMartin 2007-11-30 06:52:03 UTC
Yes, rather easily. All PA-RISC machines have a TOC switch (Transfer of Control) that boots control back to firmware, saves registers to nvram and reboots. Won't provide a stack trace, but at least provides the program counter.

In this case, the problem was that a recent commit adds a check for IRQ_DISABLED to the IRQ_PER_CPU codepath, and for some reason, we were accidently |= IRQ_PER_CPU instead of setting it, so IRQ_DISABLED leaked through.

(Actually, you've inspired me to go check to see if I can get a high priority interrupt on TOC, so I could dump the stack trace...)
Comment 4 Ingo Molnar 2007-11-30 06:57:07 UTC
> Yes, rather easily. All PA-RISC machines have a TOC switch (Transfer 
> of Control) that boots control back to firmware, saves registers to 
> nvram and reboots. Won't provide a stack trace, but at least provides 
> the program counter.

heh, that's easier than x86, which has an NMI watchdog only after some 
time (so we cannot easily debug early-bootup hangs).

> In this case, the problem was that a recent commit adds a check for 
> IRQ_DISABLED to the IRQ_PER_CPU codepath, and for some reason, we were 
> accidently |= IRQ_PER_CPU instead of setting it, so IRQ_DISABLED 
> leaked through.
> 
> (Actually, you've inspired me to go check to see if I can get a high 
> priority interrupt on TOC, so I could dump the stack trace...)

yeah, would be useful i guess.
Comment 5 Ingo Molnar 2007-12-06 03:18:14 UTC
i guess this fix will hit 2.6.24, right? Havent seen it in -rc4 yet.
Comment 6 Frans Pop 2007-12-07 13:52:37 UTC
This is in mainline now and can be closed again:
2421ba5b57ddbc3a972b9d6fb884817c39d2fff7
Comment 7 Ingo Molnar 2007-12-07 14:10:42 UTC
great. Fix will first show up in 2.6.24-rc5.