Bug 41522

Summary: Sun Blade 100 kernel panics during boot
Product: Platform Specific/Hardware Reporter: Jim Faulkner (jfaulkne)
Component: SPARC64Assignee: platform_sparc64
Status: RESOLVED INVALID    
Severity: normal CC: akpm, jfaulkne
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0.3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: blade 100 kernel panic image 1
blade 100 kernel panic image 2
blade 100 kernel panic image 3
My .config
kernel panic w/ boot_delay image 1
kernel panic w/ boot_delay image 2
kernel panic w/ boot_delay image 3
kernel panic w/ boot_delay image 4
kernel panic w/ boot_delay image 5
kernel panic w/ boot_delay image 6
kernel panic w/ boot_delay image 7
kernel panic w/ boot_delay image 8
updated .config

Description Jim Faulkner 2011-08-21 19:28:51 UTC
Created attachment 69562 [details]
blade 100 kernel panic image 1

My Sun Blade 100 hangs when I boot any recent kernel.  I've tried many Debian and Gentoo installation CDs, but every CD made since 2008 or so prints "console [tty0] enabled, bootconsole disabled" and hangs.  I have to use Gentoo's 2007.0 install CD to in order to complete booting and not disable the bootconsole.

I've built Linux 3.0.3 and tried it with the same results.  When I boot with the "keep_bootcon" kernel parameter I see that Linux is kernel panicking in pci_scan_one_pbm.  I'm attaching three photos of the full kernel panic.
Comment 1 Jim Faulkner 2011-08-21 19:29:41 UTC
Created attachment 69572 [details]
blade 100 kernel panic image 2
Comment 2 Jim Faulkner 2011-08-21 19:30:21 UTC
Created attachment 69582 [details]
blade 100 kernel panic image 3
Comment 3 Jim Faulkner 2011-08-21 23:08:41 UTC
Created attachment 69592 [details]
My .config
Comment 4 Andrew Morton 2011-08-22 22:04:17 UTC
I'm not sure that any sparc maintainers follow bugzilla.  You'll probably have better luck using the sparclinux@vger.kernel.org mailing list.
Comment 5 Jim Faulkner 2011-08-24 19:40:23 UTC
Created attachment 70002 [details]
kernel panic w/ boot_delay image 1
Comment 6 Jim Faulkner 2011-08-24 19:40:59 UTC
Created attachment 70012 [details]
kernel panic w/ boot_delay image 2
Comment 7 Jim Faulkner 2011-08-24 19:42:39 UTC
Created attachment 70042 [details]
kernel panic w/ boot_delay image 3
Comment 8 Jim Faulkner 2011-08-24 19:43:27 UTC
Created attachment 70062 [details]
kernel panic w/ boot_delay image 4
Comment 9 Jim Faulkner 2011-08-24 19:44:10 UTC
Created attachment 70082 [details]
kernel panic w/ boot_delay image 5
Comment 10 Jim Faulkner 2011-08-24 19:44:26 UTC
Created attachment 70092 [details]
kernel panic w/ boot_delay image 6
Comment 11 Jim Faulkner 2011-08-24 19:45:12 UTC
Created attachment 70102 [details]
kernel panic w/ boot_delay image 7
Comment 12 Jim Faulkner 2011-08-24 19:45:36 UTC
Created attachment 70112 [details]
kernel panic w/ boot_delay image 8
Comment 13 Jim Faulkner 2011-08-25 17:28:55 UTC
Created attachment 70212 [details]
updated .config
Comment 14 Jim Faulkner 2011-09-02 01:12:55 UTC
On Wed, Aug 31, 2011 at 03:59:56PM -0400, David Miller wrote:
>
> There is a non-trivial problem with the device tree reported by your
> firmware, several PCI device nodes are reported multiple times.
>
> If you'll look 'isa' appears twice, and so does 'pmu'.
>
> This is a very serious issue.
>
> And that's what is causing all of these boot failures.
>
> I have two theories, either the Tulip card makes the firmware corrupt
> the device tree like this.  Or, alternatively, there is some bug in
> the version of OBP installed on this machine.
>
> This really isn't a kernel bug.  And if it worked in the past, it worked
> entirely by accident.

Thanks so much for the info!  I finally have 3.0.3 booting normally.
First thing I did was update to the latest firmware, 4.17.1, which can
be found at http://ftp.sunet.se (the readme is here:
http://ftp.sunet.se/pub/security/vendor/sun/patches/all_unsigned/119235-01.README).
This was a bit of a chore because neither the SILO nor the Solaris
loader wanted to boot the firmware update.  I had to setup a tftp
netboot server in order to patch the firmware.

But, that didn't fix the problem.

Next, I yanked the tulip card.  That didn't fix the problem either.

Finally, I reset nvram by typing this at openprom's "ok" prompt:
set-defaults

A reset-all and... it works!  I can boot 3.0.3!