Bug 61501

Summary: [BISECTED]Kernels greater than 3.9.11 (all 3.10 and 3.11 kernels I have tried) will not operate in smp with more than one processor.
Product: Platform Specific/Hardware Reporter: dustin (dustin.glidden)
Component: SPARC64Assignee: platform_sparc64
Status: RESOLVED CODE_FIX    
Severity: high CC: adrien.dessemond, alan, joel.bertrand
Priority: P1    
Hardware: Sparc64   
OS: Linux   
Kernel Version: >3.9.11 Tree: Mainline
Regression: No

Description dustin 2013-09-16 17:37:43 UTC
Testing has been done on a Sun Fire T2000 (niagara), booting with maxcpus=1 will work on newer kernel versions but any more than that results in a hang during the boot process.  It hangs at "console [tty0] enabled, bootconsole disabled" whereas on a normal boot things continue to load normally.  Booting with keep_bootcon results in the kernel hanging at "pci 0001:05:02.0: Activating ISA DMA hang workarounds".
Comment 1 dustin 2013-11-06 20:55:11 UTC
It was this commit: e0972916e8fe943f342b0dd1c9d43dbf5bc261c2 which caused the issue.  Unfortunately this was when the perf-core-for-linus branch was merged and reverting the files breaks everything.
Comment 2 BERTRAND Joël 2014-02-09 20:18:17 UTC
Same constatation here with several T1000 and kernel 3.12.9. Have you found a workaround ?
Comment 3 dustin 2014-02-17 18:03:22 UTC
I have not, though not for a lack of trying.  I am currently running on 3.4.78.


(In reply to BERTRAND Joël from comment #2)
> Same constatation here with several T1000 and kernel 3.12.9. Have you found
> a workaround ?
Comment 4 Adrien Dessemond 2014-03-23 22:58:13 UTC
Same probleme here with a T1000 machine al
Comment 5 Adrien Dessemond 2014-03-23 23:47:32 UTC
(Sorry for the previous bogus post, my mistake)

I do have a very similar issue here with a T1000 machine. I tried linux 3.14-rc7: in my case the kernel boots without any trouble or special error message until it tries to initialize the SAS controller. The very last message is:

[   36.843027] Copyright (c) 1999-2008 LSI Corporation
[   36.843104] Fusion MPT SAS Host driver 3.04.20
[   81.965701] Fusion MPT SAS Host driver 3.04.20
[   81.966264] mptbase: ioc0: Initiating bringup
[  115.865372] ioc0: LSISAS1064 A3: Capabilities={Initiator}

Then, a few seconds later, the kernel complains about CPU stalls:

[  141.865227] INFO: rcu_sched detected stalls on CPUs/tasks: { 12 14 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31} (detected by 0, t=60431 jiffies, g=18446744073709551320, c=18446744073709551319, q=1766)
[  141.866285] * CPU[  0]: TSTATE[0000000080001603] TPC[000000000042c174] TNPC[000000000042c178] TASK[swapper/0:0]
[  141.866596]              TPC[arch_cpu_idle+0x74/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[start_kernel+0x3b4/0x3c4]
[  141.866892]   CPU[  1]: TSTATE[0000000080001602] TPC[000000000042c170] TNPC[000000000042c174] TASK[swapper/1:0]
[  141.867033]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]
[  141.867148]   CPU[  2]: TSTATE[0000000080001602] TPC[000000000042c170] TNPC[000000000042c174] TASK[swapper/2:0]
[  141.867286]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]
[  141.867399]   CPU[  3]: TSTATE[0000000080001602] TPC[000000000042c170] TNPC[000000000042c174] TASK[swapper/3:0]

(..)

[  186.891368] scsi0 : ioc0: LSISAS1064 A3, FwRev=010a0000h, Ports=1, MaxQ=511, IRQ=25
[  186.949673] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x*******************
[  186.952366] scsi 0:0:0:0: Direct-Access     SEAGATE  ST973401LSUN72G  0556 PQ: 0 ANSI: 3
[  186.956645] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
[  186.958001] sd 0:0:0:0: [sda] Write Protect is off
[  186.959832] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 1, phy 1, sas_addr 0x*******************
[  186.960482] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
[  186.962365] scsi 0:0:1:0: Direct-Access     SEAGATE  ST973401LSUN72G  0556 PQ: 0 ANSI: 3
[  186.966631] sd 0:0:1:0: [sdb] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)
[  186.968692] sd 0:0:1:0: [sdb] Write Protect is off
[  186.969007] Fusion MPT misc device (ioctl) driver 3.04.20
[  186.969298] mptctl: Registered with Fusion MPT base driver
[  186.969361] mptctl: /dev/mptctl @ (major,minor=10,220)
[  186.970212] sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
[  186.970525] mousedev: PS/2 mouse device common for all mice
[  186.974788] rtc-sun4v rtc-sun4v: rtc core: registered sun4v as rtc0
[  186.976103]  sda: sda1 sda2 sda3 sda4
[  186.979675] TCP: cubic registered
[  186.979723] NET: Registered protocol family 17
[  186.979854] Key type dns_resolver registered
[  186.980828] registered taskstats version 1
[  186.983061] sd 0:0:0:0: [sda] Attached SCSI disk
[  186.985611] rtc-sun4v rtc-sun4v: setting system clock to 2014-03-23 23:19:15 UTC (1395616755)
[  187.010470]  sdb: sdb1 sdb2 sdb3
[  187.016016] sd 0:0:1:0: [sdb] Attached SCSI disk
[  187.054958] EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null)
[  187.055058] VFS: Mounted root (ext4 filesystem) readonly on device 8:4.
INIT: version 2.88 booting


[  247.865477] INFO: rcu_sched detected stalls on CPUs/tasks: { 10} (detected by 0, t=60883 jiffies, g=18446744073709551325, c=18446744073709551324, q=424)
[  247.865846] * CPU[  0]: TSTATE[0000000080001603] TPC[000000000042c174] TNPC[000000000042c178] TASK[swapper/0:0]
[  247.866181]              TPC[arch_cpu_idle+0x74/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[start_kernel+0x3b4/0x3c4]
(...)
  141.878969]   CPU[ 31]: TSTATE[0000000080001602] TPC[000000000042c170] TNPC[000000000042c174] TASK[swapper/31:0]
[  141.879105]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]
[  186.891368] scsi0 : ioc0: LSISAS1064 A3, FwRev=010a0000h, Ports=1, MaxQ=511, IRQ=25
[  186.949673] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 0, phy 0, sas_addr 0x********************
[  186.952366] scsi 0:0:0:0: Direct-Access     SEAGATE  ST973401LSUN72G  0556 PQ: 0 ANSI: 3
[  186.956645] sd 0:0:0:0: [sda] 143374738 512-byte logical blocks: (73.4 GB/68.3 GiB)

(....)

[  187.016016] sd 0:0:1:0: [sdb] Attached SCSI disk
[  187.054958] EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null)
[  187.055058] VFS: Mounted root (ext4 filesystem) readonly on device 8:4.
INIT: version 2.88 booting


[  247.865477] INFO: rcu_sched detected stalls on CPUs/tasks: { 10} (detected by 0, t=60883 jiffies, g=18446744073709551325, c=18446744073709551324, q=424)
[  247.865846] * CPU[  0]: TSTATE[0000000080001603] TPC[000000000042c174] TNPC[000000000042c178] TASK[swapper/0:0]
[  247.866181]              TPC[arch_cpu_idle+0x74/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[start_kernel+0x3b4/0x3c4]
[  247.866450]   CPU[  1]: TSTATE[0000000080001602] TPC[000000000042c170] TNPC[000000000042c174] TASK[swapper/1:0]
[  247.866758]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]
[  247.867043]   CPU[  2]: TSTATE[0000000080001602] TPC[000000000042c170] TNPC[000000000042c174] TASK[swapper/2:0]
[  247.867328]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]

(...)

[  247.878323]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]
[  247.878437]   CPU[ 31]: TSTATE[0000000080001602] TPC[000000000042c170] TNPC[000000000042c174] TASK[swapper/31:0]
[  247.878574]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]
[  248.119922] random: nonblocking pool is initialized
[  308.865618] INFO: rcu_sched detected stalls on CPUs/tasks: { 1 2 3 4 5 6 8 9 10 11 12 14 15 20 22 23 24 25 26 27 28 29 30 31} (detected by 0, t=60986 jiffies, g=18446744073709551326, c=18446744073709551325, q=423)

(...)

[  369.947353]              TPC[arch_cpu_idle+0x70/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[0x951378]

   OpenRC 0.10 is starting up Funtoo Linux (sparc64)

 * Mounting /proc ...
 [ ok ]
 * Mounting /run ...
 * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
[  430.865908] INFO: rcu_sched detected stalls on CPUs/tasks: { 1 4 8 15 16 24 25 26 27 28 29 30 31} (detected by 0, t=60651 jiffies, g=18446744073709551332, c=18446744073709551331, q=355)
[  430.866927] * CPU[  0]: TSTATE[0000000080001603] TPC[000000000042c174] TNPC[000000000042c178] TASK[swapper/0:0]
[  430.867189]              TPC[arch_cpu_idle+0x74/0xa0] O7[arch_cpu_idle+0x5c/0xa0] I7[cpu_startup_entry+0x114/0x1a0] RPC[start_kernel+0x3b4/0x3c4]

(...)

but seems to stall there forever.

With maxcpus=1 the kernel boots without any complaint like explained in a previous comment.
Comment 6 Adrien Dessemond 2014-03-25 02:40:10 UTC
Reported on my side as bug #72841, although we might have the exact same issue. Also I found a discussion here (started 2 days ago) :

http://www.spinics.net/lists/sparclinux/msg11805.html

I hope it will help you.
Comment 7 Adrien Dessemond 2014-03-25 04:04:54 UTC
May I suggest you to try with Linux 3.14-rc8? It solved the issue on my side. FYI:

https://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv3.x%2Ftesting%2Fpatch-3.14-rc8.xz;z=2363
Comment 8 dustin 2014-03-25 16:55:56 UTC
I can confirm that 3.14-rc8 fixed the issue, thanks!