Created attachment 21732 [details] SUN4D-SMP mode patch SparcServer 1000E/SUN4D machines will cause a kernel-nullpointer when running in SMP-mode with certain hw configurations. According to Sun documentation Slot A on the systemboards should be filled with cpu modules first. This will fill physical cpu-slots 0,2,4,8 with cpus first. The current smp code will then fill the init_tasks-array using the physical cpu-mapping. The scheduler on the other hand uses the logical cpu-mapping. So in a two cpu system with two systemboards cpu-slots 0 and 2 will be occupied and init_tasks-array position 0 and 2 will contain a idle_task. But the scheduler will access init_tasks position 0 and 1 which will cause an error. Any sparcserver 1000(E) system with 2 systemboards and 2 cpus, with 1 cpu per board will cause this error.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 3 Jun 2009 16:45:33 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13444 > > Summary: SparcServer 1000E SMP can cause kernel-nullpointer > with some hw configurations > Product: Platform Specific/Hardware > Version: 2.5 > Kernel Version: 2.4.37 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: SPARC32 > AssignedTo: zaitcev@yahoo.com > ReportedBy: oftedal@gmail.com > Regression: No > > > Created an attachment (id=21732) > --> (http://bugzilla.kernel.org/attachment.cgi?id=21732) > SUN4D-SMP mode patch > > SparcServer 1000E/SUN4D machines will cause a kernel-nullpointer when running > in SMP-mode with certain hw configurations. > According to Sun documentation Slot A on the systemboards should be filled > with > cpu modules first. This will fill physical cpu-slots 0,2,4,8 with cpus first. > The current smp code will then fill the init_tasks-array using the physical > cpu-mapping. > The scheduler on the other hand uses the logical cpu-mapping. > So in a two cpu system with two systemboards cpu-slots 0 and 2 will be > occupied > and init_tasks-array position 0 and 2 will contain a idle_task. But the > scheduler will access init_tasks position 0 and 1 which will cause an error. > Any sparcserver 1000(E) system with 2 systemboards and 2 cpus, with 1 cpu per > board will cause this error. > Thanks, but handling patches via bugzilla is far from preferred. Please send the patch via email, as a reply-to-all to this email. The patch should include a full description and a Signed-off-by:, as per Documentation/SubmittingPatches.
On Wed, 3 Jun 2009, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Wed, 3 Jun 2009 16:45:33 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=13444 >> >> Summary: SparcServer 1000E SMP can cause kernel-nullpointer >> with some hw configurations >> Product: Platform Specific/Hardware >> Version: 2.5 >> Kernel Version: 2.4.37 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: SPARC32 >> AssignedTo: zaitcev@yahoo.com >> ReportedBy: oftedal@gmail.com >> Regression: No >> >> >> Created an attachment (id=21732) >> --> (http://bugzilla.kernel.org/attachment.cgi?id=21732) >> SUN4D-SMP mode patch >> >> SparcServer 1000E/SUN4D machines will cause a kernel-nullpointer when >> running >> in SMP-mode with certain hw configurations. >> According to Sun documentation Slot A on the systemboards should be filled >> with >> cpu modules first. This will fill physical cpu-slots 0,2,4,8 with cpus >> first. >> The current smp code will then fill the init_tasks-array using the physical >> cpu-mapping. >> The scheduler on the other hand uses the logical cpu-mapping. >> So in a two cpu system with two systemboards cpu-slots 0 and 2 will be >> occupied >> and init_tasks-array position 0 and 2 will contain a idle_task. But the >> scheduler will access init_tasks position 0 and 1 which will cause an error. >> Any sparcserver 1000(E) system with 2 systemboards and 2 cpus, with 1 cpu >> per >> board will cause this error. >> > > Thanks, but handling patches via bugzilla is far from preferred. > > Please send the patch via email, as a reply-to-all to this email. The > patch should include a full description and a Signed-off-by:, as per > Documentation/SubmittingPatches. > > As mentioned in the bug-report(#13444) the sun4d-SMP code on 2.4-series kernels(2.4.37) uses the physical cpu-mapping when inserting tasks into the init_tasks array. And the scheduler uses the logical cpu-mapping which will cause a kernel nullpointer when there are gaps in the physical cpu-mapping. The attached patch uses the cpucount variable, which can be used to find the next logical slot in the init_tasks array. As it is only incremented when a cpu is successfully started. (I hope this was more compliant with kernel-bugfixing standards. I am new to this) Signed-off-by: Kjetil Oftedal <oftedal@gmail.com> --- diff --git a/arch/sparc/kernel/sun4d_smp.c b/arch/sparc/kernel/sun4d_smp.c index 9bb7f78..5b9b34e 100644 --- a/arch/sparc/kernel/sun4d_smp.c +++ b/arch/sparc/kernel/sun4d_smp.c @@ -221,7 +221,9 @@ void __init smp4d_boot_cpus(void) cpucount++; p = init_task.prev_task; - init_tasks[i] = p; + + /* The scheduler uses the logical cpu mapping when accessing this array */ + init_tasks[cpucount] = p; p->processor = i; p->cpus_runnable = 1 << i; /* we schedule the first task manually */
From: oftedal <oftedal@gmail.com> Date: Thu, 4 Jun 2009 01:14:33 +0200 (CEST) > As mentioned in the bug-report(#13444) the sun4d-SMP code on > 2.4-series kernels(2.4.37) uses the physical cpu-mapping when > inserting tasks into the init_tasks array. > And the scheduler uses the logical cpu-mapping which will cause a > kernel nullpointer when there are gaps in the physical cpu-mapping. > > The attached patch uses the cpucount variable, which can be used to > find the next logical slot in the init_tasks array. As it is only > incremented > when a cpu is successfully started. > > (I hope this was more compliant with kernel-bugfixing standards. I am > new to this) > > Signed-off-by: Kjetil Oftedal <oftedal@gmail.com> I don't think we're handling 2.4.x patches at this point. :-) If the bug exists in 2.6.x as well I'd be happy to review such a patch and apply it for you.
On Wed, 3 Jun 2009, David Miller wrote: > From: oftedal <oftedal@gmail.com> > Date: Thu, 4 Jun 2009 01:14:33 +0200 (CEST) > >> As mentioned in the bug-report(#13444) the sun4d-SMP code on >> 2.4-series kernels(2.4.37) uses the physical cpu-mapping when >> inserting tasks into the init_tasks array. >> And the scheduler uses the logical cpu-mapping which will cause a >> kernel nullpointer when there are gaps in the physical cpu-mapping. >> >> The attached patch uses the cpucount variable, which can be used to >> find the next logical slot in the init_tasks array. As it is only >> incremented >> when a cpu is successfully started. >> >> (I hope this was more compliant with kernel-bugfixing standards. I am >> new to this) >> >> Signed-off-by: Kjetil Oftedal <oftedal@gmail.com> > > I don't think we're handling 2.4.x patches at this point. :-) > > If the bug exists in 2.6.x as well I'd be happy to review such > a patch and apply it for you. > Yeah. 2.4.x is a "bit" old. But it was easier to fix. As the 2.6.x-series is giving me all kinds of trouble. The SMP-code was redesigned for 2.6, and I havn't been able to confirm if it works or not. I could have a go at it if someone could give me a small pointer in how to get the piggyback_32 utility working with 2.6s System.map. (looks like "A end" is missing from it)
> > Yeah. 2.4.x is a "bit" old. But it was easier to fix. As the 2.6.x-series > is giving me all kinds of trouble. The SMP-code was redesigned for 2.6, > and I havn't been able to confirm if it works or not. > I could have a go at it if someone could give me a small pointer in how to > get the piggyback_32 utility working with 2.6s System.map. > (looks like "A end" is missing from it) Just a quick guess.. Try to add: end = .; to arch/sparc/kernel/vmlinux.lds.S Do this above the definition of _end. Or better change piggyback_32 to support _end. Sam