Bug 80251
Summary: | Crash early in boot that is likely scheduler related. | ||
---|---|---|---|
Product: | Process Management | Reporter: | Bruno Wolff III (bruno) |
Component: | Scheduler | Assignee: | Ingo Molnar (mingo) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 3.16 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Config file used to build from first bad commit
lshw output from the machine exhibiting the problem lshw from another i686 machine that doesn't exhibit the problem /proc/cpuinfo /proc/sys/kernel/sched_domain/cpu*/domain*/* /proc/schedstat output Boot picture dmesg output with Peter's debug patch and earlypr intk=keepsched_debug dmesg output with Peter's updated debug patch and earlyprintk=keep sched_debug cpuid output cpuid -r output dmesg output with latest test patches The previous dmesg output had these differences from 3.16-rc6 |
Description
Bruno Wolff III
2014-07-15 04:38:34 UTC
caffcdd8d27ba78730d5540396ce72ad022aff2c is the first bad commit commit caffcdd8d27ba78730d5540396ce72ad022aff2c Author: Dietmar Eggemann <Dietmar.Eggemann@arm.com> Date: Wed Apr 30 14:39:38 2014 +0100 sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups() There is no need to zero struct sched_group member cpumask and struct sched_group_power member power since both structures are already allocated as zeroed memory in __sdt_alloc(). This patch has been tested with BUG_ON(!cpumask_empty(sched_group_cpus(sg))); and BUG_ON(sg->sgp->power); in build_sched_groups() on ARM TC2 and INTEL i5 M520 platform including CPU hotplug scenarios. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1398865178-12577-1-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> :040000 040000 8d78e3f468e8bd4a51ba53750ca53d16583e4b53 d42eabda6d8a22ec6ee830a739aa7ac408883184 M kernel git bisect start # good: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15 git bisect good 1860e379875dfe7271c649058aeddffe5afd9d0d # good: [fad01e866afdbe01a1f3ec06a39c3a8b9e197014] Linux 3.15-rc8 git bisect good fad01e866afdbe01a1f3ec06a39c3a8b9e197014 # bad: [7171511eaec5bf23fb06078f59784a3a0626b38f] Linux 3.16-rc1 git bisect bad 7171511eaec5bf23fb06078f59784a3a0626b38f # bad: [aaeb2554337217dfa4eac2fcc90da7be540b9a73] Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media into next git bisect bad aaeb2554337217dfa4eac2fcc90da7be540b9a73 # good: [5142c33ed86acbcef5c63a63d2b7384b9210d39f] Merge tag 'staging-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging into next git bisect good 5142c33ed86acbcef5c63a63d2b7384b9210d39f # bad: [b05d59dfceaea72565b1648af929b037b0f96d7f] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next git bisect bad b05d59dfceaea72565b1648af929b037b0f96d7f # good: [e13cccfd86481bd4c0499577f44c570d334da79b] Merge tag 'spi-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into next git bisect good e13cccfd86481bd4c0499577f44c570d334da79b # good: [3d521f9151dacab566904d1f57dcb3e7080cdd8f] Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next git bisect good 3d521f9151dacab566904d1f57dcb3e7080cdd8f # good: [f82393426afb7c82f7618b3b4e440d8dd2b40c08] MIPS: KVM: Add master disable count interface git bisect good f82393426afb7c82f7618b3b4e440d8dd2b40c08 # bad: [4aef77b2fe373cdba461925589b9d1d4468ee016] Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next git bisect bad 4aef77b2fe373cdba461925589b9d1d4468ee016 # good: [3944a9274ef6cda0cc282daf0739832f661670f7] sched: Fix exec_start/task_hot on migrated tasks git bisect good 3944a9274ef6cda0cc282daf0739832f661670f7 # bad: [3d1a3bda65d2f48fead6f0727f2f392c15206852] Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next git bisect bad 3d1a3bda65d2f48fead6f0727f2f392c15206852 # bad: [a803f0261bb2bb57aab5542af3174db43b2a3887] sched: Initialize rq->age_stamp on processor start git bisect bad a803f0261bb2bb57aab5542af3174db43b2a3887 # good: [c515db8cd311ef77b2dc7cbd6b695022655bb0f3] sched/numa: Fix initialization of sched_domain_topology for NUMA git bisect good c515db8cd311ef77b2dc7cbd6b695022655bb0f3 # bad: [52a08ef1f13a11289c9e18cd4cfb4e51c024058b] sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance() git bisect bad 52a08ef1f13a11289c9e18cd4cfb4e51c024058b # bad: [a9467fa3cd2d5bf39e7cb7d0706d29d7ef4df212] sched: Use clamp() and clamp_val() to make sys_nice() more readable git bisect bad a9467fa3cd2d5bf39e7cb7d0706d29d7ef4df212 # bad: [caffcdd8d27ba78730d5540396ce72ad022aff2c] sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups() git bisect bad caffcdd8d27ba78730d5540396ce72ad022aff2c # first bad commit: [caffcdd8d27ba78730d5540396ce72ad022aff2c] sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups() Created attachment 143211 [details]
Config file used to build from first bad commit
Created attachment 143221 [details]
lshw output from the machine exhibiting the problem
Created attachment 143231 [details]
lshw from another i686 machine that doesn't exhibit the problem
A simple revert (against commit 1795cd9b3a91d4b5473c97f491d63892442212ab) didn't build: kernel/sched/core.c: In function ‘build_sched_groups’: kernel/sched/core.c:5851:5: error: ‘struct sched_group’ has no member named ‘sgp’ sg->sgp->power = 0; ^ scripts/Makefile.build:257: recipe for target 'kernel/sched/core.o' failed I have been using gcc-4.9.0 to do kernel builds. Adding back just the cpumask_clear(sched_group_cpus(sg)) (to rc5) gets things working again. git diff v3.16-rc5 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3bdf01b..7c3674d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5847,6 +5847,7 @@ build_sched_groups(struct sched_domain *sd, int cpu) continue; group = get_group(i, sdd, &sg); + cpumask_clear(sched_group_cpus(sg)); cpumask_setall(sched_group_mask(sg)); Created attachment 143261 [details]
/proc/cpuinfo
Created attachment 143311 [details]
/proc/sys/kernel/sched_domain/cpu*/domain*/*
Created attachment 143321 [details]
/proc/schedstat output
This is from 3.16-rc5 with the following diff:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3bdf01b..21ba65c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5847,6 +5847,10 @@ build_sched_groups(struct sched_domain *sd, int cpu)
continue;
group = get_group(i, sdd, &sg);
+ cpumask_clear(sched_group_cpus(sg));
+ sg->sgc->capacity = 0;
+ BUG_ON(!cpumask_empty(sched_group_cpus(sg)));
+ BUG_ON(sg->sgc->capacity);
cpumask_setall(sched_group_mask(sg));
for_each_cpu(j, span) {
Created attachment 143331 [details]
Boot picture
The picture DSCN1530.JPG shows bug on output triggered at 5850 which is: BUG_ON(!cpumask_empty(sched_group_cpus(sg)));
Created attachment 143361 [details]
dmesg output with Peter's debug patch and earlypr intk=keepsched_debug
Created attachment 143381 [details]
dmesg output with Peter's updated debug patch and earlyprintk=keep sched_debug
Created attachment 143871 [details]
cpuid output
Created attachment 143881 [details]
cpuid -r output
Created attachment 143961 [details]
dmesg output with latest test patches
Created attachment 143971 [details]
The previous dmesg output had these differences from 3.16-rc6
This last test was a success. Peter is going to formally write up a patch and send it to Linus. I'll test the formal patch when it shows up. This is in 3.16-rc7 as commit 2a2261553dd1472ca574acadbd93e12f44c4e6d5. |