Bug 11209
Summary: | 2.6.27-rc1 process time accounting | ||
---|---|---|---|
Product: | Process Management | Reporter: | Rafael J. Wysocki (rjw) |
Component: | Scheduler | Assignee: | Ingo Molnar (mingo) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | a.p.zijlstra, alistair, andi-bz, avi, grant_lkml, jonsmirl, ole, parag.lkml, xhejtman |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 11167 | ||
Attachments: | full .config of problem system |
Description
Rafael J. Wysocki
2008-07-31 14:36:08 UTC
Handled-By : Peter Zijlstra <a.p.zijlstra@chello.nl> http://article.gmane.org/gmane.linux.kernel/717899 is most likely the same thing. Adding rant_lkml <at> dodo.com.au . On Thu, 2008-08-07 at 16:56 -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11209 > > > parag.warudkar@gmail.com changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |grant_lkml@dodo.com.au > > > > > ------- Comment #2 from parag.warudkar@gmail.com 2008-08-07 16:56 ------- > http://article.gmane.org/gmane.linux.kernel/717899 is most likely the same > thing. Adding rant_lkml <at> dodo.com.au . most likely related to sched_clock breakage, please try the revert from bug 11269 On Fri, 2008-08-08 at 03:39 -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11209 > > > > > > ------- Comment #3 from a.p.zijlstra@chello.nl 2008-08-08 03:39 ------- > On Thu, 2008-08-07 at 16:56 -0700, bugme-daemon@bugzilla.kernel.org > wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=11209 > > > > > > parag.warudkar@gmail.com changed: > > > > What |Removed |Added > > > ---------------------------------------------------------------------------- > > CC| |grant_lkml@dodo.com.au > > > > > > > > > > ------- Comment #2 from parag.warudkar@gmail.com 2008-08-07 16:56 ------- > > http://article.gmane.org/gmane.linux.kernel/717899 is most likely the same > > thing. Adding rant_lkml <at> dodo.com.au . > > > most likely related to sched_clock breakage, please try the revert from > bug 11269 Bah, wrong number - let me get a patch Does this fix? --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 5270d44..ea436bc 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1572,28 +1572,13 @@ static inline void sched_clock_idle_sleep_event(void) static inline void sched_clock_idle_wakeup_event(u64 delta_ns) { } - -#ifdef CONFIG_NO_HZ -static inline void sched_clock_tick_stop(int cpu) -{ -} - -static inline void sched_clock_tick_start(int cpu) -{ -} -#endif - -#else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */ +#else extern void sched_clock_init(void); extern u64 sched_clock_cpu(int cpu); extern void sched_clock_tick(void); extern void sched_clock_idle_sleep_event(void); extern void sched_clock_idle_wakeup_event(u64 delta_ns); -#ifdef CONFIG_NO_HZ -extern void sched_clock_tick_stop(int cpu); -extern void sched_clock_tick_start(int cpu); #endif -#endif /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */ /* * For kernel-internal use: high-speed (but slightly incorrect) per-cpu diff --git a/kernel/Kconfig.hz b/kernel/Kconfig.hz index 382dd5a..94fabd5 100644 --- a/kernel/Kconfig.hz +++ b/kernel/Kconfig.hz @@ -55,4 +55,4 @@ config HZ default 1000 if HZ_1000 config SCHED_HRTICK - def_bool HIGH_RES_TIMERS && USE_GENERIC_SMP_HELPERS + def_bool HIGH_RES_TIMERS && (!SMP || USE_GENERIC_SMP_HELPERS) diff --git a/kernel/sched.c b/kernel/sched.c index 04160d2..f014194 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -834,7 +834,7 @@ static inline u64 global_rt_period(void) static inline u64 global_rt_runtime(void) { - if (sysctl_sched_rt_period < 0) + if (sysctl_sched_rt_runtime < 0) return RUNTIME_INF; return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC; diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c index 22ed55d..857a129 100644 --- a/kernel/sched_clock.c +++ b/kernel/sched_clock.c @@ -32,14 +32,18 @@ #include <linux/ktime.h> #include <linux/module.h> +/* + * Scheduler clock - returns current time in nanosec units. + * This is default implementation. + * Architectures and sub-architectures can override this. + */ +unsigned long long __attribute__((weak)) sched_clock(void) +{ + return (unsigned long long)jiffies * (NSEC_PER_SEC / HZ); +} #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK -#define MULTI_SHIFT 15 -/* Max is double, Min is 1/2 */ -#define MAX_MULTI (2LL << MULTI_SHIFT) -#define MIN_MULTI (1LL << (MULTI_SHIFT-1)) - struct sched_clock_data { /* * Raw spinlock - this is a special case: this might be called @@ -49,14 +53,9 @@ struct sched_clock_data { raw_spinlock_t lock; unsigned long tick_jiffies; - u64 prev_raw; u64 tick_raw; u64 tick_gtod; u64 clock; - s64 multi; -#ifdef CONFIG_NO_HZ - int check_max; -#endif }; static DEFINE_PER_CPU_SHARED_ALIGNED(struct sched_clock_data, sched_clock_data); @@ -84,90 +83,39 @@ void sched_clock_init(void) scd->lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED; scd->tick_jiffies = now_jiffies; - scd->prev_raw = 0; scd->tick_raw = 0; scd->tick_gtod = ktime_now; scd->clock = ktime_now; - scd->multi = 1 << MULTI_SHIFT; -#ifdef CONFIG_NO_HZ - scd->check_max = 1; -#endif } sched_clock_running = 1; } -#ifdef CONFIG_NO_HZ -/* - * The dynamic ticks makes the delta jiffies inaccurate. This - * prevents us from checking the maximum time update. - * Disable the maximum check during stopped ticks. - */ -void sched_clock_tick_stop(int cpu) -{ - struct sched_clock_data *scd = cpu_sdc(cpu); - - scd->check_max = 0; -} - -void sched_clock_tick_start(int cpu) -{ - struct sched_clock_data *scd = cpu_sdc(cpu); - - scd->check_max = 1; -} - -static int check_max(struct sched_clock_data *scd) -{ - return scd->check_max; -} -#else -static int check_max(struct sched_clock_data *scd) -{ - return 1; -} -#endif /* CONFIG_NO_HZ */ - /* * update the percpu scd from the raw @now value * * - filter out backward motion * - use jiffies to generate a min,max window to clip the raw values */ -static void __update_sched_clock(struct sched_clock_data *scd, u64 now, u64 *time) +static u64 __update_sched_clock(struct sched_clock_data *scd, u64 now) { unsigned long now_jiffies = jiffies; long delta_jiffies = now_jiffies - scd->tick_jiffies; u64 clock = scd->clock; u64 min_clock, max_clock; - s64 delta = now - scd->prev_raw; + s64 delta = now - scd->tick_raw; WARN_ON_ONCE(!irqs_disabled()); - - /* - * At schedule tick the clock can be just under the gtod. We don't - * want to push it too prematurely. - */ - min_clock = scd->tick_gtod + (delta_jiffies * TICK_NSEC); - if (min_clock > TICK_NSEC) - min_clock -= TICK_NSEC / 2; + min_clock = scd->tick_gtod + delta_jiffies * TICK_NSEC; if (unlikely(delta < 0)) { clock++; goto out; } - /* - * The clock must stay within a jiffie of the gtod. - * But since we may be at the start of a jiffy or the end of one - * we add another jiffy buffer. - */ - max_clock = scd->tick_gtod + (2 + delta_jiffies) * TICK_NSEC; - - delta *= scd->multi; - delta >>= MULTI_SHIFT; + max_clock = min_clock + TICK_NSEC; - if (unlikely(clock + delta > max_clock) && check_max(scd)) { + if (unlikely(clock + delta > max_clock)) { if (clock < max_clock) clock = max_clock; else @@ -180,12 +128,10 @@ static void __update_sched_clock(struct sched_clock_data *scd, u64 now, u64 *tim if (unlikely(clock < min_clock)) clock = min_clock; - if (time) - *time = clock; - else { - scd->prev_raw = now; - scd->clock = clock; - } + scd->tick_jiffies = now_jiffies; + scd->clock = clock; + + return clock; } static void lock_double_clock(struct sched_clock_data *data1, @@ -222,33 +168,24 @@ u64 sched_clock_cpu(int cpu) lock_double_clock(scd, my_scd); - now -= my_scd->tick_raw; - now += scd->tick_raw; - - now += my_scd->tick_gtod; - now -= scd->tick_gtod; + now += scd->tick_raw - my_scd->tick_raw; + now += my_scd->tick_gtod - scd->tick_gtod; __raw_spin_unlock(&my_scd->lock); - - __update_sched_clock(scd, now, &clock); - - __raw_spin_unlock(&scd->lock); - } else { __raw_spin_lock(&scd->lock); - __update_sched_clock(scd, now, NULL); - clock = scd->clock; - __raw_spin_unlock(&scd->lock); } + clock = __update_sched_clock(scd, now); + + __raw_spin_unlock(&scd->lock); + return clock; } void sched_clock_tick(void) { struct sched_clock_data *scd = this_scd(); - unsigned long now_jiffies = jiffies; - s64 mult, delta_gtod, delta_raw; u64 now, now_gtod; if (unlikely(!sched_clock_running)) @@ -260,29 +197,14 @@ void sched_clock_tick(void) now = sched_clock(); __raw_spin_lock(&scd->lock); - __update_sched_clock(scd, now, NULL); + __update_sched_clock(scd, now); /* * update tick_gtod after __update_sched_clock() because that will * already observe 1 new jiffy; adding a new tick_gtod to that would * increase the clock 2 jiffies. */ - delta_gtod = now_gtod - scd->tick_gtod; - delta_raw = now - scd->tick_raw; - - if ((long)delta_raw > 0) { - mult = delta_gtod << MULTI_SHIFT; - do_div(mult, delta_raw); - scd->multi = mult; - if (scd->multi > MAX_MULTI) - scd->multi = MAX_MULTI; - else if (scd->multi < MIN_MULTI) - scd->multi = MIN_MULTI; - } else - scd->multi = 1 << MULTI_SHIFT; - scd->tick_raw = now; scd->tick_gtod = now_gtod; - scd->tick_jiffies = now_jiffies; __raw_spin_unlock(&scd->lock); } @@ -310,9 +232,7 @@ void sched_clock_idle_wakeup_event(u64 delta_ns) * rq clock: */ __raw_spin_lock(&scd->lock); - scd->prev_raw = now; scd->clock += delta_ns; - scd->multi = 1 << MULTI_SHIFT; __raw_spin_unlock(&scd->lock); touch_softlockup_watchdog(); @@ -321,16 +241,6 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event); #endif -/* - * Scheduler clock - returns current time in nanosec units. - * This is default implementation. - * Architectures and sub-architectures can override this. - */ -unsigned long long __attribute__((weak)) sched_clock(void) -{ - return (unsigned long long)jiffies * (NSEC_PER_SEC / HZ); -} - unsigned long long cpu_clock(int cpu) { unsigned long long clock; diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index cf2cd6c..0fe94ea 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -899,7 +899,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p) * doesn't make sense. Rely on vruntime for fairness. */ if (rq->curr != p) - delta = max(10000LL, delta); + delta = max_t(s64, 10000LL, delta); hrtick_start(rq, delta); } diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 825b4c0..f5da526 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -289,7 +289,6 @@ void tick_nohz_stop_sched_tick(int inidle) ts->tick_stopped = 1; ts->idle_jiffies = last_jiffies; rcu_enter_nohz(); - sched_clock_tick_stop(cpu); } /* @@ -392,7 +391,6 @@ void tick_nohz_restart_sched_tick(void) select_nohz_load_balancer(0); now = ktime_get(); tick_do_update_jiffies64(now); - sched_clock_tick_start(cpu); cpu_clear(cpu, nohz_cpu_mask); /* patch does not apply :( this one seems to be fixed in -rc2. I didn't see wrong time for couple of days. >
> ------- Comment #6 from xhejtman@fi.muni.cz 2008-08-12 13:37 -------
> this one seems to be fixed in -rc2. I didn't see wrong time for couple of
> days.
>
Nope - this is from latest git.
parag@parag-desktop:~$ uname -a
Linux parag-desktop 2.6.27-rc2 #4 SMP Mon Aug 11 20:06:12 EDT 2008
i686 GNU/Linux
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7172 root 20 0 2952 684 548 S 958 0.0 1093:54 kdm
On Sunday, 17 of August 2008, Peter Zijlstra wrote:
> On Sat, 2008-08-16 at 21:02 +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.26. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11209
> > Subject : 2.6.27-rc1 process time accounting
> > Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
> > Date : 2008-07-31 10:43 (17 days old)
> > References : http://marc.info/?l=linux-kernel&m=121750102917490&w=4
> > Handled-By : Peter Zijlstra <a.p.zijlstra@chello.nl>
>
>
> Should be fixed by: e26b33e9552c29c1d3fe67dc602c6264c29f5dc7
Not fixed, please reopen. Just saw 9999% usage for Firefox. Ubuntu 8.04.1 / x86, kernel - 2.6.27-rc3. So that commit e26b33e9552c29c1d3fe67dc602c6264c29f5dc7 at least did not fix it. (It's not that blatantly obviously wrong - but still once in a while it clearly is although there is no telling if the seemingly correct values are correct. Yikes.) With 2.6.24-19 (The Ubuntu default kernel) ksoftirqd does not consume any CPU even after the system has run for a while with various activities. With 2.6.27-rc3 it routinely shows up as consuming 52% and gains quite some CPU time. Same story with hald-addon-storage - with .24 - it never shows up in top. With -rc3 it is quite frequently using more than 28% CPU. Something isn't quite right. Me no likey this. I'm still seeing this on 2.6.27-rc5. /proc/$PID/stat is randomly garbage for 2 or so kernel threads per boot. Me too. v2.6.27-rc4-216-g49fadb5 on x86_64 (Core 2 Duo). NO_HZ, hrtimers, HPET and stable TSC for clocksource. $ ps aux | grep "[^ ][0-9][0-9][:]" root 2 92292178 0.0 0 0 ? S< 12:24 21114580:11 [kthreadd] root 3049 103014463 0.0 0 0 ? S< 12:24 21114580:11 [md_d1_raid1] It's not limited to kernel threads. I guess it's just more likely to happen to long lived processes. $ uptime 15:18:04 up 56 min, 2 users, load average: 0.30, 0.17, 0.20 $ ps aux | grep "[^ ][0-9][0-9][:]" root 2 116264927 0.0 0 0 ? S< 14:21 21114580:11 [kthreadd] alan 7598 127853454 0.7 175644 16132 ? S 14:22 24205791:03 kdesktop [kdeinit] alan 7814 123955781 0.5 145816 11840 ? S 14:22 21111716:44 kttsd On Sunday, 7 of September 2008, Peter Zijlstra wrote:
> On Sat, 2008-09-06 at 23:30 +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.26. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11209
> > Subject : 2.6.27-rc1 process time accounting
> > Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
> > Date : 2008-07-31 10:43 (38 days old)
> > References : http://marc.info/?l=linux-kernel&m=121750102917490&w=4
> > Handled-By : Peter Zijlstra <a.p.zijlstra@chello.nl>
>
> Should be fixed in the latest -git
Fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=49048622eae698e5c4ae61f7e71200f265ccc529 Fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=56c7426b3951e4f35a71d695f1c982989399d6fd *** Bug 11553 has been marked as a duplicate of this bug. *** Jon Smirl reports* that this is still not fixed. Please re-open. [*] http://lkml.org/lkml/2008/9/30/199 Done. References : http://lkml.org/lkml/2008/9/30/199 On 2.6.27-rc8 root 8 86736080 0.0 0 0 ? S< 08:17 21133795:38 [watchdog/1] Still there on 2.6.27-rc9. Q6600 64b root 7 289719524 0.0 0 0 ? S< 08:05 21133795:38 [ksoftirqd/1] Still okay here with -rc9 with tickless turned off I have tickless off too. # # Processor type and features # CONFIG_TICK_ONESHOT=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set okay, my .config is in http://bugsplatter.id.au/kernel/boxen/pooh/ -- slack-12.1 on Core2Duo with 4GB in 32-bit mode. Grant, you are 32b. I am running 64b. That is a major difference. Okay Jon, I rebooted into slamd64-12.1 with -rc9, after a minute 'top' is okay, I'll check after some hours and report if there's oddness or not. Info, config + dmesg: http://bugsplatter.id.au/kernel/boxen/pooh64/ Created attachment 18205 [details]
full .config of problem system
I get the bogus time immediately on booting. Or least as fast as X comes up and I can get to a window. It stays and never goes away. It is not always on ksoftirq, sometimes is it is one watchdog. Here's a clue: This is from this morning: root 7 289719524 0.0 0 0 ? S< 08:05 21133795:38 [ksoftirqd/1] And now: root 7 39547874 0.0 0 0 ? S< 08:05 21133795:38 [ksoftirqd/1] Note that the time has not changed, but the %CPU has. A few seconds later: root 7 39322796 0.0 0 0 ? S< 08:05 21133795:38 [ksoftirqd/1] Again: root 7 39286787 0.0 0 0 ? S< 08:05 21133795:38 [ksoftirqd/1] root 7 39259197 0.0 0 0 ? S< 08:05 21133795:38 [ksoftirqd/1] %CPU is counting down at a very rapid pace still nothing odd here On Tue, 2008-10-07 at 15:23 -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11209 > > > > > > ------- Comment #26 from gcoady.lk@gmail.com 2008-10-07 15:23 ------- > Okay Jon, I rebooted into slamd64-12.1 with -rc9, after a minute 'top' is > okay, > I'll check after some hours and report if there's oddness or not. > Info, config + dmesg: http://bugsplatter.id.au/kernel/boxen/pooh64/ Just to be sure, People see this abnormality only on 32bit? (with and without NO_HZ=y?) No, Jon is seeing it on 64bit, I'm not seeing it any more. I'm seeing it on rc9 I see this on 64 bit, 2.6.27.3, without NO_HZ but with the high resolution timers turned on. With a day of runtime without CONFIG_HIGH_RES_TIMERS I have been unable to reproduce it. Notify-Also : Andi Kleen <andi@firstfloor.org> Notify-Also : Michael Lyle <mlyle@translattice.com> No response from the reporters, so the issue appears to be fixed. Closing. |