Bug 10648

Summary: CONFIG_PRINTK_TIME broken on git HEAD ?
Product: Process Management Reporter: Rafael J. Wysocki (rjw)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: CLOSED CODE_FIX    
Severity: normal CC: a.p.zijlstra, crazy, mingo, randy.dunlap, spamtrap
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26-rc1-00166-gc0a1811 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 10492    

Description Rafael J. Wysocki 2008-05-08 14:26:01 UTC
Subject    : CONFIG_PRINTK_TIME broken on git HEAD ?
Submitter  : Gabriel C <nix.or.die@googlemail.com>
Date       : 2008-05-08 00:26
References : http://lkml.org/lkml/2008/5/7/352
Handled-By : Peter Zijlstra <a.p.zijlstra@chello.nl>

This entry is being used for tracking a regression from 2.6.25.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2008-05-08 14:27:16 UTC
Caused by:

 commit 3e51f33fcc7f55e6df25d15b55ed10c8b4da84cd
 Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
 Date:   Sat May 3 18:29:28 2008 +0200
 
     sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
     Signed-off-by: Ingo Molnar <mingo@elte.hu>
Comment 2 Martin Knoblauch 2008-05-15 09:27:03 UTC
Related to the 2.6.25 regression reported in 10656?
Comment 3 Peter Zijlstra 2008-05-15 09:49:01 UTC
Subject: sched: fix sched_clock_cpu()
From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Make sched_clock_cpu() return 0 before it has been initialised and avoid
corrupting its state due to doing so.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched_clock.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/sched_clock.c
===================================================================
--- linux-2.6.orig/kernel/sched_clock.c	2008-05-13 18:40:35.000000000 +0200
+++ linux-2.6/kernel/sched_clock.c	2008-05-13 18:42:05.000000000 +0200
@@ -60,22 +60,26 @@
 	return &per_cpu(sched_clock_data, cpu);
 }
 
+static __read_mostly int sched_clock_running;
+
 void sched_clock_init(void)
 {
 	u64 ktime_now = ktime_to_ns(ktime_get());
-	u64 now = 0;
+	unsigned long now_jiffies = jiffies;
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
 		struct sched_clock_data *scd = cpu_sdc(cpu);
 
 		scd->lock = (raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
-		scd->prev_jiffies = jiffies;
-		scd->prev_raw = now;
-		scd->tick_raw = now;
+		scd->prev_jiffies = now_jiffies;
+		scd->prev_raw = 0;
+		scd->tick_raw = 0;
 		scd->tick_gtod = ktime_now;
 		scd->clock = ktime_now;
 	}
+
+	sched_clock_running = 1;
 }
 
 /*
@@ -137,6 +141,9 @@
 	struct sched_clock_data *scd = cpu_sdc(cpu);
 	u64 now, clock;
 
+	if (unlikely(!sched_clock_running))
+		return 0ull;
+
 	/*
 	 * Normally this is not called in NMI context - but if it is,
 	 * trying to do any locking here is totally lethal.
@@ -182,6 +189,9 @@
 	struct sched_clock_data *scd = this_scd();
 	u64 now, now_gtod;
 
+	if (unlikely(!sched_clock_running))
+		return;
+
 	WARN_ON_ONCE(!irqs_disabled());
 
 	now = sched_clock();
Comment 4 Martin Knoblauch 2008-05-19 04:32:02 UTC
Add myself dur to similarity to 10656.
Comment 5 Rafael J. Wysocki 2008-05-20 15:12:19 UTC
Regressions list annotation:
Handled-By : Peter Zijlstra <peterz@infradead.org>
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=10648#c3
Comment 6 Gabriel C 2008-05-20 17:50:50 UTC
On what tree does the patch apply ? I get rejects on 2.6.26-rc3-00154-gc110a2b
Comment 7 Martin Knoblauch 2008-06-02 06:49:06 UTC
The timestamps seem to be monotonic in 2.6.26-rc4, but the numbers are still bogus, not showing a relationship to "seconds after bootup":

[    0.000000/0] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000/0] Using ACPI (MADT) for SMP configuration information
[    0.000000/0] Allocating PCI resources starting at c2000000 (gap: c0000000:20000000)
[    0.000000/0] PERCPU: Allocating 429616 bytes of per cpu data
[    0.000000/0] NR_CPUS: 8, nr_cpu_ids: 4
[17180642.907074/0] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2038662
[17180642.907074/0] Policy zone: Normal
[17180642.907074/0] Kernel command line: ro root=/dev/sda3 rhgb quiet
[17180642.907074/0] Initializing CPU#0
[17180642.907074/0] PID hash table entries: 4096 (order: 12, 32768 bytes)
[17180642.907074/0] TSC calibrated against PM_TIMER
[17180642.907074/0] time.c: Detected 2327.497 MHz processor.
[17180642.907074/0] Console: colour VGA+ 80x25
[17180642.907074/0] console [tty0] enabled
[17180642.907074/0] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[17180642.907074/0] ... MAX_LOCKDEP_SUBCLASSES:    8
[17180642.907074/0] ... MAX_LOCK_DEPTH:          48
[17180642.907074/0] ... MAX_LOCKDEP_KEYS:        2048
[17180642.907074/0] ... CLASSHASH_SIZE:           1024
[17180642.907074/0] ... MAX_LOCKDEP_ENTRIES:     8192
[17180642.907074/0] ... MAX_LOCKDEP_CHAINS:      16384
[17180642.907074/0] ... CHAINHASH_SIZE:          8192
[17180642.907074/0]  memory used by lock dependency info: 1712 kB
[17180642.907074/0]  per task-struct memory footprint: 3456 bytes
[17180642.907074/0] Checking aperture...
[17180642.907074/0] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[17180642.907074/0] Placing software IO TLB between 0x11af000 - 0x51af000
[17180642.907074/0] Memory: 8110356k/9437184k available (2977k kernel code, 277632k reserved, 1585k data, 872k init)
[17180642.907074/0] CPA: page pool initialized 1 of 1 pages preallocated
[17180642.907074/0] hpet clockevent registered
Comment 8 Martin Knoblauch 2008-06-03 00:35:37 UTC
*** Bug 10656 has been marked as a duplicate of this bug. ***
Comment 9 Gabriel C 2008-06-03 01:15:05 UTC
This is fixed by commit a381759d6ad5c5dea5a981918e0b4493e9b66ac7 , at least for me.
Comment 10 Martin Knoblauch 2008-06-03 01:20:49 UTC
Maybe I am just naive, but shouldn't "sched_clock()" and "cpu_clock(cpu-num)" return *roughly* the same numbers? If I just call "sched_clock()" in the printk routine, the timestamps are as expected.
Comment 11 Peter Zijlstra 2008-06-03 02:02:33 UTC
On Tue, 2008-06-03 at 01:20 -0700, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=10648

> ------- Comment #10 from spamtrap@knobisoft.de  2008-06-03 01:20 -------
> Maybe I am just naive, but shouldn't "sched_clock()" and "cpu_clock(cpu-num)"
> return *roughly* the same numbers? If I just call "sched_clock()" in the
> printk
> routine, the timestamps are as expected.

Depends on your hardware - x86 TSC can do the weirdest things ;-)

Anyway - in recent kernels cpu_clock() works like you expect.
Comment 12 Martin Knoblauch 2008-06-03 02:13:48 UTC
OK, forget my last comment. Caffeine level to low. -rc4-git5 looks a lot better now. The numbers smell like "seconds after bootup", but are running about 33% to fast. The timestamps of below dmesg entries are supposed to be 5 seconds apart:

[ 2019.973255/2] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks
[ 2027.090397/1] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks
[ 2033.957576/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks
[ 2040.590016/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks
Comment 13 Peter Zijlstra 2008-06-03 02:21:18 UTC
On Tue, 2008-06-03 at 02:13 -0700, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=10648

> ------- Comment #12 from spamtrap@knobisoft.de  2008-06-03 02:13 -------
> OK, forget my last comment. Caffeine level to low. -rc4-git5 looks a lot
> better
> now. The numbers smell like "seconds after bootup", but are running about 33%
> to fast. The timestamps of below dmesg entries are supposed to be 5 seconds
> apart:
> 
> [ 2019.973255/2] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> showTasks Unmount shoW-blocked-tasks
> [ 2027.090397/1] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> showTasks Unmount shoW-blocked-tasks
> [ 2033.957576/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> showTasks Unmount shoW-blocked-tasks
> [ 2040.590016/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> showTasks Unmount shoW-blocked-tasks

How did you generate these?

something like:

while sleep 5 ; do echo h > /proc/sysrq-trigger; done

Does

while sleep 5; do date; done

Show similar results?
Comment 14 Martin Knoblauch 2008-06-03 02:38:54 UTC
(In reply to comment #13)
> On Tue, 2008-06-03 at 02:13 -0700, bugme-daemon@bugzilla.kernel.org
> wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=10648
> 
> > ------- Comment #12 from spamtrap@knobisoft.de  2008-06-03 02:13 -------
> > OK, forget my last comment. Caffeine level to low. -rc4-git5 looks a lot
> better
> > now. The numbers smell like "seconds after bootup", but are running about
> 33%
> > to fast. The timestamps of below dmesg entries are supposed to be 5 seconds
> > apart:
> > 
> > [ 2019.973255/2] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> > showTasks Unmount shoW-blocked-tasks
> > [ 2027.090397/1] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> > showTasks Unmount shoW-blocked-tasks
> > [ 2033.957576/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> > showTasks Unmount shoW-blocked-tasks
> > [ 2040.590016/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync
> > showTasks Unmount shoW-blocked-tasks
> 
> How did you generate these?
> 
> something like:
> 
> while sleep 5 ; do echo h > /proc/sysrq-trigger; done
> 
> Does
> 
> while sleep 5; do date; done
> 
> Show similar results?
> 

 I am doing:

while sleep 5; do date; echo h > /proc/sysrq-trigger ; done
Tue Jun  3 11:33:41 CEST 2008
Tue Jun  3 11:33:46 CEST 2008
Tue Jun  3 11:33:51 CEST 2008
Tue Jun  3 11:33:56 CEST 2008
Tue Jun  3 11:34:01 CEST 2008


 So, "sleep 5" works as expected. It is the output of cpu_clock that is running fast. Actually by 47% over 50 minutes uptime.
Comment 15 Martin Knoblauch 2008-06-03 02:59:33 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > On Tue, 2008-06-03 at 02:13 -0700, bugme-daemon@bugzilla.kernel.org
> > wrote:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=10648
> > 
> > > ------- Comment #12 from spamtrap@knobisoft.de  2008-06-03 02:13 -------
> > > OK, forget my last comment. Caffeine level to low. -rc4-git5 looks a lot
> better
> > > now. The numbers smell like "seconds after bootup", but are running about
> 33%
> > > to fast. The timestamps of below dmesg entries are supposed to be 5
> seconds
> > > apart:
> > > 
> > > [ 2019.973255/2] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw
> Sync
> > > showTasks Unmount shoW-blocked-tasks
> > > [ 2027.090397/1] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw
> Sync
> > > showTasks Unmount shoW-blocked-tasks
> > > [ 2033.957576/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw
> Sync
> > > showTasks Unmount shoW-blocked-tasks
> > > [ 2040.590016/0] SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm
> Full
> > > kIll saK aLlcpus showMem Nice powerOff showPc show-all-timers(Q) unRaw
> Sync
> > > showTasks Unmount shoW-blocked-tasks
> > 
> > How did you generate these?
> > 
> > something like:
> > 
> > while sleep 5 ; do echo h > /proc/sysrq-trigger; done
> > 
> > Does
> > 
> > while sleep 5; do date; done
> > 
> > Show similar results?
> > 
> 
>  I am doing:
> 
> while sleep 5; do date; echo h > /proc/sysrq-trigger ; done
> Tue Jun  3 11:33:41 CEST 2008
> Tue Jun  3 11:33:46 CEST 2008
> Tue Jun  3 11:33:51 CEST 2008
> Tue Jun  3 11:33:56 CEST 2008
> Tue Jun  3 11:34:01 CEST 2008
> 
> 
>  So, "sleep 5" works as expected. It is the output of cpu_clock that is
>  running
> fast. Actually by 47% over 50 minutes uptime.
> 

 Just to repeat: if I use sched_clock as the source for the printk timestamps, they are 5 seconds apart, as expected.
Comment 17 Martin Knoblauch 2008-06-06 03:16:47 UTC
Just for the record :-) IMO the problem is *not* completely fixed. The timestamps are monotonic now, but are running to fast. This is a regression from 2.6.24 behaviour.
Comment 18 Martin Knoblauch 2008-06-09 04:16:58 UTC
FWIW: even if the timestamps are not what *I* expect, the patch should be backported to 2.6.25-stable to fix the jumping.