Bug 8027

Summary: bogomips reports incorrectly and gettimeofday increments too slowly
Product: Timers Reporter: David L (idht4n)
Component: OtherAssignee: Thomas Gleixner (tglx)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: bunk, john.stultz
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21-rc1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmesg output
config.gz
pit setup fix
dmesg output from a kernel with no timing problem in response to commeng #32
patch emailed to me by tglx - seemed to fix timing problem

Description David L 2007-02-17 09:12:28 UTC
I'm new to using the real-time patches and I'm seeing
some things that seem odd to me.  Namely, the
"bogomips" listed in /proc/cpuinfo is 7324.40 for a 266MHz
processor instead of the 538.65 value that it has without
the real-time patch.  Also, calls to gettimeofday return
times that increment at ~1/160 of the actual elapsed time.

Here's the processor info:
processor       : 0
vendor_id       : CyrixInstead
cpu family      : 5
model           : 9
model name      : Unknown
stepping        : 1
cpu MHz         : 266.708
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu tsc msr cx8 cmov mmx cxmmx
bogomips        : 538.70
clflush size    : 32
Comment 1 Thomas Gleixner 2007-02-17 11:50:13 UTC
Can you please send a full boot log ?
Comment 2 David L 2007-02-17 18:12:18 UTC
Created attachment 10444 [details]
dmesg output
Comment 3 David L 2007-02-17 18:20:57 UTC
*** Bug 8029 has been marked as a duplicate of this bug. ***
Comment 4 David L 2007-02-17 18:21:33 UTC
*** Bug 8030 has been marked as a duplicate of this bug. ***
Comment 5 David L 2007-02-18 09:09:21 UTC
Here's some more info about this problem:

usleep 100000 takes about 42 seconds to execute.
During that period, about 751 timer interrupts occur.
CONFIG_HZ=250, so I think ~25 timer interrupts should
have accumulated during a 0.1 seconds sleep.  But since it
took 42 seconds instead of 0.1 seconds, about 10500
should have accumulated.  It's odd that it's not
consistent with either.

BTW, sorry about the duplicate bug reports 8029 and 8030...
I think I refreshed my browser window on the original
bug report page instead of the "show bug" page where I
though I was.
Comment 6 David L 2007-02-19 07:22:39 UTC
Created attachment 10461 [details]
config.gz
Comment 7 David L 2007-02-20 13:56:02 UTC
I've experimented with different config settings in the 2.6.20-rt8 kernel, but
even when they are nearly identical to the 2.6.20 unpatched kernel
configuration, the patched one still has the timing problems.  Here is the
current diff between a working 2.6.20 config and the most recent config I tried
in the rt8 patched version:

diff .config /tmp/linux-2.6.20/.config | grep -v "#"
3,4c3,4
---
8,10d7
< CONFIG_CLOCKSOURCE_WATCHDOG=y
< CONFIG_GENERIC_CLOCKEVENTS=y
< CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
102,104d98
141a136
> CONFIG_RWSEM_XCHGADD_ALGORITHM=y
156,157d150
< CONFIG_PREEMPT_DESKTOP=y
159,161d151
163,167d152
< CONFIG_CLASSIC_RCU=y
< CONFIG_ASM_SEMAPHORES=y
< CONFIG_RWSEM_XCHGADD_ALGORITHM=y
196d180
< CONFIG_REGPARM=y
Comment 8 Thomas Gleixner 2007-02-21 13:03:17 UTC
Can you please verify against 2.6.21-rc1 ?
Comment 9 David L 2007-02-21 17:10:34 UTC
In reply to comment #8:
Ok... is there a new real-time patch for 2.6.21-rc1 or should I just apply
patch-2.6.20-rt8 to it?
Comment 10 David L 2007-02-21 20:29:58 UTC
OK... I tried the vanilla 2.6.21-rc1 kernel (no RT patches) and it has the same
problem as 2.6.20-rt8.
Comment 11 Thomas Gleixner 2007-02-21 23:52:10 UTC
No, I meant that you try 2.6.21-rc1 as is (no -rt patch). The bogomips
calibration should not be influenced by -rt, but there have been similar
mysteries around calibration reported against mainline.

Thanks

    tglx
Comment 12 David L 2007-02-23 07:20:05 UTC
Should this bug be changed to 2.6.21-rc1?  I don't know the right way to do
that.  Can I just change the kernel version to 2.6.21-rc1 and the category to
"other"?

BTW, I've confirmed that this problem was happening back as far as 2.6.20-rt1,
so whatever mainline changes were picked up by rt1 had the problem.

Comment 13 Thomas Gleixner 2007-02-23 07:25:10 UTC
> Should this bug be changed to 2.6.21-rc1?  I don't know the right way to do
> that.  Can I just change the kernel version to 2.6.21-rc1 and the category to
> "other"?

No. Just keep it.

> BTW, I've confirmed that this problem was happening back as far as 2.6.20-rt1,
> so whatever mainline changes were picked up by rt1 had the problem.

Can you try the following please:

- plug in a PS/2 keyboard and check, if something changes
- switch off USB support in the BIOS and check, if something changes

Thanks,

	tglx


Comment 14 David L 2007-02-23 09:03:21 UTC
In reply to comment #13.  This is an embedded device with no monitor and no
keyboard connector (at least not without building custom cables).  I can change
any .config parameters, but it is quite difficult for me to change the BIOS or
plug in a keyboard.  :(  Let me know if disabling USB support in the kernel
would be useful.

One thing I noticed even on older kernels (eg 2.6.19.1-rt15) on this system is
that the assembly call that seems to be used by get_cycles:

__asm__ __volatile__("rdtsc" : "=A" (x));

doesn't return values that increment at the CPU clock rate like it does on my
desktop system... they increment at ~42.445764MHz on this 266MHz system.  I'm
not sure if this is any way related to this problem... I probably just don't
understand what get_cycles really does.

FYI - I think this board is a Compulab CM-i686:
http://www.compulab.co.il/i686/html/i686-cm-m-datasheet.htm
Comment 15 David L 2007-02-28 13:02:47 UTC
The bug still exists in 2.6.21-rc2.
Comment 16 Thomas Gleixner 2007-03-04 08:49:11 UTC
On Fri, 2007-02-23 at 09:03 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> n reply to comment #13.  This is an embedded device with no monitor and no
> keyboard connector (at least not without building custom cables).  I can change
> any .config parameters, but it is quite difficult for me to change the BIOS or
> plug in a keyboard.  :(  Let me know if disabling USB support in the kernel
> would be useful.

Nope.

> One thing I noticed even on older kernels (eg 2.6.19.1-rt15) on this system is
> that the assembly call that seems to be used by get_cycles:
> 
> __asm__ __volatile__("rdtsc" : "=A" (x));
> 
> doesn't return values that increment at the CPU clock rate like it does on my
> desktop system... they increment at ~42.445764MHz on this 266MHz system.  I'm
> not sure if this is any way related to this problem... I probably just don't
> understand what get_cycles really does.

It reads the TimeStampCounter (TSC), which usually increments with the
CPU frequency.

There is probably only bisecting Linus git tree, which will give us an
answer. Could you try that ?

Thanks,

	tglx



Comment 17 David L 2007-03-04 14:52:32 UTC
>There is probably only bisecting Linus git tree, which will give us an
>answer. Could you try that ?

I'm not sure how to try what your asking for.  I tried downloading the patch at:

http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.21-rc2-git3.bz2

and applying it to 2.6.20, but it failed.  Am I supposed to apply it to 2.6.20.1
or 2.6.21-rc2 or something else?  Is that even the right patch?
Comment 18 Thomas Gleixner 2007-03-05 01:44:44 UTC
On Sun, 2007-03-04 at 14:52 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> >There is probably only bisecting Linus git tree, which will give us an
> >answer. Could you try that ?
> 
> I'm not sure how to try what your asking for.  I tried downloading the patch at:
> 
> http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.21-rc2-git3.bz2
> 
> and applying it to 2.6.20, but it failed.  Am I supposed to apply it to 2.6.20.1
> or 2.6.21-rc2 or something else?  Is that even the right patch?

Sorry. I should have explained that.

http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt

It would be great, if you could go through that.

	tglx


Comment 19 David L 2007-03-05 16:09:02 UTC
git bisect good
e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit
commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Feb 16 01:28:04 2007 -0800

    [PATCH] clockevents: i386 drivers
    
    Add clockevent drivers for i386: lapic (local) and PIT/HPET (global).  Update
    the timer IRQ to call into the PIT/HPET driver's event handler and the
    lapic-timer IRQ to call into the lapic clockevent driver.  The assignement of
    timer functionality is delegated to the core framework code and replaces the
    compile and runtime evalution in do_timer_interrupt_hook()
    
    Use the clockevents broadcast support and implement the lapic_broadcast
    function for ACPI.
    
    No changes to existing functionality.
    
    [ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
    [ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
    Cleanups-from: Adrian Bunk <bunk@stusta.de>
    Build-fixes-from: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Cc: john stultz <johnstul@us.ibm.com>
    Cc: Roman Zippel <zippel@linux-m68k.org>
    Cc: Andi Kleen <ak@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 8cd88ef4dc6e976c589deda1c558e19ed8dcafde
88ed93cefb145aa545ff4561636cc9342fc7bb42 M      arch
:040000 040000 5a1342027bc77019bb3ca9295d02bf483410e3d4
3de4aedc5bb5d86287b7a5bad4b673e8ffbc1e44 M      drivers
:040000 040000 88ade982261c3ce0590c939694994796f02d2e73
d7221468ac776ccc7804f9a46d6d53fa7c294d43 M      include
Comment 20 Thomas Gleixner 2007-03-05 17:05:48 UTC
On Mon, 2007-03-05 at 16:09 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> git bisect good
> e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit
> commit e9e2cdb412412326c4827fc78ba27f410d837e6e
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Fri Feb 16 01:28:04 2007 -0800
> 
>     [PATCH] clockevents: i386 drivers

Sigh. I have no clue, how this should influence the bogomips
calibration.

Is the bogomips value correct, when you boot with 

commit 79bf2bb335b85db25d27421c798595a2fa2a0e82 

and wrong after applying the e9e2cdb412412326c4827fc78ba27f410d837e6e
one ?

Thanks

	tglx


Comment 21 David L 2007-03-05 19:31:47 UTC
>Is the bogomips value correct, when you boot with 
>commit 79bf2bb335b85db25d27421c798595a2fa2a0e82 
>and wrong after applying the e9e2cdb412412326c4827fc78ba27f410d837e6e
>one ? 
>Is the bogomips value correct, when you boot with 
>commit 79bf2bb335b85db25d27421c798595a2fa2a0e82 
>and wrong after applying the e9e2cdb412412326c4827fc78ba27f410d837e6e
>one ?

I'm a git novice, so I don't know exactly how to try what you're suggesting.
But I backed up each bzImage as I ran the git binary search, so I can show you
what the last and second to last kernels were reporting.  It looks like the
reported kernel version shows the beginning of the git tag... is that enough
for you to confirm which patch was bad?:

# cat /proc/version
Linux version 2.6.20-g79bf2bb3 (nospam@nospam.com) (gcc version 4.1.1 20061011
(Red Hat 4.1.1-30)) #15 PREEMPT Mon Mar 5 16:02:20 PST 2007
# cat /proc/cpuinfo
processor       : 0
<snip>
cpu MHz         : 266.789
<snip>
bogomips        : 537.44


# cat /proc/version
Linux version 2.6.20-ge9e2cdb4 (nospam@nospam.com) (gcc version 4.1.1 20061011
(Red Hat 4.1.1-30)) #14 PREEMPT Mon Mar 5 15:44:09 PST 2007
# cat /proc/cpuinfo
processor       : 0
<snip>
bogomips        : 7326.54
Comment 22 David L 2007-03-19 09:01:18 UTC
2.6.21-rc4 still has the problem, but the reported bogomips is now even bigger:

cat /proc/cpuinfo
processor       : 0
vendor_id       : CyrixInstead
cpu family      : 5
model           : 9
model name      : Unknown
stepping        : 1
cpu MHz         : 266.769
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu tsc msr cx8 cmov mmx cxmmx
bogomips        : 29296.35
clflush size    : 32
Comment 23 Thomas Gleixner 2007-03-21 08:28:12 UTC
> 2.6.21-rc4 still has the problem, but the reported bogomips is now even
> bigger:
> bogomips        : 29296.35

Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion
again ?

Start with good=e9e2cdb412412326c4827fc78ba27f410d837e6e and mark the
latest one as bad. It would be really helpful, if we could find the
change, which made this problem surface.

Thanks,

	tglx
Comment 24 David L 2007-03-21 14:13:17 UTC
>> 2.6.21-rc4 still has the problem, but the reported bogomips is now even
>> bigger:
>> bogomips        : 29296.35

>Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion
>again ?

Argh... I screwed up when I tested 2.6.21-rc4.  I changed the timer 
frequency from 250Hz to 1000Hz and the bogomips calibration error 
changed by a factor of 4.  So my last comment was bogus, but maybe
the fact that the calibration error scales with the timer frequency
is a clue??
Comment 25 David L 2007-03-21 14:31:21 UTC
I confirmed that when I changed the timer frequency to 100Hz, the
bogomips calibration changes by a factor of ~10 from when it was
at 1000Hz:
cat /proc/cpuinfo
<snip>
bogomips        : 2930.48
Comment 26 David L 2007-03-22 09:40:19 UTC
>> 2.6.21-rc4 still has the problem, but the reported bogomips is now even
>> bigger:
>> bogomips        : 29296.35

>Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion
>again ?

Did you still want me to run that bisection or were you asking me to 
look for the change that cause the calibration error to get bigger 
(which we now understand was a .config change)?
Comment 27 Thomas Gleixner 2007-03-22 09:52:39 UTC
> ------- Additional Comments From idht4n@hotmail.com  2007-03-22 09:40 -------
> >> 2.6.21-rc4 still has the problem, but the reported bogomips is now even bigger:
> >> bogomips        : 29296.35
> 
> >Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion
> >again ?
> 
> Did you still want me to run that bisection or were you asking me to 
> look for the change that cause the calibration error to get bigger 
> (which we now understand was a .config change)?

No. We know that it is due to HZ=100/250/1000

I found a box, where I can reproduce. I try to track this down. Thanks.

	tglx


Comment 28 Thomas Gleixner 2007-03-22 12:24:49 UTC
Created attachment 10911 [details]
pit setup fix

Can you please test this patch ?
Comment 29 David L 2007-03-22 15:32:05 UTC
The patch seemed to fix the bogomips calibration
but not the timing problems that I thought were
related.  Tests with usleep seem to indicate that 
the system time is incrementing at about 1/350 of 
real time.  I'll do some more tests with older
kernels and make sure I'm not doing something wrong.
Comment 30 john stultz 2007-03-22 15:46:02 UTC
Does gettimeofday() increment slowly w/ 2.6.18 or 2.6.19?
Comment 31 David L 2007-03-22 16:12:02 UTC
> Does gettimeofday() increment slowly w/ 2.6.18 or 2.6.19?

I don't think so.  I just confirmed the timing seemed ok in
2.6.20.  I'm starting to bisect... I'll post the result later.
Comment 32 john stultz 2007-03-22 17:27:40 UTC
Could you attach a dmesg from a post-2.6.18 kernel that does not have any timing
issue?
Comment 33 David L 2007-03-22 18:41:08 UTC
Created attachment 10920 [details]
dmesg output from a kernel with no timing problem in response to commeng #32
Comment 34 David L 2007-03-22 19:50:30 UTC
Unless I screwed up, here's the git bisect results for the timing problem:

92c7e00254b2d0efc1e36ac3e45474ce1871b6b2 is first bad commit
commit 92c7e00254b2d0efc1e36ac3e45474ce1871b6b2
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Feb 16 01:27:33 2007 -0800

    [PATCH] Simplify the registration of clocksources

    Enqueue clocksources in rating order to make selection of the clocksource
    easier.  Also check the match with an user override at enqueue time.

    Preparatory patch for the generic clocksource verification.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Cc: john stultz <johnstul@us.ibm.com>
    Cc: Roman Zippel <zippel@linux-m68k.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 2554fcacae8f3778b85140c6ee08e62551a4dd05
cc95d97744405c5c4c39c47aff7504d73cfaa139 M      arch
:040000 040000 b8cb0e2c18a396c5d1e49f4ce7d6402e15952135
90b0289adc69bc487f27faa9ff05b34b83466051 M      include
:040000 040000 28703eadd46c9133b3a569b03529eadde75f0507
ab98a4ef4c813976ed26d1238166e23a39990577 M      kernel
Comment 35 john stultz 2007-03-23 15:16:30 UTC
So looking at the dmesg, I can see that the TSC is being used on the troublesome
kernel and the PIT is being used otherwise. It looks like w/ the newer kernel
the slow TSC detection code isn't catching the slow TSC.

Quick workaround: Boot w/ "clocksource=pit" to see if that fixes the time issue.
Comment 36 David L 2007-03-23 19:32:44 UTC
Created attachment 10929 [details]
patch emailed to me by tglx - seemed to fix timing problem