I'm new to using the real-time patches and I'm seeing some things that seem odd to me. Namely, the "bogomips" listed in /proc/cpuinfo is 7324.40 for a 266MHz processor instead of the 538.65 value that it has without the real-time patch. Also, calls to gettimeofday return times that increment at ~1/160 of the actual elapsed time. Here's the processor info: processor : 0 vendor_id : CyrixInstead cpu family : 5 model : 9 model name : Unknown stepping : 1 cpu MHz : 266.708 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu tsc msr cx8 cmov mmx cxmmx bogomips : 538.70 clflush size : 32
Can you please send a full boot log ?
Created attachment 10444 [details] dmesg output
*** Bug 8029 has been marked as a duplicate of this bug. ***
*** Bug 8030 has been marked as a duplicate of this bug. ***
Here's some more info about this problem: usleep 100000 takes about 42 seconds to execute. During that period, about 751 timer interrupts occur. CONFIG_HZ=250, so I think ~25 timer interrupts should have accumulated during a 0.1 seconds sleep. But since it took 42 seconds instead of 0.1 seconds, about 10500 should have accumulated. It's odd that it's not consistent with either. BTW, sorry about the duplicate bug reports 8029 and 8030... I think I refreshed my browser window on the original bug report page instead of the "show bug" page where I though I was.
Created attachment 10461 [details] config.gz
I've experimented with different config settings in the 2.6.20-rt8 kernel, but even when they are nearly identical to the 2.6.20 unpatched kernel configuration, the patched one still has the timing problems. Here is the current diff between a working 2.6.20 config and the most recent config I tried in the rt8 patched version: diff .config /tmp/linux-2.6.20/.config | grep -v "#" 3,4c3,4 --- 8,10d7 < CONFIG_CLOCKSOURCE_WATCHDOG=y < CONFIG_GENERIC_CLOCKEVENTS=y < CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y 102,104d98 141a136 > CONFIG_RWSEM_XCHGADD_ALGORITHM=y 156,157d150 < CONFIG_PREEMPT_DESKTOP=y 159,161d151 163,167d152 < CONFIG_CLASSIC_RCU=y < CONFIG_ASM_SEMAPHORES=y < CONFIG_RWSEM_XCHGADD_ALGORITHM=y 196d180 < CONFIG_REGPARM=y
Can you please verify against 2.6.21-rc1 ?
In reply to comment #8: Ok... is there a new real-time patch for 2.6.21-rc1 or should I just apply patch-2.6.20-rt8 to it?
OK... I tried the vanilla 2.6.21-rc1 kernel (no RT patches) and it has the same problem as 2.6.20-rt8.
No, I meant that you try 2.6.21-rc1 as is (no -rt patch). The bogomips calibration should not be influenced by -rt, but there have been similar mysteries around calibration reported against mainline. Thanks tglx
Should this bug be changed to 2.6.21-rc1? I don't know the right way to do that. Can I just change the kernel version to 2.6.21-rc1 and the category to "other"? BTW, I've confirmed that this problem was happening back as far as 2.6.20-rt1, so whatever mainline changes were picked up by rt1 had the problem.
> Should this bug be changed to 2.6.21-rc1? I don't know the right way to do > that. Can I just change the kernel version to 2.6.21-rc1 and the category to > "other"? No. Just keep it. > BTW, I've confirmed that this problem was happening back as far as 2.6.20-rt1, > so whatever mainline changes were picked up by rt1 had the problem. Can you try the following please: - plug in a PS/2 keyboard and check, if something changes - switch off USB support in the BIOS and check, if something changes Thanks, tglx
In reply to comment #13. This is an embedded device with no monitor and no keyboard connector (at least not without building custom cables). I can change any .config parameters, but it is quite difficult for me to change the BIOS or plug in a keyboard. :( Let me know if disabling USB support in the kernel would be useful. One thing I noticed even on older kernels (eg 2.6.19.1-rt15) on this system is that the assembly call that seems to be used by get_cycles: __asm__ __volatile__("rdtsc" : "=A" (x)); doesn't return values that increment at the CPU clock rate like it does on my desktop system... they increment at ~42.445764MHz on this 266MHz system. I'm not sure if this is any way related to this problem... I probably just don't understand what get_cycles really does. FYI - I think this board is a Compulab CM-i686: http://www.compulab.co.il/i686/html/i686-cm-m-datasheet.htm
The bug still exists in 2.6.21-rc2.
On Fri, 2007-02-23 at 09:03 -0800, bugme-daemon@bugzilla.kernel.org wrote: > n reply to comment #13. This is an embedded device with no monitor and no > keyboard connector (at least not without building custom cables). I can change > any .config parameters, but it is quite difficult for me to change the BIOS or > plug in a keyboard. :( Let me know if disabling USB support in the kernel > would be useful. Nope. > One thing I noticed even on older kernels (eg 2.6.19.1-rt15) on this system is > that the assembly call that seems to be used by get_cycles: > > __asm__ __volatile__("rdtsc" : "=A" (x)); > > doesn't return values that increment at the CPU clock rate like it does on my > desktop system... they increment at ~42.445764MHz on this 266MHz system. I'm > not sure if this is any way related to this problem... I probably just don't > understand what get_cycles really does. It reads the TimeStampCounter (TSC), which usually increments with the CPU frequency. There is probably only bisecting Linus git tree, which will give us an answer. Could you try that ? Thanks, tglx
>There is probably only bisecting Linus git tree, which will give us an >answer. Could you try that ? I'm not sure how to try what your asking for. I tried downloading the patch at: http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.21-rc2-git3.bz2 and applying it to 2.6.20, but it failed. Am I supposed to apply it to 2.6.20.1 or 2.6.21-rc2 or something else? Is that even the right patch?
On Sun, 2007-03-04 at 14:52 -0800, bugme-daemon@bugzilla.kernel.org wrote: > >There is probably only bisecting Linus git tree, which will give us an > >answer. Could you try that ? > > I'm not sure how to try what your asking for. I tried downloading the patch at: > > http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.21-rc2-git3.bz2 > > and applying it to 2.6.20, but it failed. Am I supposed to apply it to 2.6.20.1 > or 2.6.21-rc2 or something else? Is that even the right patch? Sorry. I should have explained that. http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt It would be great, if you could go through that. tglx
git bisect good e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit commit e9e2cdb412412326c4827fc78ba27f410d837e6e Author: Thomas Gleixner <tglx@linutronix.de> Date: Fri Feb 16 01:28:04 2007 -0800 [PATCH] clockevents: i386 drivers Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update the timer IRQ to call into the PIT/HPET driver's event handler and the lapic-timer IRQ to call into the lapic clockevent driver. The assignement of timer functionality is delegated to the core framework code and replaces the compile and runtime evalution in do_timer_interrupt_hook() Use the clockevents broadcast support and implement the lapic_broadcast function for ACPI. No changes to existing functionality. [ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ] [ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ] Cleanups-from: Adrian Bunk <bunk@stusta.de> Build-fixes-from: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> :040000 040000 8cd88ef4dc6e976c589deda1c558e19ed8dcafde 88ed93cefb145aa545ff4561636cc9342fc7bb42 M arch :040000 040000 5a1342027bc77019bb3ca9295d02bf483410e3d4 3de4aedc5bb5d86287b7a5bad4b673e8ffbc1e44 M drivers :040000 040000 88ade982261c3ce0590c939694994796f02d2e73 d7221468ac776ccc7804f9a46d6d53fa7c294d43 M include
On Mon, 2007-03-05 at 16:09 -0800, bugme-daemon@bugzilla.kernel.org wrote: > git bisect good > e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit > commit e9e2cdb412412326c4827fc78ba27f410d837e6e > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Fri Feb 16 01:28:04 2007 -0800 > > [PATCH] clockevents: i386 drivers Sigh. I have no clue, how this should influence the bogomips calibration. Is the bogomips value correct, when you boot with commit 79bf2bb335b85db25d27421c798595a2fa2a0e82 and wrong after applying the e9e2cdb412412326c4827fc78ba27f410d837e6e one ? Thanks tglx
>Is the bogomips value correct, when you boot with >commit 79bf2bb335b85db25d27421c798595a2fa2a0e82 >and wrong after applying the e9e2cdb412412326c4827fc78ba27f410d837e6e >one ? >Is the bogomips value correct, when you boot with >commit 79bf2bb335b85db25d27421c798595a2fa2a0e82 >and wrong after applying the e9e2cdb412412326c4827fc78ba27f410d837e6e >one ? I'm a git novice, so I don't know exactly how to try what you're suggesting. But I backed up each bzImage as I ran the git binary search, so I can show you what the last and second to last kernels were reporting. It looks like the reported kernel version shows the beginning of the git tag... is that enough for you to confirm which patch was bad?: # cat /proc/version Linux version 2.6.20-g79bf2bb3 (nospam@nospam.com) (gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)) #15 PREEMPT Mon Mar 5 16:02:20 PST 2007 # cat /proc/cpuinfo processor : 0 <snip> cpu MHz : 266.789 <snip> bogomips : 537.44 # cat /proc/version Linux version 2.6.20-ge9e2cdb4 (nospam@nospam.com) (gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)) #14 PREEMPT Mon Mar 5 15:44:09 PST 2007 # cat /proc/cpuinfo processor : 0 <snip> bogomips : 7326.54
2.6.21-rc4 still has the problem, but the reported bogomips is now even bigger: cat /proc/cpuinfo processor : 0 vendor_id : CyrixInstead cpu family : 5 model : 9 model name : Unknown stepping : 1 cpu MHz : 266.769 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu tsc msr cx8 cmov mmx cxmmx bogomips : 29296.35 clflush size : 32
> 2.6.21-rc4 still has the problem, but the reported bogomips is now even > bigger: > bogomips : 29296.35 Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion again ? Start with good=e9e2cdb412412326c4827fc78ba27f410d837e6e and mark the latest one as bad. It would be really helpful, if we could find the change, which made this problem surface. Thanks, tglx
>> 2.6.21-rc4 still has the problem, but the reported bogomips is now even >> bigger: >> bogomips : 29296.35 >Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion >again ? Argh... I screwed up when I tested 2.6.21-rc4. I changed the timer frequency from 250Hz to 1000Hz and the bogomips calibration error changed by a factor of 4. So my last comment was bogus, but maybe the fact that the calibration error scales with the timer frequency is a clue??
I confirmed that when I changed the timer frequency to 100Hz, the bogomips calibration changes by a factor of ~10 from when it was at 1000Hz: cat /proc/cpuinfo <snip> bogomips : 2930.48
>> 2.6.21-rc4 still has the problem, but the reported bogomips is now even >> bigger: >> bogomips : 29296.35 >Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion >again ? Did you still want me to run that bisection or were you asking me to look for the change that cause the calibration error to get bigger (which we now understand was a .config change)?
> ------- Additional Comments From idht4n@hotmail.com 2007-03-22 09:40 ------- > >> 2.6.21-rc4 still has the problem, but the reported bogomips is now even bigger: > >> bogomips : 29296.35 > > >Hmm. Is this reproducible ? If yes, may I ask you to do a biscetion > >again ? > > Did you still want me to run that bisection or were you asking me to > look for the change that cause the calibration error to get bigger > (which we now understand was a .config change)? No. We know that it is due to HZ=100/250/1000 I found a box, where I can reproduce. I try to track this down. Thanks. tglx
Created attachment 10911 [details] pit setup fix Can you please test this patch ?
The patch seemed to fix the bogomips calibration but not the timing problems that I thought were related. Tests with usleep seem to indicate that the system time is incrementing at about 1/350 of real time. I'll do some more tests with older kernels and make sure I'm not doing something wrong.
Does gettimeofday() increment slowly w/ 2.6.18 or 2.6.19?
> Does gettimeofday() increment slowly w/ 2.6.18 or 2.6.19? I don't think so. I just confirmed the timing seemed ok in 2.6.20. I'm starting to bisect... I'll post the result later.
Could you attach a dmesg from a post-2.6.18 kernel that does not have any timing issue?
Created attachment 10920 [details] dmesg output from a kernel with no timing problem in response to commeng #32
Unless I screwed up, here's the git bisect results for the timing problem: 92c7e00254b2d0efc1e36ac3e45474ce1871b6b2 is first bad commit commit 92c7e00254b2d0efc1e36ac3e45474ce1871b6b2 Author: Thomas Gleixner <tglx@linutronix.de> Date: Fri Feb 16 01:27:33 2007 -0800 [PATCH] Simplify the registration of clocksources Enqueue clocksources in rating order to make selection of the clocksource easier. Also check the match with an user override at enqueue time. Preparatory patch for the generic clocksource verification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> :040000 040000 2554fcacae8f3778b85140c6ee08e62551a4dd05 cc95d97744405c5c4c39c47aff7504d73cfaa139 M arch :040000 040000 b8cb0e2c18a396c5d1e49f4ce7d6402e15952135 90b0289adc69bc487f27faa9ff05b34b83466051 M include :040000 040000 28703eadd46c9133b3a569b03529eadde75f0507 ab98a4ef4c813976ed26d1238166e23a39990577 M kernel
So looking at the dmesg, I can see that the TSC is being used on the troublesome kernel and the PIT is being used otherwise. It looks like w/ the newer kernel the slow TSC detection code isn't catching the slow TSC. Quick workaround: Boot w/ "clocksource=pit" to see if that fixes the time issue.
Created attachment 10929 [details] patch emailed to me by tglx - seemed to fix timing problem