Hardware: Intel S5000PAL/PSL motherboard, dual 5160 and dual E5450. PCIe plugin card to IDT Switch Software Environment: Problem Description: We noticed that some of our systems were coming up with bad system time, i.e. cpu_khz in /proc/cpuinfo was wrong and time was not being correctly kept(losing several seconds per minute). This problem happens more often on some machines than others. The problem is due to the fact that the number of cpu cycles over 30 ms is occassionally too large and the cpu_khz calculated in tsc.c ends up too large. I suspect that an NMI is sneaking in, delaying the time at which the OUT pin is sampled high (inb_p(0x61) & 0x20). The 2.6.25 code is the same as 2.6.18 for the sampling of the OUT pin. Test results: kernel result ============ ================= 2.6.18.solace Fails approx 1 in 3 reboots 2.6.18.8 Failed after 16 reboots 2.6.25 2 fails in 244 reboots The interesting thing is that 2.6.25 had an extremely rare "cpu looks slow" event. Normally the cpu looks too fast because cpu_cycles are recorded while the cpu is distracted with an NMI. In the too slow case - the only way this can happen is if the NMI happens at a very specific time - in between programming the timer and reading the 'start' cycle count. So in this case we know extactly how long the NMI was - it was 5.84 ms. We plan on fixing by repeating the measurement some number of times and selecting the best results. bogomips are also independently calculated incorrectly - this is a separate problem that I believe is of little consequence. Steps to reproduce: reboot and look at /proc.cpuinfo for incorrect CPU MHz.
Reassigning to platform/x86_64. ISTR this was reported and discussed a couple of months ago, but I don't recall any conclusion. Could it be an SMI?
is there any update on this BUG? I'm suffering it also.
Sorry, my bad. It's only an incorrect KDE setting for me.
(In reply to comment #1) > Reassigning to platform/x86_64. > > ISTR this was reported and discussed a couple of months > ago, but I don't recall any conclusion. > > Could it be an SMI? Yes, we have seen this in apic timer calibration as well. What I wonder is why this happens on 64bit. The 64 bit code has a detection for that already when TSC/cpu_khz is calibrated.
The code in arch/i386/kernal/tsc.c has no checks for failures - there is a one-time reading of a start/end pair. In init/calibrate.c there is an attempt to catch bad readings and not incude them in an average - but that is for bogomips.
> ------- Comment #5 from charles.mitchell@solacesystems.com 2008-05-07 12:07 > ------- > The code in arch/i386/kernal/tsc.c has no checks for failures - there is a > one-time reading of a start/end pair. > > In init/calibrate.c there is an attempt to catch bad readings and not incude > them in an average - but that is for bogomips. Yeah, I know. I planned to unify tsc_32/64, but I did not come around yet. Maybe that bug report is a good enough priority boost to go for it. Thanks, tglx
Fixed in mainline: fbb16e243887332dd5754e48ffe5b963378f3cd2