Bug 14280

Summary: slow clock on Amilo Pro 2030 with NO_HZ=y
Product: Platform Specific/Hardware Reporter: Bernhard M. Wiedemann (kernelbmw2017)
Component: i386Assignee: platform_i386
Status: RESOLVED DOCUMENTED    
Severity: low CC: rjw, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with NO_HZ active
Debug patch against 2.6.31

Description Bernhard M. Wiedemann 2009-09-30 18:28:07 UTC
On my Amilo Pro 2030 I run a vanilla 2.6.31 kernel (the one from ftp://ftp.suse.com/pub/projects/kernel/kotd/master/i586/)  with NO_HZ=y
On that setup, my system clock is slow by approx 1-2 seconds per minute.

So far, I have tried booting with nolapic_timer option and with

/etc/modprobe.d/40test.conf
options processor max_cstate=1

but the only one that helped was nohz=off

FYI: I'm relatively new to kernel-bug-reporting, so I might have put this into the wrong category.


====

# lspci
00:00.0 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. PT890 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237/VX700 PCI Bridge
00:06.0 Network controller: Broadcom Corporation BCM4318 [AirForce One 54g] 802.11g Wireless LAN Controller (rev 02)
00:0c.0 CardBus bridge: Texas Instruments PCIxx21/x515 Cardbus Controller
00:0c.2 FireWire (IEEE 1394): Texas Instruments OHCI Compliant IEEE 1394 Host Controller
00:0f.0 IDE interface: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller (rev 80)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
01:00.0 VGA compatible controller: VIA Technologies, Inc. CN700/P4M800 Pro/P4M800 CE/VN800 [S3 UniChrome Pro] (rev 01)

more testing/information available on request
Comment 1 john stultz 2009-09-30 18:43:54 UTC
What was most recent kernel version that worked properly on your hardware?

Also could you provide the output from:
  cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
  cat /sys/devices/system/clocksource/clocksource0/available_clocksource 


thanks
-john
Comment 2 Bernhard M. Wiedemann 2009-09-30 19:16:41 UTC
> cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm tsc
> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm

I ran a two-year-old openSUSE-10.3 with a 2.6.22 kernel before and never had clock issues, but this might well be, because NO_HZ was not in it.

I could probably boot up the old system with kernels from in-between (new one is on ext4 so harder there) or I could use live systems to test.


oh, I just found that this seems to amend the problem:
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource

is there a boot-parameter for that? or some way the kernel will automatically figure out to use that?
Comment 3 john stultz 2009-09-30 19:29:34 UTC
Huh. Can you provide the full dmesg?

It *sounds* like NO_HZ  is somehow causing the acpi_pm to slow down or halt occasionally. I'm not sure how that would be connected to NO_HZ.

When you boot with nohz=off, do you see the same acpi_pm return for current_clocksource?
Comment 4 Bernhard M. Wiedemann 2009-09-30 19:42:30 UTC
it has same clocksource:
cat /proc/cmdline /sys/devices/system/clocksource/clocksource0/current_clocksource
root=/dev/sda3 resume=/dev/sda1 splash=silent quiet vga=0x317 nohz=off
acpi_pm


maybe the system could miss timer ticks with NO_HZ?
I think the chipset does not support cpufreq, so that should not be an issue.
Comment 5 Bernhard M. Wiedemann 2009-09-30 19:44:33 UTC
Created attachment 23212 [details]
dmesg with NO_HZ active
Comment 6 john stultz 2009-09-30 19:51:15 UTC
Well, one of the expected side effects of NO_HZ is to miss timer ticks, however the timekeeping code is fine with that since it doesn't use ticks to account for time, instead we accumulate time directly from the clocksource, so as long as the clocksource doesn't wrap, it shouldn't be an issue.

I guess it could be that you're going tickless for more then 2.5 seconds and the ACPI PM is wrapping and you lose a bit of time  that way. Seems a bit unlikely. 

If you do something like the following:

$ while true; do echo "" > /dev/null;  done

Do you see the time drift behind?

Also, what are you using to measure the time drift? ntpdate?

thanks
-john
Comment 7 Bernhard M. Wiedemann 2009-10-01 04:06:26 UTC
I'm using the current equivalent of ntpdate:
> sntp -v -P no -r pool.ntp.orgs
sntp options: a=2 v=1 e=0.100 E=5.000 P=2147483647.000                 
    d=15 c=5 x=0 op=1 l=/etc/sntp.pid f= pool.ntp.org                  
sntp: time changed by 2.901 secs to 2009 Oct 01 05:46:30.323 +/- 0.012+0.041


The very interesting part is that with running your busy-loop line as well as my favourite perl -e 'while(1){}'   (which does not go to kernel) there is no more clock-drift. But as soon as I stop it, drift comes back.
Should help to narrow it down.

I also think, if some counter was wrapping, I would be seeing time jump in distinct 2.5sec-steps instead of slewing nearly continuously over time.

tests with
while sleep 10 ; do sntp -v -P no -r $server ; done
show some variation between 0.32 and 0.4 with at most 0.025sec steps.
Comment 8 john stultz 2009-10-06 20:13:50 UTC
Created attachment 23290 [details]
Debug patch against 2.6.31

Could you test this patch to see if it changes behavior? I'm trying to determine if there's a bug in the NOHZ code that is causing the clock adjustment code to go crazy, or if the clocksource is actually slowing down when we go into oneshot mode.
Comment 9 Bernhard M. Wiedemann 2009-10-08 04:42:35 UTC
I built a 2.6.31 with the patch and drift is still the same.
4% when idle and 0% when busy.
Comment 10 john stultz 2009-11-02 22:13:04 UTC
Bernhard: Sorry for the late response, I didn't see your last data point.

So it seems your hardware is actually speeding up and slowing down. I suspect its some sort of chipset bug.

You might try seeing if there are any BIOS power-management options that could be affecting this. Something like "Spread-spectrum timing" maybe?

For now I guess nohz=off will be required for your box. Thomas, would you be opposed to some sort of pci quirk to force this on?
Comment 11 Thomas Gleixner 2009-11-03 11:54:55 UTC
> So it seems your hardware is actually speeding up and slowing down. I suspect
> its some sort of chipset bug.
> 
> You might try seeing if there are any BIOS power-management options that
> could
> be affecting this. Something like "Spread-spectrum timing" maybe?
> 
> For now I guess nohz=off will be required for your box. Thomas, would you be
> opposed to some sort of pci quirk to force this on?

Not convinced yet :)

Bernhard, can you please let the machine idle and just run:

# while true; do sleep 1; done

Does that still expose the problem ?

Thanks,

	tglx
Comment 12 Bernhard M. Wiedemann 2009-11-03 17:43:34 UTC
I'm back on opensuse-11.2-RC2's 2.6.31.5-0.1-pae kernel now, for other testing reasons.
I went init 2 and killed even cron and cupsd and did
while sleep 1; do true; done which is pretty much the same.

bug was still there. acpi_pm clock is 4% slow.
I also saw that TSC timer is ~1% fast, too.

This machine comes with phoenix BIOS which has an amazingly user-friendly settings dialog with 8 (eight) settings, none of which comes close to "spread spectrum" (closest is NX bit enable)

I'll go for nohz=off for the meanwhile.

If the above debug-patch is important for this test, please tell, so that I can retry.
Comment 13 Rafael J. Wysocki 2010-01-04 22:55:32 UTC
What's the status here?

Have you tried any newer kernels since then?  Is the problem still present in there?
Comment 14 Bernhard M. Wiedemann 2010-01-07 18:17:52 UTC
tried today http://ftp.suse.com/pub/projects/kernel/kotd/master/i586/kernel-default-2.6.32.2-0.0.25.2acc4d7.i586.rpm

It still shows same behaviour. Only acpi_pm clocksource available there and it is slow with NOHZ, but runs fine with nohz=off.
Comment 15 Bernhard M. Wiedemann 2010-12-05 19:05:33 UTC
update: installed 2.6.37-rc3-git6 on this machine and the bug is still there. Luckily nohz=off still helps.
Comment 16 Rafael J. Wysocki 2011-02-18 21:02:33 UTC
acpi_pm is not really a good clocksource.

Have you tried to add tsc=reliable to the kernel command line?
Comment 17 Rafael J. Wysocki 2011-02-18 21:05:24 UTC
Also you may need to add hpet=force to the kernel command line.  Can you
please test that too?
Comment 18 Bernhard M. Wiedemann 2011-02-19 08:22:14 UTC
So I booted 2.6.37-22-default without nohz=off but with tsc=reliable hpet=force
and my viafb framebuffer cursor still would not blink, unless I produced keyboard interrupts.


/sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
/sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

I did echo hpet > current_clocksource
and dmesg had "Switching to clocksource hpet"


/proc/interrupts showed as few as 20 "Local timer interrupts" in 3 minutes.
with i8042 being the only other interrupt


echo tsc > current_clocksource
caused dmesg to have
"Override clocksource tsc is not HRT compatible. Cannot switch while in HRT/NOHZ mode
Switching to clocksource tsc"

It is hard to tell much about timing accuracy, because network does not work without interrupts. Comparing `date` between machines indicates, that it is not drifting too much with either tsc or hpet.
Comment 19 Rafael J. Wysocki 2011-02-25 22:57:41 UTC
Have you tried hpet=force without tsc=reliable ?
Comment 20 Bernhard M. Wiedemann 2011-03-06 07:46:41 UTC
both hpet=force alone
and tsc=reliable alone
help to get reliable timekeeping.

echo acpi_pm > current_clocksource makes it return to the buggy behaviour, so this indicates that acpi_pm is bad on this machine with NOHZ.

The other side effects described in comment 18 are probably from viafb.
Comment 21 Rafael J. Wysocki 2011-03-12 21:38:08 UTC
Yes, that most likely is the case.

Please always run with hpet=force alone as a workaround, we can't do much
about fixing your machine.