Bug 10446

Summary: Clock stopped (time/date freeze or looping over 5 seconds)
Product: Timers Reporter: Roland Kletzing (devzero)
Component: OtherAssignee: john stultz (john.stultz)
Status: REJECTED INVALID    
Severity: normal CC: akataria, devzero, dhecht, mingo, rjw, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: up to 2.6.26-rc1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: requested data
vmware issue

Description Roland Kletzing 2008-04-12 08:03:27 UTC
Latest working kernel version: n/a

Earliest failing kernel version: 2.6.16 or maybe even 2.6.5

Distribution: n/a

Hardware Environment: real hardware & especially virtual machine

Software Environment: n/a

Problem Description:
on certain circumstances, the internal linux clock/timer goes nuts and time either freezing completely or running in a loop, i.e. jumps back in time by 5 seconds. vmstat, watch, top and any timer triggered utility dont work anymore.

Steps to reproduce:
get vmware workstation 6.5 beta, install on windoze, install a linux VM (opensuse) and get it up and running. hibernate/suspend to disk the windows machine then. wait some time, and then resume the host machine. clock inside linux guest shows those sympthoms then (have seen both freeze and jumpbacks), maybe because vmware wants catch up time with the host system.linux vm recovers from this problem after some time - so not sure if it is the same problem as on real hardware, but at least its the same sympthom.
this may be also be reproduced with earlier vmware versions, maybe on linux too - maybe even with different virtualization product, but didnt try.

discussion at:
http://marc.info/?l=linux-kernel&m=120734451401191&w=2
http://lkml.org/lkml/2008/3/14/178

adding it to bugzilla, so this wont get lost in lkml deepness
Comment 1 john stultz 2008-04-16 15:32:47 UTC
~5second loop looks like nanosecond overflows. This likely points to update_wall_time not getting called, which is probably due to the timer interrupt not firing or being delivered.

From email thread:
On Thu, 2008-04-17 at 00:12 +0200, devzero@web.de wrote:
> >The looping ~5seconds sounds like nanosecond overflows. Might it be
> >that vmware isn't delivering timer interrupts? 
> 
> seems so - at least i don`t see LOC (local timer interrupts) increase in
> /proc/interrupts - the only value increasing there is
> 
> 1:  <value> IO-APIC-EDGE  i8042
> 
> >Could this be an interaction w/ hrt or no_hz?
> don`t know - not sure what hrt or no_hz is all about in detail, so please 
> you may tell me what i can check - i will happily deliver all input you 
> need or do any test for you.

I'd try disabling CONFIG_NO_HZ in your .config, and if that doesn't help, try disabling CONFIG_HIGH_RES_TIMERS. 

These are newish features, and since the report goes back to 2.6.16 or earlier, it may not help, but lets make sure they're not involved in the issue first. 

If they do not change the behavior, try booting w/ noapic to see if it effects the issue.

As an aside, for real (non-virtualized) systems, there was a very similar issue years ago on a few Pentium 4/HT IBM Thinkcentre desktop systems. In that case, there was a BIOS SMI that was firing and mis-programming the apic, leaving it to fire at something like HZ=0.5. A BIOS update resolve the issue. You might want to check that the latest BIOS is installed.
Comment 2 Roland Kletzing 2008-04-18 13:47:13 UTC
ok - i can boot with nohz=off, which should be equivalent to unsetting CONFIG_NO_HZ, right ?

what about the second config option - doesn`t that have a commandline param?
maybe hpet=disable ?

i`m just asking because i don`t want to go trough a complete kernel compile - inside that VM on windows host it`s something which runs for quite a while.....

will try nohz=off now
Comment 3 Thomas Gleixner 2008-04-18 15:15:28 UTC
> ------- Comment #2 from devzero@web.de  2008-04-18 13:47 -------
> ok - i can boot with nohz=off, which should be equivalent to unsetting
> CONFIG_NO_HZ, right ?
> 
> what about the second config option - doesn`t that have a commandline param?
> maybe hpet=disable ?

yep hpet=disable is about correct.
 
Thanks,
	tglx
Comment 4 Roland Kletzing 2008-04-18 16:57:49 UTC
i tried nohz=off and it seems that this fixed the issue.
i cannot tell for 100% sure yet, but it looks different with nohz=off
Comment 5 Thomas Gleixner 2008-04-19 02:08:44 UTC
> i tried nohz=off and it seems that this fixed the issue.
> i cannot tell for 100% sure yet, but it looks different with nohz=off

Does hpet=disable help as well ?

Thanks,
	tglx
Comment 6 Roland Kletzing 2008-04-20 13:36:47 UTC
no - it doesn`t seem to help. nohz=off helps for sure, i tried several hibernate/resume and after resume, time was always ok at once. with hpet or with hpet=disable, the linux vm needs an amount of time for recover from the "freezed" or the "jumpy" state - the time needed seems to depend from the duration of hibernation - the larger the duration of hibernation, the longer it needs for recover.
Comment 7 Roland Kletzing 2008-04-28 11:44:46 UTC
any further advice?
Comment 8 Thomas Gleixner 2008-04-28 13:54:22 UTC
> any further advice?

Can you please provide the output of /proc/timer_list and of
/sys/devices/system/clocksource/clocksource0/current_clocksource
for both the nohz=off and the nohz=on case ?

Is there any change to the devices listed in those files after a
resume ?

Thanks,
	tglx
Comment 9 Roland Kletzing 2008-04-30 16:28:42 UTC
Created attachment 15991 [details]
requested data

here it is.

no, it looks that the devices don`t change
Comment 10 john stultz 2008-04-30 16:49:48 UTC
So one difference from the data:
So in the nohz=off, the TSC is used as the clocksource, while w/ nohz=on, its the acpi_pm.  I wonder if the vmware emulation isn't incrementing the emulated acpi_pm counter properly after a resume? 

I considered maybe the acpi_pm counter isn't emulated, and so if the host suspends and resumes and the acpi_pm counter is reset, the guest linux image won't get the suspend/resume message, so it would seem as if the counter went back in time. However, the acpi_pm can only count for ~5 seconds, so any reset would only cause a short term glitch for those 5 seconds and then continue on without issue.

Roland: Does booting w/ "nohz=off clocksource=acpi_pm" also illustrate the issue?
Comment 11 Roland Kletzing 2008-05-01 03:22:22 UTC
i can see no issues after resume with nohz=off and clocksource=acpi_pm !
Comment 12 Roland Kletzing 2008-05-07 12:45:29 UTC
still curious if this is a linux problem or a vmware problem - or a linux problem being triggered more easily in vmware....
Comment 13 Roland Kletzing 2008-05-10 05:26:21 UTC
time freeze happened today after upgrading to 2.6.26-rc1 (suse kotd, head) - but it happened during some kernel module load/unload testing, i.e. i did NOT supsend/resume my host before. 

i assume the freeze is related to detecting tsc to be unstable, at least i see "Clocksource tsc unstable" and it seems the message appeared shortly before the freeze. 
Comment 14 Roland Kletzing 2008-05-10 05:44:45 UTC
i can always reproduce - freeze.
will check vanilla, too.

side note:
as i have seen more than one posting, that the automatic detection of unstable clocksource and the automatic change at runtime may hang a system, wouldn`t it be helpful, to add additional messages for this until it works absolutely reliable ?


we just see:

kernel: Clocksource tsc unstable (delta = ....)  


this doesn`t really attract attention.
most users don`t even know what`s it all about...

what about:

kernel: WARNING! Clocksource tsc unstable (delta = ....)
kernel: You may want to add "clocksource=...." to the kernel commandline to force useage of a specific clocksource.
Comment 15 Roland Kletzing 2008-05-10 05:59:28 UTC
i see that current_clocksource is acpi_pm after that message - but doesn`t change from tsc to acpi_pm print a kernel message that a new clocksource being installed? 
i don`t find such message...
Comment 16 Roland Kletzing 2008-05-10 06:23:58 UTC
now some real weird - looks like some obscure bug to me.

i booted with clocksource=acpi_pm

did my module testing and after i while i got:

Clocksource tsc unstable (delta = -1782476009 ns)

and time froze again.

so, if i have acpi_pm as current_clocksource,  why can detection of unstable tsc freeze my clock then ?

i made sure that acpi_pm actually was actually used by polling current_clocksource in a 1-second loop - and for the whole time i got acpi_pm.

so if kernel detects tsc as unstable when acpi_pm is installed - what does that matter and how can time freeze then ?
Comment 17 Thomas Gleixner 2008-05-10 07:39:06 UTC
> ------- Comment #16 from devzero@web.de  2008-05-10 06:23 -------
> now some real weird - looks like some obscure bug to me.
> 
> i booted with clocksource=acpi_pm
> 
> did my module testing and after i while i got:
> 
> Clocksource tsc unstable (delta = -1782476009 ns)
> 
> and time froze again.
> 
> so, if i have acpi_pm as current_clocksource,  why can detection of unstable
> tsc freeze my clock then ?

It's not the detection of the unstable TSC. There is something else
going wrong. What does your module do ?

Thanks,
	tglx
Comment 18 Roland Kletzing 2008-05-10 07:43:41 UTC
whoops - sorry for that noise.

please ignore my last 4 comments - that issue isn`t related to the original reported one but was a side-effect of my module testing.

if someone can delete these, please be so kind.

i found that module l440gx is the original cause for the reported time freeze. 

it didn`t happen  with kernels <2.6.26rc1 - that made me think that the recent changes made the reported freeze-bug show up differently - but i think it`s an completely different issue and it was recent changes in 2.6.26rc1 wich made the _l440gx_ issue appear differently. 

will file a different bugzilla entry for the l440gx module, as i`m 100% sure now that l440gx _has_ issues. 
Comment 19 Roland Kletzing 2008-05-10 07:48:28 UTC
Created attachment 16094 [details]
vmware issue

anyway - while struggling with the freeze-issue caused by l440gx, i catched this one (see screenshot) exactly once - just after the time-freeze and a soft reset afterwards. this was with "clocksource=acpi_pm notsc" bootparm. unfortunately i cannot reproduce.
Comment 20 Thomas Gleixner 2008-05-10 07:54:21 UTC
> it didn`t happen  with kernels <2.6.26rc1 - that made me think that the
> recent
> changes made the reported freeze-bug show up differently - but i think it`s
> an
> completely different issue and it was recent changes in 2.6.26rc1 wich made
> the
> _l440gx_ issue appear differently. 
> 
> will file a different bugzilla entry for the l440gx module, as i`m 100% sure
> now that l440gx _has_ issues. 

Hmm, that's really weird.
Comment 21 Thomas Gleixner 2008-05-10 07:57:19 UTC
> anyway - while struggling with the freeze-issue caused by l440gx, i catched
> this one (see screenshot) exactly once - just after the time-freeze and a
> soft
> reset afterwards. this was with "clocksource=acpi_pm notsc" bootparm.
> unfortunately i cannot reproduce.

Yeah, that looks like the CPU/chipset emulation of VMware was put into
a weird state.
Comment 22 Roland Kletzing 2008-05-10 11:14:07 UTC
i could test 2.6.19.7 today and i don`t see the issue there, so it`s most likely a regression.
Comment 23 Thomas Gleixner 2008-05-10 11:56:48 UTC
> ------- Comment #22 from devzero@web.de  2008-05-10 11:14 -------
> i could test 2.6.19.7 today and i don`t see the issue there, so it`s most
> likely a regression.

Well, if the kernel now uses a hardware feature which was not used in
2.6.19 then it's hard to blame the kernel for something which does not
work in vmware.
Comment 24 Roland Kletzing 2008-05-11 02:31:18 UTC
ok 2.6.19.7 doesn`t have dynticks - it appeared in 2.6.21 and that kernel shows the problem:

clocksource=tsc nohz=off -> ok
clocksource=tsc nohz=on  -> not ok
clocksource=acpi_pm nohz=off -> ok
clocksource=acpi_pm nohz=on -> not ok

zcat /proc/config.gz |grep HIGH_RES
# CONFIG_HIGH_RES_TIMERS is not set

so it seems dynticks has issues with vmware
Comment 25 Thomas Gleixner 2008-05-11 06:49:07 UTC
> ------- Comment #24 from devzero@web.de  2008-05-11 02:31 -------
> ok 2.6.19.7 doesn`t have dynticks - it appeared in 2.6.21 and that kernel
> shows
> the problem:

And it shows it also with anything >= 2.6.21, right ?

> clocksource=tsc nohz=off -> ok
> clocksource=tsc nohz=on  -> not ok
> clocksource=acpi_pm nohz=off -> ok
> clocksource=acpi_pm nohz=on -> not ok
> 
> zcat /proc/config.gz |grep HIGH_RES
> # CONFIG_HIGH_RES_TIMERS is not set
> 
> so it seems dynticks has issues with vmware

or vmware has issues with dynticks :)

Seriously, this looks like a problem with the one shot mode. Can you
please check /proc/interrupts when the system is in that state,
whether there is any increment on the timer related interrupts ?

Thanks,
	tglx
Comment 26 Roland Kletzing 2008-05-11 08:31:42 UTC
>And it shows it also with anything >= 2.6.21, right ?

correct

>or vmware has issues with dynticks :)

indeed :)
btw, i just found this posting in vmware community: http://communities.vmware.com/message/851454


>Can you please check /proc/interrupts when the system is in that state,
>whether there is any increment on the timer related interrupts ?

there is absolutely no change in that file during the freeze-state.
later on it recovers automagically - time for recover seems proportional to the time spent in suspend state.
Comment 27 Roland Kletzing 2008-05-15 11:35:55 UTC
issue still exists with latest VMware Workstation - e.x.p build-91182
Comment 28 Dan Hecht 2008-05-15 12:10:15 UTC
hi Roland, which Hal dll is your Windows host using?
Comment 29 Roland Kletzing 2008-05-15 12:20:50 UTC
on my system, hal.dll is 5.1.2600.2180 (xpsp_sp2_rtm.040803-2158), internal name halacpi.dll

under control panel ->computer->acpi-pc

i see the following information:

driver files:
C:\WINDOWS\system32\hal.dll  5.1.2600.2180 (xpsp_sp2_rtm.040803-2158)
C:\WINDOWS\system32\ntkrnlpa.exe 5.1.2600.3093 (xpsp_sp2_qfe.070227-2300)
C:\WINDOWS\system32\ntoskrnl.exe 5.1.2600.3093 (xpsp_sp2_qfe.070227-2300)
Comment 30 Roland Kletzing 2008-05-15 12:26:03 UTC
being curious why i have different versions, i checked the other system (which also has the issue)

those 3 files are the same version there:  5.1.2600.3311 (xpsp.080212-0005)
Comment 31 Roland Kletzing 2008-05-16 00:42:02 UTC
http://bugzilla.kernel.org/show_bug.cgi?id=9314
similar issue!?
Comment 32 Dan Hecht 2008-05-20 16:10:30 UTC
Hi Roland,

We've been able to repro this issue.  The problem occurs because the value of the Windows routine QueryPerformanceCounter() is restored to reflect the amount of wallclock time that has passed while the machine was hibernated, rather than picking up where it left off prior to the hibernation.

On Windows hosts, when the physical TSCs are not well behaved, VMware Workstation will use QueryPerformanceCounter() to track the passage of time.  The large jump of QPC() that occurs while the machine is hibernated (and therefore VMware, as an application on the Windows host, is not executing) caused a problem with the VMware timer emulation code.  As expected, the problem is particular to emulating one-shot timers and would cause a one-shot timer to be delayed.

I expect the fix will make the final Workstation 6.5 release.  Thanks for reporting this problem.  As you know, until this is fixed, the workaround is to do a VMware suspend before hibernating the host.

Since this is not a Linux kernel bug, let's go ahead and close this kernel.org bug (I don't see a way for me to do that myself).

Thanks,
Dan
Comment 33 Roland Kletzing 2008-05-20 23:58:41 UTC
Hello Dan, 

thanks very much for the fantastic explanation. I`m glad that the problem has been identified and will be fixed.

regards
roland