Bug 2544

Summary: Time seemingly stops on IBM x305
Product: Timers Reporter: john stultz (john.stultz)
Component: gettimeofdayAssignee: john stultz (john.stultz)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: gone, lcm, niclas.gustafsson, s.prasad, shishz, spamalltheway
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.5 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Proposed fix by Maciej W. Rozycki

Description john stultz 2004-04-19 09:32:11 UTC
As reported on lkml by niclas.gustafsson@codesense.com 
 
Hardware Environment: IBM x305 
Software Environment: Linux 2.6.5 kernel 
Problem Description: 
After running for a while (~12-24 hours) time seemingly stops/slows to a 
crawl, jiffies is incremented very slowly and the following is output into 
dmesg. 
	Losing too many ticks! 
	TSC cannot be used as a timesource. <4> Possible reasons for this are: 
	   You're running with Speedstep, 
	   You don't have DMA enabled for your hard disk (see hdparm), 
	   Incorrect TSC synchronization on an SMP system (see dmesg). 
 
Steps to reproduce: 
1. Boot 2.6.5 on IBM x305 
2. watch time of day 
3. wait 24 hours
Comment 1 john stultz 2004-04-19 09:36:23 UTC
I've reproduced this issue in our labs. It seems that after 12-24 hours, the 
PIT suddenly stops sending interrupts at HZ frequency, and instead drops to 
something like once every 2 seconds. The TSC time source code sees this drop 
and assume's its running on a cpufreq changing cpu, dropping back to the PIT 
as a timesource.  
 
In my tests, once we fall back to the PIT, after some period of time the PIT 
returns to sending interrupts at HZ frequency. So it looks like the issue is 
somewhat spurratic. Possibly BIOS SMM/SMI related. 
Comment 2 john stultz 2004-04-19 09:38:50 UTC
Created attachment 2626 [details]
Proposed fix by Maciej W. Rozycki

Email message and patch sent to lkml by Maciej W. Rozycki with a proposed fix
(currently under testing).
Comment 3 john stultz 2004-04-20 14:02:15 UTC
Maciej's fix appears to resolve the issue (according to original bug reporter 
and also verified in the lab here). I'm following up with IBM hardware folks 
to see if we can get a BIOS fix for the issue as well. Maciej's workaround 
needs to be propagated into mainline and distros to handle existing systems w/ 
this issue. 
Comment 4 john stultz 2004-06-28 16:54:59 UTC
*** Bug 2964 has been marked as a duplicate of this bug. ***
Comment 5 Fabien Chevalier 2004-07-22 11:32:01 UTC
Hi,

We have one system (firewall) affected by this issue (x305, Linux 2.6.3). This
makes our system hang or iptables hash tables to get full so that the firewall
doesn't accept any new network connection. Quite funny for a firewall, to have
"DROP ALL" as only policy, no ? ;-)

Do you have any news about the x305 firmware fix?
The latest release seams to be 1.65, out in feb 2004, and doesn't contain the fix.

Regards,

Fabien
Comment 6 Andy Duplain 2004-08-26 06:20:32 UTC
I have this problem on a Netvista 8317 with the latest BIOS (version 
28KT26A).  I have tried kernel version 2.6.8 and 2.6.9-rc1 using timesource 
tsc and pmtmr.  When running 2.6.8 with tsc, I received the "losing too many 
ticks" kernel message, however I have yet to receive that under 2.6.9-rc1 with 
pmtmr, even though time has stopped.
Comment 7 john stultz 2004-08-26 10:28:46 UTC
Andy: Does Maciej's fix solve the problem for you? (see comment #2 for the 
attachment) 
 
 
Comment 8 Paul Larson 2004-08-26 13:56:27 UTC
I just ran into a similar problem on a 305 too.  It hangs partway through the
init scripts on boot.  If you leave it sitting for about a day, eventually it
will come back with the losing too many ticks message.  I tried the patch
(against 2.6.9-rc1) but it did not fix anything for me.
Comment 9 john stultz 2004-08-26 14:02:27 UTC
Paul: Does this happen every boot? Or just occasionally? 
Comment 10 Andy Duplain 2004-08-31 01:43:47 UTC
John: Yes Maciej's does fix the problem.  I applied it to 2.6.9-rc1 and it's 
been running now for 4 days without problem.
Comment 11 john stultz 2005-01-19 10:17:47 UTC
*** Bug 3693 has been marked as a duplicate of this bug. ***
Comment 12 john stultz 2005-04-01 11:15:29 UTC
I've been told a BIOS update fixes what appears to be the same issue on
ThinkCentre A50/M50 systems. I'd be interested if the folks seeing this problem
could confirm/deny if the BIOS update helps.

thanks
Comment 13 john stultz 2005-07-27 15:28:39 UTC
I believe the BIOS fix resolves this issue. Please reopen if this is not the case.