Bug 6296

Summary: Time stops on IBM Netvista 8317
Product: Timers Reporter: Andy Duplain (andy)
Component: gettimeofdayAssignee: john stultz (john.stultz)
Status: CLOSED CODE_FIX    
Severity: normal CC: crook, mingo, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16 Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel logging
Kernel log with "noapic" boot option
Patch to fix problem in 2.6.16 and 2.6.16.1
x86 ioapic timer ACK fix

Description Andy Duplain 2006-03-28 05:28:36 UTC
Most recent kernel where this bug did not occur: 2.6.15 (with correcting 
patches)
Distribution: Debian Etch
Hardware Environment: IBM Netvista 8317
Software Environment: Linux 2.6.16
Problem Description: Time slows down and stops when using TSC timesource.

Steps to reproduce:
Boot vanilla kernel 2.6.16
Wait 12-24 hours.
Note time difference from actual time.
Comment 1 Andy Duplain 2006-03-28 05:38:36 UTC
This bug is the same as 2544, but the patch from Maciej W. Rozycki no longer 
works under 2.6.16.  I have the latest BIOS installed - dated July 2004.
Comment 2 Andy Duplain 2006-03-29 01:01:51 UTC
Created attachment 7699 [details]
Kernel logging

This is my kernel log file during boot-up.  No errors are reported, however.
Comment 3 john stultz 2006-03-29 10:45:48 UTC
Hmm. This is a uniprocessor system without HT? That is different then other
similar reports. Does booting w/ noapic avoid the issue?
Comment 4 Andy Duplain 2006-03-29 11:48:16 UTC
I will try it and let you know John.
Comment 5 Andy Duplain 2006-03-29 12:06:29 UTC
Created attachment 7711 [details]
Kernel log with "noapic" boot option
Comment 6 Andy Duplain 2006-03-30 06:01:52 UTC
The system has been running for 15 hours now.  No errors reported and time is 
keeping well (NTP corrects by 1.0 - 1.5 sec/hr - normally it's 0.15 sec/hr - so 
no big change there).  System is very sluggish though and atsar who absolutetly 
0 zero cpu usage over that period, even though there are several heavy CPU 
processes running. Therefore the system seems better with the "noapic" kernel 
boot flag than without.
Comment 7 Andy Duplain 2006-03-31 01:45:13 UTC
Created attachment 7729 [details]
Patch to fix problem in 2.6.16 and 2.6.16.1

This is a modified version of the original patch from Maciej W. Rozycki for
2.6.16 (and .1).  My system has been running without problem for 14 hours now
with it applied.
Comment 8 Andy Crook 2006-10-02 00:39:05 UTC
The same problem on NetVista 8309 (latest BIOS) with 2.6.17.*
Patch fixed the problem for me.
Comment 9 john stultz 2006-10-18 13:15:16 UTC
I suspect this issue still exists, but I'm curious if the behavior has changed
w/ 2.6.18 and greater?
Comment 10 Andy Crook 2007-01-11 01:31:53 UTC
With unpatched 2.6.18 and 2.6.19 time still stopped occasionally.
Hopefully patch still helps fixing the problem on my NetVista.
Comment 11 Natalie Protasevich 2007-07-07 12:55:04 UTC
Have you tried running with later kernels, can you confirm please that the problem has been resolved?
Thanks.
Comment 12 Andy Crook 2007-07-17 01:06:46 UTC
Seems like 2.6.21.6 runs OK on my NetVista for 2 days now.

I've changed my NetVista 8309 for 8319 recently, but still had to apply the patch to 2.6.20 kernel.

So I'll wait 2-3 days more and let you know if everything is good now.
Comment 13 Ingo Molnar 2007-11-18 07:35:51 UTC
This patch is still not upstream - but we are now defaulting to NMI watchdog disabled, which might hide the bug.

I've ported the patch to arch/x86 and we've added it to the x86 queue of patches.
Comment 14 Ingo Molnar 2007-11-18 07:37:08 UTC
Created attachment 13597 [details]
x86 ioapic timer ACK fix

Attached the 2.6.24-rc3 ported patch.