Most recent kernel where this bug did not occur: 2.6.15 (with correcting patches) Distribution: Debian Etch Hardware Environment: IBM Netvista 8317 Software Environment: Linux 2.6.16 Problem Description: Time slows down and stops when using TSC timesource. Steps to reproduce: Boot vanilla kernel 2.6.16 Wait 12-24 hours. Note time difference from actual time.
This bug is the same as 2544, but the patch from Maciej W. Rozycki no longer works under 2.6.16. I have the latest BIOS installed - dated July 2004.
Created attachment 7699 [details] Kernel logging This is my kernel log file during boot-up. No errors are reported, however.
Hmm. This is a uniprocessor system without HT? That is different then other similar reports. Does booting w/ noapic avoid the issue?
I will try it and let you know John.
Created attachment 7711 [details] Kernel log with "noapic" boot option
The system has been running for 15 hours now. No errors reported and time is keeping well (NTP corrects by 1.0 - 1.5 sec/hr - normally it's 0.15 sec/hr - so no big change there). System is very sluggish though and atsar who absolutetly 0 zero cpu usage over that period, even though there are several heavy CPU processes running. Therefore the system seems better with the "noapic" kernel boot flag than without.
Created attachment 7729 [details] Patch to fix problem in 2.6.16 and 2.6.16.1 This is a modified version of the original patch from Maciej W. Rozycki for 2.6.16 (and .1). My system has been running without problem for 14 hours now with it applied.
The same problem on NetVista 8309 (latest BIOS) with 2.6.17.* Patch fixed the problem for me.
I suspect this issue still exists, but I'm curious if the behavior has changed w/ 2.6.18 and greater?
With unpatched 2.6.18 and 2.6.19 time still stopped occasionally. Hopefully patch still helps fixing the problem on my NetVista.
Have you tried running with later kernels, can you confirm please that the problem has been resolved? Thanks.
Seems like 2.6.21.6 runs OK on my NetVista for 2 days now. I've changed my NetVista 8309 for 8319 recently, but still had to apply the patch to 2.6.20 kernel. So I'll wait 2-3 days more and let you know if everything is good now.
This patch is still not upstream - but we are now defaulting to NMI watchdog disabled, which might hide the bug. I've ported the patch to arch/x86 and we've added it to the x86 queue of patches.
Created attachment 13597 [details] x86 ioapic timer ACK fix Attached the 2.6.24-rc3 ported patch.