As reported on lkml by niclas.gustafsson@codesense.com Hardware Environment: IBM x305 Software Environment: Linux 2.6.5 kernel Problem Description: After running for a while (~12-24 hours) time seemingly stops/slows to a crawl, jiffies is incremented very slowly and the following is output into dmesg. Losing too many ticks! TSC cannot be used as a timesource. <4> Possible reasons for this are: You're running with Speedstep, You don't have DMA enabled for your hard disk (see hdparm), Incorrect TSC synchronization on an SMP system (see dmesg). Steps to reproduce: 1. Boot 2.6.5 on IBM x305 2. watch time of day 3. wait 24 hours
I've reproduced this issue in our labs. It seems that after 12-24 hours, the PIT suddenly stops sending interrupts at HZ frequency, and instead drops to something like once every 2 seconds. The TSC time source code sees this drop and assume's its running on a cpufreq changing cpu, dropping back to the PIT as a timesource. In my tests, once we fall back to the PIT, after some period of time the PIT returns to sending interrupts at HZ frequency. So it looks like the issue is somewhat spurratic. Possibly BIOS SMM/SMI related.
Created attachment 2626 [details] Proposed fix by Maciej W. Rozycki Email message and patch sent to lkml by Maciej W. Rozycki with a proposed fix (currently under testing).
Maciej's fix appears to resolve the issue (according to original bug reporter and also verified in the lab here). I'm following up with IBM hardware folks to see if we can get a BIOS fix for the issue as well. Maciej's workaround needs to be propagated into mainline and distros to handle existing systems w/ this issue.
*** Bug 2964 has been marked as a duplicate of this bug. ***
Hi, We have one system (firewall) affected by this issue (x305, Linux 2.6.3). This makes our system hang or iptables hash tables to get full so that the firewall doesn't accept any new network connection. Quite funny for a firewall, to have "DROP ALL" as only policy, no ? ;-) Do you have any news about the x305 firmware fix? The latest release seams to be 1.65, out in feb 2004, and doesn't contain the fix. Regards, Fabien
I have this problem on a Netvista 8317 with the latest BIOS (version 28KT26A). I have tried kernel version 2.6.8 and 2.6.9-rc1 using timesource tsc and pmtmr. When running 2.6.8 with tsc, I received the "losing too many ticks" kernel message, however I have yet to receive that under 2.6.9-rc1 with pmtmr, even though time has stopped.
Andy: Does Maciej's fix solve the problem for you? (see comment #2 for the attachment)
I just ran into a similar problem on a 305 too. It hangs partway through the init scripts on boot. If you leave it sitting for about a day, eventually it will come back with the losing too many ticks message. I tried the patch (against 2.6.9-rc1) but it did not fix anything for me.
Paul: Does this happen every boot? Or just occasionally?
John: Yes Maciej's does fix the problem. I applied it to 2.6.9-rc1 and it's been running now for 4 days without problem.
*** Bug 3693 has been marked as a duplicate of this bug. ***
I've been told a BIOS update fixes what appears to be the same issue on ThinkCentre A50/M50 systems. I'd be interested if the folks seeing this problem could confirm/deny if the BIOS update helps. thanks
I believe the BIOS fix resolves this issue. Please reopen if this is not the case.