|Summary:||Clock is running too fast^Wslow using acpi_pm clocksource|
|Component:||gettimeofday||Assignee:||john stultz (john.stultz)|
|Severity:||normal||CC:||bunk, macro, mingo, tglx|
|Bug Depends on:|
Dump APIC and PIC state
dmesg with patch-2.6.26-rc1-20080505-all-pic-dump-3
Always try the timer both directly and through the 8259A
Description tosn00j02 2008-01-21 12:38:09 UTC
Commit d4d25deca49ec2527a634557bf5a6cf449f85deb broke acpi_pm as a clocksource on my Shuttle SK41G. It makes the clock run a couple of seconds too fast per minute. Reverting that commit fixes the problem.
Comment 1 Anonymous Emailer 2008-01-21 12:49:45 UTC
Reply-To: email@example.com On Mon, 21 Jan 2008 12:38:09 -0800 (PST) firstname.lastname@example.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9791 > > Summary: Clock is running too fast using acpi_pm clocksource > Product: Timers > Version: 2.5 > KernelVersion: 2.6.24-rc8-ga7da60f4 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: gettimeofday > AssignedTo: email@example.com > ReportedBy: firstname.lastname@example.org > > > Commit d4d25deca49ec2527a634557bf5a6cf449f85deb broke acpi_pm as a > clocksource > on my Shuttle SK41G. It makes the clock run a couple of seconds too fast per > minute. > > Reverting that commit fixes the problem. > So I assume this was a post-2.6.23 regression?
Comment 2 john stultz 2008-01-21 12:53:04 UTC
Please attach dmesg output. Here is the git link to the patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d4d25deca49ec2527a634557bf5a6cf449f85deb
Comment 3 tosn00j02 2008-01-21 13:06:58 UTC
Yes, post-2.6.23 regression.
Comment 5 Thomas Gleixner 2008-01-22 01:46:51 UTC
On Mon, 21 Jan 2008, Andrew Morton wrote: > > ReportedBy: email@example.com > > > > Commit d4d25deca49ec2527a634557bf5a6cf449f85deb broke acpi_pm as a > > clocksource on my Shuttle SK41G. It makes the clock run a couple > > of seconds too fast per minute. > > > > Reverting that commit fixes the problem. > > > > So I assume this was a post-2.6.23 regression? Yes it is a post-2.6.23 regression. This commit was done to fix long standing bugzilla entries. We definitely need to revert it for .24 and find a solution for .25 which fixes the old bugs and does not introduce new ones. Linus, can you please pull the revert from: ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git Thanks, tglx
Comment 6 Adrian Bunk 2008-01-22 20:38:31 UTC
Reverted in commit 4960c9df1407c2723459c60ff13e6efe0c209c62.
Comment 7 tosn00j02 2008-05-03 05:09:47 UTC
Commit 6e908947b4995bc0e551a8257c586d5c3e428201 breaks this again.
Comment 8 Rafael J. Wysocki 2008-05-16 14:32:25 UTC
Regressions list annotation: Date : 2008-05-03 05:09
Comment 9 Maciej W. Rozycki 2008-05-16 19:48:03 UTC
Created attachment 16168 [details] Dump APIC and PIC state (In reply to comment #7) > Commit 6e908947b4995bc0e551a8257c586d5c3e428201 breaks this again. Can you please try the patch attached, which makes an elaborate dump of all registers of all the interrupt controllers to be output to the bootstrap log, and post the result? I find the problem with your system strange as the lone effect of reverting the change is disabling OCW3 poll cycles to the 8259A. But your system has its 8254 timer interrupt routed directly to an I/O APIC and therefore timer interrupts should be unaffected no matter what the change does to the 8259A as the chip is not in the timer interrupt path on this system. My first suspect is some interrupt routing and/or masking misconfiguration which makes the 8259A post interrupts to the CPU somehow. Maciej
Comment 10 Thomas Gleixner 2008-05-17 02:18:22 UTC
> I find the problem with your system strange as the lone effect of > reverting the change is disabling OCW3 poll cycles to the 8259A. But > your system has its 8254 timer interrupt routed directly to an I/O > APIC and therefore timer interrupts should be unaffected no matter > what the change does to the 8259A as the chip is not in the timer > interrupt path on this system. My first suspect is some interrupt > routing and/or masking misconfiguration which makes the 8259A post > interrupts to the CPU somehow. It's even more odd because it affects the acpi_pm timer, which is read only and neither related to the 8254 nor to the 8259. The time is solely retrieved from the acpi_pm timer and should not be affected by timer interrupts going too fast. Does adding "nohz=off" to the kernel command line change anything? Thanks, tglx
Comment 11 tosn00j02 2008-05-17 03:02:39 UTC
nohz=off makes the clock stable for 2.6.26-rc2-00433-gf26a398. Furthermore, the drift was never as big with 2.6.26-rc? as it was with 2.6.24-rc?, but big enough for ntpd not to keep it stable. Will try your patch next (without nohz=off).
Comment 12 tosn00j02 2008-05-17 03:25:25 UTC
Created attachment 16171 [details] dmesg with patch-2.6.26-rc1-20080505-all-pic-dump-3
Comment 13 Thomas Gleixner 2008-05-17 08:31:22 UTC
> nohz=off makes the clock stable for 2.6.26-rc2-00433-gf26a398. > > Furthermore, the drift was never as big with 2.6.26-rc? as it was with > 2.6.24-rc?, but big enough for ntpd not to keep it stable. Hmm. That's odd. John, any idea how to decode this ? Thanks, tglx
Comment 14 tosn00j02 2008-05-17 12:02:11 UTC
To correct myself, the clock is about 1 second too _slow_ per minute, not too fast. I went back to 2.6.24-rc? and that was indeed slow too. Sorry for the confusion.
Comment 15 Maciej W. Rozycki 2008-05-17 12:50:18 UTC
(In reply to comment #14) > To correct myself, the clock is about 1 second too _slow_ per minute, not too > fast. I went back to 2.6.24-rc? and that was indeed slow too. Well, that is an important update. Such a symptom usually means a timer tick is lost somewhere, typically because interrupts are held disabled for too long. Perhaps the NMI watchdog could be used to track down this place now that we have more choice about timer sources than a few years ago. Your dump of IC registers looks sane. To rule out the influence of the output of the 8259A (the PIT interrupt is enabled in the chip), could you please try the "disable_8254_timer" option of the kernel? It should change the PIC IMR to ffff.
Comment 16 tosn00j02 2008-05-17 13:44:23 UTC
> could you please try the "disable_8254_timer" option of the kernel? > It should change the PIC IMR to ffff. It did, and the problem is gone now.
Comment 17 Maciej W. Rozycki 2008-05-17 14:07:49 UTC
Hmm, so now we have a solution that works, though we do not know why. I smell some SMM interaction. Anybody who could shed some light here is welcome to speak -- without the system at hands it is quite hard to figure out what is going on there. Anyway, the kernel option and the associated code is unnecessary as there is no need to unmask the timer interrupt in the PIC in the first place unless you want to make use of the I/O APIC NMI watchdog. I shall cook a reasonable patch shortly so that the option is not needed for you. I think the unmasking has been there since the day one and I always felt a bit uneasy about doing that unnecessarily. Whoever did this "disable_8254_timer" option must have not been aware that there is really no need for this piece of code here at all. ;-)
Comment 18 Maciej W. Rozycki 2008-05-17 14:27:54 UTC
Self-correction -- for the record. The interrupt in the PIC actually had to be unmasked for an obscure piece of code that used to live in do_slow_gettimeoffset() that did a read from the PIC to check for the value of the IRR bit corresponding to the timer interrupt. The code is now gone together with the whole function, so the unmasking can be removed from here.
Comment 19 Thomas Gleixner 2008-05-17 15:52:12 UTC
> To correct myself, the clock is about 1 second too _slow_ per minute, not too > fast. I went back to 2.6.24-rc? and that was indeed slow too. > > Sorry for the confusion. That makes a lot more sense.
Comment 20 Maciej W. Rozycki 2008-05-17 18:32:55 UTC
Created attachment 16180 [details] Always try the timer both directly and through the 8259A This is a change to clean up the timer setup through the I/O APIC a little bit. If a single candidate I/O APIC interrupt input pin is found only (either in the MP table or in the ACPI table), then it is always tried directly first and then, if failed, through the 8259A. This is to take care of broken BIOSes which report an ExtINTA I/O APIC input (typically the pin #0 of the APIC #0) as a native timer interrupt input. Care is taken that the 8259A has its timer IRQ input unmasked only if known to be needed. Please try this patch and see whether it fixes the clock being late for you. Note with this set of changes the "disable_8254_timer" option is now gone.
Comment 21 tosn00j02 2008-05-18 02:51:53 UTC
Works perfectly. Thank you very much.
Comment 22 Maciej W. Rozycki 2008-05-18 10:25:20 UTC
Good! It's on the way to upstream now (split into two as it comprises close, but functionally separate changes), but I suspect it will take a while to get integrated as it tackles historically one of the most fragile pieces of code dealing with the APIC and therefore needs a considerable amount of testing. Thanks for your feedback.
Comment 23 Adrian Bunk 2008-06-22 02:24:12 UTC
Handled-By : Maciej W. Rozycki <firstname.lastname@example.org> Patch : http://bugzilla.kernel.org/attachment.cgi?id=16180
Comment 24 Adrian Bunk 2008-06-22 02:24:51 UTC
Caused by: commit 6e908947b4995bc0e551a8257c586d5c3e428201 Author: Ingo Molnar <email@example.com> Date: Fri Mar 21 14:32:36 2008 +0100 x86: fix ioapic bug again un-revert: commit 4960c9df1407c2723459c60ff13e6efe0c209c62 Author: Thomas Gleixner <firstname.lastname@example.org> Date: Tue Jan 22 10:23:01 2008 +0100 Revert "x86: fix NMI watchdog & 'stopped time' problem" This reverts commit d4d25deca49ec2527a634557bf5a6cf449f85deb. needs a proper fix though ... Signed-off-by: Ingo Molnar <email@example.com>
Comment 25 john stultz 2008-07-10 15:12:40 UTC
Is this issue now resolved or not?
Comment 26 tosn00j02 2008-07-31 10:06:14 UTC
It is resolved. 2.6.27-rc1, which has the attached patch, works perfectly. Thanks again.