Bug 5366
Summary: | synchronize_tsc_bp can zero the TSC during a lost ticks measurement | ||
---|---|---|---|
Product: | Timers | Reporter: | Tim Mann (mann) |
Component: | Other | Assignee: | john stultz (john.stultz) |
Status: | CLOSED CODE_FIX | ||
Severity: | low | CC: | andi-bz, weissman |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.11 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | [UNTESTED] Use flag to avoid early lost tick compensation |
Description
Tim Mann
2005-10-04 18:16:37 UTC
p.s. Boris Weissman found this bug; the detailed description above is due to him. So to summarize, and try to verify I understand the issue: Since timer interrupts occur before the TSCs are synced at boot, its possible to get large TSC offsets before and for one tick after the TSCs have been synchronized. This causes a larger jiffies tick count at boot then there should be. This appears to be a fairly small window before init, so I'm not sure how the issue manifests itself. Further I'm not sure I understand the claim: "This in turn can cause timeouts to go off before they should." Could these issues be clarified a bit? Thanks! Yes, you understand the bug correctly. We've seen the issue manifest itself because some timers do get set before the TSC sync-up, so they can go off spuriously. I don't claim to know how many instances there are of timers that get set that early, but http://bugme.osdl.org/show_bug.cgi?id=5397 gives one example: The blank_screen_t timer gets scheduled early, and we've seen it go off prematurely because of this bug. That would be fairly harmless except for the additional bug (which bug 5397 describes) that the timer callback uses workqueues, which aren't initialized yet at the point where the timer is set. It's bogus - timer interrupts only occur on the BP, which is synced by definition. No, it's not bogus. Syncing the TSCs is implemented by setting them all to 0, not by setting the APs to the BP's value. So the BP's TSC does change when the TSCs are synced up. (You can't do the latter because on older processors, it's impossible to write a value to the TSC with a nonzero value in the high-order 32 bits. That is, you can write values only between 0x0000000000000000 and 0x00000000ffffffff.) That was only before 2.6.12, but hasn't been true since then. With the CPU hotplug changes a new TSC sync algorithm came too that doesn't do this -> INVALID Andi, your patch in 2.6.12 was only for x86_64, but I'm talking about i386. The i386 arch still has the old algorithm that zeros all the TSCs. It would be possible to port the new algorithm to i386, but you'd have to insert a round of zeroing all the TSCs first, in case the BP's existing TSC value happens to be more than 0x00000000ffffffff. So the bug I reported here would still exist. p.s. My second paragraph above isn't quite right, but you probably get the point -- because of the limitation in writing the TSC, in general i386 will still sometimes need to change the BP's TSC value when syncing up with the APs. By the way, the limitation to writing values <= 0x00000000ffffffff exists in all Intel processors older than P4 model 3 and all AMD processors older than Opteron. That's funny, just today Jim Houston has posted a patch to lkml converting i386 to use the x86-64 TSC sync method (look for "improved boot time TSC synchronization"). Tim, might you be able to test Jim's patch to see if it resolves this issue? Unlike the ia64 and x86_64 versions, Jim's patch for i386 zeros the TSC on the BP (of necessity, as I explained above), so it can't fix this bug. Quoting a few lines (dropping some - lines): + * Writing the TSC resets the upper 32-bits, so we need to be careful + * that all of the cpus can be synchronized before we overflow the + * 32-bit count. ... +/* + * The boot processor set its own TSC to zero and then gives each + * slave processor the chance to synchronize itself. + */ +static void __init synchronize_tsc_bp (void) +{ + unsigned int tsc_low, tsc_high, error; + int cpu; + printk("start TSC synchronization\n"); + write_tsc(0, 0); Ah, good point. Sorry for that goose hunt. Basically it seems we need a flag to disable lost tick compensation until after the TSCs are synced. Does this sound reasonable? I'm a bit busy at the moment, but I'll try to come up with something soon. Yes, that sounds quite reasonable. Thanks. Created attachment 6872 [details]
[UNTESTED] Use flag to avoid early lost tick compensation
Hey Tim, Sorry for the very slow response on this one(things are finally
slowing down a bit for the holidays). Here's a untested patch that I think
should resolve this. Let me know if you continue to see the problem.
Patch has been sent to akpm and lkml. Patch was included in Linus' -git tree today. Closing. I don't have an easy reproducer for the bug, but the code change looks good. Sorry for not commenting sooner. Just for reference (in case Boris cares), 2.6.16 is the first Linus release with the fix. |