Bug 7992
Summary: | Calibration of APIC timer fails on x86_64 SMP due to bugs in apic.c | ||
---|---|---|---|
Product: | ACPI | Reporter: | David P. Reed (dpreed) |
Component: | Config-Other | Assignee: | acpi_config-other |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | andi-bz, bunk, mingo, tglx |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.19 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
File containing patch cited
proper patch for bug simpler patch Simple patch for this bug |
Description
David P. Reed
2007-02-12 13:40:40 UTC
Created attachment 10394 [details]
File containing patch cited
dyntick code completely rewrites this and it should go in soon into .21. Andi Kleen <ak@suse.de> writes: > On Wednesday 14 February 2007 21:59, Andrew Morton wrote: >> >> Sigh, apics. >> >> Could you please send this patch as per >> http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt via email to >> >> discuss@x86-64.org >> linux-kernel@vger.kernel.org >> Andi Kleen <ak@suse.de> >> "Eric W. Biederman" <ebiederm@xmission.com> >> Andrew Morton <akpm@linux-foundation.org> > > dyntick changes this code quite a lot -- good chance that it is already fixed > there Well the patch really doesn't change the current logic only makes the operations unused and introduces a few extra temporaries and comments to be certain all of the steps happen as expected. How far out is the dyntick work? Otherwise this patch (with maybe a little sprucing up) seems to meet the obviously correct criteria. Eric Reply-To: akpm@linux-foundation.org On Wed, 14 Feb 2007 14:16:30 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote: > How far out is the dyntick work? As far as I'm concerned it goes in as soon as we can persuade Linus to merge the x86 pull he was sent yesterday. But Andi has said that he wants to re-review the dynticks/hrtimer and/or convert-x86_64-to-use-generic_time patches, so I've been meaning to ask what to do there.. It would be interesting to know whether 2.6.20-rc6-mm3 fixes David's bug.
> Well the patch really doesn't change the current logic only makes the operations
> unused and introduces a few extra temporaries and comments to be certain all
> of the steps happen as expected.
It makes it interrupt driven. You're just thinking of the preceeding cleanup
patch.
-Andi
On Wednesday 14 February 2007 22:36, Andrew Morton wrote: > On Wed, 14 Feb 2007 14:16:30 -0700 > ebiederm@xmission.com (Eric W. Biederman) wrote: > > > How far out is the dyntick work? > > As far as I'm concerned it goes in as soon as we can persuade Linus to > merge the x86 pull he was sent yesterday. > > But Andi has said that he wants to re-review the dynticks/hrtimer and/or > convert-x86_64-to-use-generic_time patches, so I've been meaning to ask > what to do there.. I did a quick review yesterday -- didn't see anything i didn't like. Perhaps could be done more closely, but I don't want to block it on that. > It would be interesting to know whether 2.6.20-rc6-mm3 fixes David's bug. Yep. -Andi This bug continues in 2.6.21-rc7, and it is getting really annoying to have to patch it each time, I have to say. The patch here continues to work well. What do I need to do to get the patch accepted? Hmm, I still hope dyntick in .22 will obsolete this. Thomas? But I might put it into a late merge for .21. Can you submit a patch with Signed-off-by lines and no debug printks and proper changelog? Created attachment 11260 [details]
proper patch for bug
Per Andi's request, this is updated to be a "proper patch" as I understand it.
I signed off and wrote a changelog, added diffstat, etc.
The printk(KERN_WARNING...) included is a legitimate warning, not a debug
printk, since a wraparound of the apic timer is almost certainly a signal of a
problem calibrating which will cause an issue later. Not serious enough to
panic or crash, since the system will just get the wrong apic timer interrupt
rate as a result.
Created attachment 11262 [details]
simpler patch
Actually does this simpler patch work too? The TSC overflows can be simply
avoided by doing it all in 64bit
Andi - the patch you suggested doesn't seem to quite work reliably on my machine. But I get the general approach, which makes sense, and will tune-up that patch so that it does work, if you prefer it. The previous "proper patch" does work, though, so you could submit that one as is. I'm just eager to see this issue done with. (And I'm looking forward to dyntick when it is merged). Created attachment 11280 [details]
Simple patch for this bug
Andi - as noted above, this is now a simpler patch, using 64-bit TSC reads as
you suggested, and fixing the buggy loop termination (apic timer counts down,
not up). I've put it in proper form.
I've tested it on my machine, and it works fine as a patch to the just-released
2.6.21, and should work on prior kernels as well, if anyone wants to backport
the fix to earlier kernels.
- (apic - apic_start) < TICK_COUNT); + (apic_start - apic) < TICK_COUNT); That change doesn't look correct. I think i will stay with the patch i posted + the change for initializing the counter higher. Also probably similar change to i386 Nah actually the change was correct since apic counts down The patch from this bug was included in 2.6.22. |