Bug 11153
Summary: | The HPET emulation of rtc UIE interrupt is badly broken. | ||
---|---|---|---|
Product: | Timers | Reporter: | W Unruh (unruh) |
Component: | Realtime Clock | Assignee: | timers_realtime-clock |
Status: | RESOLVED OBSOLETE | ||
Severity: | high | CC: | alan, dbrownell, goneri, ossi, serge.bets |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.26 with the patches from bug 11112 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | 11312 | ||
Bug Blocks: | |||
Attachments: |
updated test program
photo of test output before laptop wedged switch rtc-cmos to dispatch through ACPI and mostly ignore HPET updated patch, works around ACPI buglet |
Description
W Unruh
2008-07-23 16:28:04 UTC
I have now run the same test with the same kernel on a different machine (D915GAG Intel motherboard) and here I saw no anomalous results. Ie, the bug seems to be hardware dependent. Is there any way that the hpet can be turned off entirely and allow the rtc interrupts to be visible to the rtc-cmos driver? Created attachment 16958 [details] updated test program > Read using select(2) on /dev/rtc: Program needs to be fixed to give the right RTC path ... /dev/rtc0 is usually correct, /dev/rtc rarely is!! I also changed the logic of how the "interesting" ticks are identified, so it ignores deltas within scope of the 64 Hz sampling. And made sure the left column mostly lines up. One small point to remember is that the HPET emulation is normally driven at 64 Hz, so these measurements have only 1000/64 msecs of precision ... call it 16 msecs. That large a difference is noise. But most of these are larger differences, suggesting that something is delaying either (a) handling the IRQ, or maybe (b) getting the notification up to userspace. Plus of course (c) something else. Since this phenomenon has not been observed with real RTC interrupts (on boards with no HPET), I'm thinking (b) is a non-issue, leaving (a) and (c) as the likely causes. If IRQ delays happened at the times marked "#" that would explain most of these cases nicely. These delays seem long for SMI, but I suspect a pattern might be observable... # 130 msec delay (?) > 1 0.782692 0190 > -3 * 1.652687 0190 > -2 2.652684 0190 > -1 3.652682 0190 > 0 4.652685 0190 > 1 5.652683 0190 > 2 6.652683 0190 # 400 msec delay > -2 * 8.152767 0190 > -1 8.652708 0190 > 0 9.652697 0190 > 1 10.652695 0190 > 2 11.652697 0190 > 3 12.652641 0190 # 500 msec delay > -1 * 14.152733 0190 > 0 14.652642 0190 > 1 15.652642 0190 > 2 16.652792 0190 > 3 17.652640 0190 ... this is odd: 90 msec *early* > -1 * 18.562086 0190 > 0 19.562064 0190 > 1 20.562039 0190 > 2 21.562015 0190 > 3 22.561989 0190 > 4 23.561966 0190 > 5 24.561940 0190 Now, that 90 msec is VERY odd. If it were even as much as 16 msec it could be explained easily by the 64 Hz measurement precision. a) On the same machine that generated those horrible errors, if I loaded the kernel with the nohpet kernel option, everything worked just fine (127 different runs and nary a problem). (strangely there is still a /dev/hpet file.) b) I am seriously perturbed by your apparent claim that the hpet only reports the interrupt in units of 1/64th of a second. That is really horrible for any kind of high precision timing! With ntp/chrony controlling the system clock to microsecond accuracy, to have to make do with 16msec accuracy on the rtc is terrible. (why is it not called a LPET?) Is there any reason at all why anyone should enable HPET on their systems? Created attachment 16959 [details]
photo of test output before laptop wedged
OK, two more odd data points. With that new test program, I have twice observed a system lockup (!!), details currently unknown. This is running a 2.6.26 kernel with my patches, Ubuntu latest, and X11 on a Core2 laptop:
- One time I ran the test program in one window with a (partial) kernel rebuild in another ... no "*" alerts got to the screen. I had to run the test a lot since it was acting just fine. (Until the lockup!)
- Another time I ran *just* that test program, and it must have done something very "right" because it only seemed to see oddly mistimed "*" alerts ... the attached photo shows everything up to the instant it wedged.
> b) I am seriously perturbed by your apparent claim that the hpet only reports > the interrupt in units of 1/64th of a second. That is really horrible for any > kind of high precision timing! Agreed. It's the RTC_UIE emulation which sucks that badly ... although given the issues we've observed, I wonder what else may be going on. In other roles, HPET works nicely. > With ntp/chrony controlling the system clock to microsecond accuracy, to > have to make do with 16msec accuracy on the rtc is terrible. I always call the "legacy replacement" IRQ mode "broken mode", since it prevents effective use of the RTC IRQs ... all of them, including in this case the update IRQs with their 30.518 usec accuracy. > (why is it not called a LPET?) If it weren't interfering with the RTC, it would make a nice system timebase. But ... it's interfering. > Is there any reason at all why anyone > should enable HPET on their systems? It's truly a strange bit of hardware design, which you can evidently blame at least in part on Microsoft. One thing you can get out of HPET with no trouble at all is a 10+ MHz monotonic counter ... exactly what you want for a good clock source, you get very precise event timestamps. But right now Linux can't get that without also kicking in its interrupts, and that's what causes the problems for anyone trying to leverage RTC capabilities. If you didn't care about RTCs you'd be happy to have a clockevent source that could give you either periodic or oneshot IRQs, suitable for NO_HZ operation (saving power) and also high precision timers (IRQ when that fast counter matches your selected value, better than 100 nsec precision). The good bits are complicated by the way most BIOS programmers haven't bothered to set up routing so the non-broken HPET IRQ mode can be used. Sigh. If the "sane" mode were usable, the system timekeeping could happily use it for a good tick source -- and NO_HZ mode -- and forget about it otherwise. Just ran a much longer test of that bad machine -- ran 1000 sequences of the rtc-uie altered program on that bad machine (the one that gave those terrible results) and no problems. Looks like booting the kernel with nohpet is at least one answer to problems with the hpet rtc problems. OK, I have been reading the hpet.c code and the HPET timer specifications,
and am confused.
a) The Specifications state that the HPET disables the RTC IRQ but only for
the PIE. It states that the Alarm IRQ from the RTC is still delivered by
the SCI
"BIOS sets LegacyReplacement Route bit (LEG_RT_CNF)
> LegacyReplacement IRQ Routing Enabled for Comparator_1
If present, RTC Periodic Interrupt Function will not cause any
interrupts.
RTC Alarm function (still required) will signal interrupts via SCI
RTC CMOS function (still required) will consume i/o range
Now this says nothing about the UIE, but I would expect that that would be
an alarm ratehr than a Periodic interrupt ( but one could make arguments
either way).
However, there is also the Legacy Replacement Route bit which the OS
(linux) can set or unset. This would enable or disable the Legacy
Replacement. Ie, it would seem that a much more sensible default for the
Linux kernel should be to disable this bit, so that Legacy Replacement does
NOT occur, unless there is no RTC on the motherboard, in which case theycan
enable this. Or they (the Linux kernel writers responsible for hpet.c) can
include another kernel option, hpetlegacy or nohpetlegacy to enable or
disable the Legacy Replacement bit. That way people whose have hpet but no,
or a broken RTC can use the Legacy Replacement mode, and those (almost all
motherboards?) who have a working rtc can run with a disabled kernel.
Or a subroutine to switch on and off this bit could be called by the
rtc-cmos routine to switch off the legacy replacement bit. (There is
already and hpet_enable_legacy_int and we would need a
hpet_disable_legacy_int routine that the rtc code could call.
That kludge of having the RTC UIE mimiced by using HPET counter 1 in 64 Hz
Periodic mode
and reading the rtc each time to see if it has changed, could be used only
in the case of totally brain dead rtcs.
I would try writing this, but since my knowledge of both the HPET, and the
linux kernel, and of comptentent coding skills precludes this.
I may also be talking nonsense due to my verypoor understanding of the HPET
and of the kernel code.
Reply-To: david-b@pacbell.net > and am confused. > a) The Specifications state that the HPET disables the RTC IRQ but only for > the PIE. It states that the Alarm IRQ from the RTC is still delivered by > the SCI Linux doesn't/can't/mustn't intercept SCI though. Agreed that the spec seems a bit ambiguous. I'm new to HPET details, the folk who came before me seem to have observed the obvious behavior: there's only one RTC IRQ signal, gated by LEG_RT_CNF, except that alarms have some special logic since they're wake events. > However, there is also the Legacy Replacement Route bit which the OS > (linux) can set or unset. This would enable or disable the Legacy > Replacement. That would deeply goof up the system timer code though. If you look for example at Intel's southbridge docs -- ICH5 and newer have HPET modules, basically all the same except ICH9 and/or ICH10 add another comparator -- the ONLY way to have the dedicated-by-Linux "timer" IRQ fed by the HPET is to use "legacy replacement mode". > Ie, it would seem that a much more sensible default for the > Linux kernel should be to disable this bit, so that Legacy Replacement does > NOT occur, As I already noted: most BIOS writers don't support this entirely laudable goal. They don't set up routing for the HPET IRQs. Which means switching IRQ routing modes isn't very practical. > unless there is no RTC on the motherboard, All x86 PCs have an RTC, except maybe ones predating the PC/AT. On Thu, 24 Jul 2008, bugme-daemon@bugzilla.kernel.org wrote: > Reply-To: david-b@pacbell.net > >> and am confused. >> a) The Specifications state that the HPET disables the RTC IRQ but only for >> the PIE. It states that the Alarm IRQ from the RTC is still delivered by >> the SCI > > Linux doesn't/can't/mustn't intercept SCI though. Agreed that the spec OK, I am way out of my depth. > seems a bit ambiguous. I'm new to HPET details, the folk who came before > me seem to have observed the obvious behavior: there's only one RTC IRQ > signal, gated by LEG_RT_CNF, except that alarms have some special logic > since they're wake events. > > >> However, there is also the Legacy Replacement Route bit which the OS >> (linux) can set or unset. This would enable or disable the Legacy >> Replacement. > > That would deeply goof up the system timer code though. If you look > for example at Intel's southbridge docs -- ICH5 and newer have HPET > modules, basically all the same except ICH9 and/or ICH10 add another > comparator -- the ONLY way to have the dedicated-by-Linux "timer" IRQ > fed by the HPET is to use "legacy replacement mode". Ah, so Linux uses the 8254 timer int 0/2 stuff in some crucial way. But what happens if I disable hpet ( use the nohpet kernel option). The system still runs fine. And the rtc now behaves itself. That would seem to switch off the legacy mode, but the timer IRQ still works. But now I am still confused. On my machine, the timer, which is an IO-ACPI interrupt is interrupt 0, while the HPET docs state that the HPET 0 which takes over the 8254 functions is interrupt 2 in IO-APIC mode. Anyway, my confusion is probably going to remain high, since my ignorance is so deep. > > >> Ie, it would seem that a much more sensible default for the >> Linux kernel should be to disable this bit, so that Legacy Replacement does >> NOT occur, > > As I already noted: most BIOS writers don't support this entirely > laudable goal. They don't set up routing for the HPET IRQs. Which > means switching IRQ routing modes isn't very practical. Again, I am unclear on what the BIOS has to do with it. It seems from the specs that Linux can set up the HPET however they want, and if the Bios does it badly, Linux can fix it. > > >> unless there is no RTC on the motherboard, > > All x86 PCs have an RTC, except maybe ones predating the PC/AT. Ran a test on an older system. Mandriva 2007.1 2.6.17 kernel. HPET and HPET_EMULAT_RTC are both on in the kernel. The rtc module is not loaded. I do this simply to provide and older benchmark. 13 out of 100 runs of the rtc-uie program had glitches, but all were at the 1ms level ( although that does not fit in with the 64Hz explanation for the glitches, since one would expect them to actually be 16ms glitches.) All are 1.0-1.8ms glitches, with in one the glitch goes earlier rather than later by 1ms. In all cases the system settles in to the new value after the glitch. Since the HPET/rtc code has changed since 2.6.17 this is provided in the hope that it might be useful in tracking down the causes of such glitches. The config file claims that the system has a 250Hz clock, adjtimex reports that tick: 10000 This suggests it is not just a lost tick. (ntp which is running on the system reports no glitches but controls the clock to a few usec. If it were 11 min mode, I would not expect the random distribution of the glitches. Read using select(2) on /dev/rtc: 1 0.765250 01d0 2 1.765224 01d0 3 2.765207 01d0 -1 * 3.763878 01d0 0 4.763862 01d0 1 5.763844 01d0 2 6.763827 01d0 3 7.763810 01d0 4 8.763793 01d0 5 9.763776 01d0 Read using select(2) on /dev/rtc: 1 0.658835 01d0 2 1.658809 01d0 3 2.658792 01d0 -1 * 3.659842 01d0 0 4.659825 01d0 1 5.659809 01d0 2 6.659792 01d0 3 7.659773 01d0 4 8.659758 01d0 5 9.659741 01d0 Hello David, (In reply to comment #6) > the [RTC] update IRQs with their 30.518 usec accuracy. The 30 microseconds granularity does not limit accuracy, which is better than this when using smart tools like hwclock 2.33 or adjtimex 1.24 Their method to read more accurately the RTC is to measure the offset at the first UIE following RTC_SET_TIME, in order to correct the time elapsed since then (between this first UIE and the current UIE). The interval between two UIEs is of course an integer number of seconds, and an integer number of 30 usec grains. But the correction goes below the grain. Unfortunately even this smart method can do nothing against the HPET emulated 64 Hz granularity, which adds a different gigantic random noise to the timestamps of each one of the emulated UIEs. Serge. -- Serge point Bets arobase laposte point net > But now I am still confused. On my machine, the timer, which is an IO-ACPI > interrupt is interrupt 0, while the HPET docs state that the HPET 0 which > takes over the 8254 functions is interrupt 2 in IO-APIC mode. The IO-APIC is an IRQ *ROUTER* ... so the input pin #2 gets rerouted to output IRQ #0. Most pins, thankfully, use an identity mapping. > > As I already noted: most BIOS writers don't support this entirely > > laudable goal. They don't set up routing for the HPET IRQs. Which > > means switching IRQ routing modes isn't very practical. > > Again, I am unclear on what the BIOS has to do with it. It seems from the > specs that Linux can set up the HPET however they want, and if the Bios does > it badly, Linux can fix it. Linux *could* fix this ... but as a general policy, it doesn't second-guess the IRQ routing set up by the BIOS. I could imagine that changing someday; but it'd be a fair amount of work to do well, given the amount of hardware braindamage workarounds that are hidden in such BIOS code. I don't know anything about RTCs but I suspect I am suffering from this bug, quite often at bootup or shutdown when hwclock is used my system freezes. Sometimes I can just press a key or move the mouse and it will continue with a complaint about lost interrupts, sometimes it locks up completely and I need to turn off the computer. Is this bug being worked on? I'm just wondering, because now I need to disable hpet to get a working system. Thanks I don't know that anyone "owns" HPET, so it's unclear to me who ought to work on that aspect. For now, just disable HPET -- that's a robust workaround, with minimal downside. I'm looking at something that might work if the HPET docs are wrong about one part of the IRQ routing: if *all* RTC irqs go through SCI and thence ACPI, rather than just alarms, then the ACPI event mechanisms should let us get "real RTC IRQs". (Since the RTC only has one IRQ signal, it would be very strange to require HPET support to modify the RTC silicon to split its IRQ source into three parts...) ACPI would add at least some delays; I don't know how significant they would be. Thing is, I've never observed that part of ACPI to work. I've got the experiment set up now, I've just got to make time to run it and evaluate the results. Then make a clean solution, if it works, and hope that other platforms don't have blocking ACPI or hardware bugs. Created attachment 17185 [details]
switch rtc-cmos to dispatch through ACPI and mostly ignore HPET
Turns out it's not so hard to get that to work. It seems that ACPI will intercept RTC interrupts even if it's not forced to do so via the HPET legacy replacement mode. And the overhead of going through ACPI really isn't much...
I've tested this on an old non-HPET machine, an HPET machine with and without HPET active, and it seems to behave fine.
Unfortunately, see bug 11312 ... ACPI seems to spontaneously go AWOL, causing trouble for this otherwise clean fix. Created attachment 17222 [details]
updated patch, works around ACPI buglet
I took a different strategy (more cautious) in this version: only filter through ACPI when emulating HPET (so it works on my system which doesn't handle ACPI IRQs), and keep that ACPI support disabled until some real RTC interrupts are enabled (hoping it will help avoid the cases where ACPI spontaneously disables the RTC event handler). Works fine so far in light testing.
By the way, I think I can confirm that the system lockups mentioned in comment #5 have nothing at all to do with flakey HPET emulation. Other folk have seen them, and I just saw one with the patch from comment #18 applied ... no RTC stuff at all was running, and the only HPET interaction was comparator 0 running in NO_HZ mode. (System rebooted because the TCO watchdog fired. More user tasks than just my X11 desktop had stopped running...) One theory is that it's caused by recent RCU updates. Status ? Status? Don't know that anyone's looked at this recently. Best workaround is to never use that emulation logic; unfortunately, that doesn't make a good default (one wants HPET clocksources). There were some mostly-working patches to use real UIE interrupts, filtered through ACPI, that got broken (can't apply any more) by some strange rtc-cmos changes that somehow got merged ... plus of course by those ACPI IRQ handling bugs. (I don't think Linux uses that IRQ mechanism much at all -- don't know about MS-Windows -- making broken ACPI behavior there be less of a surprise than usual.) If the ACPI IRQs worked OK, I'd almost suggest just using them all the time instead of "native" RTC IRQs ... where "almost" is constrained by me thinking it's bad to make this driver becine PC-specific (and depend on ACPI). But some folk seem to believe that's the way to go. Maybe they're right, but in that case there'd need to be an almost-identical driver for other MC146818 clones... ugh. Another solution might be fixing HPET's emulation code itself, not just its interaction with the RTC code. If this is still seen on modern kernels then please re-open/update On Tue, 30 Oct 2012, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=11153 > > > > > > --- Comment #22 from Alan <alan@lxorguk.ukuu.org.uk> 2012-10-30 14:57:16 --- > If this is still seen on modern kernels then please re-open/update > > I love this response to bug reports. do nothing for years, and then ask the OP whether the problem is still there. We spend time and effort to report bugs and all we get is further requests to test, not in order to fix but in order to for us to determine if the problem is still there. It is like companies who have a complaint line, (never an 800 line) where you get put on hold for hours racking up costs on your telco bill. would somebody with the necessary permissions bother to re-open this report? the git log has some evidence that some work might have been done on acpi interrupts (i didn't look too closely). however, there is no indication that the hpet code has been adjusted in any way. therefore it does not seem that this report is obsolete at all. |