Most recent kernel where this bug did not occur: Distribution: Debian testing Hardware Environment: IBM T60p Software Environment: Problem Description: Trying to read from /dev/rtc blocks indefinitely. If I use the simple test program in rtc.txt.gz, it hangs on the read and never returns. The program appears to turn on update interrupts, before reading. I see the same problem in Debian's latest 2.6.16 and a vanilla 2.6.18-rc4 kernel. Steps to reproduce: fd = open("/dev/rtc", O_RDONLY); ioctl(fd, RTC_UIE_ON, 0); read(fd, &data, sizeof(unsigned long));
Is this still happening with latest mainline ? tglx
It does still happen with 2.6.18-4 as packaged by debian. I will have to test the latest kernel separately.
Jerry, did you have chance to test, is the problem still there? Thanks.
Yes, I still see it in Debian's 2.6.21-1-686
David wonders "Does it happen with rtc-cmos, or only with the legacy driver?" and adds "If the general issue were that RTC irqs aren't arriving, I'd suspect it's HPET interfering..."
Created attachment 12166 [details] kernel dmesg output
Does the following answer which rtc driver this is? naga:/usr/src/madwifi-svn/tools# modinfo rtc filename: /lib/modules/2.6.21-2-686/kernel/drivers/char/rtc.ko alias: char-major-10-135 license: GPL author: Paul Gortmaker depends: vermagic: 2.6.21-2-686 SMP mod_unload 686 If not, let me know how to find the info. I'm also attaching my dmesg output.
Reply-To: david-b@pacbell.net On Thursday 26 July 2007, bugme-daemon@bugzilla.kernel.org wrote: > Does the following answer which rtc driver this is? That was never an issue. My question was whether this ONLY showed up with the legacy driver (i.e. the one you were very clearly using). To try rtc-cmos, disable the char/rtc.c in Kconfig, and then go to the RTC framework menu. Enable all the interfaces there, and "rtc-cmos". Now try.
Reply-To: david-b@pacbell.net > ACPI: HPET id: 0x8086a201 base: 0xfed00000 OK, I was right. This uses HPET. Until a bunch of clock patches merge, HPET prevents the CMOS clock from delivering IRQs. Your workaround: don't use HPET. The short version of the story is that HPET has two modes, which I tend to call "sane" and "broken". Unfortunately, "broken" is the default ... the breakage involves preventing the RTC (and something else, maybe PIT) from working properly, by taking over their IRQs. The patches I'm thinking of switch HPET over to use "sane" mode, with IRQs routed normally (not clobbering other devices). And then, conveniently enough, allocate one HPET to each CPU so they can serve as per-CPU clockevent sources. I understand that those patches have been deferred for now, along with all other x86_64 clockevent patches.
(added Thomas to cc) Thanks, David. You know everything. Jerry, if you're keen you could test 2.6.22-rc6-mm1 which has the patches which David refers to. Otherwise, please wait until we get x86_64 dynticks/clockevents back in, which will probably be a few weeks from now. Thanks.
Reply-To: david-b@pacbell.net On Thursday 26 July 2007, you wrote: > Thanks, David.
Guys, what's the status of this one? AFACIT we expect that the x86_64 clockevents patches will fix this bug, but Jerry has disappeared on us? David, if Jerry is still working on this, it might be useful to recap exactly what tests you'd like him to perform - it isn't terribly clear... Thanks.
bugme-daemon@bugzilla.kernel.org wrote: > ------- Comment #12 from akpm@osdl.org 2007-10-04 13:47 ------- > Guys, what's the status of this one? > > AFACIT we expect that the x86_64 clockevents patches will fix this > bug, but Jerry has disappeared on us? > Sorry, I'm here. > David, if Jerry is still working on this, it might be useful > to recap exactly what tests you'd like him to perform - it > isn't terribly clear... > That would be useful. Thanks.
That's strange. HPET should do this ugly emulate RTC thing. Can you please put your .config into the bugzilla ? tglx
I thought Comment #8 described it completely. Kernel config should have ONLY the rtc-cmos driver (not the legacy driver) ... that'll come out as /dev/rtc0 on most systems, which can be symlinked as /dev/rtc if you like. Verify that the same problem shows up. The legacy driver "should", as Thomas said, do the ugly thing given HPET in what I called "broken" mode. The new one does NOT, but I expect it will act fine in "sane" mode when all the x86_64 clockevent and HPET patches merge. (I don't recall hearing when that merge is planned. One hopes it's soon...)
Jerry, is this still happening with 2.6.24-rc2 ? tglx
Yes, I'm still seeing the same trouble. I'm attaching the config I used for building this kernel.
Created attachment 13589 [details] Kernel config
Hmm, it builds both the legacy rtc and the new rtc driver. David, is there any conflict ?
There "should" be no confusion any more, but only one of the two RTC drivers will bind to the hardware. Both are modular, so there is at least a simple choice of which to use, via "modprobe". (Though you may want to make sure you've got the latest version of hwclock, from util-linux-ng, since older ones don't understand that they may need to use "/dev/rtc0" not always "/dev/rtc".) I observe HPET still isn't completly disabled in this config, and since it's still being used in "broken" steal-the-rtc-IRQs mode, I'm thiking that will most likely still be making this trouble. Until HPET is used only in "sane" mode, there should probably be a way to make sure it's never enabled ... otherwise these "my IRQ's been stolen!" failures will persist.
David, what would you suggest us to do to get the mainline kernel work fine out of box?
Long term, stop using HPET in "legacy replacement"/broken mode and use it in "standard"/sane mode. And get rid of that "ugly emulation" thing, and its support in various places. There's really no point to trying to use that, except possibly as a (nasty) workaround for hardware bugs in "standard" mode. (Only "standard" mode is guarantee to exist, too...) I suspect that fix could not be for 2.6.24 ... ISTR some other RTC/IRQ bug report that was similar, except that it said x86_32 worked while x86_64 didn't ... there may be some merge issues yet to resolve. I noticed that x86_64 treats HPET differently. (Not just in terms of init, but also arch/x86/Kconfig and HPET_TIMER...) Near term there may be some ways to tweak the RTC code to coexist better with HPET. "rtc-cmos" doesn't insist on having an IRQ ... so if PNP correctly reported that it doesn't have an IRQ, it should behave. The legacy RTC driver is (as usual) a mess, but evidently it *used* to work (for some definition of "work") and something broke it. Someone with HPET skillz should be able to sort this out ... at least the regression aspect of this bug should be fixable, although the botch of not using "sane" HPET mode will still remain. (Remember that my insights here are restricted to RTC framework and code ... I've touched neither the legacy driver with which this problem is appearing, nor the x86 arch code that seems to be troublesome here. So I can't help resolve this bug except by noting what must be the root cause: using this "broken" HPET mode setting up a fragile house-of-cards, which this bug reports as starting to fall down.)
> ------- Comment #22 from dbrownell@users.sourceforge.net 2007-11-18 15:10 > ------- > Long term, stop using HPET in "legacy replacement"/broken mode and use it in > "standard"/sane mode. And get rid of that "ugly emulation" thing, and its > support in various places. There's really no point to trying to use that, > except possibly as a (nasty) workaround for hardware bugs in "standard" mode. > (Only "standard" mode is guarantee to exist, too...) I suspect that fix > could > not be for 2.6.24 ... We can not use the "non legacy mode" in a sane way as long as BIOSes are not providing irq routing for the non legacy case. Venki tried to enforce this, but it is really troublesome. > ISTR some other RTC/IRQ bug report that was similar, except that it said > x86_32 > worked while x86_64 didn't ... there may be some merge issues yet to resolve. > I noticed that x86_64 treats HPET differently. (Not just in terms of init, > but > also arch/x86/Kconfig and HPET_TIMER...) the 32/64 bit hpet related code is the same now. > Near term there may be some ways to tweak the RTC code to coexist better with > HPET. "rtc-cmos" doesn't insist on having an IRQ ... so if PNP correctly > reported that it doesn't have an IRQ, it should behave. The legacy RTC > driver > is (as usual) a mess, but evidently it *used* to work (for some definition of > "work") and something broke it. Someone with HPET skillz should be able to > sort this out ... at least the regression aspect of this bug should be > fixable, > although the botch of not using "sane" HPET mode will still remain. > > (Remember that my insights here are restricted to RTC framework and code ... > I've touched neither the legacy driver with which this problem is appearing, > nor the x86 arch code that seems to be troublesome here. So I can't help > resolve this bug except by noting what must be the root cause: using this > "broken" HPET mode setting up a fragile house-of-cards, which this bug > reports > as starting to fall down.) Very helpful :) The only pitfall as far as I can tell is that the old x8664 code did not enforce the RTC emulation when HPET was enabled, which is stupid. the 32 bit code did. This is fixed in .24-rc. tglx
Reply-To: david-b@pacbell.net > ------- Comment #23 from tglx@linutronix.de 2007-11-18 17:31 ------- > > We can not use the "non legacy mode" in a sane way as long as BIOSes > are not providing irq routing for the non legacy case. Venki tried to > enforce this, but it is really troublesome. Well that's rude. Gotta love the extent to which BIOS vendors cripple the hardware. :( Next time I get a new PC, I'll hope it has an HPET so I can see how this affects the new RTC framework (and the rtc-cmos driver).
Jerry, have you tested current kernel, does the rtc work for you now?
I still see the problem in Debian kernel 2.6.24-1-686 v 2.6.24-5. I can give a vanilla kernel a try too.
Reply-To: david-b@pacbell.net On Monday 31 March 2008, bugme-daemon@bugzilla.kernel.org wrote: > I still see the problem in Debian kernel 2.6.24-1-686 v 2.6.24-5.
The simple test program works on Debian's 2.6.25
Reply-To: david-b@pacbell.net On Monday 19 May 2008, you wrote: > ------- Comment #28 from jlquinn@optonline.net
(In reply to comment #29) > Reply-To: david-b@pacbell.net > > On Monday 19 May 2008, you wrote: > > ------- Comment #28 from jlquinn@optonline.net �2008-05-19 16:24 ------- > > The simple test program works on Debian's 2.6.25 > > So ... the bug is fixed in 2.6.25? > Or is this a Debian-specific patch/config issue? I see nothing in Debian's kernel package changelog to indicate they apply a patch for the problem, so I'd conclude it is fixed in 2.6.25. Thanks