Bug 10729
Summary: | Freezes with kernel 2.6.25, worked perfectly with 2.6.24 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Andreas Juch (kernel-bt) |
Component: | x86-64 | Assignee: | David Brownell (dbrownell) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | bunk, celejar, dbrownell, tglx |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25-2-amd64 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg on kernel 2.6.24
dmesg on kernel 2.6.25 configs and dmesg outputs of the tests i ran kconfig tweaks |
Description
Andreas Juch
2008-05-16 20:13:20 UTC
Created attachment 16169 [details]
dmesg on kernel 2.6.24
Created attachment 16170 [details]
dmesg on kernel 2.6.25
I marked this as a regression and tentatively recategorised it to x86_64. Thomas, we've seen a few of those rtc messages - is it likely to be assicoated with this problem? Yes, this might be the RTC changes in drivers/rtc vs. the legacy rtc code. David ? Found it. I stopped chrony, the ntp synchronization daemon and the system now runs without the lost interrupts messages for four and a half hour. No freezes up to now, so this might be a duplicate of bug #10369. If it still crashes, I'll write a comment. Thanks for now! I speculate that the legacy RTC has HPET-related bugs still/again/... One experiment would be to try using a proper "RTC framework" config with rtc-cmos statically linked. Then try diddling HPET config options. I don't know that anything has changed recently, but as usual it would be good to verify the current (2.6.26-rc2) kernel has these issues... I just tested 2.6.26-rc2 with the same config as 2.6.25 (make oldconfig) and it ran for 7 hours with chrony without freeze and without any lost interrupts messages. So the current development version doesn't seem to have that problem. rtc-cmos doesn't seem to be used here. It's compiled as a module here and lsmod | grep rtc doesn't give any results. Here's a grep of the kernel rtc config with 2.6.26-rc2: CONFIG_HPET_EMULATE_RTC=y # CONFIG_HPET_RTC_IRQ is not set CONFIG_RTC_LIB=m CONFIG_RTC_CLASS=m # RTC interfaces CONFIG_RTC_INTF_SYSFS=y CONFIG_RTC_INTF_PROC=y CONFIG_RTC_INTF_DEV=y # CONFIG_RTC_INTF_DEV_UIE_EMUL is not set # CONFIG_RTC_DRV_TEST is not set # I2C RTC drivers CONFIG_RTC_DRV_DS1307=m CONFIG_RTC_DRV_DS1374=m CONFIG_RTC_DRV_DS1672=m CONFIG_RTC_DRV_MAX6900=m CONFIG_RTC_DRV_RS5C372=m CONFIG_RTC_DRV_ISL1208=m CONFIG_RTC_DRV_X1205=m CONFIG_RTC_DRV_PCF8563=m CONFIG_RTC_DRV_PCF8583=m CONFIG_RTC_DRV_M41T80=m # CONFIG_RTC_DRV_M41T80_WDT is not set CONFIG_RTC_DRV_S35390A=m # SPI RTC drivers CONFIG_RTC_DRV_MAX6902=m CONFIG_RTC_DRV_R9701=m CONFIG_RTC_DRV_RS5C348=m # Platform RTC drivers CONFIG_RTC_DRV_CMOS=m CONFIG_RTC_DRV_DS1511=m CONFIG_RTC_DRV_DS1553=m CONFIG_RTC_DRV_DS1742=m CONFIG_RTC_DRV_STK17TA8=m CONFIG_RTC_DRV_M48T86=m CONFIG_RTC_DRV_M48T59=m CONFIG_RTC_DRV_V3020=m # on-CPU RTC drivers diff -u between .25 and .26 rtc config: --- /tmp/25.rtc 2008-05-19 06:49:54.000000000 +0200 +++ /tmp/26.rtc 2008-05-19 06:50:04.000000000 +0200 @@ -1,11 +1,7 @@ CONFIG_HPET_EMULATE_RTC=y -CONFIG_RTC=y # CONFIG_HPET_RTC_IRQ is not set -CONFIG_SND_RTCTIMER=m -CONFIG_SND_SEQ_RTCTIMER_DEFAULT=y CONFIG_RTC_LIB=m CONFIG_RTC_CLASS=m -# Conflicting RTC option has been selected, check GEN_RTC and RTC # RTC interfaces CONFIG_RTC_INTF_SYSFS=y CONFIG_RTC_INTF_PROC=y I don't know if I have time for playing with all HPET options since every option needs my computer to run for ~4 hours with it, but I'll try some of them. Reply-To: david-b@pacbell.net On Sunday 18 May 2008, bugme-daemon@bugzilla.kernel.org wrote: > I just tested 2.6.26-rc2 with the same config as 2.6.25 (make oldconfig) and > it > ran for 7 hours with chrony without freeze and without any lost interrupts > messages. So the current development version doesn't seem to have that > problem. > > rtc-cmos doesn't seem to be used here. It's compiled as a module here and > lsmod | grep rtc doesn't give any results. You should try this with two different configs: - Your currrent RC2 config with the RTC framework, and rtc-cmos, statically linked; - Alternate config, without the RTC framework but with the legacy RTC driver statically linked The freeze seemed to be related to the kernel handling the RTC update IRQ. Your "it works" configuration doesn't have the kernel handling that IRQ, so it's not a good verification that the problem went away ... Ok. I tried the two configs with 2.6.25.4 and 2.6.26rc3. Here are the results: * the second config crashed with both kernels * the first config had lost interrupts in kernel 2.6.25.4, but they seem to disappear in .26 I hope that you meant "device drivers/character devices/enhanced real time clock support" with "legacy RTC driver"... I'll attach the complete kernel config of every tested kernel and config and the dmesg output of each of them. I hope I've done it right this time. Created attachment 16267 [details]
configs and dmesg outputs of the tests i ran
Created attachment 16269 [details] kconfig tweaks > Ok. I tried the two configs with 2.6.25.4 and 2.6.26rc3. > Here are the results: > > * the second config crashed with both kernels Where "second config" is "legacy RTC driver in use", also HPET ... but not CONFIG_HPET_RTC_IRQ. I suggest you enable that (char drivers, under HPET); near as I can tell, disabling HPET_RTC_IRQ while HPET is in use should not be permitted. (If that checks out, this patch should probably get merged ... given an ACK from someone more HPET-savvy than me.) > * the first config had lost interrupts in kernel 2.6.25.4, > but they seem to disappear in .26 Where "first config/2.6.25.4" is invalid but ends up using the legacy RTC driver without HPET_RTC_IRQ ... while the "first config/2.6.26-rc3" is valid and uses only the RTC framework. > I hope that you meant "device drivers/character devices/enhanced real time > clock support" with "legacy RTC driver" Yes. That's also clarified in this patch. The patch seems to work. I have applied the patch to 2.6.25.4 and used the "goofy" config from the debian 2.6.25 kernels with make oldconfig. HPET_RTC_IRQ became y automagically and the system is running with chrony since 18,5 hours. If you want, I can do the two test cases from above again with the patch, but I'll need a few days for that. Thanks a lot for your help! So the current working config is: CONFIG_HPET_EMULATE_RTC=y CONFIG_RTC=y CONFIG_HPET_RTC_IRQ=y CONFIG_SND_RTCTIMER=m CONFIG_SND_SEQ_RTCTIMER_DEFAULT=y CONFIG_RTC_LIB=m CONFIG_RTC_CLASS=m # Conflicting RTC option has been selected, check GEN_RTC and RTC # RTC interfaces Reply-To: david-b@pacbell.net On Sunday 25 May 2008, bugme-daemon@bugzilla.kernel.org wrote: > So the current working config is: Actually there are now two working configs. The one you list is the legacy driver config. The other uses the newer code; your previous note reported that it works ok. The question in my mind is now: does this "working" legacy config behave right on hardware that *doesn't* have HPET? The newer code only worked in 2.6.26-rc3, I had lost interrupts with it in 2.6.25.4, but I didn't wait hours to see if it crashes. I'm now testing that "working" legacy config with HPET disabled in BIOS. I don't know if that's equal to a system without HPET. Reply-To: david-b@pacbell.net On Sunday 25 May 2008, bugme-daemon@bugzilla.kernel.org wrote: > I'm now testing that "working" legacy config with HPET disabled in BIOS. I > don't know if that's equal to a system without HPET. If the kernel doesn't find and enable the HPET, it should be equilvalent to an HPET-less system. The system ran fine with HPET disabled for about 10 hours without lost interrupts or crash, so I guess it's ok on systems without HPET too. Hi, I believe that I'm also seeing this bug on my laptop (Acer Aspire 690-2672, Intel 82801G (ICH7 Family). Running Debian Sid, I get complete freezes (no screen activity, no response to input, can't ssh in) with nothing particularly helpful (I don't recall if I got the 'lost n interrupts' message) in the logs, after various amounts of uptime, but always within about fifteen minutes. I'm pretty sure that it is this bug, since after disabling chrony, as per Andreas' suggestion, the problem seems to have gone away. Also, the earliest I see the problem is immediately after init starts chrony. Possibly useful information: I) I'm seeing this with i386 (686) II) The freezes come much earlier for me than the four hours reported by Andreas. Had the same issue on IBM x336 running Debian Etch. The system stopped working (like explained by the others on this report) after doing a manual ntpdate. I've a cron job doing it every hour and 10 minutes. The server crashed 2 times after 25 hours of uptime. I'm now back on 2.6.24.2 and everything is fine again. This is my RTC config: $ grep -i RTC linux-2.6.25.4/.config CONFIG_HPET_EMULATE_RTC=y CONFIG_RTC=y # CONFIG_RTC_CLASS is not set So if you apply the Kconfig patch then rebuild your kernel, does it work? If not, I'd assume it's a different bug. fixed by commit e6d2bb2bacb43ff03b0f458108d71981d58e775a |