Hello, after upgrading from 2.6.37.5 to 2.6.38.7, the hwclock command times out on a kvm guest system: [root@hwclock ~]# hwclock select() to /dev/rtc to wait for clock tick timed out Relevant strace output: ioctl(4, RTC_UIE_ON, 0) = 0 select(5, [4], NULL, NULL, {5, 0}) = 0 (Timeout) The kvm host system is Centos 5.6. Another virtual machine running on a Fedora 14 kvm host works fine. I've bisected it down to this commit: # git bisect bad 6610e0893b8bc6f59b14fed7f089c5997f035f88 is the first bad commit commit 6610e0893b8bc6f59b14fed7f089c5997f035f88 Author: John Stultz <john.stultz@linaro.org> Date: Thu Sep 23 15:07:34 2010 -0700 RTC: Rework RTC code to use timerqueue for events This patch reworks a large portion of the generic RTC code to in-effect virtualize the rtc interrupt code. The current RTC interface is very much a raw hardware interface. Via the proc, /dev/, or sysfs interfaces, applciations can set the hardware to trigger interrupts in one of three modes: AIE: Alarm interrupt UIE: Update interrupt (ie: once per second) PIE: Periodic interrupt (sub-second irqs) The problem with this interface is that it limits the RTC hardware so it can only be used by one application at a time. The purpose of this patch is to extend the RTC code so that we can multiplex multiple applications event needs onto a single RTC device. This is done by utilizing the timerqueue infrastructure to manage a list of events, which cause the RTC hardware to be programmed to fire an interrupt for the next event in the list. In order to preserve the functionality of the exsting proc,/dev/ and sysfs interfaces, we emulate the different interrupt modes as follows: AIE: We create a rtc_timer dedicated to AIE mode interrupts. There is only one per device, so we don't change existing interface semantics. UIE: Again, a dedicated rtc_timer, set for periodic mode, is used to emulate UIE interrupts. Again, only one per device. PIE: Since PIE mode interrupts fire faster then the RTC's clock read granularity, we emulate PIE mode interrupts using a hrtimer. Again, one per device. With this patch, the rtctest.c application in Documentation/rtc.txt passes fine on x86 hardware. However, there may very well still be bugs, so greatly I'd appreciate any feedback or testing! Signed-off-by: John Stultz <john.stultz@linaro.org> LKML Reference: <1290136329-18291-4-git-send-email-john.stultz@linaro.org> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: Thomas Gleixner <tglx@linutronix.de> CC: Richard Cochran <richardcochran@gmail.com> ---------------------------------- Reverting the commit cures the issue. I've also tested kernel 3.0.0-rc1, the problem is still there. As one virtual machine is running fine and another one isn't, this might be a bug in the "old" Centos 5.6 kvm host system? Relevant kernel config: [linux-2.6]# grep RTC .config CONFIG_HPET_EMULATE_RTC=y CONFIG_RTC_LIB=y CONFIG_RTC_CLASS=y CONFIG_RTC_HCTOSYS=y CONFIG_RTC_HCTOSYS_DEVICE="rtc0" # CONFIG_RTC_DEBUG is not set # RTC interfaces CONFIG_RTC_INTF_SYSFS=y CONFIG_RTC_INTF_PROC=y CONFIG_RTC_INTF_DEV=y # Platform RTC drivers CONFIG_RTC_DRV_CMOS=m [root@hwclock ~]# grep -i rtc /var/log/dmesg drivers/rtc/hctosys.c: unable to open rtc device (rtc0) rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0 rtc0: alarms up to one day, 114 bytes nvram Best regards, Thomas
Thanks for the bug report! I'm looking into it.
Thomas: Are the two virtual machines identical? Maybe could you provide full guest dmesg output from the VM using the same guest kernel on both the working and non-working hosts?
Another clarification: Is the guest kernel x86_64 or i686? So far I'm unable to reproduce on actual x86_64 hardware. Trying to see if I can trigger it under kvm.
So far, I've not been able to reproduce with x86_64 guest on x86_64 host (with recent kvm). It def seems like it could be a kvm/qemu triggered issue (which makes reproducing it somewhat more difficult). Thomas: Could you also provide the "kvm -version" output?
Created attachment 62042 [details] Dmesg output of working box
Created attachment 62052 [details] Dmesg output of broken box
Both systems are installed from the same .iso image. Guest system kernel is i686 + PAE, the VM hardware is set to "x86_64" on both boxes. *** Working box *** KVM version: 0.13.0 Host system kernel: 2.6.35.13-92.fc14.x86_64 dmesg output in "dmesg.working.kvm_0.13.0" *** Broken box *** KVM version: 0.10.50 Host system kernel: 2.6.35.8-59.x86_64 dmesg output in "dmesg.broken.0.10.50" Hope this helps.
My apologies, the last few weeks have been busy and I forgot to follow up here. So one interesting detail between the broken and working virtual machines is the broken one doesn't seem to have hpet functionality. I'll try to look through the code to see how the hpet may be involved.
Also, on the guest kernel, is CONFIG_RTC_INTF_DEV_UIE_EMUL enabled?
(In reply to comment #9) > Also, on the guest kernel, is CONFIG_RTC_INTF_DEV_UIE_EMUL enabled? This option is not set.
Ok, so I've reproduced this by using qemu-0.10.5 as you described above with fedora15. I suspect its a kvm issue, where kvm didn't properly support AIE (Alarm) mode alarms from the RTC. So now that the kernel uses emulates periodic update interrupts (UIE mode irqs) via AIE, we're missing the irq. I'll be working on a debug patch to narrow down if my theory is correct.
So yea. Unfortunately my theory is right. Apparently kvm/qemu did not support AIE/alarm mode RTC interrupts in older 0.10.5 releases. It did seemingly support UIE mode (and likely PIE mode as well). So with the RTC virtualization changes, the kernel now uses AIE mode for all RTC interrupts. Since kvm/qemu doesn't support those, the bug described here manifests. So I've bisected the fix down on the qemu side to: eeb7c03c0f49a8678028a734f1d6575f36a44edc (Add rtc reset function) which showed up in 0.11. Unfortunately it depends on numerous earlier patches and won't likely be easy to back-port to 0.10.5. So now I'm trying to weigh how valid it is to add hackish fixes to the kernel in order to support old and incomplete emulation environments.
CC'ing Avi Kivity to give him a heads up.
Some more info on this: The affected server is running Centos 5, so basically this will affect RHEL 5, too. Also the ancient VMware server 1.x shows the same issue (found while testing the upgrade to 2.6.39.3 on it).
Talked with Avi Kivity and I think the right approach here is to get the qemu/kvm AIE mode enablment from 0.11 backported to the RHEL5 version(0.10) of qemu/kvm.
RHEL5 bug filed: https://bugzilla.redhat.com/show_bug.cgi?id=725876
Thomas Jarosch: According to the RH bug, as of kvm-83-246.el5 and 2.6.38.6-26.rc1.fc15.x86_64 the issue should be fixed by an updated kvm. I'm marking this as resolved, but let me know if it continues to be an issue.