Bug 13053
Summary: | No HPET interrupts unless acpi_skip_timer_override is set. | ||
---|---|---|---|
Product: | ACPI | Reporter: | William Cattey (wdc) |
Component: | Config-Interrupts | Assignee: | Venkatesh Pallipadi (venki) |
Status: | CLOSED DOCUMENTED | ||
Severity: | normal | CC: | lenb, linux-bugs, Matt_Domsch, venki, yakui.zhao |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
acpidump output from Dell 760
dmesg output with boot options "idle=poll printk.time=1" dmesg output with kernel options: printk.time=1 acpi_skip_timer_override dmesg output with hpet=disable but NOT acpi_skip_timer_override dmesg output with hpet=disable and acpi_skip_timer_override dmesg with default but with printk.time=1 |
Description
William Cattey
2009-04-09 21:57:36 UTC
After posting this, I continued to investigate. I don't consider myself at all clueful in the ways of the kernel, but I figured I should keep working and learning. I was surprised that a find/grep over the kernel source only turned up 3 files that referenced acpi_skip_timer_override, that it was set in several places but only tested in one, in acpi_parse_int_src_ovr(); As near as I can tell, the effect of setting the flag is to print: BIOS IRQ0 pin2 override ignored. and then to NOT execute: mp_override_legacy_irq(intsrc->source_irq, intsrc->inti_flags & ACPI_MADT_POLARITY_MASK, (intsrc->inti_flags & ACPI_MADT_TRIGGER_MASK) >> 2, intsrc->global_irq); ---- Am I right in concluding that, after all these "overrides" and "legacies" are parsed out of the conversation like many double negatives, that what's really going on is that on my new hardware, I need to ENABLE a legacy IRQ that would normally no longer expected to be needed? In that case, are we indeed talking about a BIOS bug here? Will you please attach the output of acpidump? Will you please try the following boot options and see whether the issue still eixsts? a. nolapic_timer b. processor.max_cstate=1 Thanks. I'll run those tests. I predict processor.max_cstate=1 will result in a working system, because this is all about failure to receive interrupts when we're in C3 state. Is there any particular config that is or is not appropriate for the acpidump? IE will it be a valid dump if I take it on the system that has been booted with acpi_skip_timer_override set? (I'm unfamiliar with acpidump, so I'm going to guess that it's the table the BIOS sets with what should work in ACPI, and that the output will be useful no matter what interrupt structure I've set.) Created attachment 20927 [details]
acpidump output from Dell 760
Not appearing in this log file are two error messages that went to stderr:
Wrong checksum for TCPA
Wrong checksum for TCPA!
With BIOS setting, "processor.maxcstate-=1", as expected, all was fine. I'll note in passing that powertop showed the CPU spending most of its time in C state 1. With BIOS setting, "nolapic_timer", the system mis-behaved the same way as in the default setting. The boot sequence paused in initializing the Ethernet hardware until I hit the power switch to provoke a few interrupts. Thanks for the test. Will you please boot the system with the option of "idle=poll nolapic_timer" and see whether the box can be booted? If it can be booted, please attach the output of /proc/interrupts. Thanks. From the acpidump we know that there exists the timer interrupt override entry. If the hpet is disabled, the PIT time will be used and the system is fine. If the boot option of "acpi_skip_timer_override" is added, the timer interrupt override entry is ignored. In such case the hpet can work well. Maybe this is related with BIOS bug. Will you please boot the system with the boot option of "idle=poll printk.time=1" and attach the output of dmesg? Thanks. Hi, Venki Any idea about this bug? I booted with options "idle=poll nolapic_timer", and the system came up. I'm confused as to why powertop showed no C state information with those options set, and why the topmost interupts listed were from hpet2 and hpet3. I then booted with options "idle=poll printk.time=1" as per your instructions. I will attach the dmesg output from that. Created attachment 20962 [details]
dmesg output with boot options "idle=poll printk.time=1"
Oh I forgot to add: even with nolapic_timer UNSET, (as per the boot options I mentioned in comment #10 above,) the powertop output shows NO C-state information. Are we running ok because we're doing different interrupt handling, or because we never enter the C3 state? In an attempt to provide more useful information, I will reboot with options: "acpi_skip_timer_override printk.time=1" and provide that dmesg output. Created attachment 20965 [details]
dmesg output with kernel options: printk.time=1 acpi_skip_timer_override
Yes, I re-confirm that without the idle=poll option, I powertop shows us going into C States, and spending appreciable time in state C3. Here are the interesting parts from the dmesg Without acpi_skip_timer_override [ 0.452319] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 With acpi_skip_timer_override [ 0.452001] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 [ 0.456001] ..MP-BIOS bug: 8254 timer not connected to IO-APIC [ 0.456001] ...trying to set up timer (IRQ0) through the 8259A ... [ 0.456001] ..... (found apic 0 pin 0) ... [ 0.499609] ....... works. So, timer_override seem to be doing the right thing. Pointing us to IOAPIC pin2 and timer interrupts on IOAPIC pin 2 is working fine (as long as no C3 state). With skip timer overide, we look at wrong pin on IOAPIC (pin 9 instead of pin2) and we fail to see the timer interrupt. So, we fallback to getting timer interrupt through PIC (8259A) pin 0, which is working fine as well. So, the problem here is IRQ0 timer interrupt is broken when it comes through IOAPIC and when CPU is in C3 state. Somehow that interrupt is not reaching the CPU when low power state is involved. But, the same interrupt works when it comes through 8259A. And in both cases HPET generates the interrupt. One interesting experiment would be to disable HPET. We will use 8254 timer in that case and see what happens without and with skip_override. That way we will know whether the problem is specific to HPET or it is something related to IOAPIC. Can you please try "hpet=disable" boot option without and with acpi_skip_timer_override? Created attachment 20981 [details]
dmesg output with hpet=disable but NOT acpi_skip_timer_override
You didn't say whether or not to include printk.time=1. I presumed it would be benign but possibly usefully verbose so I included it.
/proc/cmdline reports my command line as:
BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed boot=casper initrd=/c
asper/initrd.gz hpet=disable printk.time=1 nosplash --
IMPORTANT! When I got to the part of the boot-up from CD where it unpacked the squashfs filesystem (at line 611 in the attached dmesg log), it PAUSED as if waiting for an interrupt. It resumed when I struck a key. (Other dmesg output shows the next line about udev starting around tick 50. It took longer here because it was waiting for an interrupt. It only took a little longer because I noticed the CD light quit flashing, was in a hurry, and hit a key quickly.)
Output from powertop looks like this:
Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 0.8%)
polling 0.0ms ( 0.0%)
C1 mwait 0.0ms ( 0.0%)
C2 0.0ms ( 0.0%)
C3 12.4ms (99.2%)
Wakeups-from-idle per second : 82.6 interval: 15.0s
no ACPI power usage estimate available
Top causes for wakeups:
35.3% ( 37.1) <interrupt> : extra timer interrupt
18.0% ( 19.0) <kernel IPI> : Rescheduling interrupts
13.6% ( 14.3) <interrupt> : uhci_hcd:usb4, uhci_hcd:usb7
12.7% ( 13.3) USB device 4-1 : DELL USB Laser Mouse (DELL)
4.6% ( 4.9) <kernel core> : hrtimer_start (tick_sched_timer)
3.9% ( 4.1) <kernel core> : usb_hcd_poll_rh_status (rh_timer_func)
I see in comment #6 I was asked to supply /proc/interrupts. I apologize. I forgot to do that. Here is /proc/interrupts for the attachment mentioned in commend #15: root@ubuntu:/proc# cat /proc/interrupts CPU0 CPU1 0: 28517 30050 IO-APIC-edge timer 1: 2 0 IO-APIC-edge i8042 4: 1 1 IO-APIC-edge 7: 0 0 IO-APIC-edge parport0 8: 0 0 IO-APIC-edge rtc0 9: 0 0 IO-APIC-fasteoi acpi 12: 2 2 IO-APIC-edge i8042 16: 1293 1192 IO-APIC-fasteoi uhci_hcd:usb3, HDA Intel, radeon@pci:0000:01:00.0 17: 4896 3748 IO-APIC-fasteoi uhci_hcd:usb4, uhci_hcd:usb7 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb8 22: 1 2 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5 23: 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb6 2299: 2056 1898 PCI-MSI-edge eth0 2300: 3264 2989 PCI-MSI-edge ahci NMI: 0 0 Non-maskable interrupts LOC: 17895 16127 Local timer interrupts RES: 18494 20785 Rescheduling interrupts CAL: 58 52 Function call interrupts TLB: 104 95 TLB shootdowns SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 Created attachment 20982 [details]
dmesg output with hpet=disable and acpi_skip_timer_override
/proc/cmdline says:
BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed boot=casper initrd=/casper/initrd.gz hpet=disable acpi_skip_timer_override printk.time=1 nosplash --
/proc/interrupts says:
CPU0 CPU1
0: 184 35 IO-APIC-edge timer
1: 2 0 IO-APIC-edge i8042
4: 1 1 IO-APIC-edge
7: 0 0 IO-APIC-edge parport0
8: 0 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 2 2 IO-APIC-edge i8042
16: 1232 1223 IO-APIC-fasteoi uhci_hcd:usb3, HDA Intel, radeon@pci:0000:01:00.0
17: 2067 2085 IO-APIC-fasteoi uhci_hcd:usb4, uhci_hcd:usb7
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb8
22: 1 2 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5
23: 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb6
2299: 973 953 PCI-MSI-edge eth0
2300: 2427 2437 PCI-MSI-edge ahci
NMI: 0 0 Non-maskable interrupts
LOC: 10986 11560 Local timer interrupts
RES: 14898 14205 Rescheduling interrupts
CAL: 49 63 Function call interrupts
TLB: 35 28 TLB shootdowns
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
powertop says:
PowerTOP version 1.11 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 5.7%)
polling 0.0ms ( 0.0%)
C1 mwait 0.0ms ( 0.0%)
C2 0.2ms ( 0.8%)
C3 0.2ms (93.5%)
Wakeups-from-idle per second : 5563.4 interval: 10.0s
no ACPI power usage estimate available
Top causes for wakeups:
24.6% ( 21.1) <interrupt> : uhci_hcd:usb4, uhci_hcd:usb7
21.4% ( 18.4) USB device 4-1 : DELL USB Laser Mouse (DELL)
9.2% ( 7.9) <kernel IPI> : Rescheduling interrupts
8.5% ( 7.3) <interrupt> : eth0
8.3% ( 7.1) USB device 4-2 : Dell USB Keyboard (Dell)
7.9% ( 6.8) <kernel core> : hrtimer_start (tick_sched_timer)
Suggestion: Enable USB autosuspend by pressing the U key or adding
usbcore.autosuspend=1 to the kernel command line in the grub config
Note that the dmesg output attached in comment #17 shows the elapsed time between lines 613 and 614 (the aufs unpack to udev) to be 38 seconds not 60 seconds. That's because I didn't have to create an interrupt. Created attachment 20983 [details]
dmesg with default but with printk.time=1
Here is more detailed info on the bad state.
I kept more careful counts: 25 hits of power switch till keyboard was up.
1000 keystrokes until mouse was up.
/proc/cmdline:
BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed boot=casper initrd=/casper/initrd.gz printk.time=1 nosplash --
/proc/interrupts:
CPU0 CPU1
0: 149 2 IO-APIC-edge timer
1: 2 0 IO-APIC-edge i8042
4: 1 1 IO-APIC-edge
7: 0 0 IO-APIC-edge parport0
8: 0 1 IO-APIC-edge rtc0
9: 18 7 IO-APIC-fasteoi acpi
12: 3 1 IO-APIC-edge i8042
16: 1315 1192 IO-APIC-fasteoi uhci_hcd:usb3, HDA Intel, radeon@pci:0000:01:00.0
17: 6679 4592 IO-APIC-fasteoi uhci_hcd:usb4, uhci_hcd:usb7
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb8
22: 3 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5
23: 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb6
2294: 661 505 PCI-MSI-edge eth0
2295: 2495 2361 PCI-MSI-edge ahci
2302: 0 11137 HPET_MSI-edge hpet3
2303: 9712 0 HPET_MSI-edge hpet2
NMI: 0 0 Non-maskable interrupts
LOC: 385 362 Local timer interrupts
RES: 15896 17538 Rescheduling interrupts
CAL: 81 56 Function call interrupts
TLB: 71 107 TLB shootdowns
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
powertop:
PowerTOP version 1.11 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies)
C0 (cpu running) ( 1.7%)
polling 0.0ms ( 0.0%)
C1 mwait 0.0ms ( 0.0%)
C2 4.9ms ( 0.1%)
C3 9.9ms (98.3%)
Wakeups-from-idle per second : 99.3 interval: 5.0s
no ACPI power usage estimate available
Top causes for wakeups:
32.2% ( 80.8) <interrupt> : uhci_hcd:usb4, uhci_hcd:usb7
32.1% ( 80.6) USB device 4-1 : DELL USB Laser Mouse (DELL)
14.8% ( 37.0) <kernel IPI> : Rescheduling interrupts
9.8% ( 24.6) <interrupt> : hpet3
3.0% ( 7.6) <interrupt> : hpet2
1.6% ( 4.0) <interrupt> : eth0
Suggestion: Enable USB autosuspend by pressing the U key or adding
usbcore.autosuspend=1 to the kernel command line in the grub config
We're getting our clock interrupts from the mouse. :-D
OK. So, you basically need acpi_skip_timer_override both with and without hpet=disable, to boot this system normally. That means the problem is not with HPET hardware itself. But somewhere down along the IOAPIC. Following up on my comment #14, acpi_skip_timer_override is actually forcing us to use IRQ0 timer based on "Virtual Wire Mode via I/O APIC" which is basically 8259 forwarding the interrupt to IOAPIC, in place of IRQ line directly connected to IOAPIC. And the strangest thing is that this interrupt delivery is only broken with C3 state. So, this looks like a BIOS/hardware issue at this point rather than any bug in kernel. Not sure whats the best resolution we can get to, other than complaining about this to BIOS folks. We can surely add a dmi detection for this platform and force the use of "Virtual wire mode" for this platform. May be that is a good enough fix for now. But, we should probably watch out for this problem on other platforms... Adding Dell team who may be able to assist if it is diagnosed as a BIOS bug. Can you try one more experiment. Can you try latest .29.x kernel from kernel.org and see whether you see this waiting for the interrupt issue wuthout any special boot parameter. Venkatesh, I'm off site for a long weekend, and I need to learn how to create a bootable USB drive with a kernel that isn't tied to a distro, so there will be a few days delay in complying with your request, but yes, I'll do that test. It took RATHER longer than I hoped, but after wiping Ubuntu, installing Fedora 11 beta (because Fedora has the ability to run with a stock kernel in addition to the distro-patched kernel. Ubuntu may too, but I didn't see the feature advertised or what the procedure was to try and use a stock kernel.), and building a stock kernel from scratch, I confirm that the problem STILL occurs on a vanilla .29.1 kernel. Same symptom: You need to generate clock interrupts by hand unless you set acpi_skip_timer_override Good news! Dell has released an updated BIOS that remedies the problem! Rev A03 for the Dell Optiplex 760 can be found at: http://ftp.us.dell.com/bios/O760-A03.EXE I have updated my test system to this revision, and confirmed that HPET interrupts now correctly reach the Linux kernel with C states enabled. This bug can now be CLOSED. Thats good news. Closing this bug. |