Most recent kernel where this bug did *NOT* occur: 2.6.16, briefly Distribution: Gentoo Hardware Environment: ASUS P5NSLI, NVidia MCP51 BIOS Software Environment: Linux Merckx 2.6.20-gentoo-r3 #1 SMP PREEMPT Tue Mar 20 05:42:58 CDT 2007 i686 Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz GenuineIntel GNU/Linux Problem Description: ACPI BIOS seems to have broken connection to timer: dmesg: ... Total of 2 processors activated (7469.77 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... failed. ...trying to set up timer as Virtual Wire IRQ... failed. ...trying to set up timer as ExtINT IRQ... works. checking TSC synchronization across 2 CPUs: passed. Which *seems* to work, BUT: jesnow@Merckx ~ $ cat /proc/interrupts CPU0 CPU1 0: 3155472 0 XT-PIC-XT timer 1: 1794 0 IO-APIC-edge i8042 9: 0 0 IO-APIC-fasteoi acpi 12: 4 0 IO-APIC-edge i8042 14: 50 0 IO-APIC-edge ide0 16: 57713 0 IO-APIC-fasteoi eth0 17: 120591 0 IO-APIC-fasteoi libata, ohci_hcd:usb2 18: 0 0 IO-APIC-fasteoi libata 19: 1557 0 IO-APIC-fasteoi ehci_hcd:usb1 20: 6708 0 IO-APIC-fasteoi HDA Intel 21: 473644 0 IO-APIC-fasteoi nvidia NMI: 0 0 LOC: 3089598 3089888 ERR: 1 MIS: 0 CPU1 does not process interrups, which leaves me essentially with a uniprocessor system. Especially since io-intensive activity often blocks the whole machine while one whole core sits idle. Also I have random lockups, but cannot prove they are related. Back in Kernel 2.6.16, for a brief while, the command line param acpi_skip_timer_override worked, both CPU's processed interrupts, and everything was fine. Not sure when that changed. Steps taken since: Flashed BIOS to newest version (1001): No effect used kernel params: - pci=bios: no effect - acpi_skip_timer_override: no effect - no_timer_check: freeze after 'io scheduler cfq registered'
CPU1 won't handle interrupts unless irqbalance is run. See http://www.irqbalance.org
I had understood that irqbalance was no longer needed, and in fact during the brief period that everything worked, I did not have it, but OK, actually the result is interesting: Merckx jesnow # equery list irqbalance [ Searching for package 'irqbalance' in all categories among: ] * installed packages [I--] [ ] sys-apps/irqbalance-0.55 (0) Merckx jesnow # /etc/init.d/irqbalance start * irqbalance: your machine lacks different physical processors; not enabling *** Yo, that's interesting: They're both there: Merckx jesnow # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz stepping : 6 cpu MHz : 1867.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 3736.14 clflush size : 64 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz stepping : 6 cpu MHz : 1867.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips : 3733.63 clflush size : 64 *** Curiouser and curiouser.
> * irqbalance: your machine lacks different physical processors; not enabling Whelp, that is an interesting policy that we should alert the irqbalance folks to. You're not the first to be suprised by it. > acpi_skip_timer_override Well, yes,t here has been some mucking about with the nvidia quirk lately. Please attach the complete oputput from dmesg -s64000 from the old working kernel, and also the new kernel. The workaround now depends on if the machine has an HPET or not. also, acpi_use_timer_override was added.
Created attachment 10913 [details] dmesg from my current kernel
dmesg from my current kernel atached. Unfortunately, I can't reproduce the "good" behavior I rmember with the earlier kernels. I used acpi_skip_timer_override=1 to no avail.
*** This bug has been marked as a duplicate of bug 7884 ***