Distribution: Slackware 9.0 Hardware Environment: Tyan Tiger S2723GNN mainboard, 1 GB RAM, two Xeon 2.66 GHz CPU with Hyperthreading, Bios Version 1.02 (newest) Software Environment: Unmodified monolithic 2.5.70 Linux kernel, booted with Lilo 22.5 Problem Description: During ACPI initialization I get 100 times the following output: ACPI: Subsystem revision 20030522 irq 9: nobody cared! Call Trace: [<c010b4a2>] handle_IRQ_event+0x87/0xf7 [<c010b758>] do_IRQ+0xbe/0x17b [<c0109b78>] common_interrupt+0x18/0x20 [<c010be45>] setup_irq+0xc3/0xff [<c022bb36>] acpi_irq+0x0/0x16 [<c022bb36>] acpi_irq+0x0/0x16 [<c010b8be>] request_irq+0xa9/0xde [<c022bb86>] acpi_os_install_interrupt_handler+0x3a/0x59 [<c022bb36>] acpi_irq+0x0/0x16 [<c022bb36>] acpi_irq+0x0/0x16 [<c022fb49>] acpi_ev_install_sci_handler+0x1a/0x1e [<c022fb0c>] acpi_ev_sci_xrupt_handler+0x0/0x18 [<c022f58b>] acpi_ev_handler_initialize+0x6/0x6e [<c024056e>] acpi_enable_subsystem+0x2b/0x58 [<c03e964c>] acpi_bus_init+0x7a/0x115 [<c03e973d>] acpi_init+0x56/0xa6 [<c03d685a>] do_initcalls+0x28/0x94 [<c0130660>] init_workqueues+0xf/0x26 [<c01050c9>] init+0x5a/0x1d1 [<c010506f>] init+0x0/0x1d1 [<c0107075>] kernel_thread_helper+0x5/0xb handlers: [<c022bb36>] (acpi_irq+0x0/0x16) Additionally the number of ACPI interrupts is excessive, nearly 4.5 million ACPI interrupts per minute. The machine does not have this problem when running 2.4.21-rc7, where the ACPI interrupt is configured in XT-PIC mode, whereas 2.5.70 configures it to operate in IO-APIC-level mode. robert.moore@intel.com says it may be due to a missing interrupt source override for the ACPI SCI interrupt. The ACPI backport in 2.4.21-ac seems to have a similar problem (see Bug #370) Steps to reproduce: Run 2.5.70 on aforementioned board. Alternatively send me patches to try or ask me for additional information.
Make sure you have latest BIOS version, and Latest ACPI . If you still have problem, please attach dmesg as well as output of acpidmp
Created attachment 724 [details] acpidmp output Used Linux 2.4.22-rc3 (should be latest ACPI) Used latest Tyan BIOS
Created attachment 725 [details] dmesg output Used Linux 2.4.22-rc3 (should be latest ACPI) Used latest Tyan BIOS Problem still persists. IRQ is triggered excessively CPU0 CPU1 CPU2 CPU3 0: 2125 0 0 44845 IO-APIC-edge timer 1: 2 0 0 0 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 0 IO-APIC-edge rtc 9: 1664680 1697269 36288240 0 IO-APIC-level acpi 14: 1228 19 745 0 IO-APIC-edge ide0 17: 121 28567 0 0 IO-APIC-level eth3 19: 91028 0 0 0 IO-APIC-level eth4 24: 7 0 0 231 IO-APIC-level eth2 48: 153 0 2325 0 IO-APIC-level eth0 NMI: 46849 46849 46849 46849 LOC: 46826 46825 46824 46823 ERR: 0 MIS: 0
Comment on attachment 724 [details] acpidmp output ><HTML><HEAD/> ><BODY> ><H1><A href="http://hell.wh8.tu-dresden.de/acpi/acpidmp">http://hell.wh8.tu-dresden.de/acpi/acpidmp</A></H1> ></BODY></HTML>
NMI: 46849 46849 46849 46849 it's abnormal. ?
It's not abnormal. The kernel is running with NMI watchdog. See kernel command line in dmesg output. When booting without NMI watchdog there are no NMIs.
Created attachment 765 [details] patch for collecting IO-APIC config infomation Would you please apply this patch, and send out demsg? I want to verify whether IO-APIC is configured as expected. Thanks a lot.
Created attachment 766 [details] dmesg output with collected IO-APIC config information Patch was originally for 2.5, I've ported it to 2.4.22, since that's what the machine is currently running. Results should be the same.
Created attachment 777 [details] what could happen ? I want to see what could happen, if irq for SCI gets changed.
Created attachment 778 [details] dmesg output with changed SCI interrupt Changing SCI interrupt to 11 fixes the problem.
Created attachment 779 [details] Output of /proc/interrupts with changed SCI interrupt Changing ACPI interrupt to 11 no longer causes excessive interrupts as /proc/interrupts shows.
Created attachment 819 [details] I hope this patch can resolve your problem!
The patch does not fix the problem. With the patch applied the ACPI interrupt is still IRQ9 and is being triggered excessively. The only patch that helped so far was the one that configured the ACPI interrupt to IRQ11 the hard way.
Created attachment 820 [details] To verify if default setting can work Would you please have it a try( with privious patch applied), and post dmesg and interrupts. Thanks a lot.
Created attachment 821 [details] dmesg output of 2.4.23-pre2 + patches (attachment id #819 + #820). The combination of both patches fixes the problem again, now with ACPI interrupt at IRQ9.
Created attachment 822 [details] /proc/interrupts of 2.4.23-pre2 + patches (attachment id #819 + #820). The combination of both patches fixes the problem again, now with ACPI interrupt at IRQ9. However, there are now only 2 CPUs. There should be 4, because the machine has Hyperthreading.
Re: HT Total of 2 processors activated (10643.04 BogoMIPS). cpu_sibling_map[0] = 1 cpu_sibling_map[1] = 0 This dmsg output shows that the two siblings inside a package came up, but neither of the siblings inside the other package came up. Unclear why or how the code changes in this report could have an effect on HT. BTW acpismp=force should not be needed in 2.4.23 -- let me know if it changes the behaviour b/c I removed it in 2.4.22... Re: IRQ9 storm stopping the storm by masking or mis-programming IRQ9 is a clue, but the real question is what conditions are necessary for you to correctly receive and handle acpi interrupts. Is it possible for you to go back to your base 2.4.22 or 2.4.23 source tree and try only these three combinations one at a time?: arch/i386/kernel/mpparse.c: void __init mp_config_ioapic_for_sci(int irq) ... old: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low, level triggered new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high, level triggered or new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low, edge triggered or new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high, edge triggered and check if 1. the IRQ9 storm stopped 2. the other interrupts in the system work 3. you can provoke and handle an acpi interrupt, eg. power button does something (and I guess also verify that you have all 4 logical processors)
Re: Only one physical package with two siblings came up. -- This is unrelated to any of the patches posted here and is caused by changes introduced somewhere in the 2.4.23-pre patches. I have gone back to plain 2.4.22 for testing this issue. Re: IRQ9 storm (with 2.4.22 there are always 4 logical CPUs) old: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low, level triggered -- causes IRQ9 storm, other ints work, unclear whether acpi int works due to storm new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high, level triggered -- no IRQ9 storm, other ints work, acpi int doesn't work new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low, edge triggered -- no IRQ9 storm, other ints work, acpi ints works (pwr button) new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high, edge triggered -- no IRQ9 storm, other ints work, acpi int works (pwr button) I can also supply dmesg output for all 4 testcases with 2.4.22, but they only differ in the output of the following line: IOAPIC[0]: Set PCI routing entry (8-9 -> 0x71 -> IRQ 9 Mode:X Active:Y) where X and Y form the four combinations mentioned above.
Thanks for testing the combinations -- I think the solution is now clear. The SCI hardware on this board is sending a pulse low->high->low and so either of the edge triggers work, but it isn't high long enough to trigger the the level-high interrupt. Seems the fix for this bug is if the SCI has no over-ride, that we should leave it alone instead of forcing it to level/low. In this case it would have remained at edge/high as programmed by the BIOS and shown in the 1st IO-APIC dmesg output above. will send "official" patch shortly...
Created attachment 862 [details] please try this proposed patch against 2.4.22
Patch works as expected.
Is this merged back yet? can we close it?
Fix has been merged in 2.4 and 2.6.
The fix broke other machines.does Winxp work in the box?
The machine is a permanently running Linux router. Running WinXP on it is not an option.
in fixing bug 1622 we got clarification on the ACPI spec that if no interrupt source over-ride is present, that the OS should program the SCI as level/low. 2.4.26 and 2.6.5 now do this. If our previous experiments on this board are valid, then the board is out of spec and that fix will break it. Boot-time parameters acpi_sci=edge acpi_sci=high are now available in that event. Let me know how it goes. thanks, -Len