|Summary:||ACPI SCI interrupt storm on Tyan Tiger MB in APIC mode|
|Product:||ACPI||Reporter:||Udo A. Steinberg (us15)|
|Component:||Config-Interrupts||Assignee:||Len Brown (lenb)|
|Kernel Version:||2.5.70 (earlier versions as well)||Tree:||Mainline|
patch for collecting IO-APIC config infomation
dmesg output with collected IO-APIC config information
what could happen ?
dmesg output with changed SCI interrupt
Output of /proc/interrupts with changed SCI interrupt
I hope this patch can resolve your problem!
To verify if default setting can work
dmesg output of 2.4.23-pre2 + patches (attachment id #819 + #820).
/proc/interrupts of 2.4.23-pre2 + patches (attachment id #819 + #820).
please try this proposed patch against 2.4.22
Description Udo A. Steinberg 2003-06-04 10:40:41 UTC
Distribution: Slackware 9.0 Hardware Environment: Tyan Tiger S2723GNN mainboard, 1 GB RAM, two Xeon 2.66 GHz CPU with Hyperthreading, Bios Version 1.02 (newest) Software Environment: Unmodified monolithic 2.5.70 Linux kernel, booted with Lilo 22.5 Problem Description: During ACPI initialization I get 100 times the following output: ACPI: Subsystem revision 20030522 irq 9: nobody cared! Call Trace: [<c010b4a2>] handle_IRQ_event+0x87/0xf7 [<c010b758>] do_IRQ+0xbe/0x17b [<c0109b78>] common_interrupt+0x18/0x20 [<c010be45>] setup_irq+0xc3/0xff [<c022bb36>] acpi_irq+0x0/0x16 [<c022bb36>] acpi_irq+0x0/0x16 [<c010b8be>] request_irq+0xa9/0xde [<c022bb86>] acpi_os_install_interrupt_handler+0x3a/0x59 [<c022bb36>] acpi_irq+0x0/0x16 [<c022bb36>] acpi_irq+0x0/0x16 [<c022fb49>] acpi_ev_install_sci_handler+0x1a/0x1e [<c022fb0c>] acpi_ev_sci_xrupt_handler+0x0/0x18 [<c022f58b>] acpi_ev_handler_initialize+0x6/0x6e [<c024056e>] acpi_enable_subsystem+0x2b/0x58 [<c03e964c>] acpi_bus_init+0x7a/0x115 [<c03e973d>] acpi_init+0x56/0xa6 [<c03d685a>] do_initcalls+0x28/0x94 [<c0130660>] init_workqueues+0xf/0x26 [<c01050c9>] init+0x5a/0x1d1 [<c010506f>] init+0x0/0x1d1 [<c0107075>] kernel_thread_helper+0x5/0xb handlers: [<c022bb36>] (acpi_irq+0x0/0x16) Additionally the number of ACPI interrupts is excessive, nearly 4.5 million ACPI interrupts per minute. The machine does not have this problem when running 2.4.21-rc7, where the ACPI interrupt is configured in XT-PIC mode, whereas 2.5.70 configures it to operate in IO-APIC-level mode. firstname.lastname@example.org says it may be due to a missing interrupt source override for the ACPI SCI interrupt. The ACPI backport in 2.4.21-ac seems to have a similar problem (see Bug #370) Steps to reproduce: Run 2.5.70 on aforementioned board. Alternatively send me patches to try or ask me for additional information.
Comment 1 Luming Yu 2003-08-24 21:02:05 UTC
Make sure you have latest BIOS version, and Latest ACPI . If you still have problem, please attach dmesg as well as output of acpidmp
Comment 2 Udo A. Steinberg 2003-08-25 02:48:30 UTC
Created attachment 724 [details] acpidmp output Used Linux 2.4.22-rc3 (should be latest ACPI) Used latest Tyan BIOS
Comment 3 Udo A. Steinberg 2003-08-25 02:49:57 UTC
Created attachment 725 [details] dmesg output Used Linux 2.4.22-rc3 (should be latest ACPI) Used latest Tyan BIOS Problem still persists. IRQ is triggered excessively CPU0 CPU1 CPU2 CPU3 0: 2125 0 0 44845 IO-APIC-edge timer 1: 2 0 0 0 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 0 IO-APIC-edge rtc 9: 1664680 1697269 36288240 0 IO-APIC-level acpi 14: 1228 19 745 0 IO-APIC-edge ide0 17: 121 28567 0 0 IO-APIC-level eth3 19: 91028 0 0 0 IO-APIC-level eth4 24: 7 0 0 231 IO-APIC-level eth2 48: 153 0 2325 0 IO-APIC-level eth0 NMI: 46849 46849 46849 46849 LOC: 46826 46825 46824 46823 ERR: 0 MIS: 0
Comment 4 Udo A. Steinberg 2003-08-25 03:03:14 UTC
Comment on attachment 724 [details] acpidmp output ><HTML><HEAD/> ><BODY> ><H1><A href="http://hell.wh8.tu-dresden.de/acpi/acpidmp">http://hell.wh8.tu-dresden.de/acpi/acpidmp</A></H1> ></BODY></HTML>
Comment 5 Shaohua 2003-08-29 03:02:01 UTC
NMI: 46849 46849 46849 46849 it's abnormal. ?
Comment 6 Udo A. Steinberg 2003-08-29 04:03:14 UTC
It's not abnormal. The kernel is running with NMI watchdog. See kernel command line in dmesg output. When booting without NMI watchdog there are no NMIs.
Comment 7 Luming Yu 2003-08-29 06:08:28 UTC
Created attachment 765 [details] patch for collecting IO-APIC config infomation Would you please apply this patch, and send out demsg? I want to verify whether IO-APIC is configured as expected. Thanks a lot.
Comment 8 Udo A. Steinberg 2003-08-29 09:03:02 UTC
Created attachment 766 [details] dmesg output with collected IO-APIC config information Patch was originally for 2.5, I've ported it to 2.4.22, since that's what the machine is currently running. Results should be the same.
Comment 9 Luming Yu 2003-08-31 20:45:10 UTC
Created attachment 777 [details] what could happen ? I want to see what could happen, if irq for SCI gets changed.
Comment 10 Udo A. Steinberg 2003-09-01 03:41:37 UTC
Created attachment 778 [details] dmesg output with changed SCI interrupt Changing SCI interrupt to 11 fixes the problem.
Comment 11 Udo A. Steinberg 2003-09-01 03:43:42 UTC
Created attachment 779 [details] Output of /proc/interrupts with changed SCI interrupt Changing ACPI interrupt to 11 no longer causes excessive interrupts as /proc/interrupts shows.
Comment 12 Luming Yu 2003-09-05 06:29:12 UTC
Created attachment 819 [details] I hope this patch can resolve your problem!
Comment 13 Udo A. Steinberg 2003-09-05 07:07:49 UTC
The patch does not fix the problem. With the patch applied the ACPI interrupt is still IRQ9 and is being triggered excessively. The only patch that helped so far was the one that configured the ACPI interrupt to IRQ11 the hard way.
Comment 14 Luming Yu 2003-09-05 09:34:55 UTC
Created attachment 820 [details] To verify if default setting can work Would you please have it a try( with privious patch applied), and post dmesg and interrupts. Thanks a lot.
Comment 15 Udo A. Steinberg 2003-09-05 10:03:48 UTC
Created attachment 821 [details] dmesg output of 2.4.23-pre2 + patches (attachment id #819 + #820). The combination of both patches fixes the problem again, now with ACPI interrupt at IRQ9.
Comment 16 Udo A. Steinberg 2003-09-05 10:06:56 UTC
Created attachment 822 [details] /proc/interrupts of 2.4.23-pre2 + patches (attachment id #819 + #820). The combination of both patches fixes the problem again, now with ACPI interrupt at IRQ9. However, there are now only 2 CPUs. There should be 4, because the machine has Hyperthreading.
Comment 17 Len Brown 2003-09-06 20:50:07 UTC
Re: HT Total of 2 processors activated (10643.04 BogoMIPS). cpu_sibling_map = 1 cpu_sibling_map = 0 This dmsg output shows that the two siblings inside a package came up, but neither of the siblings inside the other package came up. Unclear why or how the code changes in this report could have an effect on HT. BTW acpismp=force should not be needed in 2.4.23 -- let me know if it changes the behaviour b/c I removed it in 2.4.22... Re: IRQ9 storm stopping the storm by masking or mis-programming IRQ9 is a clue, but the real question is what conditions are necessary for you to correctly receive and handle acpi interrupts. Is it possible for you to go back to your base 2.4.22 or 2.4.23 source tree and try only these three combinations one at a time?: arch/i386/kernel/mpparse.c: void __init mp_config_ioapic_for_sci(int irq) ... old: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low, level triggered new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high, level triggered or new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low, edge triggered or new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high, edge triggered and check if 1. the IRQ9 storm stopped 2. the other interrupts in the system work 3. you can provoke and handle an acpi interrupt, eg. power button does something (and I guess also verify that you have all 4 logical processors)
Comment 18 Udo A. Steinberg 2003-09-10 18:35:44 UTC
Re: Only one physical package with two siblings came up. -- This is unrelated to any of the patches posted here and is caused by changes introduced somewhere in the 2.4.23-pre patches. I have gone back to plain 2.4.22 for testing this issue. Re: IRQ9 storm (with 2.4.22 there are always 4 logical CPUs) old: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low, level triggered -- causes IRQ9 storm, other ints work, unclear whether acpi int works due to storm new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high, level triggered -- no IRQ9 storm, other ints work, acpi int doesn't work new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low, edge triggered -- no IRQ9 storm, other ints work, acpi ints works (pwr button) new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high, edge triggered -- no IRQ9 storm, other ints work, acpi int works (pwr button) I can also supply dmesg output for all 4 testcases with 2.4.22, but they only differ in the output of the following line: IOAPIC: Set PCI routing entry (8-9 -> 0x71 -> IRQ 9 Mode:X Active:Y) where X and Y form the four combinations mentioned above.
Comment 19 Len Brown 2003-09-10 20:09:30 UTC
Thanks for testing the combinations -- I think the solution is now clear. The SCI hardware on this board is sending a pulse low->high->low and so either of the edge triggers work, but it isn't high long enough to trigger the the level-high interrupt. Seems the fix for this bug is if the SCI has no over-ride, that we should leave it alone instead of forcing it to level/low. In this case it would have remained at edge/high as programmed by the BIOS and shown in the 1st IO-APIC dmesg output above. will send "official" patch shortly...
Comment 20 Len Brown 2003-09-10 20:49:43 UTC
Created attachment 862 [details] please try this proposed patch against 2.4.22
Comment 21 Udo A. Steinberg 2003-09-10 21:12:56 UTC
Patch works as expected.
Comment 22 Martin J. Bligh 2003-11-26 16:20:29 UTC
Is this merged back yet? can we close it?
Comment 23 Udo A. Steinberg 2003-11-26 16:33:42 UTC
Fix has been merged in 2.4 and 2.6.
Comment 24 Shaohua 2003-12-01 03:51:24 UTC
The fix broke other machines.does Winxp work in the box?
Comment 25 Udo A. Steinberg 2003-12-01 10:16:27 UTC
The machine is a permanently running Linux router. Running WinXP on it is not an option.
Comment 26 Len Brown 2004-04-12 21:04:54 UTC
in fixing bug 1622 we got clarification on the ACPI spec that if no interrupt source over-ride is present, that the OS should program the SCI as level/low. 2.4.26 and 2.6.5 now do this. If our previous experiments on this board are valid, then the board is out of spec and that fix will break it. Boot-time parameters acpi_sci=edge acpi_sci=high are now available in that event. Let me know how it goes. thanks, -Len