Bug 774
Summary: | ACPI SCI interrupt storm on Tyan Tiger MB in APIC mode | ||
---|---|---|---|
Product: | ACPI | Reporter: | Udo A. Steinberg (us15) |
Component: | Config-Interrupts | Assignee: | Len Brown (lenb) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.5.70 (earlier versions as well) | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
acpidmp output
dmesg output patch for collecting IO-APIC config infomation dmesg output with collected IO-APIC config information what could happen ? dmesg output with changed SCI interrupt Output of /proc/interrupts with changed SCI interrupt I hope this patch can resolve your problem! To verify if default setting can work dmesg output of 2.4.23-pre2 + patches (attachment id #819 + #820). /proc/interrupts of 2.4.23-pre2 + patches (attachment id #819 + #820). please try this proposed patch against 2.4.22 |
Description
Udo A. Steinberg
2003-06-04 10:40:41 UTC
Make sure you have latest BIOS version, and Latest ACPI . If you still have problem, please attach dmesg as well as output of acpidmp Created attachment 724 [details]
acpidmp output
Used Linux 2.4.22-rc3 (should be latest ACPI)
Used latest Tyan BIOS
Created attachment 725 [details]
dmesg output
Used Linux 2.4.22-rc3 (should be latest ACPI)
Used latest Tyan BIOS
Problem still persists. IRQ is triggered excessively
CPU0 CPU1 CPU2 CPU3
0: 2125 0 0 44845 IO-APIC-edge timer
1: 2 0 0 0 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
8: 1 0 0 0 IO-APIC-edge rtc
9: 1664680 1697269 36288240 0 IO-APIC-level acpi
14: 1228 19 745 0 IO-APIC-edge ide0
17: 121 28567 0 0 IO-APIC-level eth3
19: 91028 0 0 0 IO-APIC-level eth4
24: 7 0 0 231 IO-APIC-level eth2
48: 153 0 2325 0 IO-APIC-level eth0
NMI: 46849 46849 46849 46849
LOC: 46826 46825 46824 46823
ERR: 0
MIS: 0
Comment on attachment 724 [details] acpidmp output ><HTML><HEAD/> ><BODY> ><H1><A href="http://hell.wh8.tu-dresden.de/acpi/acpidmp">http://hell.wh8.tu-dresden.de/acpi/acpidmp</A></H1> ></BODY></HTML> NMI: 46849 46849 46849 46849 it's abnormal. ? It's not abnormal. The kernel is running with NMI watchdog. See kernel command line in dmesg output. When booting without NMI watchdog there are no NMIs. Created attachment 765 [details]
patch for collecting IO-APIC config infomation
Would you please apply this patch, and send out demsg?
I want to verify whether IO-APIC is configured as expected.
Thanks a lot.
Created attachment 766 [details]
dmesg output with collected IO-APIC config information
Patch was originally for 2.5, I've ported it to 2.4.22, since that's what the
machine is currently running. Results should be the same.
Created attachment 777 [details]
what could happen ?
I want to see what could happen, if irq for SCI gets changed.
Created attachment 778 [details]
dmesg output with changed SCI interrupt
Changing SCI interrupt to 11 fixes the problem.
Created attachment 779 [details]
Output of /proc/interrupts with changed SCI interrupt
Changing ACPI interrupt to 11 no longer causes excessive interrupts as
/proc/interrupts shows.
Created attachment 819 [details]
I hope this patch can resolve your problem!
The patch does not fix the problem. With the patch applied the ACPI interrupt is still IRQ9 and is being triggered excessively. The only patch that helped so far was the one that configured the ACPI interrupt to IRQ11 the hard way. Created attachment 820 [details]
To verify if default setting can work
Would you please have it a try( with privious patch applied), and post dmesg
and interrupts. Thanks a lot.
Created attachment 821 [details]
dmesg output of 2.4.23-pre2 + patches (attachment id #819 + #820).
The combination of both patches fixes the problem again, now with ACPI
interrupt at IRQ9.
Created attachment 822 [details]
/proc/interrupts of 2.4.23-pre2 + patches (attachment id #819 + #820).
The combination of both patches fixes the problem again, now with ACPI
interrupt at IRQ9.
However, there are now only 2 CPUs. There should be 4, because the machine has
Hyperthreading.
Re: HT Total of 2 processors activated (10643.04 BogoMIPS). cpu_sibling_map[0] = 1 cpu_sibling_map[1] = 0 This dmsg output shows that the two siblings inside a package came up, but neither of the siblings inside the other package came up. Unclear why or how the code changes in this report could have an effect on HT. BTW acpismp=force should not be needed in 2.4.23 -- let me know if it changes the behaviour b/c I removed it in 2.4.22... Re: IRQ9 storm stopping the storm by masking or mis-programming IRQ9 is a clue, but the real question is what conditions are necessary for you to correctly receive and handle acpi interrupts. Is it possible for you to go back to your base 2.4.22 or 2.4.23 source tree and try only these three combinations one at a time?: arch/i386/kernel/mpparse.c: void __init mp_config_ioapic_for_sci(int irq) ... old: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low, level triggered new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high, level triggered or new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low, edge triggered or new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high, edge triggered and check if 1. the IRQ9 storm stopped 2. the other interrupts in the system work 3. you can provoke and handle an acpi interrupt, eg. power button does something (and I guess also verify that you have all 4 logical processors) Re: Only one physical package with two siblings came up. -- This is unrelated to any of the patches posted here and is caused by changes introduced somewhere in the 2.4.23-pre patches. I have gone back to plain 2.4.22 for testing this issue. Re: IRQ9 storm (with 2.4.22 there are always 4 logical CPUs) old: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low, level triggered -- causes IRQ9 storm, other ints work, unclear whether acpi int works due to storm new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high, level triggered -- no IRQ9 storm, other ints work, acpi int doesn't work new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low, edge triggered -- no IRQ9 storm, other ints work, acpi ints works (pwr button) new: io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high, edge triggered -- no IRQ9 storm, other ints work, acpi int works (pwr button) I can also supply dmesg output for all 4 testcases with 2.4.22, but they only differ in the output of the following line: IOAPIC[0]: Set PCI routing entry (8-9 -> 0x71 -> IRQ 9 Mode:X Active:Y) where X and Y form the four combinations mentioned above. Thanks for testing the combinations -- I think the solution is now clear. The SCI hardware on this board is sending a pulse low->high->low and so either of the edge triggers work, but it isn't high long enough to trigger the the level-high interrupt. Seems the fix for this bug is if the SCI has no over-ride, that we should leave it alone instead of forcing it to level/low. In this case it would have remained at edge/high as programmed by the BIOS and shown in the 1st IO-APIC dmesg output above. will send "official" patch shortly... Created attachment 862 [details]
please try this proposed patch against 2.4.22
Patch works as expected. Is this merged back yet? can we close it? Fix has been merged in 2.4 and 2.6. The fix broke other machines.does Winxp work in the box? The machine is a permanently running Linux router. Running WinXP on it is not an option. in fixing bug 1622 we got clarification on the ACPI spec that if no interrupt source over-ride is present, that the OS should program the SCI as level/low. 2.4.26 and 2.6.5 now do this. If our previous experiments on this board are valid, then the board is out of spec and that fix will break it. Boot-time parameters acpi_sci=edge acpi_sci=high are now available in that event. Let me know how it goes. thanks, -Len |