Bug 774 - ACPI SCI interrupt storm on Tyan Tiger MB in APIC mode
Summary: ACPI SCI interrupt storm on Tyan Tiger MB in APIC mode
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-04 10:40 UTC by Udo A. Steinberg
Modified: 2004-04-12 21:04 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.5.70 (earlier versions as well)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpidmp output (152 bytes, text/html)
2003-08-25 02:48 UTC, Udo A. Steinberg
Details
dmesg output (150 bytes, text/html)
2003-08-25 02:49 UTC, Udo A. Steinberg
Details
patch for collecting IO-APIC config infomation (452 bytes, patch)
2003-08-29 06:08 UTC, Luming Yu
Details | Diff
dmesg output with collected IO-APIC config information (164 bytes, text/html)
2003-08-29 09:03 UTC, Udo A. Steinberg
Details
what could happen ? (592 bytes, patch)
2003-08-31 20:45 UTC, Luming Yu
Details | Diff
dmesg output with changed SCI interrupt (162 bytes, text/html)
2003-09-01 03:41 UTC, Udo A. Steinberg
Details
Output of /proc/interrupts with changed SCI interrupt (172 bytes, text/html)
2003-09-01 03:43 UTC, Udo A. Steinberg
Details
I hope this patch can resolve your problem! (446 bytes, patch)
2003-09-05 06:29 UTC, Luming Yu
Details | Diff
To verify if default setting can work (425 bytes, patch)
2003-09-05 09:34 UTC, Luming Yu
Details | Diff
dmesg output of 2.4.23-pre2 + patches (attachment id #819 + #820). (174 bytes, text/html)
2003-09-05 10:03 UTC, Udo A. Steinberg
Details
/proc/interrupts of 2.4.23-pre2 + patches (attachment id #819 + #820). (184 bytes, text/html)
2003-09-05 10:06 UTC, Udo A. Steinberg
Details
please try this proposed patch against 2.4.22 (1.98 KB, patch)
2003-09-10 20:49 UTC, Len Brown
Details | Diff

Description Udo A. Steinberg 2003-06-04 10:40:41 UTC
Distribution: Slackware 9.0

Hardware Environment: Tyan Tiger S2723GNN mainboard, 1 GB RAM, two Xeon 2.66 GHz
CPU with Hyperthreading, Bios Version 1.02 (newest)

Software Environment: Unmodified monolithic 2.5.70 Linux kernel, booted with
Lilo 22.5

Problem Description:
During ACPI initialization I get 100 times the following output:

ACPI: Subsystem revision 20030522
 irq 9: nobody cared!
 Call Trace:
  [<c010b4a2>] handle_IRQ_event+0x87/0xf7
  [<c010b758>] do_IRQ+0xbe/0x17b
  [<c0109b78>] common_interrupt+0x18/0x20
  [<c010be45>] setup_irq+0xc3/0xff
  [<c022bb36>] acpi_irq+0x0/0x16
  [<c022bb36>] acpi_irq+0x0/0x16
  [<c010b8be>] request_irq+0xa9/0xde
  [<c022bb86>] acpi_os_install_interrupt_handler+0x3a/0x59
  [<c022bb36>] acpi_irq+0x0/0x16
  [<c022bb36>] acpi_irq+0x0/0x16
  [<c022fb49>] acpi_ev_install_sci_handler+0x1a/0x1e
  [<c022fb0c>] acpi_ev_sci_xrupt_handler+0x0/0x18
  [<c022f58b>] acpi_ev_handler_initialize+0x6/0x6e
  [<c024056e>] acpi_enable_subsystem+0x2b/0x58
  [<c03e964c>] acpi_bus_init+0x7a/0x115
  [<c03e973d>] acpi_init+0x56/0xa6
  [<c03d685a>] do_initcalls+0x28/0x94
  [<c0130660>] init_workqueues+0xf/0x26
  [<c01050c9>] init+0x5a/0x1d1
  [<c010506f>] init+0x0/0x1d1
  [<c0107075>] kernel_thread_helper+0x5/0xb

 handlers:
 [<c022bb36>] (acpi_irq+0x0/0x16)

Additionally the number of ACPI interrupts is excessive, nearly 4.5 million ACPI
interrupts per minute. The machine does not have this problem when running
2.4.21-rc7, where the ACPI interrupt is configured in XT-PIC mode, whereas
2.5.70 configures it to operate in IO-APIC-level mode.

robert.moore@intel.com says it may be due to a missing interrupt source
override for the ACPI SCI interrupt.

The ACPI backport in 2.4.21-ac seems to have a similar problem (see Bug #370)

Steps to reproduce:
Run 2.5.70 on aforementioned board. Alternatively send me patches to try or ask
me for additional information.
Comment 1 Luming Yu 2003-08-24 21:02:05 UTC
Make sure you have latest BIOS version, and Latest ACPI .
If you still have problem, please attach dmesg as well as output of acpidmp
Comment 2 Udo A. Steinberg 2003-08-25 02:48:30 UTC
Created attachment 724 [details]
acpidmp output

Used Linux 2.4.22-rc3 (should be latest ACPI)
Used latest Tyan BIOS
Comment 3 Udo A. Steinberg 2003-08-25 02:49:57 UTC
Created attachment 725 [details]
dmesg output

Used Linux 2.4.22-rc3 (should be latest ACPI)
Used latest Tyan BIOS

Problem still persists. IRQ is triggered excessively

	   CPU0       CPU1	 CPU2	    CPU3       
  0:	   2125 	 0	    0	   44845    IO-APIC-edge  timer
  1:	      2 	 0	    0	       0    IO-APIC-edge  keyboard
  2:	      0 	 0	    0	       0	  XT-PIC  cascade
  8:	      1 	 0	    0	       0    IO-APIC-edge  rtc
  9:	1664680    1697269   36288240	       0   IO-APIC-level  acpi
 14:	   1228 	19	  745	       0    IO-APIC-edge  ide0
 17:	    121      28567	    0	       0   IO-APIC-level  eth3
 19:	  91028 	 0	    0	       0   IO-APIC-level  eth4
 24:	      7 	 0	    0	     231   IO-APIC-level  eth2
 48:	    153 	 0	 2325	       0   IO-APIC-level  eth0
NMI:	  46849      46849	46849	   46849 
LOC:	  46826      46825	46824	   46823 
ERR:	      0
MIS:	      0
Comment 4 Udo A. Steinberg 2003-08-25 03:03:14 UTC
Comment on attachment 724 [details]
acpidmp output

><HTML><HEAD/>
><BODY>
><H1><A href="http://hell.wh8.tu-dresden.de/acpi/acpidmp">http://hell.wh8.tu-dresden.de/acpi/acpidmp</A></H1>
></BODY></HTML>
Comment 5 Shaohua 2003-08-29 03:02:01 UTC
NMI:	  46849      46849	46849	   46849 
it's abnormal. ?
Comment 6 Udo A. Steinberg 2003-08-29 04:03:14 UTC
It's not abnormal. The kernel is running with NMI watchdog. See kernel command
line in dmesg output. When booting without NMI watchdog there are no NMIs.
Comment 7 Luming Yu 2003-08-29 06:08:28 UTC
Created attachment 765 [details]
patch for collecting IO-APIC config infomation 

Would you please apply this patch, and send out demsg?
I want to verify whether IO-APIC is configured as expected.

Thanks a lot.
Comment 8 Udo A. Steinberg 2003-08-29 09:03:02 UTC
Created attachment 766 [details]
dmesg output with collected IO-APIC config information

Patch was originally for 2.5, I've ported it to 2.4.22, since that's what the
machine is currently running. Results should be the same.
Comment 9 Luming Yu 2003-08-31 20:45:10 UTC
Created attachment 777 [details]
what could happen ?

I want to see what could happen, if irq for SCI gets changed.
Comment 10 Udo A. Steinberg 2003-09-01 03:41:37 UTC
Created attachment 778 [details]
dmesg output with changed SCI interrupt

Changing SCI interrupt to 11 fixes the problem.
Comment 11 Udo A. Steinberg 2003-09-01 03:43:42 UTC
Created attachment 779 [details]
Output of /proc/interrupts with changed SCI interrupt

Changing ACPI interrupt to 11 no longer causes excessive interrupts as
/proc/interrupts shows.
Comment 12 Luming Yu 2003-09-05 06:29:12 UTC
Created attachment 819 [details]
I hope this patch can resolve your problem!
Comment 13 Udo A. Steinberg 2003-09-05 07:07:49 UTC
The patch does not fix the problem. With the patch applied the ACPI interrupt is
still IRQ9 and is being triggered excessively. The only patch that helped so far
was the one that configured the ACPI interrupt to IRQ11 the hard way.
Comment 14 Luming Yu 2003-09-05 09:34:55 UTC
Created attachment 820 [details]
To verify if default setting can work

Would you please have it a try( with privious patch applied), and post dmesg
and interrupts. Thanks a lot.
Comment 15 Udo A. Steinberg 2003-09-05 10:03:48 UTC
Created attachment 821 [details]
dmesg output of 2.4.23-pre2 + patches (attachment id #819 + #820).

The combination of both patches fixes the problem again, now with ACPI
interrupt at IRQ9.
Comment 16 Udo A. Steinberg 2003-09-05 10:06:56 UTC
Created attachment 822 [details]
/proc/interrupts of 2.4.23-pre2 + patches (attachment id #819 + #820).

The combination of both patches fixes the problem again, now with ACPI
interrupt at IRQ9.

However, there are now only 2 CPUs. There should be 4, because the machine has
Hyperthreading.
Comment 17 Len Brown 2003-09-06 20:50:07 UTC
Re: HT 
Total of 2 processors activated (10643.04 BogoMIPS). 
cpu_sibling_map[0] = 1 
cpu_sibling_map[1] = 0 
 
This dmsg output shows that the two siblings inside a package came up, but neither of the 
siblings inside the other package came up.  Unclear why or how the code changes in this report 
could have an effect on HT.  BTW acpismp=force should not be needed in 2.4.23 -- let me know 
if it changes the behaviour b/c I removed it in 2.4.22... 
 
Re: IRQ9 storm 
stopping the storm by masking or mis-programming IRQ9 is a clue, but the real question is what 
conditions are necessary for you to correctly receive and handle acpi interrupts. 
 
Is it possible for you to go back to your base 2.4.22 or 2.4.23 source tree and try only these three 
combinations one at a time?: 
 
arch/i386/kernel/mpparse.c:  
void __init mp_config_ioapic_for_sci(int irq)  
...  
old:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low, level triggered  
new:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high, level triggered  
or  
new:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low, edge triggered  
or  
new:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high, edge triggered  
 
and check if 
1. the IRQ9 storm stopped 
2. the other interrupts in the system work 
3. you can provoke and handle an acpi interrupt, eg. power button does something 
(and I guess also verify that you have all 4 logical processors) 
 
 
Comment 18 Udo A. Steinberg 2003-09-10 18:35:44 UTC
Re: Only one physical package with two siblings came up. -- This is unrelated to
any of the patches posted here and is caused by changes introduced somewhere in
the 2.4.23-pre patches. I have gone back to plain 2.4.22 for testing this issue.
 
Re: IRQ9 storm  (with 2.4.22 there are always 4 logical CPUs)

old:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 1); // Active low,
level triggered  -- causes IRQ9 storm, other ints work, unclear whether acpi int
works due to storm

new:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 1); // Active high,
level triggered  -- no IRQ9 storm, other ints work, acpi int doesn't work

new:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 1, 0); // Active low,
edge triggered  -- no IRQ9 storm, other ints work, acpi ints works (pwr button)

new:    io_apic_set_pci_routing(ioapic, ioapic_pin, irq, 0, 0); // Active high,
edge triggered  -- no IRQ9 storm, other ints work, acpi int works (pwr button)

I can also supply dmesg output for all 4 testcases with 2.4.22, but they only
differ in the output of the following line:

IOAPIC[0]: Set PCI routing entry (8-9 -> 0x71 -> IRQ 9 Mode:X Active:Y)

where X and Y form the four combinations mentioned above.
Comment 19 Len Brown 2003-09-10 20:09:30 UTC
Thanks for testing the combinations -- I think the solution is now clear.

The SCI hardware on this board is sending a pulse low->high->low and so either 
of the edge triggers work, but it isn't high long enough to trigger the the 
level-high interrupt.

Seems the fix for this bug is if the SCI has no over-ride, that we should 
leave it alone instead of forcing it to level/low.  In this case it would have 
remained at edge/high as programmed by the BIOS and shown in the 1st IO-APIC 
dmesg output above.

will send "official" patch shortly...

Comment 20 Len Brown 2003-09-10 20:49:43 UTC
Created attachment 862 [details]
please try this proposed patch against 2.4.22
Comment 21 Udo A. Steinberg 2003-09-10 21:12:56 UTC
Patch works as expected.
Comment 22 Martin J. Bligh 2003-11-26 16:20:29 UTC
Is this merged back yet? can we close it?
Comment 23 Udo A. Steinberg 2003-11-26 16:33:42 UTC
Fix has been merged in 2.4 and 2.6. 
Comment 24 Shaohua 2003-12-01 03:51:24 UTC
The fix broke other machines.does Winxp work in the box? 
Comment 25 Udo A. Steinberg 2003-12-01 10:16:27 UTC
The machine is a permanently running Linux router. Running WinXP on it is not an
option.
Comment 26 Len Brown 2004-04-12 21:04:54 UTC
in fixing bug 1622 we got clarification on the ACPI spec that if 
no interrupt source over-ride is present, that the OS should 
program the SCI as level/low.  2.4.26 and 2.6.5 now do this. 
 
If our previous experiments on this board are valid, then 
the board is out of spec and that fix will break it. 
Boot-time parameters 
acpi_sci=edge 
acpi_sci=high 
are now available in that event.  Let me know how it goes. 
 
thanks, 
-Len 
 

Note You need to log in before you can comment on or make changes to this bug.