Bug 2574 - IRQ21 flood on nForce2 any time IRQ21 is active
Summary: IRQ21 flood on nForce2 any time IRQ21 is active
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Len Brown
URL:
Keywords:
: 2227 2570 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-04-22 20:09 UTC by Noel Maddy
Modified: 2004-11-03 17:36 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpidmp output (83.19 KB, text/plain)
2004-04-22 20:10 UTC, Noel Maddy
Details
dmidecode output (10.76 KB, text/plain)
2004-04-22 20:11 UTC, Noel Maddy
Details
dmesg with pci=noacpi (11.85 KB, text/plain)
2004-04-22 20:12 UTC, Noel Maddy
Details
dmesg with acpi enabled (16.17 KB, text/plain)
2004-04-22 20:12 UTC, Noel Maddy
Details
interrupt behavior with both ehci_hcd and snd-intel_8x0 (4.76 KB, text/plain)
2004-04-22 20:13 UTC, Noel Maddy
Details
interrupt behavior with only ehci_hcd loaded (4.64 KB, text/plain)
2004-04-22 20:14 UTC, Noel Maddy
Details
interrupt behavior with only snd-intel_8x0 loaded (4.70 KB, text/plain)
2004-04-22 20:14 UTC, Noel Maddy
Details
behavior with neither ehci_hcd nor snd-intel_8x0 loaded (4.50 KB, text/plain)
2004-04-22 20:15 UTC, Noel Maddy
Details
lspci -v output (5.22 KB, text/plain)
2004-04-22 20:15 UTC, Noel Maddy
Details
dmesg with acpi_irq_isa=21 (16.17 KB, text/plain)
2004-04-23 00:37 UTC, Noel Maddy
Details
behavior with acpi_irq_isa=21 (4.44 KB, text/plain)
2004-04-23 00:38 UTC, Noel Maddy
Details
interrupts with IOAPIC/ACPI (571 bytes, text/plain)
2004-05-07 13:07 UTC, Julien Ducourthial
Details
interrupts on XP (1.01 KB, text/plain)
2004-05-07 13:27 UTC, Julien Ducourthial
Details
dmesg from 2.6.6-mm2 (22.43 KB, text/plain)
2004-05-15 08:46 UTC, Noel Maddy
Details
hack disabling AP3C PCI interrupt link (443 bytes, patch)
2004-05-21 05:07 UTC, Noel Maddy
Details | Diff
dmesg with AP3C hack (22.76 KB, text/plain)
2004-05-21 05:08 UTC, Noel Maddy
Details
lspci -vv (2.6.6-mm4 without hack) (7.36 KB, text/plain)
2004-05-23 21:31 UTC, Noel Maddy
Details
lspci -vv (2.6.6-mm4 without hack) (8.78 KB, text/plain)
2004-05-23 21:35 UTC, Noel Maddy
Details
Bjorn's PRT patch vs 2.6.7 (32.88 KB, patch)
2004-05-23 23:03 UTC, Len Brown
Details | Diff
/proc/interrupts with bjorn's patch (509 bytes, text/plain)
2004-05-24 06:27 UTC, Noel Maddy
Details
dmesg on 2.6.6-mm4 with bjorn's patch (24.17 KB, text/plain)
2004-05-24 06:28 UTC, Noel Maddy
Details
Bjorn's PRT patch vs 2.6.7 (updated) (34.50 KB, patch)
2004-05-24 22:35 UTC, Len Brown
Details | Diff

Description Noel Maddy 2004-04-22 20:09:17 UTC
Distribution: Debian unstable
Hardware Environment: Chaintech 7NIF2, XP 1800+
Software Environment:
Problem Description:

When IRQ 21 is in use, it gives > 100k interrupts/sec.

I'm using 2.6.5 with Ross's c1 patch and Len's acpi_skip_timer_override patch.

Both echi_hcd and intel_8x0 use IRQ 21. I have them both compiled as modules.
When either or both modules are inserted, I get between 100k and 200k interrupts
per second on IRQ 21.

When either or both are compiled into the kernel, I get the same interrupt flood

Steps to reproduce:

Boot into ACPI kernel with ehci_hcd or intel_8x0 active
Comment 1 Noel Maddy 2004-04-22 20:10:43 UTC
Created attachment 2655 [details]
acpidmp output
Comment 2 Noel Maddy 2004-04-22 20:11:31 UTC
Created attachment 2656 [details]
dmidecode output
Comment 3 Noel Maddy 2004-04-22 20:12:04 UTC
Created attachment 2657 [details]
dmesg with pci=noacpi
Comment 4 Noel Maddy 2004-04-22 20:12:29 UTC
Created attachment 2658 [details]
dmesg with acpi enabled
Comment 5 Noel Maddy 2004-04-22 20:13:33 UTC
Created attachment 2659 [details]
interrupt behavior with both ehci_hcd and snd-intel_8x0
Comment 6 Noel Maddy 2004-04-22 20:14:12 UTC
Created attachment 2660 [details]
interrupt behavior with only ehci_hcd loaded
Comment 7 Noel Maddy 2004-04-22 20:14:45 UTC
Created attachment 2661 [details]
interrupt behavior with only snd-intel_8x0 loaded
Comment 8 Noel Maddy 2004-04-22 20:15:19 UTC
Created attachment 2662 [details]
behavior with neither ehci_hcd nor snd-intel_8x0 loaded
Comment 9 Noel Maddy 2004-04-22 20:15:45 UTC
Created attachment 2663 [details]
lspci -v output
Comment 10 Len Brown 2004-04-22 22:25:08 UTC
*** Bug 2570 has been marked as a duplicate of this bug. ***
Comment 11 Len Brown 2004-04-22 22:46:45 UTC
Has IRQ21 ever worked on this board with any version of ACPI+IOAPIC  
enabled Linux?  Does Windows give the same interrupt mapping 
and do the devices work? 
  
Comparison to pci=noacpi doesn't tell us much because lack of MPS  
in the BIOS causes that configuration to run in PIC mode.  
  
IOAPIC programming looks fine.  
  
Unfortunately, both ehci_irq() and snd_intel8x0_interrupt()  
are hard coded to return IRQ_HANDLED.  So if the interrupts  
are all spurious (as we suspect), the IRQ will not get shut down  
b/c the drivers are claiming them no matter what.  We may 
need to instrument these drivers to confirm that they're actually 
not seeing any interrupts for their hardware. 
  
random stabs  
1. any difference if you get rid of the I2C stuff in your kernel?  
2. any difference if you boot with  "acpi_irq_isa=21" 
    this may simply move the problem to a different IRQ. 
3. If the interrupts are due to another device which is not 
    registering a driver on this IRQ (current prime suspect) 
    it would be interesting if you could go through BIOS SETUP 
    and disable as many on-board devices as possible to see 
    if the symptom goes away. 
 
Comment 12 Noel Maddy 2004-04-23 00:35:50 UTC
No, IRQ21 has never worked properly with ACPI/IOAPIC on this board under Linux.
I haven't run Windows on it at all. I may have an old Win98SE that I could
install if that would help.

Random results:
1. Behavior is unchanged when i2c stuff is removed.
3. Behavior is unchanged when disabling built-ins,
   EXCEPT if both AC97 and USB are disabled, then nothing is on IRQ21, so
   the problem doesn't show up

- Behavior unchanged when SB Live! and bttv tuner are removed
- Behavior unchanged when Radeon 9200SE replaces on-board IGP

2. acpi_irq_isa=21 changes things. Late in the boot, I get an "irq 20: nobody
cared!", and then it's disabled. No interrupt flooding in this situation.

ohci_hcd 0000:00:02.1: irq 22, pci mem dea25000
ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
irq 20: nobody cared!
Call Trace:
 [<c010b54b>] __report_bad_irq+0x2b/0x90
 [<c010b644>] note_interrupt+0x64/0xa0
 [<c010b8ff>] do_IRQ+0x12f/0x140
 [<c0109cdc>] common_interrupt+0x18/0x20

handlers:
[<dea6dc10>] (usb_hcd_irq+0x0/0x70 [usbcore])
Disabling IRQ #20
usb 1-2: new full speed USB device using address 2
ehci_hcd 0000:00:02.2: nVidia Corporation nForce2 USB Controller

(full dmesg, /proc/interrupts, etc., as attachments)
Comment 13 Noel Maddy 2004-04-23 00:37:14 UTC
Created attachment 2664 [details]
dmesg with acpi_irq_isa=21
Comment 14 Noel Maddy 2004-04-23 00:38:09 UTC
Created attachment 2665 [details]
behavior with acpi_irq_isa=21
Comment 15 Len Brown 2004-04-23 11:16:42 UTC
*** Bug 2227 has been marked as a duplicate of this bug. ***
Comment 16 Julien Ducourthial 2004-05-07 13:07:27 UTC
Created attachment 2813 [details]
interrupts with IOAPIC/ACPI
Comment 17 Julien Ducourthial 2004-05-07 13:09:41 UTC
Comment on attachment 2813 [details]
interrupts with IOAPIC/ACPI

Same problem here with MSI K7N2Delta-L. 
I've tried with 2.6.3, 2.6.4, 2.6.5 and also 2.6.5 from fedora core 2.
Comment 18 Julien Ducourthial 2004-05-07 13:27:31 UTC
Created attachment 2814 [details]
interrupts on XP

Here is the interrupt situation with windows XP.
ehci seems to be on interrupt 22. IRQ21 is used by ohci...
Comment 19 Marc Ballarin 2004-05-08 02:54:05 UTC
Epox 8RDA3+ suffers from the same problem. In Windows, IRQ assignment is 
different. IRQ 21 is used by the onboard soundcard, but the problem seems to 
exist as well, although in a milder form. The interrupt rate is ~2550/second 
on an idle system (~140/s on a similiar system with an SiS Chipset). 
 
Comment 20 Len Brown 2004-05-14 22:08:39 UTC
I'd like to see complete dmesg from 2.6.6-mm2 
or 2.6.6 + patch in bug 2665 
since it includes some PCI Link fixes and has extra debug output. 
 
thanks, 
-Len 
 
Comment 21 Noel Maddy 2004-05-15 08:46:50 UTC
Created attachment 2872 [details]
dmesg from 2.6.6-mm2

dmesg from 2.6.6-mm2 on same 7NIF2 system (with bk-input patch backed out)

It's flooding on IRQ20 instead of IRQ21 now.
Comment 22 Len Brown 2004-05-17 14:05:05 UTC
storm moved to IRQ20 from 21 along with EHCI? 
This is consistent with the acpi_irq_isa=21 experiment, 
where both EHCI and the flood moved to IRQ20. 
 
This is looking more device specific than ACPI related at this point. 
 
Are there any USB devices plugged into the system? 
Any difference if you physically un-plug them? 
 
This doesn't directly address the case if EHCI is not loaded 
but sound is.  Perhaps in that case the USB hardware is 
still generating interrupts and the sound is the victim? 
 
I think it is time to enable some debugging code on the USB side. 
For starters, it shouldn't return IRQ_HANDLED when its 
hardware didn't actually get an interrupt. 
Comment 23 Noel Maddy 2004-05-18 09:24:19 UTC
I did have an Atmel at76c503a-based 802.11b adapter plugged in. Removing it
makes no difference, however.

Seems like the flood will happen whenever any driver is hooked up to the interrupt.

I'm not very clued about PCI interrupt routing/ACPI, but I did notice something.
 AP3C is listed as disabled in the first list, but then enabled and assigned to
IRQ 20. In earlier kernels, the same three interrupt links were assigned to IRQ
21. Is it possible that enabling AP3C is related to the interrupt flood?

>grep -i 'irq.*20' 2.6.6-mm2.dmesg

ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22) *0
ACPI: PCI Interrupt Link [APCG] (IRQs 20 21 22) *0
ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22) *0
ACPI: PCI Interrupt Link [APCI] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [APCJ] (IRQs 20 21 22) *0
ACPI: PCI Interrupt Link [APCK] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22) *0
ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [AP3C] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [APCL] enabled at IRQ 20
00:00:02[C] -> 2-20 -> IRQ 20 level high
ACPI: PCI Interrupt Link [APCJ] enabled at IRQ 20
ACPI: PCI Interrupt Link [AP3C] enabled at IRQ 20
IRQ20 -> 0:20
ehci_hcd 0000:00:02.2: irq 20, pci mem de90c000
Comment 24 David Brownell 2004-05-20 12:52:30 UTC
Hmm, I'm not sure why this is assigned to me; looks like an ACPI issue with 
IRQ routing on certain NF2 boards.  It's clearly not an NF2-always issue, 
or an EHCI-only issue. 
 
For example, here's one Shuttle NF2 MB: 
 
           CPU0 
  0:  172106932          XT-PIC  timer 
  1:     136176    IO-APIC-edge  i8042 
  4:     532472    IO-APIC-edge  serial 
  9:          0   IO-APIC-level  acpi 
 14:     504035    IO-APIC-edge  ide0 
 18:     467827   IO-APIC-level  net2280 
 20:          2   IO-APIC-level  ehci_hcd 
 21:      66840   IO-APIC-level  ohci_hcd 
 22:    8836112   IO-APIC-level  ohci_hcd, eth0 
NMI:       9765 
LOC:  172106898 
ERR:          0 
MIS:          0 
 
That's using "forcedeth", which seems wasteful of IRQ22, but there's no 
hint of a flood on IRQ21 (which has a mouse).  And "net2280" is used 
on this system mostly as a network link, in an IRQ-per-packet mode; 
it's generally faster than 100baseT full duplex links. 
 
 
For the record, the reasoning behind returning IRQ_HANDLED is that 
there are a number of cases where the controller can schedule an IRQ 
for "later", by which time the driver may already have handled the 
relevant event.  That's not limited to the cases in which the watchdog 
timer (needed mostly for some flakiness on VIA hardware) fires. 
 
However I may be able to record enough state to identify a few of the 
cases where no such events are possible, like with a completely idle 
controller, and report those cases with IRQ_NONE. 
 
 
Noel, what happens when you rmmod both OHCI and EHCI, then 
reload them (EHCI first then OHCI)?   I've had to do that with a 
recent non-NF2 motherboard, when the BIOS did strange init. 
Is there any USB "legacy" (keyboard/mouse) support enabled in 
the BIOS?  If so, try disabling it to see if this still happens. 
 
Comment 25 Noel Maddy 2004-05-21 04:58:11 UTC
David,

Removing EHCI/OHCI and then reloading them in either order: no change
Booting without EHCI/OHCI and then loading EHCI-first: no change

I did have legacy USB enabled in the BIOS. Turned it off: no change
Comment 26 Noel Maddy 2004-05-21 05:07:52 UTC
Created attachment 2933 [details]
hack disabling AP3C PCI interrupt link

Going on my earlier suspicion, I threw this hack in to make sure that the AP3C
PCI Interrupt Link was not enabled.

It works! Or, at least, the flood is stopped. Sound works properly. I haven't
tested a USB 2.0 device yet, though. Will do that this weekend.
Comment 27 Noel Maddy 2004-05-21 05:08:39 UTC
Created attachment 2934 [details]
dmesg with AP3C hack
Comment 28 Noel Maddy 2004-05-23 13:44:31 UTC
Success! With the (admittedly ugly) hack forcing ACPI to skip AP3C, there is no
interrupt flood, and both EHCI and on-chip sound work properly.

I used an FA120 USB-ethernet adapter (usbnet driver).

In order to make sure that it was working with EHCI, I removed ohci_hcd, and was
still able to connect through the FA120. In /proc/interrupts, IRQ 20 (assigned
to EHCI and sound) was incrementing with the network traffic.

It seems obvious that the AP3C interrupt link is the source of the interrupt
floods. So far, I haven't been able to find anything that doesn't work when it's
forced off.

Remaining questions:

1) What is AP3C connected to? Does disabling it break anything? What else should
I test?

2) Why is AP3C enabled on my motherboard? Is it related to BIOS or ACPI
problems, or is it a hardware issue?

3) Could AP3C have the opposite polarity to the other (EHCI and sound)
interrupts that are linked with it?
Comment 29 Len Brown 2004-05-23 21:06:27 UTC
Hmm, looks like ACPI/BIOS interrupt routing bug after all. 
 
1) What is AP3C connected to? Does disabling it break anything? What else should 
I test? 
 
Please attach the output from acpidmp, available in /usr/sbin/, or in pmtools: 
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ 
 
Please also attach the output from lspci -vv on this system before the hack. 
Together they'll tell us what devices are connected to what interrupt lines. 
 
2) Why is AP3C enabled on my motherboard? Is it related to BIOS or ACPI 
problems, or is it a hardware issue? 
 
Links are enabled when PCI devices refer to them. 
Somewhere in your system there is now device without an interrupt: 
 
pci_link-0620 [73] acpi_pci_link_get_irq : Invalid link context 
 
If you didn't see any change in /proc/interrupts before and after the AP3C 
hack, then it may be that the device doesn't have a driver loaded. 
 
Or, maybe we're enabling links when perhaps it isn't necessary, 
and on this system maybe that unmasks a platform bug? 
 
3) Could AP3C have the opposite polarity to the other (EHCI and sound) 
interrupts that are linked with it? 
 
No.  PCI Interrupt Links all have PCI interrupt flags, by definition. 
 
Comment 30 Len Brown 2004-05-23 21:24:05 UTC
Ah, I see acpidmp for Noel's system above, and it shows AP3C 
has 5 customers: 
 
Device (PCI0)/_PRT/APIC: 
Package (0x04) { 0x000CFFFF, 0x00, \_SB.PCI0.AP3C, 0x00 }, 
 
Device (HUB1)/_PRT/APIC 
Package (0x04) { 0x0001FFFF, 0x00, \_SB.PCI0.AP3C, 0x00 }, 
Package (0x04) { 0x0001FFFF, 0x01, \_SB.PCI0.AP3C, 0x00 }, 
Package (0x04) { 0x0001FFFF, 0x02, \_SB.PCI0.AP3C, 0x00 }, 
Package (0x04) { 0x0001FFFF, 0x03, \_SB.PCI0.AP3C, 0x00 } }) 
 
Device C, Pin A 
Device 1, Pins A,B,C,D 
lspci -vv will show the pins, but lspci -v shows 2 sub-functions on device 1, 
and does not show any on device C: 
 
0000:00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a3) 
0000:00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2) 
 
Comment 31 Noel Maddy 2004-05-23 21:31:09 UTC
Created attachment 2955 [details]
lspci -vv (2.6.6-mm4 without hack)

Here's the lspci -vv on vanilla 2.6.6-mm4. I see you found the acpidmp.
Comment 32 Noel Maddy 2004-05-23 21:35:18 UTC
Created attachment 2956 [details]
lspci -vv (2.6.6-mm4 without hack)

Doh! Paper bag, please! Forgot to run as root.
Comment 33 Len Brown 2004-05-23 22:20:07 UTC
0000:00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2) 
        Interrupt: pin A routed to IRQ 23 
 
SMBus is the only device of the 5 potential customers that shows in lspci-vv 
and has an interrupt. 
 
Comment 34 Len Brown 2004-05-23 23:03:08 UTC
Created attachment 2957 [details]
Bjorn's PRT patch vs 2.6.7

Please test this 2.6.7 patch (should apply to your 2.6.6-mm tree)
from Bjorn Helgaas.  Before this patch, mp_parse_prt() enables
all PCI links.	After this patch, the IRQs are enabled as demanded
by the PCI devices.  Please attach the resulting dmesg and /proc/interrupts.
Comment 35 Noel Maddy 2004-05-24 06:27:12 UTC
Created attachment 2962 [details]
/proc/interrupts with bjorn's patch

Looks good. There's no interrupt flood. Everything seems to work properly. I
also noticed that this fixes anomalous ACPI THRM readings in dmesg. Previously,
it was giving weird readings (-121 C, 8 C, 2 C, ...). Now the temperatures are
reasonable.
Comment 36 Noel Maddy 2004-05-24 06:28:08 UTC
Created attachment 2963 [details]
dmesg on 2.6.6-mm4 with bjorn's patch
Comment 37 Len Brown 2004-05-24 22:35:55 UTC
Created attachment 2967 [details]
Bjorn's PRT patch vs 2.6.7 (updated)

minor cleanups to previous version of this patch.
i'm checking this version into acpi-test tree.
Comment 38 Noel Maddy 2004-05-25 13:04:16 UTC
Tested with updated version of Bjorn's PRT patch. Everything works great.

No changes in dmesg from Bjorn's previous patch, except for formatting
(global_irq_base -> gsi_base, GSI in decimal instead of hex).

Kudos!
Comment 39 Len Brown 2004-11-03 17:36:57 UTC
shipped in 2.6.8.1
did not back-port to 2.4.
closing

Note You need to log in before you can comment on or make changes to this bug.