Distribution: Redhat 9
Pentium 4 2.4
Soyo Dragon Lite - SiS 645 chipset
gcc (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
module-init-tools version 0.9.14
ACPI interrupt is disabled after a while. It is mysteriously linked to
another interrupt. In this case, one of the USB controllers. Previously it was
linked to the NVidia interrupt. Let me be clear: The ACPI controller has always
been alone on interrupt 9, in /proc/interrupts. However, I can cause the number
of interrupts to climb by using the device that it is linked to. When the
NVidia card was linked, anything that caused events except moving the mouse
caused the number to climb. Now, moving my mouse, or writing from my USB drives
cause the number to climb. When it hits 100002, then I get a Call Trace, and
the kernel disables IRQ #9. The kernel appears to continue uninterrupted, with
no further problems.
Steps to reproduce:
Boot with full ACPI support. Move mouse, or use USB drive. It seems to be
similar to report # 905: http://bugme.osdl.org/show_bug.cgi?id=905, however, in
this case, nothing is shown to share the acpi interrupt.
Created attachment 1041 [details]
2.6.0-test7 dmesg boot log.
Created attachment 1042 [details]
dmesg after error
Created attachment 1043 [details]
lspci -vv output
I just noticed that after the prior to the crash, I have a bunch of
connect-debounce messages, and then after, it re-detects and creates my
Created attachment 1044 [details]
Problem in remissions - added 'noapic' to boot cmdline
I booted up with noapic - prior to 2.6.0-test1 this would still have problems,
however, so far I have no USB error messages,and the ACPI is working fine! I
don't know what this means entirely, but I am thankful that I don't have an HT
CPU about now =).
Created attachment 1045 [details]
dmesg - with 'noapic'
Here is the dmesg, if anyone needs to review the changes.
Please try the patch in bug 1240 first to see if it's USB's bug. If still
can't work, please attach '/proc/interrupt' and acpidmp
The patch from bug 1240 does not apply at all against 2.6.0-test7 - should I be
dropping to a different revision?
The 2nd dmesg attachment is a continuation of the 1st, yes?
By remission, do you mean that "noapic" makes it work,
but that the original APIC-mode problem still persists?
/proc/interrupts from PIC mode shows IRQ9 shared: acpi, ohci-hcd
Can you attach the /proc/interrupts from the APIC mode failure?
It would be interesting to see if ACPI and USB still share an IRQ, because
the BIOS specifies that in APIC mode IRQ9 should be Edge Triggered Active High:
ACPI: INT_SRC_OVR (bus irq[0x9] global_irq[0x9] polarity[0x0] trigger[0x0])
which _isn't_ condusive to sharing, particularly with an active low PCI interrupt...
It would also be good to know if this problem persists in 2.6.0-test8,
and interesting to know if ACPI events, such as pressing the power button,
result in acpi interrupts being received.
Created attachment 1154 [details]
Interrupts from the Working Condition
The only change that I made:
- cmdline: ro root=/dev/hda7 pci=noacpi
I added pci=noacpi
Created attachment 1155 [details]
Interrupts After rebooting and getting starts.
This is after starting the system, X, Firebird and Evolution.
Created attachment 1156 [details]
Interrupts After rebooting and then moving the mouse.
This is after everything was running. Then I moved the mouse a bunch, then I
re-polled the interrupts.
NOTE: The "Interrupts After" posts are with the following cmdline:
Created attachment 1157 [details]
dmesg with - borked Interrupts.
Created attachment 1158 [details]
dmeg - with pci=noacpi
This is with pci=noacpi, this seems to work great. I've run since the release
of 2.6.0-test8 without incident.
Please boot with acpi=off and attach dmesg and /proc/interrupts
Please boot with acpi=off noapic, and attach dmesg and /proc interrupts.
The MPS/IOAPIC mode results we got above via pci=noacpi look pretty
much like the PIC IRQ case with the APIC turned on -- and doesn't match
the ACPI/IOAPIC mapping at all.
The above should get ACPI completely out of the way (should be the same
as !CONFIG_ACPI, and !CONFIG_ACPI && !CONFIG_X86_IO_APIC)
Also, please attach the output of acpidmp so we can take a look at your _PRT
and the output of dmidecode so we can identify your board and BIOS version.
I'd like to see your MPS tables too, but off-hand I don't see a utility to dump it,
maybe hwinfo from Suse?
Len, will do. This is my home system, so I have to get home, then spend time
with fam.... but I will endeavour to get the requested info tonight.
Created attachment 1188 [details]
print ioapic patch
The fact that acpi on IRQ9 seems to have (always) exactly 6 more interrupts
more than USB up on IRQ20, even when a boat load of interrupts are added to
surely can't be a coincidence.
Please apply this patch do dump out the IOAPIC _after_ it gets programmed by
and attach the dmesg output, need to look at the vectors...
Created attachment 1415 [details]
dmesg - as much as it saves, with patch.
dmesg w/ patch
Created attachment 1416 [details]
Created attachment 1417 [details]
Created attachment 1418 [details]
/proc/interrupts - after hitting power button 3 times.
Created attachment 1419 [details]
cmdline used for boot
I also have the following files that I saved at the time:
Please let me know if you want them.
Created attachment 1420 [details]
dmesg with cmdline: acpi=off noapic
Created attachment 1421 [details]
/proc/interrupts with cmdline: acpi=off noapic
Created attachment 1422 [details]
dmesg with cmdline: acpi=off
Created attachment 1423 [details]
/proc/interrupts with cmdline: acpi=off
> ACPI: INT_SRC_OVR (bus irq[0x9] global_irq[0x9] polarity[0x0]
Please try the two patches in bug #1351, attach the resulting dmesg and /proc/interrupts
and report on if ACPI events and USB interrupts work. This should address the
polarity and trigger issue on IRQ9. I don't know if it will also address the mysterious
USB vs ACPI tying -- though this is a root cause and that may have been a decoy.
Created attachment 1581 [details]
2.6.0-test11 + patch from 1351 - ACPI/USB STILL BORKED
I did this this morning, and right after boot, ACPI and USB are 1-1 still
the same. (Each 196 in interrupt count). I did not test the power button,
etc. I had to go to work
Created attachment 1582 [details]
INTERRUPTS - 2.6.0-test11 + patch from 1351- ACPI/USB STILL BORKED
Looks like the 1st pach did its thing:
- 9: 3418 IO-APIC-edge acpi
+ 9: 261 IO-APIC-level acpi
But still this:
20: 261 IO-APIC-level ohci_hcd
I assume that irq9 still gets disabled when you wiggle the mouse enough?
how about a couple of ACPI button presses in there -- do they register?
the 2nd patch is a link in the text to the print_IO_APIC fix
Created attachment 2021 [details]
2.6.2 - Behaviour changed, still broken.
2.6.2 is still broken. Now the eth0 is sharing an interrupt with AGP. This
happens w/ or w/o the NVidia driver. ACPI no longer generates false
interrupts. I get ACPI errors when loading the ohci-hcd module.
Created attachment 2022 [details]
dmesg output from 2.6.2 w/ pci=noacpi
Note: This is the boot when I used pci=noacpi. Another attachment follows with
no such line. w/ Full ACPI, I get no Ethernet, and w/ pci=noacpi I have
Created attachment 2023 [details]
dmesg output w/ fullacpi
Created attachment 2024 [details]
Interrupts w/ fullacpi
Here are the interrutps - note: no IRQs are apparently being delivered to the
NIC, or it's not answering. If I send pings I just get tx errors.
So as of 2.6.2 the ACPI SCI is no longer "tied" to USB?
Can you clarify exactly what is not working at this point?
Is the "nobody cared!" message gone, or does it come back someplace else?
Unclear why you attached the /proc/cpuinfo -- did i miss something?
Note that in 2.6.2, you have the boot parameter "acpi_irq_nobalance"
to tell ACPI not to move interrupts around. It might be interesting to
compare /proc/interrupts with and without booting with this flag,
and also to see if the error moves.
Created attachment 2058 [details]
test patch for ACPI interrupt over-ride
while your failure symptom no longer seems to involve the ACPI SCI, please
apply this patch to fix a known problem with the ACPI SCI -- as it is possible
that it is a side-effect that is troubling your system.
Wahoo!! All the remaining problems are problems with devfs!!!
I migrated too 100% udev, and all the problems are gone. Interrupt routing was
not the issue with 2.6.3. #&^#$&^%^@# devfs was. On completely different box I
ran into problems exactly the same, but i was about to reinstall, after
re-install all the problems were gone. I quickly discovered that the difference
was only DevFS, so I migrated my Gentoo box to udev, and boom. Problems gone.