Distribution: Redhat 9 Hardware Environment: Pentium 4 2.4 Soyo Dragon Lite - SiS 645 chipset Software Environment: gcc (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5) module-init-tools version 0.9.14 Problem Description: ACPI interrupt is disabled after a while. It is mysteriously linked to another interrupt. In this case, one of the USB controllers. Previously it was linked to the NVidia interrupt. Let me be clear: The ACPI controller has always been alone on interrupt 9, in /proc/interrupts. However, I can cause the number of interrupts to climb by using the device that it is linked to. When the NVidia card was linked, anything that caused events except moving the mouse caused the number to climb. Now, moving my mouse, or writing from my USB drives cause the number to climb. When it hits 100002, then I get a Call Trace, and the kernel disables IRQ #9. The kernel appears to continue uninterrupted, with no further problems. Steps to reproduce: Boot with full ACPI support. Move mouse, or use USB drive. It seems to be similar to report # 905: http://bugme.osdl.org/show_bug.cgi?id=905, however, in this case, nothing is shown to share the acpi interrupt.
Created attachment 1041 [details] 2.6.0-test7 dmesg boot log.
Created attachment 1042 [details] dmesg after error
Created attachment 1043 [details] lspci -vv output
I just noticed that after the prior to the crash, I have a bunch of connect-debounce messages, and then after, it re-detects and creates my muti-card reader.
Created attachment 1044 [details] Problem in remissions - added 'noapic' to boot cmdline I booted up with noapic - prior to 2.6.0-test1 this would still have problems, however, so far I have no USB error messages,and the ACPI is working fine! I don't know what this means entirely, but I am thankful that I don't have an HT CPU about now =).
Created attachment 1045 [details] dmesg - with 'noapic' Here is the dmesg, if anyone needs to review the changes.
Please try the patch in bug 1240 first to see if it's USB's bug. If still can't work, please attach '/proc/interrupt' and acpidmp
The patch from bug 1240 does not apply at all against 2.6.0-test7 - should I be dropping to a different revision? js
The 2nd dmesg attachment is a continuation of the 1st, yes? By remission, do you mean that "noapic" makes it work, but that the original APIC-mode problem still persists? /proc/interrupts from PIC mode shows IRQ9 shared: acpi, ohci-hcd Can you attach the /proc/interrupts from the APIC mode failure? It would be interesting to see if ACPI and USB still share an IRQ, because the BIOS specifies that in APIC mode IRQ9 should be Edge Triggered Active High: ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x0] trigger[0x0]) which _isn't_ condusive to sharing, particularly with an active low PCI interrupt... It would also be good to know if this problem persists in 2.6.0-test8, and interesting to know if ACPI events, such as pressing the power button, result in acpi interrupts being received. thanks, -Len
Created attachment 1154 [details] Interrupts from the Working Condition The only change that I made: - cmdline: ro root=/dev/hda7 pci=noacpi I added pci=noacpi
Created attachment 1155 [details] Interrupts After rebooting and getting starts. This is after starting the system, X, Firebird and Evolution.
Created attachment 1156 [details] Interrupts After rebooting and then moving the mouse. This is after everything was running. Then I moved the mouse a bunch, then I re-polled the interrupts.
NOTE: The "Interrupts After" posts are with the following cmdline: ro root=/dev/hda7
Created attachment 1157 [details] dmesg with - borked Interrupts.
Created attachment 1158 [details] dmeg - with pci=noacpi This is with pci=noacpi, this seems to work great. I've run since the release of 2.6.0-test8 without incident.
Please boot with acpi=off and attach dmesg and /proc/interrupts Please boot with acpi=off noapic, and attach dmesg and /proc interrupts. The MPS/IOAPIC mode results we got above via pci=noacpi look pretty much like the PIC IRQ case with the APIC turned on -- and doesn't match the ACPI/IOAPIC mapping at all. The above should get ACPI completely out of the way (should be the same as !CONFIG_ACPI, and !CONFIG_ACPI && !CONFIG_X86_IO_APIC) Also, please attach the output of acpidmp so we can take a look at your _PRT and the output of dmidecode so we can identify your board and BIOS version. I'd like to see your MPS tables too, but off-hand I don't see a utility to dump it, maybe hwinfo from Suse?
Len, will do. This is my home system, so I have to get home, then spend time with fam.... but I will endeavour to get the requested info tonight.
Created attachment 1188 [details] print ioapic patch The fact that acpi on IRQ9 seems to have (always) exactly 6 more interrupts more than USB up on IRQ20, even when a boat load of interrupts are added to IRQ20 surely can't be a coincidence. Please apply this patch do dump out the IOAPIC _after_ it gets programmed by ACPI and attach the dmesg output, need to look at the vectors... thanks, -Len
Created attachment 1415 [details] dmesg - as much as it saves, with patch. dmesg w/ patch
Created attachment 1416 [details] acpidmp output
Created attachment 1417 [details] /proc/interrupts
Created attachment 1418 [details] /proc/interrupts - after hitting power button 3 times.
Created attachment 1419 [details] cmdline used for boot
I also have the following files that I saved at the time: acpidmp-2.6.0-test9-with-patch.FACP-acpitbl acpidmp-2.6.0-test9-with-patch.acpidisasm acpidmp-2.6.0-test9-with-patch.System.map.gz Please let me know if you want them.
Created attachment 1420 [details] dmesg with cmdline: acpi=off noapic
Created attachment 1421 [details] /proc/interrupts with cmdline: acpi=off noapic
Created attachment 1422 [details] dmesg with cmdline: acpi=off
Created attachment 1423 [details] /proc/interrupts with cmdline: acpi=off
> ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x0] > trigger[0x0]) Please try the two patches in bug #1351, attach the resulting dmesg and /proc/interrupts and report on if ACPI events and USB interrupts work. This should address the polarity and trigger issue on IRQ9. I don't know if it will also address the mysterious USB vs ACPI tying -- though this is a root cause and that may have been a decoy. thanks, -Len
Created attachment 1581 [details] 2.6.0-test11 + patch from 1351 - ACPI/USB STILL BORKED Len, I did this this morning, and right after boot, ACPI and USB are 1-1 still the same. (Each 196 in interrupt count). I did not test the power button, etc. I had to go to work
Created attachment 1582 [details] INTERRUPTS - 2.6.0-test11 + patch from 1351- ACPI/USB STILL BORKED
Looks like the 1st pach did its thing: - 9: 3418 IO-APIC-edge acpi + 9: 261 IO-APIC-level acpi But still this: 20: 261 IO-APIC-level ohci_hcd I assume that irq9 still gets disabled when you wiggle the mouse enough? how about a couple of ACPI button presses in there -- do they register? the 2nd patch is a link in the text to the print_IO_APIC fix
Created attachment 2021 [details] 2.6.2 - Behaviour changed, still broken. 2.6.2 is still broken. Now the eth0 is sharing an interrupt with AGP. This happens w/ or w/o the NVidia driver. ACPI no longer generates false interrupts. I get ACPI errors when loading the ohci-hcd module.
Created attachment 2022 [details] dmesg output from 2.6.2 w/ pci=noacpi Note: This is the boot when I used pci=noacpi. Another attachment follows with no such line. w/ Full ACPI, I get no Ethernet, and w/ pci=noacpi I have identical behaviour.
Created attachment 2023 [details] dmesg output w/ fullacpi
Created attachment 2024 [details] Interrupts w/ fullacpi Here are the interrutps - note: no IRQs are apparently being delivered to the NIC, or it's not answering. If I send pings I just get tx errors.
So as of 2.6.2 the ACPI SCI is no longer "tied" to USB? Can you clarify exactly what is not working at this point? Is the "nobody cared!" message gone, or does it come back someplace else? Unclear why you attached the /proc/cpuinfo -- did i miss something? Note that in 2.6.2, you have the boot parameter "acpi_irq_nobalance" to tell ACPI not to move interrupts around. It might be interesting to compare /proc/interrupts with and without booting with this flag, and also to see if the error moves.
Created attachment 2058 [details] test patch for ACPI interrupt over-ride while your failure symptom no longer seems to involve the ACPI SCI, please apply this patch to fix a known problem with the ACPI SCI -- as it is possible that it is a side-effect that is troubling your system.
Wahoo!! All the remaining problems are problems with devfs!!! I migrated too 100% udev, and all the problems are gone. Interrupt routing was not the issue with 2.6.3. #&^#$&^%^@# devfs was. On completely different box I ran into problems exactly the same, but i was about to reinstall, after re-install all the problems were gone. I quickly discovered that the difference was only DevFS, so I migrated my Gentoo box to udev, and boom. Problems gone.