Most recent kernel where this bug did not occur: ??? Distribution: Debian Sarge, self-compiled kernel Hardware Environment: 0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia] (rev 05) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+ Latency: 0 Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M] Capabilities: <available only to root> 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia AGP] (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+ Latency: 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Memory behind bridge: f4100000-f57fffff Prefetchable memory behind bridge: 14000000-140fffff BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B- 0000:00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 22) Subsystem: VIA Technologies, Inc. VT82C686/A PCI to ISA Bridge Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 0000:00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 10) (prog-if 8a [Master SecP PriP]) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 Region 4: I/O ports at 1460 [size=16] Capabilities: <available only to root> 0000:00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 10) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64, Cache Line Size: 0x08 (32 bytes) Interrupt: pin D routed to IRQ 11 Region 4: I/O ports at 1440 [size=32] Capabilities: <available only to root> 0000:00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30) Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin ? routed to IRQ 10 Capabilities: <available only to root> 0000:00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 20) Subsystem: Compaq Computer Corporation Soundmax integrated digital audio Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin C routed to IRQ 9 Region 0: I/O ports at 1000 [size=256] Region 1: I/O ports at 1474 [size=4] Region 2: I/O ports at 1470 [size=4] Capabilities: <available only to root> 0000:00:09.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 09) Subsystem: Intel Corp. EtherExpress PRO/100 P Mobile Combo Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 66 (2000ns min, 14000ns max), Cache Line Size: 0x08 (32 bytes) Interrupt: pin A routed to IRQ 11 Region 0: Memory at f4020000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at 1400 [size=64] Region 2: Memory at f4000000 (32-bit, non-prefetchable) [size=128K] Capabilities: <available only to root> 0000:00:09.1 Serial controller: Lucent Microelectronics LT WinModem (prog-if 00 [8250]) Subsystem: Intel Corp.: Unknown device 2201 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at 1478 [size=8] Region 1: Memory at f4021000 (32-bit, non-prefetchable) [size=4K] Capabilities: <available only to root> 0000:00:0a.0 CardBus bridge: Texas Instruments PCI1410 PC card Cardbus Controller (rev 01) Subsystem: Compaq Computer Corporation: Unknown device b103 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 168, Cache Line Size: 0x08 (32 bytes) Interrupt: pin A routed to IRQ 11 Region 0: Memory at 14100000 (32-bit, non-prefetchable) [size=4K] Bus: primary=00, secondary=02, subordinate=05, sec-latency=176 Memory window 0: 10000000-11fff000 (prefetchable) Memory window 1: 12000000-13fff000 I/O window 0: 00001800-000018ff I/O window 1: 00001c00-00001cff BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt+ PostWrite+ 16-bit legacy interface ports at 0001 0000:01:00.0 VGA compatible controller: Trident Microsystems CyberBlade i1 (rev 6a) (prog-if 00 [VGA]) Subsystem: Compaq Computer Corporation CyberBlade i1 AGP Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 Interrupt: pin A routed to IRQ 9 Region 0: Memory at f5000000 (32-bit, non-prefetchable) [size=8M] Region 1: Memory at f4100000 (32-bit, non-prefetchable) [size=128K] Region 2: Memory at f4800000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at 14000000 [disabled] [size=64K] Capabilities: <available only to root> Software Environment: Problem Description: The e100 driver triggers the IRQ debugging code ("irq 11: nobody cared ... Disabling IRQ #11"). I won't post detailed logs, as I've done some investigation already and I know when this happens. So, the machine is a Compaq Armada 110 notebook equipped with a combo Ethernet/Modem MiniPCI card. That is the 0000:00:09.[1-2] in the above listing. IRQ #11 is normally shared among quite a few devices: * VIA VT82xxxxx USB Controller (see 00:07.2 above) * Texas Instruments PCI1410 Cardbus controller (see 00:0a.0) * Ethernet Pro 100 Controller (see 00:09.1) * Lucent Winmodem (see 00:09.2) I won't disassemble my notebook to physically remove the other devices and I have no idea how to remap their IRQ, but even if all devices are disabled (with pci_disable_device) and only the Ethernet is enabled, the bug still occurs. Steps to reproduce: Load the e100 driver and bring the interface up and down a few times. The important thing is that e100_hw_reset() gets called and the IRQ is routed. That is to say, it is sufficient if request_irq() is called AFTER e100_hw_reset(). This means that the bug is only triggered in e100_up() or in e100_down(), not for example in e100_probe(). So, what happens exactly? When the code in e100_hw_reset() issues the selective reset (or software reset), the (buggy?) Ethernet Express generates the IRQ, but stat_ack remains zero, so e100_intr() returns IRQ_NONE. However, the interrupt is soon invoked again, because the IRQ was not handled. This vicious circle is broken when the interrupt line is masked again in e100_disable_irq(), but very often there have been 100000 such unhandled interrupts already by that time and the IRQ got disabled in note_interrupt(). I am not 100% sure that the interrupt is indeed generated by the Ethernet device. It might also be some kind of strange interaction with the Lucent winmodem. Any idea how to check it? I have some experience hacking the kernal and I am willing to perform some tests, but I don't have the time to read the spec, so I'd really appreciate some help from the guy who wrote the e100 driver or anybody who knows how that d**n Ethernet Controller works.
Is it possible to disable any of the other PCI devices in the BIOS? I guess not.. Have you tried disabling acpi? Can you confirm that the problem is intermittent? That the e100 only fails on every second or third open?
No, the spurious interrupts do occur every time the device is reset in e100_hw_reset(). However, because normally there are some (sufficiently numerous) handled IRQs between two calls to e100_hw_reset(), it does not trigger the IRQ debugging code. I could probably compile a version of the module that would count the spurious interrupts, but I don't have a hardware debugging board to tell you exactly what happens on the PCI bus, so it might just as well be unneeded. You're right, the other devices cannot be disabled in BIOS. I haven't tried booting with the "noacpi" option yet. Stay tuned...
OK, so booting with "pci=noacpi" results in: 1. the bug going away. (wow!) 2. performance going down. (Well, OK, broken HW won't operate at the best possible speed) 3. one change in the detected devices that might be relevant: with ACPI: Jan 2 07:57:51 localhost kernel: ACPI: PCI Interrupt 0000:00:09.1[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Jan 2 07:57:51 localhost kernel: 0000:00:09.1: ttyS2 at I/O 0x1478 (irq = 11) is a 16450 without ACPI: Jan 2 14:43:48 localhost kernel: PCI: Found IRQ 11 for device 0000:00:09.1 Jan 2 14:43:48 localhost kernel: PCI: Sharing IRQ 11 with 0000:00:09.0 Note that 0000:00:09.1 is the winmodem in my config. So, is this the end of the story, or may I help improve the performance by further hacking?
I should paste the actual IRQ routing here. with ACPI: ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs *9 11 12) ACPI: PCI Interrupt Link [LNKB] (IRQs 9 *11 12) ACPI: PCI Interrupt Link [LNKC] (IRQs *9 11 12) ACPI: PCI Interrupt Link [LNKD] (IRQs 9 *11 12) ACPI: Embedded Controller [EC0] (gpe 1) ACPI: Power Resource [PLPC] (on) PCI: Using ACPI for IRQ routing PCI: Enabling device 0000:00:0a.0 (0000 -> 0003) ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 PCI: setting IRQ 11 as level-triggered ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKD] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt 0000:00:07.2[D] -> Link [LNKD] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 9 PCI: setting IRQ 9 as level-triggered ACPI: PCI Interrupt 0000:00:07.5[C] -> Link [LNKC] -> GSI 9 (level, low) -> IRQ 9 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt 0000:00:09.1[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 without ACPI: PCI: Using IRQ router VIA [1106/0686] at 0000:00:07.0 PCI: IRQ 0 for device 0000:00:0a.0 doesn't match PIRQ mask - try pci=usepirqmask PCI: Found IRQ 11 for device 0000:00:0a.0 PCI: Sharing IRQ 11 with 0000:00:07.2 PCI: Enabling device 0000:00:0a.0 (0000 -> 0003) PCI: Found IRQ 11 for device 0000:00:0a.0 PCI: Sharing IRQ 11 with 0000:00:07.2 PCI: Found IRQ 11 for device 0000:00:07.2 PCI: Sharing IRQ 11 with 0000:00:0a.0 PCI: Found IRQ 9 for device 0000:00:07.5 PCI: Found IRQ 11 for device 0000:00:09.0 PCI: Sharing IRQ 11 with 0000:00:09.1 PCI: Found IRQ 11 for device 0000:00:09.1 PCI: Sharing IRQ 11 with 0000:00:09.0 This means that the Cardbus controller shares LNKD at IRQ 11 with the USB controller and the Ethernet card shares LNKB at IRQ 11 with the Lucent winmodem, while the sound card is on its own LNKC at IRQ 9. That said, we may conclude that the combo MiniPCI does not like Linux ACPI IRQ routing, and that winmodems are evil, anyway. :) However, it would be nice if you discovered why the default routing works better. Dropping severity to low.
Reassigning to acpi_config-interrupts@kernel-bugs.osdl.org
I am sorry for the comment where I stated that disabling ACPI solves the problem. This is NOT TRUE, and it probably has nothing to do with ACPI, because the bug is back today. :( I tried booting with "acpi=noirq", then "pci=noacpi" and then even with "acpi=off", and the spurious interrupt occured in all three cases. However, the bug seems to be in the serial driver, not in e100, because: The bug occurs IFF (if and only if) the serial driver detects the winmodem as a 16450. That is, the spurious interrupts occur iff syslog contains this line: localhost kernel: 0000:00:09.1: ttyS2 at I/O 0x1478 (irq = 11) is a 16450 This happens on some boots. On other boots, there is no such line (even though the device seems to be detected because it is assigned an interrupt, as can be seen in the log), and then resetting the Ethernet controller does not generate any extra interrupts. I guess the correct solution would be to ignore the winmodem in serial8250_pci somehow.
However, I suspect doing so would inconvenience other users. Are you saying that we sometimes detect this modem as a 16450 and other times as a 16550 or something? In any case, I've no idea about this bug - I don't think I can progress it in any way. Sorry.
Hi, Russell! No, I don't mean the chip gets detected incorrectly. It either gets detected as 16550A, or it doesn't get detected at all. However, I don't want the serial driver to work with this hardware (it is a winmodem), I only want the driver not to touch my winmodem in any way whatsoever. The driver can't handle the winmodem anyway, but during initialization something goes wrong and later the winmodem generates some bogus interrupts when the Ethernet controller (on the same MiniPCI board) is used. You see, there is no stable driver for the winmodem, so I don't want to use one, but the serial driver is then considered by the kernel to be the best driver for that device, since it claims to be able to handle any PCI serial device; see pci_device_id serial_pci_tbl in 8250_pci.c (at the end of the file): { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_COMMUNICATION_SERIAL << 8, 0xffff00, pbn_default },
Hmm, okay. But I still don't know what to do because solving your problem could well create a lot of problems for other people. I think this is a case where we can't satisfy everyone, so we can only maintain the current status quo.
Hi! Pleased that you're still interested. :) Well, I don't think the default behavior should change. I'd be completely happy with a kernel option telling the driver to "skip this very device and not to try to do anything with it". I can't disable the driver completely because there are other PCI chips in my computer which I do want to use. I miss the possibility to disable one particular serial device. Does that seem plausible?
It seems like a bug that the Lucent WinModem claims to use the 8250 programming interface, when it seems that it's not 100% compatible. Maybe we could work around this with a PCI quirk that recognizes the specific Lucent device and changes the programming interface so 8250_pci won't claim it. Can you include the output of "lspci -xx" so we can see exactly what IDs the WinModem uses?
Petr, can I trouble you to attach the output of "lspci -xx", so we can see exactly what IDs the Lucent WinModem uses? There's also a superficially similar issue, http://bugzilla.kernel.org/show_bug.cgi?id=5918 which also involves e100 and an "irq 11: nobody cared" message. There's a patch to fix that issue in 2.6.17-rc1-mm3. If you can try that, I'd be interested in seeing the complete dmesg log. It involves a boot-time quirk for the e100 device.
Created attachment 7966 [details] output of lspci -xx Hi Bjorn, sorry for the delay, but as it happens, I've sort of forgotten about this bug. Anyway, here's what you asked for.
I've just applied the patch, installed a new kernel and rebooted. Now, I can't tell if the quirk really helps, because I didn't manage to get a boot with "ttyS2" detected. However, that could mean either that the bug is gone, or that it might pop up later. Anyway, for the time being, I am closing the bug and if I encounter it again, I might re-open it. Thanks to everyone!
Created attachment 7967 [details] change lucent winmodem programming model to "other" > Now, I can't tell if the quirk really helps, because I didn't manage to get a > boot with "ttyS2" detected. If the e100 quirk did anything, you would see a "PCI: Firmware left XXX e100 interrupts enabled, disabling" note in your dmesg. If you don't see that message, the e100 quirk did nothing. If it IS doing something, I'd like to know that, because Jeff Garzik isn't sure the problem is common enough to be worth the trouble of a quirk. If the e100 quirk IS disabling e100 interrupts on your box, it could be that that is enough to make the serial driver will stop misdetecting the winmodem as a 16x50 device. In that case, the problem would disappear. It still seems wrong to me that the winmodem claims to be 8250-compatible, though. So if you see the problem again, try this patch, which should make the serial driver leave the winmodem alone.
Hi, I'm afraid the problem is back again today. I applied the patch from bug 5918. It DID do something ("PCI: Firmware left XXX e100 interrupts enabled, disabling", see the attached output of dmesg), but obviously it is not enough. *sigh*
Created attachment 7971 [details] output of dmesg This is the failed dmesg (without the patch).
> I applied the patch from bug 5918. It DID do something ("PCI: Firmware > left XXX e100 interrupts enabled, disabling", see the attached output > of dmesg), but obviously it is not enough. *sigh* OK, good, thanks for that report. So it's useful to know that (1) the e100 quirk is applicable to more than just the single machine I knew about previously, and (2) that's not enough to prevent the serial driver from claiming the WinModem. Can you also try the WinModem quirk I posted (attachment 7967 [details])? I don't know how Russell feels about that approach, but it should prevent the serial driver from claiming the WinModem.
Created attachment 8013 [details] output of dmesg with the Winmodem workaround applied I applied the patch you provided (attachment 7967 [details]) and switched the machine on and off a few times. Again, I cannot tell for 100%, since the original bug was occuring only occasionally (i.e. on some power-ons) and now, that the automatic detection is overriden, I cannot even tell whether this is such a power-on, but the bug seems to be gone away. However, I have a suspicion that my machine obeys the Murphy Laws, so please wait at least a few days to see if the bug pops up eventually. ;)
Thanks for trying the patch. I'm confused about one thing, though: your latest dmesg doesn't show any ttyS devices. You should have ttyS0 (0x3f8) and ttyS1 (0x2f8). Did you forget to load the serial driver, by chance?
Almost two months old and no activity. What's the status? Can this bug be closed?
I'm sorry for the delay. I dropped out of university, then did not have access to the Internet for a month, started a new job, and well, being a father not much time remained. Actually, the Winmodem workaround did not help, but I don't mind if you close the bug. I'm using "noirqdebug" right now, and if I find a clean solution, I'll post it to LKML.
> ... being a father not much time remained. I know the feeling. Congratulations! > Actually, the Winmodem workaround did not help, but I don't mind if you close > the bug. I'm using "noirqdebug" right now, and if I find a clean solution, I'll > post it to LKML. Interesting. Any chance you can post a dmesg log from the attempt with the WinModem quirk? I know you posted one earlier (comment 19) but I suspect the serial driver wasn't loaded there.
Ping! I was cleaning out some old patches and came across the quirk to prevent winmodems from being claimed by the serial driver. That still seems reasonable to me, and I still think it should fix the problem you're seeing. Comment 6 says you see the IRQ problem if and only if the serial driver claims the winmodem and prints this: 0000:00:09.1: ttyS2 at I/O 0x1478 (irq = 11) is a 16450 The winmodem quirk should prevent that from ever happening, so I'd like to see a dmesg log that exhibits the problem when the winmodem quirk is applied.
Ping. Petr, your last update says the Winmodem quirk didn't solve the problem, but we never had a chance to figure out exactly *why* it didn't work. Would you be able to post a dmesg log (with the Winmodem quirk applied) that shows the problem?
I'm sorry for the late reply (I've had some trouble with email delivery), but it can be left as rejected, as I can no longer see the bug. We may assume that it was fixed as a side effect of another change (I'm currently running 2.6.20.3). Thank you for your time, anyway!