Bug 5810
Summary: | e100 IRQ problem | ||
---|---|---|---|
Product: | Drivers | Reporter: | Petr Tesarik (kernel) |
Component: | Serial | Assignee: | Russell King (rmk) |
Status: | REJECTED WILL_NOT_FIX | ||
Severity: | normal | CC: | akpm, bjorn.helgaas, jbrandeb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.14 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
output of lspci -xx
change lucent winmodem programming model to "other" output of dmesg output of dmesg with the Winmodem workaround applied |
Description
Petr Tesarik
2006-01-02 03:21:08 UTC
Is it possible to disable any of the other PCI devices in the BIOS? I guess not.. Have you tried disabling acpi? Can you confirm that the problem is intermittent? That the e100 only fails on every second or third open? No, the spurious interrupts do occur every time the device is reset in e100_hw_reset(). However, because normally there are some (sufficiently numerous) handled IRQs between two calls to e100_hw_reset(), it does not trigger the IRQ debugging code. I could probably compile a version of the module that would count the spurious interrupts, but I don't have a hardware debugging board to tell you exactly what happens on the PCI bus, so it might just as well be unneeded. You're right, the other devices cannot be disabled in BIOS. I haven't tried booting with the "noacpi" option yet. Stay tuned... OK, so booting with "pci=noacpi" results in: 1. the bug going away. (wow!) 2. performance going down. (Well, OK, broken HW won't operate at the best possible speed) 3. one change in the detected devices that might be relevant: with ACPI: Jan 2 07:57:51 localhost kernel: ACPI: PCI Interrupt 0000:00:09.1[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Jan 2 07:57:51 localhost kernel: 0000:00:09.1: ttyS2 at I/O 0x1478 (irq = 11) is a 16450 without ACPI: Jan 2 14:43:48 localhost kernel: PCI: Found IRQ 11 for device 0000:00:09.1 Jan 2 14:43:48 localhost kernel: PCI: Sharing IRQ 11 with 0000:00:09.0 Note that 0000:00:09.1 is the winmodem in my config. So, is this the end of the story, or may I help improve the performance by further hacking? I should paste the actual IRQ routing here. with ACPI: ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs *9 11 12) ACPI: PCI Interrupt Link [LNKB] (IRQs 9 *11 12) ACPI: PCI Interrupt Link [LNKC] (IRQs *9 11 12) ACPI: PCI Interrupt Link [LNKD] (IRQs 9 *11 12) ACPI: Embedded Controller [EC0] (gpe 1) ACPI: Power Resource [PLPC] (on) PCI: Using ACPI for IRQ routing PCI: Enabling device 0000:00:0a.0 (0000 -> 0003) ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 PCI: setting IRQ 11 as level-triggered ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKD] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt 0000:00:07.2[D] -> Link [LNKD] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 9 PCI: setting IRQ 9 as level-triggered ACPI: PCI Interrupt 0000:00:07.5[C] -> Link [LNKC] -> GSI 9 (level, low) -> IRQ 9 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI Interrupt 0000:00:09.1[A] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 without ACPI: PCI: Using IRQ router VIA [1106/0686] at 0000:00:07.0 PCI: IRQ 0 for device 0000:00:0a.0 doesn't match PIRQ mask - try pci=usepirqmask PCI: Found IRQ 11 for device 0000:00:0a.0 PCI: Sharing IRQ 11 with 0000:00:07.2 PCI: Enabling device 0000:00:0a.0 (0000 -> 0003) PCI: Found IRQ 11 for device 0000:00:0a.0 PCI: Sharing IRQ 11 with 0000:00:07.2 PCI: Found IRQ 11 for device 0000:00:07.2 PCI: Sharing IRQ 11 with 0000:00:0a.0 PCI: Found IRQ 9 for device 0000:00:07.5 PCI: Found IRQ 11 for device 0000:00:09.0 PCI: Sharing IRQ 11 with 0000:00:09.1 PCI: Found IRQ 11 for device 0000:00:09.1 PCI: Sharing IRQ 11 with 0000:00:09.0 This means that the Cardbus controller shares LNKD at IRQ 11 with the USB controller and the Ethernet card shares LNKB at IRQ 11 with the Lucent winmodem, while the sound card is on its own LNKC at IRQ 9. That said, we may conclude that the combo MiniPCI does not like Linux ACPI IRQ routing, and that winmodems are evil, anyway. :) However, it would be nice if you discovered why the default routing works better. Dropping severity to low. Reassigning to acpi_config-interrupts@kernel-bugs.osdl.org I am sorry for the comment where I stated that disabling ACPI solves the problem. This is NOT TRUE, and it probably has nothing to do with ACPI, because the bug is back today. :( I tried booting with "acpi=noirq", then "pci=noacpi" and then even with "acpi=off", and the spurious interrupt occured in all three cases. However, the bug seems to be in the serial driver, not in e100, because: The bug occurs IFF (if and only if) the serial driver detects the winmodem as a 16450. That is, the spurious interrupts occur iff syslog contains this line: localhost kernel: 0000:00:09.1: ttyS2 at I/O 0x1478 (irq = 11) is a 16450 This happens on some boots. On other boots, there is no such line (even though the device seems to be detected because it is assigned an interrupt, as can be seen in the log), and then resetting the Ethernet controller does not generate any extra interrupts. I guess the correct solution would be to ignore the winmodem in serial8250_pci somehow. However, I suspect doing so would inconvenience other users. Are you saying that we sometimes detect this modem as a 16450 and other times as a 16550 or something? In any case, I've no idea about this bug - I don't think I can progress it in any way. Sorry. Hi, Russell! No, I don't mean the chip gets detected incorrectly. It either gets detected as 16550A, or it doesn't get detected at all. However, I don't want the serial driver to work with this hardware (it is a winmodem), I only want the driver not to touch my winmodem in any way whatsoever. The driver can't handle the winmodem anyway, but during initialization something goes wrong and later the winmodem generates some bogus interrupts when the Ethernet controller (on the same MiniPCI board) is used. You see, there is no stable driver for the winmodem, so I don't want to use one, but the serial driver is then considered by the kernel to be the best driver for that device, since it claims to be able to handle any PCI serial device; see pci_device_id serial_pci_tbl in 8250_pci.c (at the end of the file): { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_COMMUNICATION_SERIAL << 8, 0xffff00, pbn_default }, Hmm, okay. But I still don't know what to do because solving your problem could well create a lot of problems for other people. I think this is a case where we can't satisfy everyone, so we can only maintain the current status quo. Hi! Pleased that you're still interested. :) Well, I don't think the default behavior should change. I'd be completely happy with a kernel option telling the driver to "skip this very device and not to try to do anything with it". I can't disable the driver completely because there are other PCI chips in my computer which I do want to use. I miss the possibility to disable one particular serial device. Does that seem plausible? It seems like a bug that the Lucent WinModem claims to use the 8250 programming interface, when it seems that it's not 100% compatible. Maybe we could work around this with a PCI quirk that recognizes the specific Lucent device and changes the programming interface so 8250_pci won't claim it. Can you include the output of "lspci -xx" so we can see exactly what IDs the WinModem uses? Petr, can I trouble you to attach the output of "lspci -xx", so we can see exactly what IDs the Lucent WinModem uses? There's also a superficially similar issue, http://bugzilla.kernel.org/show_bug.cgi?id=5918 which also involves e100 and an "irq 11: nobody cared" message. There's a patch to fix that issue in 2.6.17-rc1-mm3. If you can try that, I'd be interested in seeing the complete dmesg log. It involves a boot-time quirk for the e100 device. Created attachment 7966 [details]
output of lspci -xx
Hi Bjorn,
sorry for the delay, but as it happens, I've sort of forgotten about this bug.
Anyway, here's what you asked for.
I've just applied the patch, installed a new kernel and rebooted. Now, I can't tell if the quirk really helps, because I didn't manage to get a boot with "ttyS2" detected. However, that could mean either that the bug is gone, or that it might pop up later. Anyway, for the time being, I am closing the bug and if I encounter it again, I might re-open it. Thanks to everyone! Created attachment 7967 [details] change lucent winmodem programming model to "other" > Now, I can't tell if the quirk really helps, because I didn't manage to get a > boot with "ttyS2" detected. If the e100 quirk did anything, you would see a "PCI: Firmware left XXX e100 interrupts enabled, disabling" note in your dmesg. If you don't see that message, the e100 quirk did nothing. If it IS doing something, I'd like to know that, because Jeff Garzik isn't sure the problem is common enough to be worth the trouble of a quirk. If the e100 quirk IS disabling e100 interrupts on your box, it could be that that is enough to make the serial driver will stop misdetecting the winmodem as a 16x50 device. In that case, the problem would disappear. It still seems wrong to me that the winmodem claims to be 8250-compatible, though. So if you see the problem again, try this patch, which should make the serial driver leave the winmodem alone. Hi, I'm afraid the problem is back again today. I applied the patch from bug 5918. It DID do something ("PCI: Firmware left XXX e100 interrupts enabled, disabling", see the attached output of dmesg), but obviously it is not enough. *sigh* Created attachment 7971 [details]
output of dmesg
This is the failed dmesg (without the patch).
> I applied the patch from bug 5918. It DID do something ("PCI: Firmware > left XXX e100 interrupts enabled, disabling", see the attached output > of dmesg), but obviously it is not enough. *sigh* OK, good, thanks for that report. So it's useful to know that (1) the e100 quirk is applicable to more than just the single machine I knew about previously, and (2) that's not enough to prevent the serial driver from claiming the WinModem. Can you also try the WinModem quirk I posted (attachment 7967 [details])? I don't know how Russell feels about that approach, but it should prevent the serial driver from claiming the WinModem. Created attachment 8013 [details] output of dmesg with the Winmodem workaround applied I applied the patch you provided (attachment 7967 [details]) and switched the machine on and off a few times. Again, I cannot tell for 100%, since the original bug was occuring only occasionally (i.e. on some power-ons) and now, that the automatic detection is overriden, I cannot even tell whether this is such a power-on, but the bug seems to be gone away. However, I have a suspicion that my machine obeys the Murphy Laws, so please wait at least a few days to see if the bug pops up eventually. ;) Thanks for trying the patch. I'm confused about one thing, though: your latest dmesg doesn't show any ttyS devices. You should have ttyS0 (0x3f8) and ttyS1 (0x2f8). Did you forget to load the serial driver, by chance? Almost two months old and no activity. What's the status? Can this bug be closed? I'm sorry for the delay. I dropped out of university, then did not have access to the Internet for a month, started a new job, and well, being a father not much time remained. Actually, the Winmodem workaround did not help, but I don't mind if you close the bug. I'm using "noirqdebug" right now, and if I find a clean solution, I'll post it to LKML. > ... being a father not much time remained. I know the feeling. Congratulations! > Actually, the Winmodem workaround did not help, but I don't mind if you close > the bug. I'm using "noirqdebug" right now, and if I find a clean solution, I'll > post it to LKML. Interesting. Any chance you can post a dmesg log from the attempt with the WinModem quirk? I know you posted one earlier (comment 19) but I suspect the serial driver wasn't loaded there. Ping! I was cleaning out some old patches and came across the quirk to prevent winmodems from being claimed by the serial driver. That still seems reasonable to me, and I still think it should fix the problem you're seeing. Comment 6 says you see the IRQ problem if and only if the serial driver claims the winmodem and prints this: 0000:00:09.1: ttyS2 at I/O 0x1478 (irq = 11) is a 16450 The winmodem quirk should prevent that from ever happening, so I'd like to see a dmesg log that exhibits the problem when the winmodem quirk is applied. Ping. Petr, your last update says the Winmodem quirk didn't solve the problem, but we never had a chance to figure out exactly *why* it didn't work. Would you be able to post a dmesg log (with the Winmodem quirk applied) that shows the problem? I'm sorry for the late reply (I've had some trouble with email delivery), but it can be left as rejected, as I can no longer see the bug. We may assume that it was fixed as a side effect of another change (I'm currently running 2.6.20.3). Thank you for your time, anyway! |