Bug 7022
Summary: | e1000 - IRQ - Nobody cared | ||
---|---|---|---|
Product: | Drivers | Reporter: | Goudal Francois (francois) |
Component: | Network | Assignee: | Jeff Garzik (jgarzik) |
Status: | REJECTED INVALID | ||
Severity: | normal | CC: | auke-jan.h.kok, marcin.slusarz |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.15.3 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Kernel config file
proposed workaround |
Description
Goudal Francois
2006-08-18 00:56:47 UTC
I think you should: - attach .config - show output of lspci - try newer kernel Created attachment 8851 [details]
Kernel config file
Here is the output of lspci : 00:00.0 Host bridge: Intel Corp.: Unknown device 3580 (rev 02) 00:00.1 System peripheral: Intel Corp.: Unknown device 3584 (rev 02) 00:00.3 System peripheral: Intel Corp.: Unknown device 3585 (rev 02) 00:02.0 VGA compatible controller: Intel Corp.: Unknown device 3582 (rev 02) 00:1d.0 USB Controller: Intel Corp.: Unknown device 24c2 (rev 03) 00:1d.1 USB Controller: Intel Corp.: Unknown device 24c4 (rev 03) 00:1d.7 USB Controller: Intel Corp.: Unknown device 24cd (rev 03) 00:1e.0 PCI bridge: Intel Corp. 82820 820 (Camino 2) Chipset PCI (-M) (rev 83) 00:1f.0 ISA bridge: Intel Corp.: Unknown device 24cc (rev 03) 00:1f.1 IDE interface: Intel Corp.: Unknown device 24ca (rev 03) 00:1f.3 SMBus: Intel Corp.: Unknown device 24c3 (rev 03) 00:1f.5 Multimedia audio controller: Intel Corp.: Unknown device 24c5 (rev 03) 01:04.0 Ethernet controller: Intel Corp. 82559ER (rev 14) 01:05.0 Ethernet controller: Intel Corp.: Unknown device 1076 (rev 05) I will try to install a newer kernel to make some tests. If you already have an idea, please tell me about it. I will givee you some news when the new kernel will be installed. Eventually, I just built a 2.6.17.10 kernel, using the same .config but with running make menuconfig to check for obsolete config flags. The problem is exactly the same, I get an IRQ, apparently from the Intel gigabit card, but nobody cares about it. Before getting this message, the card works fine and after, it doesn't work anymore cause the IRQ line has been disabled by the kernel. Thank's for your help. Regards. what happens if you enable "Use Rx Polling (NAPI)" (CONFIG_E1000_NAPI) below "Intel(R) PRO/1000 Gigabit Ethernet support" I just tried with the NAPI enabled but the problem is the same : It works fine with irqpoll but without it, the IRQ becomes disabled. I just found something really surprising so it may be really helpful : When I run without irqpoll, the message "Nobody cares, etc..." appears exacly when the interrupts counter on IRQ line 9 reaches 100000. I have noticed that by looking at /proc/interrupts. Everything works fine while this count is lower than 100000 but at the exact moment when it reaches 100000, the kernel disables the IRQ line, and then, this counter remains indefinitely to 100000 because the IRQ has been masked again by the kernel. I noticed that with both kernels (the one without NAPI and the one with NAPI) but it seems that without NAPI, the 100000 count is reached much faster. By the way, I'm quite surprised of this interrupt count, 100000 looks like a lot, considering other peripherials at the same moment. Furthermore, the card isn't used a lot, I'm just pinging it with another computer, but even if there is nothing wired on the card, I get the problem. Does it looks like familiar to you ? Thanks. it looks like nic is generating some faked interrupts and when unhandled_irqs reaches 100000 kernel disables this irq line why "faked interrupt"? because documentation for "Interrupt Cause Read Register" says: "This register contains all interrupt conditions for the Ethernet controller. Each time an interrupt causing event occurs, the corresponding interrupt bit is set in the register. (..)" but this register is sometimes(?) empty at the beginning of e1000_intr i don't have any idea why... ps: note that i'm not a kernel developer Well, thank's for this information. I will try to have a look in the kernel source code, trying to find a way to inhibit unhandled_interrupts, in order for the kernel not to disable the IRQ line, even if it's ugly. That even would be better than irqpoll. Well, if you're not a kernel developper, do you have an idea to who I can ask for help then ? Thank's a lot for your help ! I just found these two lines in the e1000_intr function in e1000_main.c : if(unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ where icr is the register you were talking about. I assume that when the if condition is true, we generate an IRQ Nobody cared to the kernel. I assume then that if this condition is true 100000 times, then the kernel disables the IRQ line. Then I'll try to comment out these two lines and build a "hacked" kernel ^^. I don't have access to the hardware right now so I will test this tomorrow. But I assume that this test, should never be true, the hardware seems to be a little aggressive on his IRQ line and that's bad, the CPU will always jump in the interrupt handler to do ... nothing... I'll give you some news when I'll have my tests done. Thank's ! Created attachment 8865 [details]
proposed workaround
The patch you proposed didn't worked well cause it seems that the amount of IRQ with no bit set in the register is really high so, sometimes, 100000 interrupts without a bit set are raised without one "good" interrupt. It just gives more chance for the kernel to keep the IRQ line enabled, but it does not solve the problem. I have made another patch that solves the problem, in my specific case : I replaced return IRQ_NONE by return IRQ_HANDLED when the register value is 0. In my specific case, it works fine now cause the ethernet card is alone on his IRQ line, but I assume that if another peripherial is on the same IRQ line, it won't work because the card driver would probably catch some interrupts that would be addressed to the other peripherial's own driver, causing some malfunctions on this hardware. Now, for me, the problem is solved, but this solution isn't reliable, thus I would like a kernel developper to tell me what he thinks about all this considering this bug SOLVED. I'll keep watching this page if someone else wants some new elements about the problem I had. After some wide investigations, the problem is not due to the ethernet card, sorry. It definetly seems that this motherboard model has a strange workaround with the IRQ line 9. And the default BIOS configuration puts only the ethernet card on this interrupt line... Apparently it is possible to make the BIOS to not assign IRQ9 to any peripherial, which solves the problem, but this motherboard is definetly crap... |