Bug 8455
Summary: | panic with e1000 driver on HP Integrity servers | ||
---|---|---|---|
Product: | Drivers | Reporter: | Doug Chapman (doug.chapman) |
Component: | Network | Assignee: | Jeff Garzik (jgarzik) |
Status: | CLOSED CODE_FIX | ||
Severity: | blocking | CC: | auke-jan.h.kok, jbrandeb, michal.k.k.piotrowski, shawn.starr |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.21-git | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
stack trace from panic on HP Integrity ia64 server
.config from kernel build lspci -vv output ethtool -e eth0 output dmesg output |
Description
Doug Chapman
2007-05-08 12:23:48 UTC
Created attachment 11437 [details]
stack trace from panic on HP Integrity ia64 server
Created attachment 11438 [details]
.config from kernel build
If I compile witout CONFIG_E1000_NAPI the problem goes away. This should help narrow down the defect considerably. Auke, this one appears to be a post-2.6.21 regression. I need some more information: `lspci -vvv`, `dmesg`, `ethtool -e ethX` Created attachment 11458 [details]
lspci -vv output
Taken from original 2.6.21 kernel before defect was introduced.
Created attachment 11459 [details]
ethtool -e eth0 output
Created attachment 11460 [details]
dmesg output
FYI, I tried the same model card in an HP dl380 (x86_64) and I did _not_ see the panic. Appears this panic is just on ia64. status: we are still unable to reproduce. We are also having lots of machine issues, which is effecting our ability to reproduce. Work continues. Here is a little more info if it helps. The panic happens at include/linux/netdevice.h:923 in netif_rx_complete 918 static inline void netif_rx_complete(struct net_device *dev) 919 { 920 unsigned long flags; 921 922 local_irq_save(flags); 923 BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state)); This appears to have been called from e1000_main.c:3962 in e1000_clean 3956 /* If no Tx and not enough Rx work done, exit the polling mode */ 3957 if ((!tx_cleaned && (work_done == 0)) || 3958 !netif_running(poll_dev)) { 3959 quit_polling: 3960 if (likely(adapter->itr_setting & 3)) 3961 e1000_set_itr(adapter); 3962 netif_rx_complete(poll_dev); 3963 e1000_irq_enable(adapter); 3964 return 0; Please let me know if there is any more info I can provide. I can reproduce this quite easily but I don't have the background to really debug it. Tentative patch below, please test this patch: --- Herbert Xu wrote: "netif_poll_enable can only be called if you've previously called netif_poll_disable. Otherwise a poll might already be in action and you may get a crash like this." Removing the call to netif_poll_enable in e1000_open should fix this issue, the only other call to netif_poll_enable is in e1000_up() which is only reached after a device reset or resume. Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> --- drivers/net/e1000/e1000_main.c | 4 ---- 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 49be393..cbc7feb 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -1431,10 +1431,6 @@ e1000_open(struct net_device *netdev) /* From here on the code is the same as e1000_up() */ clear_bit(__E1000_DOWN, &adapter->flags); -#ifdef CONFIG_E1000_NAPI - netif_poll_enable(netdev); -#endif - e1000_irq_enable(adapter); /* fire a link status change interrupt to start the watchdog */ I have tested the patch and it does fix the panic. I believe I hit this also on my e1000 that came with this T42 laptop. Panic on link up. Noticed in 2.6.22-rc1. Yes, I can also confirm the patch below fixes this issue. Linux segfault 2.6.22-rc2 #1 Tue May 22 03:34:35 EDT 2007 i686 GNU/Linux I have tested the latest git tree which includes this patch and it resolves the issue. thanks, - Doug |