Bug 6568
Summary: | forcedeth broken by commit ebf34c9b6fcd22338ef764b039b3ac55ed0e297b | ||
---|---|---|---|
Product: | Drivers | Reporter: | Alistair Strachan (alistair) |
Component: | Network | Assignee: | Ayaz Abdulla (aabdulla) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | akpm, jgarzik, manfred |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.17-rc4 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Alistair Strachan
2006-05-16 15:53:01 UTC
Are you able to send a copy of the oops trace? Reply-To: s0348365@sms.ed.ac.uk On Thursday 18 May 2006 23:08, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=6568 > > akpm@osdl.org changed: > > What |Removed |Added > --------------------------------------------------------------------------- >- CC| |akpm@osdl.org > > > > ------- Additional Comments From akpm@osdl.org 2006-05-18 15:06 ------- > Are you able to send a copy of the oops trace? > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. No serial console. Camera phone shots available from http://devzero.co.uk/~alistair/oops-20060512/ We haven't heard from Ayaz yet? I'm dithering over whether to revert this for 2.6.17. If the crash only occurs a) during heavy traffic and b) during machine shutdown then I guess we can live with it. We need to decide whether this problem is worse than the problem which the buggy patch fixed. Problem is the changelog for that patch is so wholly lame that I am unable to determine that. It's my opinion that it should be reverted. It's too late in the day to be letting crash bugs through. It's also going to affect a lot of people; forcedeth is an end-user, desktop chipset. These people (myself included) actually switch off their machines at night. However, according to private mail from Ayaz, I got the feeling that it might be limited to SMP... We have tried to repro here and are having no such luck. I will continue to root cause it. However, this fix is suppose to fix other crashes, so I don't think it should be reverted but rather root caused and fixed. Alistair, can I give you patches to try out since I can not repro? Just post them and I'll try them. We can't take one step forwards, two back, for 2.6.17. For what it's worth, if I revert the spin_lock -> spin_lock_irq changes (randomly scattered in this file, are they actually correct?) the problem disappears. This might just be coincidental, but it did seem like a fairly arbitrarily change, inconsistent with similar functions in this file.. Ideas? bugme-daemon@bugzilla.kernel.org wrote: >For what it's worth, if I revert the spin_lock -> spin_lock_irq changes >(randomly scattered in this file, are they actually correct?) the problem >disappears. > >This might just be coincidental, but it did seem like a fairly arbitrarily >change, inconsistent with similar functions in this file.. > >Ideas? > > > SMP or UP system? Background of the change: With MSI-X, the driver uses three independant interrupts: One for tx, one for rx, one for the remaining irq sources. Previously, there was only one interrupt - thus spin_lock() was sufficient for functions that are only called from within the interrupt handler. With three interrupts, this is now wrong, thus the spin_lock() calls were replaced by spin_lock_irq(). -- Manfred This is an SMP machine, as indicated in the original bug report and my dmesg. It also, for whatever reason, does not use MSI/MSI-X interrupts even though dmesg indicates that certain devices are capable of it and I have CONFIG_PCI_MSI set (x86-64 kernel). bugme-daemon@bugzilla.kernel.org wrote: > > thus the spin_lock() > calls were replaced by spin_lock_irq(). In that case I'd be looking for a spin_unlock_irq() occurring when interrupts are already disabled. So if the code originally did this: local_irq_disable(); ... spin_lock(); .. spin_unlock(); ... local_irq_enable(); and we went and changed it to this: local_irq_disable(); ... spin_lock_irq(); .. spin_unlock_irq(); ... local_irq_enable(); then we've added a bug, by enabling interrupts too early. I bet changing all those spin_lock_irq()s to spin_lock_irqsave() and spin_unlcok_irqrestore() will fix it. bugme-daemon@bugzilla.kernel.org wrote: >In that case I'd be looking for a spin_unlock_irq() occurring when >interrupts are already disabled. > > > I'm checking that right now: I added BUG_ON(irqs_disable()) in front of every spin_lock_irq(). No hits so far (ifup/ifdown, simple network traffic). -- Manfred Manfred, if we think this is the problem and you can't reproduce it, please send me a diff with your changes and I'll try it here. Attached is the patch. I'm not sure if this is the problem, it could be a true race, and the _irq just causes a sufficient timing change. On my own system, I'm now running into problems, too: slab debugging is enabled, and I get crashes in nv_nic_irq(): struct net_device was already freed, that causes a crash when accessing the io section of the device (i.e. dev->base is 0x5a5a5a5a. Or was it 0x6b6b6b6b?) -- MAnfred --- 2.6/drivers/net/forcedeth.c 2006-05-20 11:49:37.000000000 +0200 +++ build-2.6/drivers/net/forcedeth.c 2006-05-20 13:20:01.000000000 +0200 @@ -1095,6 +1095,7 @@ disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector); } if (nv_alloc_rx(dev)) { + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); if (!np->in_shutdown) mod_timer(&np->oom_kick, jiffies + OOM_REFILL); @@ -1236,6 +1237,7 @@ ((skb_shinfo(skb)->frags[i].size & (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0); } + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); if ((np->next_tx - np->nic_tx + entries - 1) > TX_LIMIT_STOP) { @@ -1469,6 +1471,7 @@ } } + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); /* 1) stop tx engine */ @@ -1804,6 +1807,7 @@ if (netif_running(dev)) { spin_lock_bh(&dev->xmit_lock); + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); /* stop rx engine */ @@ -1873,6 +1877,7 @@ } addr[0] |= NVREG_MCASTADDRA_FORCE; pff |= NVREG_PFF_ALWAYS; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); nv_stop_rx(dev); writel(addr[0], base + NvRegMulticastAddrA); @@ -2160,6 +2165,7 @@ if (!(events & np->irqmask)) break; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); nv_tx_done(dev); spin_unlock_irq(&np->lock); @@ -2169,6 +2175,7 @@ dev->name, events); } if (i > max_interrupt_work) { + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); /* disable interrupts on the nic */ writel(NVREG_IRQ_TX_ALL, base + NvRegIrqMask); @@ -2209,6 +2216,7 @@ nv_rx_process(dev); if (nv_alloc_rx(dev)) { + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); if (!np->in_shutdown) mod_timer(&np->oom_kick, jiffies + OOM_REFILL); @@ -2216,6 +2224,7 @@ } if (i > max_interrupt_work) { + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); /* disable interrupts on the nic */ writel(NVREG_IRQ_RX_ALL, base + NvRegIrqMask); @@ -2255,11 +2264,13 @@ break; if (events & NVREG_IRQ_LINK) { + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); nv_link_irq(dev); spin_unlock_irq(&np->lock); } if (np->need_linktimer && time_after(jiffies, np->link_timeout)) { + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); nv_linkchange(dev); spin_unlock_irq(&np->lock); @@ -2270,6 +2281,7 @@ dev->name, events); } if (i > max_interrupt_work) { + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); /* disable interrupts on the nic */ writel(NVREG_IRQ_OTHER, base + NvRegIrqMask); @@ -2372,6 +2384,7 @@ struct fe_priv *np = netdev_priv(dev); wolinfo->supported = WAKE_MAGIC; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); if (np->wolenabled) wolinfo->wolopts = WAKE_MAGIC; @@ -2383,6 +2396,7 @@ struct fe_priv *np = netdev_priv(dev); u8 __iomem *base = get_hwbase(dev); + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); if (wolinfo->wolopts == 0) { writel(0, base + NvRegWakeUpFlags); @@ -2401,6 +2415,7 @@ struct fe_priv *np = netdev_priv(dev); int adv; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); ecmd->port = PORT_MII; if (!netif_running(dev)) { @@ -2497,6 +2512,7 @@ return -EINVAL; } + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); if (ecmd->autoneg == AUTONEG_ENABLE) { int adv, bmcr; @@ -2587,6 +2603,7 @@ int i; regs->version = FORCEDETH_REGS_VER; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); for (i = 0;i <= np->register_size/sizeof(u32); i++) rbuf[i] = readl(base + i*sizeof(u32)); @@ -2598,6 +2615,7 @@ struct fe_priv *np = netdev_priv(dev); int ret; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); if (np->autoneg) { int bmcr; @@ -2632,6 +2650,7 @@ { struct fe_priv *np = get_nvpriv(dev); + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); /* save vlan group */ @@ -2966,6 +2985,7 @@ /* ask for interrupts */ nv_enable_hw_interrupts(dev, np->irqmask); + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); writel(NVREG_MCASTADDRA_FORCE, base + NvRegMulticastAddrA); writel(0, base + NvRegMulticastAddrB); @@ -3009,6 +3029,7 @@ struct fe_priv *np = netdev_priv(dev); u8 __iomem *base; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); np->in_shutdown = 1; spin_unlock_irq(&np->lock); @@ -3018,6 +3039,7 @@ del_timer_sync(&np->nic_poll); netif_stop_queue(dev); + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); nv_stop_tx(dev); nv_stop_rx(dev); @@ -3292,11 +3314,13 @@ int id1, id2; int phyaddr = i & 0x1F; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); id1 = mii_rw(dev, phyaddr, MII_PHYSID1, MII_READ); spin_unlock_irq(&np->lock); if (id1 < 0 || id1 == 0xffff) continue; + WARN_ON(irqs_disabled()); spin_lock_irq(&np->lock); id2 = mii_rw(dev, phyaddr, MII_PHYSID2, MII_READ); spin_unlock_irq(&np->lock); I believe it is a race. With your patch applied, I no longer see the oops, so I can only assume it affects the timing in some subtle way. The change in -rc3 might not be the root cause; it could be exposing some bug we've been papering over for a long time. Unfortunately it's doing it very reliably on my machine. ;-( Spoke too soon. I still see it, but less often. Now I have to load/unload the module a couple times to get it to happen. Shutting down isn't enough any more. bugme-daemon@bugzilla.kernel.org wrote: >http://bugzilla.kernel.org/show_bug.cgi?id=6568 > > > > > >------- Additional Comments From alistair@devzero.co.uk 2006-05-20 05:23 ------- >I believe it is a race. With your patch applied, I no longer see the oops, so I >can only assume it affects the timing in some subtle way. > >The change in -rc3 might not be the root cause; it could be exposing some bug >we've been papering over for a long time. Unfortunately it's doing it very >reliably on my machine. ;-( > > > I have found it, category extra stupid: The patch moves code from nv_open() into a new nv_request_irq() function. But the code also remains within nv_open(). Thus request_irq() is called twice. During nv_close(), only one instance is freed - thus one irq handler remains active, even if the driver is unloaded. If something causes an irq, then it crashes. Could you try the attached patch? It's untested - I'm rebooting now. -- Manfred --- 2.6/drivers/net/forcedeth.c 2006-05-20 11:49:37.000000000 +0200 +++ build-2.6/drivers/net/forcedeth.c 2006-05-20 14:58:15.000000000 +0200 @@ -2891,78 +2891,6 @@ goto out_drain; } - if (np->msi_flags & NV_MSI_X_CAPABLE) { - for (i = 0; i < (np->msi_flags & NV_MSI_X_VECTORS_MASK); i++) { - np->msi_x_entry[i].entry = i; - } - if ((ret = pci_enable_msix(np->pci_dev, np->msi_x_entry, (np->msi_flags & NV_MSI_X_VECTORS_MASK))) == 0) { - np->msi_flags |= NV_MSI_X_ENABLED; - if (optimization_mode == NV_OPTIMIZATION_MODE_THROUGHPUT) { - /* Request irq for rx handling */ - if (request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector, &nv_nic_irq_rx, SA_SHIRQ, dev->name, dev) != 0) { - printk(KERN_INFO "forcedeth: request_irq failed for rx %d\n", ret); - pci_disable_msix(np->pci_dev); - np->msi_flags &= ~NV_MSI_X_ENABLED; - goto out_drain; - } - /* Request irq for tx handling */ - if (request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector, &nv_nic_irq_tx, SA_SHIRQ, dev->name, dev) != 0) { - printk(KERN_INFO "forcedeth: request_irq failed for tx %d\n", ret); - pci_disable_msix(np->pci_dev); - np->msi_flags &= ~NV_MSI_X_ENABLED; - goto out_drain; - } - /* Request irq for link and timer handling */ - if (request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector, &nv_nic_irq_other, SA_SHIRQ, dev->name, dev) != 0) { - printk(KERN_INFO "forcedeth: request_irq failed for link %d\n", ret); - pci_disable_msix(np->pci_dev); - np->msi_flags &= ~NV_MSI_X_ENABLED; - goto out_drain; - } - - /* map interrupts to their respective vector */ - writel(0, base + NvRegMSIXMap0); - writel(0, base + NvRegMSIXMap1); - set_msix_vector_map(dev, NV_MSI_X_VECTOR_RX, NVREG_IRQ_RX_ALL); - set_msix_vector_map(dev, NV_MSI_X_VECTOR_TX, NVREG_IRQ_TX_ALL); - set_msix_vector_map(dev, NV_MSI_X_VECTOR_OTHER, NVREG_IRQ_OTHER); - } else { - /* Request irq for all interrupts */ - if (request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector, &nv_nic_irq, SA_SHIRQ, dev->name, dev) != 0) { - printk(KERN_INFO "forcedeth: request_irq failed %d\n", ret); - pci_disable_msix(np->pci_dev); - np->msi_flags &= ~NV_MSI_X_ENABLED; - goto out_drain; - } - - /* map interrupts to vector 0 */ - writel(0, base + NvRegMSIXMap0); - writel(0, base + NvRegMSIXMap1); - } - } - } - if (ret != 0 && np->msi_flags & NV_MSI_CAPABLE) { - if ((ret = pci_enable_msi(np->pci_dev)) == 0) { - np->msi_flags |= NV_MSI_ENABLED; - if (request_irq(np->pci_dev->irq, &nv_nic_irq, SA_SHIRQ, dev->name, dev) != 0) { - printk(KERN_INFO "forcedeth: request_irq failed %d\n", ret); - pci_disable_msi(np->pci_dev); - np->msi_flags &= ~NV_MSI_ENABLED; - goto out_drain; - } - - /* map interrupts to vector 0 */ - writel(0, base + NvRegMSIMap0); - writel(0, base + NvRegMSIMap1); - /* enable msi vector 0 */ - writel(NVREG_MSI_VECTOR_0_ENABLED, base + NvRegMSIIrqMask); - } - } - if (ret != 0) { - if (request_irq(np->pci_dev->irq, &nv_nic_irq, SA_SHIRQ, dev->name, dev) != 0) - goto out_drain; - } - /* ask for interrupts */ nv_enable_hw_interrupts(dev, np->irqmask); Argh, I was looking for a nesting bug like this and could not find it from the changediff, but yes in context I see the problem! I've just rebooted the machine twice (successfully) with the patch; somebody should probably block Jeff's revert that just hit LKML. May I suggest better descriptions in future, so words like "refactor" make us think about why code isn't being deleted. Good catch Manfred! Thanks Alistair for your help. On a side note, I am trying to grasp why an interrupt would happen after nv_close and cause the "unfree'd" irq line to get called. We disable hardware interrupts so there is no reason for it to get called. Also, what I was originally thinking was the following happening in nv_close: netif_stop_queue(dev); spin_lock_irq(&np->lock); // This will not prevent interrupt on 2nd CPU nv_stop_tx(dev); nv_stop_rx(dev); nv_txrx_reset(dev); // We clear out the rings (thereby making all entries host owned). If interrupt comes right now, then ISR handler could think it needs to complete ring entries that are not owned by hardware and touch skbuff[] array with NULL pointer. /* disable interrupts on the nic or we will lock up */ base = get_hwbase(dev); nv_disable_hw_interrupts(dev, np->irqmask); pci_push(base); dprintk(KERN_INFO "%s: Irqmask is zero again\n", dev->name); spin_unlock_irq(&np->lock); nv_free_irq(dev); ... Instead we should do the following to be safe: netif_stop_queue(dev); nv_disable_hw_interrupts(dev, np->irqmask); pci_push(base); nv_disable_irq(dev); nv_stop_tx(dev); nv_stop_rx(dev); nv_txrx_reset(dev); nv_free_irq(dev); .... |