Bug 18492
Summary: | kernel softirq warning on boot | ||
---|---|---|---|
Product: | Drivers | Reporter: | Michal Suchanek (hramrach) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, florian, maciej.rutecki, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.36-rc4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 16444 |
Description
Michal Suchanek
2010-09-14 15:19:25 UTC
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 14 Sep 2010 15:19:30 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=18492 > > Summary: kernel softirq warning on boot > Product: Drivers > Version: 2.5 > Kernel Version: 2.6.36-rc4 Looks like a post-2.6.35 regression. I'd be looking at the rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU usage". > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Network > AssignedTo: drivers_network@kernel-bugs.osdl.org > ReportedBy: hramrach@centrum.cz > Regression: No > > > In the 2.6.36 rc kernels I noticed a warning on every boot. > > It may be related to upgrading the cpu to dual-core rather than upgrading the > kernel. > > [ 14.332011] ------------[ cut here ]------------ > [ 14.332021] WARNING: at kernel/softirq.c:143 > _local_bh_enable_ip+0x41/0x8b() > [ 14.332024] Hardware name: System Product Name > [ 14.332026] Modules linked in: bridge stp llc reiserfs fuse coretemp > hwmon_vid radeonfb fb_d > dc loop firewire_sbp2 snd_hda_codec_realtek snd_hda_intel snd_hda_codec > snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_mid > i snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device > asus_atk0110 > snd wacom processor soundcore button psmouse lm6 > 3 snd_page_alloc led_class parport_pc i2c_i801 shpchp parport serio_raw > joydev > evdev tpm_tis pci_hotplug tpm tpm_bios rng_core > ext3 jbd mbcache usbhid hid sg sr_mod cdrom sd_mod crc_t10dif radeon > pata_acpi > ttm ata_generic drm_kms_helper ata_piix ehci_hcd > uhci_hcd drm i2c_algo_bit i2c_core libata atl1 thermal thermal_sys > firewire_ohci usbcore mii firewire_core crc_itu_t scsi_mod > nls_base [last unloaded: scsi_wait_scan] > [ 14.332101] Pid: 0, comm: swapper Not tainted 2.6.36-rc4-r600fence-amd64 > #1 > [ 14.332104] Call Trace: > [ 14.332106] <IRQ> [<ffffffff810442f8>] ? warn_slowpath_common+0x78/0x8c > [ 14.332116] [<ffffffff8104985b>] ? _local_bh_enable_ip+0x41/0x8b > [ 14.332122] [<ffffffff812515e5>] ? netpoll_rx+0x6d/0x79 > [ 14.332126] [<ffffffff81253721>] ? netif_rx+0xd/0x7b > [ 14.332130] [<ffffffff810403a6>] ? rebalance_domains+0xd1/0x13b > [ 14.332135] [<ffffffff810aaf19>] ? perf_event_task_tick+0x6a/0x186 > [ 14.332146] [<ffffffffa00c2994>] ? atl1_intr+0x8d9/0x9a5 [atl1] > [ 14.332151] [<ffffffff8108d68b>] ? handle_IRQ_event+0x4c/0x104 > [ 14.332156] [<ffffffff8100e6a6>] ? read_tsc+0x5/0x16 > [ 14.332160] [<ffffffff8108f2fb>] ? handle_edge_irq+0xd9/0x120 > [ 14.332164] [<ffffffff8100af68>] ? handle_irq+0x17/0x1f > [ 14.332168] [<ffffffff8100a62a>] ? do_IRQ+0x54/0xb9 > [ 14.332173] [<ffffffff81305093>] ? ret_from_intr+0x0/0x11 > [ 14.332175] <EOI> [<ffffffff8100fb50>] ? mwait_idle+0x75/0x80 > [ 14.332182] [<ffffffff8100fb02>] ? mwait_idle+0x27/0x80 > [ 14.332187] [<ffffffff81007b64>] ? cpu_idle+0xaa/0x120 > [ 14.332192] [<ffffffff816bad4f>] ? start_kernel+0x3ce/0x3d9 > [ 14.332197] [<ffffffff816ba3ba>] ? x86_64_start_kernel+0xf9/0x106 > [ 14.332200] ---[ end trace 76feca083f1d81b1 ]--- > On 14 September 2010 21:07, Andrew Morton <akpm@linux-foundation.org> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 14 Sep 2010 15:19:30 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=18492 >> >> Summary: kernel softirq warning on boot >> Product: Drivers >> Version: 2.5 >> Kernel Version: 2.6.36-rc4 > > Looks like a post-2.6.35 regression. I'd be looking at the Indeed, I found a log from booting 2.6.35 kernel while trying to work around anouther issue and the warning did not show in it. > rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU > usage". Do you mean de85d99eb7b595f6751550184b94c1e2f74a828b? I am not sure reverting just that would work, nor that the kernel around that point is in shape for trying on a system I would want to still use. It's way back in history for a kernel which is just to be released. Thanks Michal On Tue, 14 Sep 2010 22:45:19 +0200 Michal Suchanek <hramrach@centrum.cz> wrote: > > rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU > > usage". > > Do you mean de85d99eb7b595f6751550184b94c1e2f74a828b? yup. > I am not sure reverting just that would work, nor that the kernel > around that point is in shape for trying on a system I would want to > still use. It's way back in history for a kernel which is just to be > released. No, we wouldn't want to revert it - it fixes stuff. It could be that we just don't need the _bh locking, if local irqs are reliably disabled. But I didn't really look. From: Andrew Morton <akpm@linux-foundation.org> Date: Tue, 14 Sep 2010 13:57:30 -0700 > On Tue, 14 Sep 2010 22:45:19 +0200 > Michal Suchanek <hramrach@centrum.cz> wrote: > >> > rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU >> > usage". >> >> Do you mean de85d99eb7b595f6751550184b94c1e2f74a828b? > > yup. > >> I am not sure reverting just that would work, nor that the kernel >> around that point is in shape for trying on a system I would want to >> still use. It's way back in history for a kernel which is just to be >> released. > > No, we wouldn't want to revert it - it fixes stuff. > > It could be that we just don't need the _bh locking, if local irqs are > reliably disabled. But I didn't really look. I'll take a closer look at this if Herbert doesn't beat me to it. Reply-To: herbert@gondor.hengli.com.au On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote: > > I'll take a closer look at this if Herbert doesn't beat me to it. I thought we were going to use this patch until Paul's BH warning removal made it into mainline? netpoll: Disable IRQ around RCU dereference in netpoll_rx We cannot use rcu_dereference_bh safely in netpoll_rx as we may be called with IRQs disabled. We could however simply disable IRQs as that too causes BH to be disabled and is safe in either case. Thanks to John Linville for discovering this bug and providing a patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h index 413742c..54f286b 100644 --- a/include/linux/netpoll.h +++ b/include/linux/netpoll.h @@ -63,20 +63,20 @@ static inline bool netpoll_rx(struct sk_buff *skb) unsigned long flags; bool ret = false; - rcu_read_lock_bh(); + local_irq_save(flags); npinfo = rcu_dereference_bh(skb->dev->npinfo); if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags)) goto out; - spin_lock_irqsave(&npinfo->rx_lock, flags); + spin_lock(&npinfo->rx_lock); /* check rx_flags again with the lock held */ if (npinfo->rx_flags && __netpoll_rx(skb)) ret = true; - spin_unlock_irqrestore(&npinfo->rx_lock, flags); + spin_unlock(&npinfo->rx_lock); out: - rcu_read_unlock_bh(); + local_irq_restore(flags); return ret; } Thanks, From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 15 Sep 2010 14:46:55 +0800 > On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote: >> >> I'll take a closer look at this if Herbert doesn't beat me to it. > > I thought we were going to use this patch until Paul's BH warning > removal made it into mainline? > > netpoll: Disable IRQ around RCU dereference in netpoll_rx I hadn't anticipated it taking this long for Paul's patch to get upstream. So I only queued this up for -stable, not Linus's tree. Paul has started the work of moving his patch upstream, so I suspect that if I queued this patch of our's up to my tree now it would land in Linus's tree about the same time :-) If something non-trivially stalls Paul's submission work, we can queue this up. Patch: "netpoll: Disable IRQ around RCU dereference in netpoll_rx" On 15 September 2010 08:46, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote: >> >> I'll take a closer look at this if Herbert doesn't beat me to it. > > I thought we were going to use this patch until Paul's BH warning > removal made it into mainline? > > netpoll: Disable IRQ around RCU dereference in netpoll_rx > > We cannot use rcu_dereference_bh safely in netpoll_rx as we may > be called with IRQs disabled. We could however simply disable > IRQs as that too causes BH to be disabled and is safe in either > case. > > Thanks to John Linville for discovering this bug and providing > a patch. > With the patch applied I no longet get the warning. Thanks Michal From: Michal Suchanek <hramrach@centrum.cz> Date: Fri, 17 Sep 2010 20:00:42 +0200 > On 15 September 2010 08:46, Herbert Xu <herbert@gondor.apana.org.au> wrote: >> On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote: >>> >>> I'll take a closer look at this if Herbert doesn't beat me to it. >> >> I thought we were going to use this patch until Paul's BH warning >> removal made it into mainline? >> >> netpoll: Disable IRQ around RCU dereference in netpoll_rx >> >> We cannot use rcu_dereference_bh safely in netpoll_rx as we may >> be called with IRQs disabled. We could however simply disable >> IRQs as that too causes BH to be disabled and is safe in either >> case. >> >> Thanks to John Linville for discovering this bug and providing >> a patch. >> > With the patch applied I no longet get the warning. Ok I'll toss this upstream since I have no indication that Paul's RCU change will hit Linus's tree any time soon. Thanks. Patch : https://bugzilla.kernel.org/show_bug.cgi?id=18492#c5 Handled-By : Herbert Xu <herbert@gondor.apana.org.au> *** Bug 17321 has been marked as a duplicate of this bug. *** Fixed by commit f0f9deae9e7c421fa0c1c627beb8e174325e1ba7 . *** Bug 17131 has been marked as a duplicate of this bug. *** |