Bug 18492

Summary: kernel softirq warning on boot
Product: Drivers Reporter: Michal Suchanek (hramrach)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16444    

Description Michal Suchanek 2010-09-14 15:19:25 UTC
In the 2.6.36 rc kernels I noticed a warning on every boot.

It may be related to upgrading the cpu to dual-core rather than upgrading the kernel.

 [   14.332011] ------------[ cut here ]------------
 [   14.332021] WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x41/0x8b()
 [   14.332024] Hardware name: System Product Name
 [   14.332026] Modules linked in: bridge stp llc reiserfs fuse coretemp hwmon_vid radeonfb fb_d
dc loop firewire_sbp2 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_mid
i snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device asus_atk0110 snd wacom processor soundcore button psmouse lm6
3 snd_page_alloc led_class parport_pc i2c_i801 shpchp parport serio_raw joydev evdev tpm_tis pci_hotplug tpm tpm_bios rng_core 
ext3 jbd mbcache usbhid hid sg sr_mod cdrom sd_mod crc_t10dif radeon pata_acpi ttm ata_generic drm_kms_helper ata_piix ehci_hcd
 uhci_hcd drm i2c_algo_bit i2c_core libata atl1 thermal thermal_sys firewire_ohci usbcore mii firewire_core crc_itu_t scsi_mod 
nls_base [last unloaded: scsi_wait_scan]
 [   14.332101] Pid: 0, comm: swapper Not tainted 2.6.36-rc4-r600fence-amd64 #1
 [   14.332104] Call Trace:
 [   14.332106]  <IRQ>  [<ffffffff810442f8>] ? warn_slowpath_common+0x78/0x8c
 [   14.332116]  [<ffffffff8104985b>] ? _local_bh_enable_ip+0x41/0x8b
 [   14.332122]  [<ffffffff812515e5>] ? netpoll_rx+0x6d/0x79
 [   14.332126]  [<ffffffff81253721>] ? netif_rx+0xd/0x7b
 [   14.332130]  [<ffffffff810403a6>] ? rebalance_domains+0xd1/0x13b
 [   14.332135]  [<ffffffff810aaf19>] ? perf_event_task_tick+0x6a/0x186
 [   14.332146]  [<ffffffffa00c2994>] ? atl1_intr+0x8d9/0x9a5 [atl1]
 [   14.332151]  [<ffffffff8108d68b>] ? handle_IRQ_event+0x4c/0x104
 [   14.332156]  [<ffffffff8100e6a6>] ? read_tsc+0x5/0x16
 [   14.332160]  [<ffffffff8108f2fb>] ? handle_edge_irq+0xd9/0x120
 [   14.332164]  [<ffffffff8100af68>] ? handle_irq+0x17/0x1f
 [   14.332168]  [<ffffffff8100a62a>] ? do_IRQ+0x54/0xb9
 [   14.332173]  [<ffffffff81305093>] ? ret_from_intr+0x0/0x11
 [   14.332175]  <EOI>  [<ffffffff8100fb50>] ? mwait_idle+0x75/0x80
 [   14.332182]  [<ffffffff8100fb02>] ? mwait_idle+0x27/0x80
 [   14.332187]  [<ffffffff81007b64>] ? cpu_idle+0xaa/0x120
 [   14.332192]  [<ffffffff816bad4f>] ? start_kernel+0x3ce/0x3d9
 [   14.332197]  [<ffffffff816ba3ba>] ? x86_64_start_kernel+0xf9/0x106
 [   14.332200] ---[ end trace 76feca083f1d81b1 ]---
Comment 1 Andrew Morton 2010-09-14 19:08:08 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 14 Sep 2010 15:19:30 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=18492
> 
>            Summary: kernel softirq warning on boot
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.36-rc4

Looks like a post-2.6.35 regression.  I'd be looking at the
rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU
usage".


>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: hramrach@centrum.cz
>         Regression: No
> 
> 
> In the 2.6.36 rc kernels I noticed a warning on every boot.
> 
> It may be related to upgrading the cpu to dual-core rather than upgrading the
> kernel.
> 
>  [   14.332011] ------------[ cut here ]------------
>  [   14.332021] WARNING: at kernel/softirq.c:143
> _local_bh_enable_ip+0x41/0x8b()
>  [   14.332024] Hardware name: System Product Name
>  [   14.332026] Modules linked in: bridge stp llc reiserfs fuse coretemp
> hwmon_vid radeonfb fb_d
> dc loop firewire_sbp2 snd_hda_codec_realtek snd_hda_intel snd_hda_codec
> snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_mid
> i snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device
> asus_atk0110
> snd wacom processor soundcore button psmouse lm6
> 3 snd_page_alloc led_class parport_pc i2c_i801 shpchp parport serio_raw
> joydev
> evdev tpm_tis pci_hotplug tpm tpm_bios rng_core 
> ext3 jbd mbcache usbhid hid sg sr_mod cdrom sd_mod crc_t10dif radeon
> pata_acpi
> ttm ata_generic drm_kms_helper ata_piix ehci_hcd
>  uhci_hcd drm i2c_algo_bit i2c_core libata atl1 thermal thermal_sys
> firewire_ohci usbcore mii firewire_core crc_itu_t scsi_mod 
> nls_base [last unloaded: scsi_wait_scan]
>  [   14.332101] Pid: 0, comm: swapper Not tainted 2.6.36-rc4-r600fence-amd64
>  #1
>  [   14.332104] Call Trace:
>  [   14.332106]  <IRQ>  [<ffffffff810442f8>] ? warn_slowpath_common+0x78/0x8c
>  [   14.332116]  [<ffffffff8104985b>] ? _local_bh_enable_ip+0x41/0x8b
>  [   14.332122]  [<ffffffff812515e5>] ? netpoll_rx+0x6d/0x79
>  [   14.332126]  [<ffffffff81253721>] ? netif_rx+0xd/0x7b
>  [   14.332130]  [<ffffffff810403a6>] ? rebalance_domains+0xd1/0x13b
>  [   14.332135]  [<ffffffff810aaf19>] ? perf_event_task_tick+0x6a/0x186
>  [   14.332146]  [<ffffffffa00c2994>] ? atl1_intr+0x8d9/0x9a5 [atl1]
>  [   14.332151]  [<ffffffff8108d68b>] ? handle_IRQ_event+0x4c/0x104
>  [   14.332156]  [<ffffffff8100e6a6>] ? read_tsc+0x5/0x16
>  [   14.332160]  [<ffffffff8108f2fb>] ? handle_edge_irq+0xd9/0x120
>  [   14.332164]  [<ffffffff8100af68>] ? handle_irq+0x17/0x1f
>  [   14.332168]  [<ffffffff8100a62a>] ? do_IRQ+0x54/0xb9
>  [   14.332173]  [<ffffffff81305093>] ? ret_from_intr+0x0/0x11
>  [   14.332175]  <EOI>  [<ffffffff8100fb50>] ? mwait_idle+0x75/0x80
>  [   14.332182]  [<ffffffff8100fb02>] ? mwait_idle+0x27/0x80
>  [   14.332187]  [<ffffffff81007b64>] ? cpu_idle+0xaa/0x120
>  [   14.332192]  [<ffffffff816bad4f>] ? start_kernel+0x3ce/0x3d9
>  [   14.332197]  [<ffffffff816ba3ba>] ? x86_64_start_kernel+0xf9/0x106
>  [   14.332200] ---[ end trace 76feca083f1d81b1 ]---
>
Comment 2 Michal Suchanek 2010-09-14 20:46:10 UTC
On 14 September 2010 21:07, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Tue, 14 Sep 2010 15:19:30 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=18492
>>
>>            Summary: kernel softirq warning on boot
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 2.6.36-rc4
>
> Looks like a post-2.6.35 regression.  I'd be looking at the

Indeed, I found a log from booting 2.6.35 kernel while trying to work
around anouther issue and the warning did not show in it.

> rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU
> usage".

Do you mean de85d99eb7b595f6751550184b94c1e2f74a828b?

I am not sure reverting just that would work, nor that the kernel
around that point is in shape for trying on a system I would want to
still use. It's way back in history for a kernel which is just to be
released.

Thanks

Michal
Comment 3 Andrew Morton 2010-09-14 20:58:30 UTC
On Tue, 14 Sep 2010 22:45:19 +0200
Michal Suchanek <hramrach@centrum.cz> wrote:

> > rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU
> > usage".
> 
> Do you mean de85d99eb7b595f6751550184b94c1e2f74a828b?

yup.

> I am not sure reverting just that would work, nor that the kernel
> around that point is in shape for trying on a system I would want to
> still use. It's way back in history for a kernel which is just to be
> released.

No, we wouldn't want to revert it - it fixes stuff.

It could be that we just don't need the _bh locking, if local irqs are
reliably disabled.  But I didn't really look.
Comment 4 David S. Miller 2010-09-14 21:11:13 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Tue, 14 Sep 2010 13:57:30 -0700

> On Tue, 14 Sep 2010 22:45:19 +0200
> Michal Suchanek <hramrach@centrum.cz> wrote:
> 
>> > rcu_read_unlock_bh() in netpoll_rx(), added by "netpoll: Fix RCU
>> > usage".
>> 
>> Do you mean de85d99eb7b595f6751550184b94c1e2f74a828b?
> 
> yup.
> 
>> I am not sure reverting just that would work, nor that the kernel
>> around that point is in shape for trying on a system I would want to
>> still use. It's way back in history for a kernel which is just to be
>> released.
> 
> No, we wouldn't want to revert it - it fixes stuff.
> 
> It could be that we just don't need the _bh locking, if local irqs are
> reliably disabled.  But I didn't really look.

I'll take a closer look at this if Herbert doesn't beat me to it.
Comment 5 Anonymous Emailer 2010-09-15 07:26:13 UTC
Reply-To: herbert@gondor.hengli.com.au

On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote:
>
> I'll take a closer look at this if Herbert doesn't beat me to it.

I thought we were going to use this patch until Paul's BH warning
removal made it into mainline?

netpoll: Disable IRQ around RCU dereference in netpoll_rx

We cannot use rcu_dereference_bh safely in netpoll_rx as we may
be called with IRQs disabled.  We could however simply disable
IRQs as that too causes BH to be disabled and is safe in either
case.

Thanks to John Linville for discovering this bug and providing
a patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 413742c..54f286b 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -63,20 +63,20 @@ static inline bool netpoll_rx(struct sk_buff *skb)
 	unsigned long flags;
 	bool ret = false;
 
-	rcu_read_lock_bh();
+	local_irq_save(flags);
 	npinfo = rcu_dereference_bh(skb->dev->npinfo);
 
 	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
 		goto out;
 
-	spin_lock_irqsave(&npinfo->rx_lock, flags);
+	spin_lock(&npinfo->rx_lock);
 	/* check rx_flags again with the lock held */
 	if (npinfo->rx_flags && __netpoll_rx(skb))
 		ret = true;
-	spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+	spin_unlock(&npinfo->rx_lock);
 
 out:
-	rcu_read_unlock_bh();
+	local_irq_restore(flags);
 	return ret;
 }
 
Thanks,
Comment 6 David S. Miller 2010-09-16 04:41:34 UTC
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 15 Sep 2010 14:46:55 +0800

> On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote:
>>
>> I'll take a closer look at this if Herbert doesn't beat me to it.
> 
> I thought we were going to use this patch until Paul's BH warning
> removal made it into mainline?
> 
> netpoll: Disable IRQ around RCU dereference in netpoll_rx

I hadn't anticipated it taking this long for Paul's patch to
get upstream.

So I only queued this up for -stable, not Linus's tree.

Paul has started the work of moving his patch upstream, so I suspect
that if I queued this patch of our's up to my tree now it would land
in Linus's tree about the same time :-)

If something non-trivially stalls Paul's submission work, we can
queue this up.
Comment 7 Florian Mickler 2010-09-16 08:24:52 UTC
Patch: "netpoll: Disable IRQ around RCU dereference in netpoll_rx"
Comment 8 Michal Suchanek 2010-09-17 18:23:38 UTC
On 15 September 2010 08:46, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote:
>>
>> I'll take a closer look at this if Herbert doesn't beat me to it.
>
> I thought we were going to use this patch until Paul's BH warning
> removal made it into mainline?
>
> netpoll: Disable IRQ around RCU dereference in netpoll_rx
>
> We cannot use rcu_dereference_bh safely in netpoll_rx as we may
> be called with IRQs disabled.  We could however simply disable
> IRQs as that too causes BH to be disabled and is safe in either
> case.
>
> Thanks to John Linville for discovering this bug and providing
> a patch.
>
With the patch applied I no longet get the warning.

Thanks

Michal
Comment 9 David S. Miller 2010-09-17 23:55:54 UTC
From: Michal Suchanek <hramrach@centrum.cz>
Date: Fri, 17 Sep 2010 20:00:42 +0200

> On 15 September 2010 08:46, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> On Tue, Sep 14, 2010 at 02:11:03PM -0700, David Miller wrote:
>>>
>>> I'll take a closer look at this if Herbert doesn't beat me to it.
>>
>> I thought we were going to use this patch until Paul's BH warning
>> removal made it into mainline?
>>
>> netpoll: Disable IRQ around RCU dereference in netpoll_rx
>>
>> We cannot use rcu_dereference_bh safely in netpoll_rx as we may
>> be called with IRQs disabled.  We could however simply disable
>> IRQs as that too causes BH to be disabled and is safe in either
>> case.
>>
>> Thanks to John Linville for discovering this bug and providing
>> a patch.
>>
> With the patch applied I no longet get the warning.

Ok I'll toss this upstream since I have no indication that Paul's
RCU change will hit Linus's tree any time soon.

Thanks.
Comment 10 Rafael J. Wysocki 2010-09-19 20:17:42 UTC
Patch : https://bugzilla.kernel.org/show_bug.cgi?id=18492#c5
Handled-By : Herbert Xu <herbert@gondor.apana.org.au>
Comment 11 Florian Mickler 2010-09-20 11:24:46 UTC
*** Bug 17321 has been marked as a duplicate of this bug. ***
Comment 12 Rafael J. Wysocki 2010-09-20 16:17:58 UTC
Fixed by commit f0f9deae9e7c421fa0c1c627beb8e174325e1ba7 .
Comment 13 Rafael J. Wysocki 2010-09-20 18:55:57 UTC
*** Bug 17131 has been marked as a duplicate of this bug. ***