Bug 17321

Summary: i386 WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x2f/0x7e
Product: Platform Specific/Hardware Reporter: Maciej Rutecki (maciej.rutecki)
Component: i386Assignee: platform_i386
Severity: normal CC: aelschuring, florian, maciej.rutecki, paulmck, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36-rc2 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 16444    
Attachments: config
full dmesg
lspci -vv output
first bisection attempt

Description Maciej Rutecki 2010-08-29 13:57:35 UTC
Subject    : i386 WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x2f/0x7e
Submitter  : Arno Schuring <aelschuring@hotmail.com>
Date       : 2010-08-27 20:04
Message-ID : 4C781A3A.4010707@hotmail.com
References : http://marc.info/?l=linux-kernel&m=128294076822387&w=2

This entry is being used for tracking a regression from 2.6.35. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Arno Schuring 2010-08-29 16:12:10 UTC
Created attachment 28321 [details]
Comment 2 Arno Schuring 2010-08-29 16:13:53 UTC
Created attachment 28331 [details]
full dmesg
Comment 3 Arno Schuring 2010-08-29 16:15:01 UTC
Created attachment 28341 [details]
lspci -vv output
Comment 4 Arno Schuring 2010-08-30 14:14:14 UTC
still present in -rc3. I notice that the full backtrace is not listed yet (it is in the attached dmesg though), so here it is in full:

[   12.675795] ------------[ cut here ]------------
[   12.675814] WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x2f/0x7e()
[   12.675818] Hardware name:       
[   12.675821] Modules linked in: xfs exportfs w83627hf_wdt w83627hf hwmon_vid d
m_crypt dm_mod rtc_cmos 8250_pnp processor i2c_sis96x 8250 rtc_core evdev pcspkr
 button usbhid usb_storage rtc_lib serial_core hid ext4 jbd2 crc16 ohci_hcd ehci
_hcd usbcore floppy thermal sr_mod thermal_sys nls_base cdrom [last unloaded: sc
[   12.675872] Pid: 0, comm: swapper Not tainted 2.6.36-rc3+ #20
[   12.675875] Call Trace:
[   12.675891]  [<c1024fc1>] ? warn_slowpath_common+0x6a/0x7b
[   12.675896]  [<c102975a>] ? _local_bh_enable_ip+0x2f/0x7e
[   12.675902]  [<c1024fdf>] ? warn_slowpath_null+0xd/0x10
[   12.675907]  [<c102975a>] ? _local_bh_enable_ip+0x2f/0x7e
[   12.675914]  [<c125c832>] ? netpoll_rx+0x71/0x77
[   12.675919]  [<c125c846>] ? netif_rx+0xe/0xf7
[   12.675928]  [<c122085b>] ? sis900_interrupt+0x1c7/0x4ab
[   12.675936]  [<c10580e3>] ? handle_IRQ_event+0x54/0x102
[   12.675943]  [<c1059981>] ? handle_fasteoi_irq+0x8c/0xdf
[   12.675948]  [<c10598f5>] ? handle_fasteoi_irq+0x0/0xdf
[   12.675951]  <IRQ>  [<c1003a3c>] ? do_IRQ+0x2b/0x7f
[   12.675961]  [<c1002ca9>] ? common_interrupt+0x29/0x30
[   12.675971]  [<c1007eba>] ? default_idle+0x39/0x4c
[   12.675975]  [<c1001b00>] ? cpu_idle+0x35/0xad
[   12.675983]  [<c14346c0>] ? start_kernel+0x274/0x277
[   12.675987] ---[ end trace c042fd3b7ebacdb1 ]---
Comment 5 Florian Mickler 2010-09-06 06:24:49 UTC
Added Paul on the cc list.
Comment 6 Paul E. McKenney 2010-09-06 15:26:36 UTC
The patch at http://lkml.org/lkml/2010/9/2/292 should address this.
Comment 7 Arno Schuring 2010-09-08 07:01:13 UTC
>Comment #6 From  Paul E. McKenney   2010-09-06 15:26:36   -------
>The patch at http://lkml.org/lkml/2010/9/2/292 should address this.
Sadly, it doesn't:

aschuring@neminis:~/work/linux-GIT$ grep -A5 _bh_enable_ip kernel/softirq.c
static inline void _local_bh_enable_ip(unsigned long ip)
aschuring@neminis:~/work/linux-GIT$ ls -l kernel/softirq.c
-rw-r--r-- 1 aschuring aschuring 21733 Sep  7 11:23 kernel/softirq.c
[    0.000000] Linux version 2.6.36-rc3+ (aschuring@neminis) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #21 PREEMPT Tue Sep 7 11:29:32 CEST 2010
[   12.880400] WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x28/0x77()

The backtrace is identical, except for the addresses in _local_bh_enable_ip. Will do a make clean and retest.
Comment 8 Arno Schuring 2010-09-08 10:22:43 UTC
No change after a clean build:
[   11.811728] WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x28/0x77()

Closer inspection of the backtrace shows we're in a code path from handle_IRQ_event, so it shouldn't be too surprising that the patch has no effect?
Comment 9 Arno Schuring 2010-09-17 22:37:54 UTC
Created attachment 30382 [details]
first bisection attempt

de85d99eb7b595f6751550184b94c1e2f74a828b is the first bad commit
commit de85d99eb7b595f6751550184b94c1e2f74a828b
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Jun 10 16:12:44 2010 +0000

    netpoll: Fix RCU usage
    The use of RCU in netpoll is incorrect in a number of places:
    1) The initial setting is lacking a write barrier.
    2) The synchronize_rcu is in the wrong place.
    3) Read barriers are missing.
    4) Some places are even missing rcu_read_lock.
    5) npinfo is zeroed after freeing.
    This patch fixes those issues.  As most users are in BH context,
    this also converts the RCU usage to the BH variant.
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
:040000 040000 27b02b9815fedf05ce1dc42cf935b984e97abba9 0f46cb6360a8a082f5695d56c48fdf40078cd6b3 Minclude                         
:040000 040000 fdf4c432f33c67668c9740dde0b8c89f2806a43b 39c5f6fb0abea040a520ff2e08e26aac4689c49c Mnet                             

I'm not entirely convinced that this bisection result is correct, although it seems plausible. My problem with the commit is this:

$ git describe de85d99eb7b595f6751550184b94c1e2f74a828b

Doesn't this mean that I should have seen this WARNING before? I have been running 35-rc4 and 35.1 before and never saw this trace. Moreover, the bisection took me as far back as 2.6.34, even though I started out with marking 2.6.35 as good. I'm attaching the full bisect log for closer inspection, I may have confused git by forcing it to re-test v2.6.35 (5th step).
Comment 10 Florian Mickler 2010-09-18 08:06:24 UTC

$ git tag --contains de85d99eb7b595f6751550184b94c1e2f74a828b

$ git describe just says on what it is _based_ upon. But it doesn't say when it was integrated into mainline. 

Please check if the patch from: 


fixes the issue for you.
Comment 11 Arno Schuring 2010-09-19 15:03:50 UTC
Thanks for the git info :)

The patch does indeed solve the problem.
Comment 12 Florian Mickler 2010-09-20 11:24:46 UTC
Ok, I'm closing this as a duplicate of that bug then.

Thanks for the testing and bisection.

*** This bug has been marked as a duplicate of bug 18492 ***