Bug 17321 - i386 WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x2f/0x7e
Summary: i386 WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x2f/0x7e
Status: CLOSED DUPLICATE of bug 18492
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_i386
URL:
Keywords:
Depends on:
Blocks: 16444
  Show dependency tree
 
Reported: 2010-08-29 13:57 UTC by Maciej Rutecki
Modified: 2010-09-20 16:18 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.36-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
config (62.52 KB, application/octet-stream)
2010-08-29 16:12 UTC, Arno Schuring
Details
full dmesg (34.25 KB, application/octet-stream)
2010-08-29 16:13 UTC, Arno Schuring
Details
lspci -vv output (7.74 KB, text/plain)
2010-08-29 16:15 UTC, Arno Schuring
Details
first bisection attempt (2.79 KB, text/plain)
2010-09-17 22:37 UTC, Arno Schuring
Details

Description Maciej Rutecki 2010-08-29 13:57:35 UTC
Subject    : i386 WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x2f/0x7e
Submitter  : Arno Schuring <aelschuring@hotmail.com>
Date       : 2010-08-27 20:04
Message-ID : 4C781A3A.4010707@hotmail.com
References : http://marc.info/?l=linux-kernel&m=128294076822387&w=2

This entry is being used for tracking a regression from 2.6.35. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Arno Schuring 2010-08-29 16:12:10 UTC
Created attachment 28321 [details]
config
Comment 2 Arno Schuring 2010-08-29 16:13:53 UTC
Created attachment 28331 [details]
full dmesg
Comment 3 Arno Schuring 2010-08-29 16:15:01 UTC
Created attachment 28341 [details]
lspci -vv output
Comment 4 Arno Schuring 2010-08-30 14:14:14 UTC
still present in -rc3. I notice that the full backtrace is not listed yet (it is in the attached dmesg though), so here it is in full:

[   12.675795] ------------[ cut here ]------------
[   12.675814] WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x2f/0x7e()
[   12.675818] Hardware name:       
[   12.675821] Modules linked in: xfs exportfs w83627hf_wdt w83627hf hwmon_vid d
m_crypt dm_mod rtc_cmos 8250_pnp processor i2c_sis96x 8250 rtc_core evdev pcspkr
 button usbhid usb_storage rtc_lib serial_core hid ext4 jbd2 crc16 ohci_hcd ehci
_hcd usbcore floppy thermal sr_mod thermal_sys nls_base cdrom [last unloaded: sc
si_wait_scan]
[   12.675872] Pid: 0, comm: swapper Not tainted 2.6.36-rc3+ #20
[   12.675875] Call Trace:
[   12.675891]  [<c1024fc1>] ? warn_slowpath_common+0x6a/0x7b
[   12.675896]  [<c102975a>] ? _local_bh_enable_ip+0x2f/0x7e
[   12.675902]  [<c1024fdf>] ? warn_slowpath_null+0xd/0x10
[   12.675907]  [<c102975a>] ? _local_bh_enable_ip+0x2f/0x7e
[   12.675914]  [<c125c832>] ? netpoll_rx+0x71/0x77
[   12.675919]  [<c125c846>] ? netif_rx+0xe/0xf7
[   12.675928]  [<c122085b>] ? sis900_interrupt+0x1c7/0x4ab
[   12.675936]  [<c10580e3>] ? handle_IRQ_event+0x54/0x102
[   12.675943]  [<c1059981>] ? handle_fasteoi_irq+0x8c/0xdf
[   12.675948]  [<c10598f5>] ? handle_fasteoi_irq+0x0/0xdf
[   12.675951]  <IRQ>  [<c1003a3c>] ? do_IRQ+0x2b/0x7f
[   12.675961]  [<c1002ca9>] ? common_interrupt+0x29/0x30
[   12.675971]  [<c1007eba>] ? default_idle+0x39/0x4c
[   12.675975]  [<c1001b00>] ? cpu_idle+0x35/0xad
[   12.675983]  [<c14346c0>] ? start_kernel+0x274/0x277
[   12.675987] ---[ end trace c042fd3b7ebacdb1 ]---
Comment 5 Florian Mickler 2010-09-06 06:24:49 UTC
Added Paul on the cc list.
Comment 6 Paul E. McKenney 2010-09-06 15:26:36 UTC
The patch at http://lkml.org/lkml/2010/9/2/292 should address this.
Comment 7 Arno Schuring 2010-09-08 07:01:13 UTC
>Comment #6 From  Paul E. McKenney   2010-09-06 15:26:36   -------
>The patch at http://lkml.org/lkml/2010/9/2/292 should address this.
Sadly, it doesn't:

aschuring@neminis:~/work/linux-GIT$ grep -A5 _bh_enable_ip kernel/softirq.c
static inline void _local_bh_enable_ip(unsigned long ip)
{
	WARN_ON_ONCE(in_irq());
#ifdef CONFIG_TRACE_IRQFLAGS
	local_irq_disable();
#endif
--
[...]
aschuring@neminis:~/work/linux-GIT$ ls -l kernel/softirq.c
-rw-r--r-- 1 aschuring aschuring 21733 Sep  7 11:23 kernel/softirq.c
[    0.000000] Linux version 2.6.36-rc3+ (aschuring@neminis) (gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) ) #21 PREEMPT Tue Sep 7 11:29:32 CEST 2010
[...]
[   12.880400] WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x28/0x77()

The backtrace is identical, except for the addresses in _local_bh_enable_ip. Will do a make clean and retest.
Comment 8 Arno Schuring 2010-09-08 10:22:43 UTC
No change after a clean build:
[   11.811728] WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x28/0x77()

Closer inspection of the backtrace shows we're in a code path from handle_IRQ_event, so it shouldn't be too surprising that the patch has no effect?
Comment 9 Arno Schuring 2010-09-17 22:37:54 UTC
Created attachment 30382 [details]
first bisection attempt

de85d99eb7b595f6751550184b94c1e2f74a828b is the first bad commit
commit de85d99eb7b595f6751550184b94c1e2f74a828b
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Jun 10 16:12:44 2010 +0000

    netpoll: Fix RCU usage
    
    The use of RCU in netpoll is incorrect in a number of places:
    
    1) The initial setting is lacking a write barrier.
    2) The synchronize_rcu is in the wrong place.
    3) Read barriers are missing.
    4) Some places are even missing rcu_read_lock.
    5) npinfo is zeroed after freeing.
    
    This patch fixes those issues.  As most users are in BH context,
    this also converts the RCU usage to the BH variant.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
                                
:040000 040000 27b02b9815fedf05ce1dc42cf935b984e97abba9 0f46cb6360a8a082f5695d56c48fdf40078cd6b3 Minclude                         
:040000 040000 fdf4c432f33c67668c9740dde0b8c89f2806a43b 39c5f6fb0abea040a520ff2e08e26aac4689c49c Mnet                             


I'm not entirely convinced that this bisection result is correct, although it seems plausible. My problem with the commit is this:

$ git describe de85d99eb7b595f6751550184b94c1e2f74a828b
v2.6.35-rc1-454-gde85d99        

Doesn't this mean that I should have seen this WARNING before? I have been running 35-rc4 and 35.1 before and never saw this trace. Moreover, the bisection took me as far back as 2.6.34, even though I started out with marking 2.6.35 as good. I'm attaching the full bisect log for closer inspection, I may have confused git by forcing it to re-test v2.6.35 (5th step).
Comment 10 Florian Mickler 2010-09-18 08:06:24 UTC
try 

$ git tag --contains de85d99eb7b595f6751550184b94c1e2f74a828b

$ git describe just says on what it is _based_ upon. But it doesn't say when it was integrated into mainline. 


Please check if the patch from: 

https://bugzilla.kernel.org/show_bug.cgi?id=18492

fixes the issue for you.
Comment 11 Arno Schuring 2010-09-19 15:03:50 UTC
Thanks for the git info :)

The patch does indeed solve the problem.
Comment 12 Florian Mickler 2010-09-20 11:24:46 UTC
Ok, I'm closing this as a duplicate of that bug then.

Thanks for the testing and bisection.

*** This bug has been marked as a duplicate of bug 18492 ***

Note You need to log in before you can comment on or make changes to this bug.