Latest working kernel version: 2.6.24.2 Earliest failing kernel version: 2.6.24-rc3 (not checked before) Distribution: Bluewhite64 Hardware Environment: Athlon X2 4200 Software Environment: samba 3.0, 2.6.25-rc3 kernel + HR + tickless + kernel SMP debugging Problem Description: The Samba smbd daemon triggers regularly the following BUG with 2.6.25-rc3: BUG: using smp_processor_id() in preemptible [00000000] code: nmbd/3167 caller is __nf_conntrack_find+0x119/0x150 Pid: 3167, comm: nmbd Not tainted 2.6.25-rc3 #1 Call Trace: [<ffffffff8038f3f4>] debug_smp_processor_id+0xc4/0xd0 [<ffffffff80555d79>] __nf_conntrack_find+0x119/0x150 [<ffffffff80555dc9>] nf_conntrack_find_get+0x19/0x80 [<ffffffff80556914>] nf_conntrack_in+0x1a4/0x5a0 [<ffffffff8020bd33>] ? restore_args+0x0/0x30 [<ffffffff8059d596>] ipv4_conntrack_local+0x66/0x70 [<ffffffff80554362>] nf_iterate+0x62/0xa0 [<ffffffff80567050>] ? dst_output+0x0/0x10 [<ffffffff80554406>] nf_hook_slow+0x66/0xe0 [<ffffffff80567050>] ? dst_output+0x0/0x10 [<ffffffff80568825>] __ip_local_out+0xa5/0xb0 [<ffffffff80568841>] ip_local_out+0x11/0x30 [<ffffffff80568ac1>] ip_push_pending_frames+0x261/0x3e0 [<ffffffff80587153>] udp_push_pending_frames+0x233/0x3d0 [<ffffffff8058860f>] udp_sendmsg+0x30f/0x710 [<ffffffff802328b0>] ? default_wake_function+0x0/0x10 [<ffffffff8058f895>] inet_sendmsg+0x45/0x80 [<ffffffff80531fcf>] sock_sendmsg+0xdf/0x110 [<ffffffff80251270>] ? autoremove_wake_function+0x0/0x40 [<ffffffff802374c7>] ? hrtick_resched+0x77/0x90 [<ffffffff8025e2b5>] ? trace_hardirqs_on+0xd5/0x160 [<ffffffff80531735>] ? sockfd_lookup_light+0x45/0x80 [<ffffffff805323da>] sys_sendto+0xea/0x120 [<ffffffff80626bcb>] ? _spin_unlock_irq+0x2b/0x60 [<ffffffff8025e2b5>] ? trace_hardirqs_on+0xd5/0x160 [<ffffffff80626bd6>] ? _spin_unlock_irq+0x36/0x60 [<ffffffff8020b6db>] system_call_after_swapgs+0x7b/0x80 Steps to reproduce: Start smbd with the forementionned kernel instrumented for SMP and kernel debugging and hr + tickless enabled.
Reply-To: akpm@linux-foundation.org On Mon, 25 Feb 2008 10:44:08 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10097 > > Summary: SMP BUG in __nf_conntrack_find > Product: Networking > Version: 2.5 > KernelVersion: 2.6.25-rc3 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Netfilter/Iptables > AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org > ReportedBy: casteyde.christian@free.fr > > > Latest working kernel version: 2.6.24.2 > Earliest failing kernel version: 2.6.24-rc3 (not checked before) > Distribution: Bluewhite64 > Hardware Environment: Athlon X2 4200 > > Software Environment: > samba 3.0, 2.6.25-rc3 kernel + HR + tickless + kernel SMP debugging > > Problem Description: > The Samba smbd daemon triggers regularly the following BUG with 2.6.25-rc3: > > BUG: using smp_processor_id() in preemptible [00000000] code: nmbd/3167 > caller is __nf_conntrack_find+0x119/0x150 > Pid: 3167, comm: nmbd Not tainted 2.6.25-rc3 #1 > > Call Trace: > [<ffffffff8038f3f4>] debug_smp_processor_id+0xc4/0xd0 > [<ffffffff80555d79>] __nf_conntrack_find+0x119/0x150 > [<ffffffff80555dc9>] nf_conntrack_find_get+0x19/0x80 > [<ffffffff80556914>] nf_conntrack_in+0x1a4/0x5a0 > [<ffffffff8020bd33>] ? restore_args+0x0/0x30 > [<ffffffff8059d596>] ipv4_conntrack_local+0x66/0x70 > [<ffffffff80554362>] nf_iterate+0x62/0xa0 > [<ffffffff80567050>] ? dst_output+0x0/0x10 > [<ffffffff80554406>] nf_hook_slow+0x66/0xe0 > [<ffffffff80567050>] ? dst_output+0x0/0x10 > [<ffffffff80568825>] __ip_local_out+0xa5/0xb0 > [<ffffffff80568841>] ip_local_out+0x11/0x30 > [<ffffffff80568ac1>] ip_push_pending_frames+0x261/0x3e0 > [<ffffffff80587153>] udp_push_pending_frames+0x233/0x3d0 > [<ffffffff8058860f>] udp_sendmsg+0x30f/0x710 > [<ffffffff802328b0>] ? default_wake_function+0x0/0x10 > [<ffffffff8058f895>] inet_sendmsg+0x45/0x80 > [<ffffffff80531fcf>] sock_sendmsg+0xdf/0x110 > [<ffffffff80251270>] ? autoremove_wake_function+0x0/0x40 > [<ffffffff802374c7>] ? hrtick_resched+0x77/0x90 > [<ffffffff8025e2b5>] ? trace_hardirqs_on+0xd5/0x160 > [<ffffffff80531735>] ? sockfd_lookup_light+0x45/0x80 > [<ffffffff805323da>] sys_sendto+0xea/0x120 > [<ffffffff80626bcb>] ? _spin_unlock_irq+0x2b/0x60 > [<ffffffff8025e2b5>] ? trace_hardirqs_on+0xd5/0x160 > [<ffffffff80626bd6>] ? _spin_unlock_irq+0x36/0x60 > [<ffffffff8020b6db>] system_call_after_swapgs+0x7b/0x80 > > Steps to reproduce: > Start smbd with the forementionned kernel instrumented for SMP and kernel > debugging and hr + tickless enabled. > Presumably this is in NF_CT_STAT_INC(). I wonder what caused it to start happening. Guys, this probably means that the developers who tested this change aren't enabling the debug options which all kernel developers _should_ be enabling when "testing" their code! Documentation/SubmitChecklist has a handy list. Should NF_CT_STAT_INC() be using local_inc()?
bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10097 > > > > > > ------- Comment #1 from anonymous@kernel-bugs.osdl.org 2008-02-25 11:12 > ------- > Reply-To: akpm@linux-foundation.org > > On Mon, 25 Feb 2008 10:44:08 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=10097 >> >> Summary: SMP BUG in __nf_conntrack_find >> Product: Networking >> Version: 2.5 >> KernelVersion: 2.6.25-rc3 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Netfilter/Iptables >> AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org >> ReportedBy: casteyde.christian@free.fr >> >> >> Latest working kernel version: 2.6.24.2 >> Earliest failing kernel version: 2.6.24-rc3 (not checked before) >> Distribution: Bluewhite64 >> Hardware Environment: Athlon X2 4200 >> >> Software Environment: >> samba 3.0, 2.6.25-rc3 kernel + HR + tickless + kernel SMP debugging >> >> Problem Description: >> The Samba smbd daemon triggers regularly the following BUG with 2.6.25-rc3: >> >> BUG: using smp_processor_id() in preemptible [00000000] code: nmbd/3167 >> caller is __nf_conntrack_find+0x119/0x150 >> Pid: 3167, comm: nmbd Not tainted 2.6.25-rc3 #1 >> >> Call Trace: >> [<ffffffff8038f3f4>] debug_smp_processor_id+0xc4/0xd0 >> [<ffffffff80555d79>] __nf_conntrack_find+0x119/0x150 >> [<ffffffff80555dc9>] nf_conntrack_find_get+0x19/0x80 >> [<ffffffff80556914>] nf_conntrack_in+0x1a4/0x5a0 >> [<ffffffff8020bd33>] ? restore_args+0x0/0x30 >> [<ffffffff8059d596>] ipv4_conntrack_local+0x66/0x70 >> [<ffffffff80554362>] nf_iterate+0x62/0xa0 >> [<ffffffff80567050>] ? dst_output+0x0/0x10 >> [<ffffffff80554406>] nf_hook_slow+0x66/0xe0 >> [<ffffffff80567050>] ? dst_output+0x0/0x10 >> [<ffffffff80568825>] __ip_local_out+0xa5/0xb0 >> [<ffffffff80568841>] ip_local_out+0x11/0x30 >> [<ffffffff80568ac1>] ip_push_pending_frames+0x261/0x3e0 >> [<ffffffff80587153>] udp_push_pending_frames+0x233/0x3d0 >> [<ffffffff8058860f>] udp_sendmsg+0x30f/0x710 >> [<ffffffff802328b0>] ? default_wake_function+0x0/0x10 >> [<ffffffff8058f895>] inet_sendmsg+0x45/0x80 >> [<ffffffff80531fcf>] sock_sendmsg+0xdf/0x110 >> [<ffffffff80251270>] ? autoremove_wake_function+0x0/0x40 >> [<ffffffff802374c7>] ? hrtick_resched+0x77/0x90 >> [<ffffffff8025e2b5>] ? trace_hardirqs_on+0xd5/0x160 >> [<ffffffff80531735>] ? sockfd_lookup_light+0x45/0x80 >> [<ffffffff805323da>] sys_sendto+0xea/0x120 >> [<ffffffff80626bcb>] ? _spin_unlock_irq+0x2b/0x60 >> [<ffffffff8025e2b5>] ? trace_hardirqs_on+0xd5/0x160 >> [<ffffffff80626bd6>] ? _spin_unlock_irq+0x36/0x60 >> [<ffffffff8020b6db>] system_call_after_swapgs+0x7b/0x80 >> >> Steps to reproduce: >> Start smbd with the forementionned kernel instrumented for SMP and kernel >> debugging and hr + tickless enabled. >> > > Presumably this is in NF_CT_STAT_INC(). I wonder what caused it to start > happening. > > Guys, this probably means that the developers who tested this change aren't > enabling the debug options which all kernel developers _should_ be enabling > when "testing" their code! Documentation/SubmitChecklist has a handy list. I guess the reporter is using preemptible RCU, which wasn't upstream when I tested this. > Should NF_CT_STAT_INC() be using local_inc()? No, it needs to disable BHs to avoid corrupting the counters. I have the fix queued for this, will send upstream soon.
Yes, I've also activated Premptible RCU (I was willing to test since I'm quite lucky finding problems). I usually run -rc3 and following kernels with all debug options activated (except kobject because it floods my logs) to test them, and I indeed did it for 2.6.24 and hadn't seen this problem. So it the NF code has not changed, premptible RCU is a good guess. I can check without it if you want - just tell me. Btw, the problem occurs also with other processes (khttp, dnsmasq, etc), not just Samba (quite logical).
Created attachment 14987 [details] fix smp_processor_id warning This is the patch I have queued, please try whether it fixes the problem for you.
The patch seems to be OK. I didn't manage to reproduce with it. For the record, I also tried without the patch and with non premptible RCU, the problem doesn't appear either. Thanks
Thanks for testing this (and for updating the bugreport :))
fixed by commit 4e29e9ec7e0707d3925f5dcc29af0d3f04e49833