Bug 10875
Summary: | Oops in nf_nat_setup_info | ||
---|---|---|---|
Product: | Networking | Reporter: | Krzysztof Oledzki (ole) |
Component: | Netfilter/Iptables | Assignee: | networking_netfilter-iptables (networking_netfilter-iptables) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | bunk, ole |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24.7, 2.6.26-rc5 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9853 | ||
Attachments: |
The oops
Additional Oops nf_nat_core.o from 2.6.26-rc5, compiled with gcc-4.2.3 |
Description
Krzysztof Oledzki
2008-06-06 10:45:25 UTC
Created attachment 16414 [details]
The oops
Created attachment 16415 [details]
Additional Oops
Reply-To: ole@ans.pl Hello, Today I have been plaing with the conntrackd utility and noticed it is very easy to trigger a kernel oops just by: conntrackd -d conntrackd -n conntrackd -c conntrackd -c The oops is here: http://bugzilla.kernel.org/attachment.cgi?id=16414 I was trying to fix it with: --- nf_nat_core.c 2008-06-06 19:55:25.000000000 +0200 +++ nf_nat_core.c 2008-05-07 01:22:34.000000000 +0200 @@ -153,7 +153,7 @@ read_lock_bh(&nf_nat_lock); hlist_for_each_entry(nat, n, &bysource[h], bysource) { ct = nat->ct; - if (ct && same_src(ct, tuple)) { + if (same_src(ct, tuple)) { /* Copy source part from reply tuple. */ nf_ct_invert_tuplepr(result, &ct->tuplehash[IP_CT_DIR_REPLY].tuple); However and I'm not able to find how nat->ct may become NULL in here and unfortutunatelly this patch does not help too much as with above fix I get a different Oops: http://bugzilla.kernel.org/attachment.cgi?id=16415 (gdb) l *nf_nat_setup_info+0x223 0x783e30de is in nf_nat_setup_info (net/ipv4/netfilter/nf_nat_core.c:154). 149 struct nf_conn_nat *nat; 150 struct nf_conn *ct; 151 struct hlist_node *n; 152 153 read_lock_bh(&nf_nat_lock); 154 hlist_for_each_entry(nat, n, &bysource[h], bysource) { <- here 155 ct = nat->ct; 156 if (ct && same_src(ct, tuple)) { 157 /* Copy source part from reply tuple. */ 158 nf_ct_invert_tuplepr(result, All accesses to bysource seem to be protected by the lock_bh so I have no concept where to dig next. :( Any idea? Best regards, Krzysztof Ol Reply-To: ole@ans.pl On Fri, 6 Jun 2008, Krzysztof Oledzki wrote: > Hello, > > Today I have been plaing with the conntrackd utility and noticed it is very > easy to trigger a kernel oops just by: > > conntrackd -d > conntrackd -n > conntrackd -c > conntrackd -c OK, quite often is is enough to run only the first "conntrackd -c" to crash the kernel. Best regards, Krzysztof Ol Krzysztof Oledzki wrote: > > > On Fri, 6 Jun 2008, Krzysztof Oledzki wrote: > >> Hello, >> >> Today I have been plaing with the conntrackd utility and noticed it >> is very easy to trigger a kernel oops just by: >> >> conntrackd -d >> conntrackd -n >> conntrackd -c >> conntrackd -c > > OK, quite often is is enough to run only the first "conntrackd -c" to > crash the kernel. Is that with or without your accounting patch? Patrick McHardy wrote: > Krzysztof Oledzki wrote: >> >> >> On Fri, 6 Jun 2008, Krzysztof Oledzki wrote: >> >>> Hello, >>> >>> Today I have been plaing with the conntrackd utility and noticed it >>> is very easy to trigger a kernel oops just by: >>> >>> conntrackd -d >>> conntrackd -n >>> conntrackd -c >>> conntrackd -c >> >> OK, quite often is is enough to run only the first "conntrackd -c" to >> crash the kernel. > > Is that with or without your accounting patch? In case its not, does that kernel include commit 86577c661? Reply-To: ole@ans.pl On Sat, 7 Jun 2008, Patrick McHardy wrote: > Krzysztof Oledzki wrote: >> >> >> On Fri, 6 Jun 2008, Krzysztof Oledzki wrote: >> >>> Hello, >>> >>> Today I have been plaing with the conntrackd utility and noticed it is >>> very easy to trigger a kernel oops just by: >>> >>> conntrackd -d >>> conntrackd -n >>> conntrackd -c >>> conntrackd -c >> >> OK, quite often is is enough to run only the first "conntrackd -c" to crash >> the kernel. > > Is that with or without your accounting patch? Without. Best regards, Krzysztof Ol Reply-To: ole@ans.pl On Sat, 7 Jun 2008, Patrick McHardy wrote: > Patrick McHardy wrote: >> Krzysztof Oledzki wrote: >>> >>> >>> On Fri, 6 Jun 2008, Krzysztof Oledzki wrote: >>> >>>> Hello, >>>> >>>> Today I have been plaing with the conntrackd utility and noticed it is >>>> very easy to trigger a kernel oops just by: >>>> >>>> conntrackd -d >>>> conntrackd -n >>>> conntrackd -c >>>> conntrackd -c >>> >>> OK, quite often is is enough to run only the first "conntrackd -c" to >>> crash the kernel. >> >> Is that with or without your accounting patch? > > In case its not, does that kernel include commit 86577c661? No, it does not but unfortunately this fix does not solve the crash. However, before you spend too much time on this I'll check 2.6.25.6 and 2.6.26-rc to make sure this problem has not been solved already. Best regards, Krzysztof Ol Krzysztof Oledzki wrote:
> On Sat, 7 Jun 2008, Patrick McHardy wrote:
>> In case its not, does that kernel include commit 86577c661?
>
> No, it does not but unfortunately this fix does not solve the crash.
> However, before you spend too much time on this I'll check 2.6.25.6 and
> 2.6.26-rc to make sure this problem has not been solved already.
Thanks, please let me know how it turns out.
Reply-To: ole@ans.pl On Sat, 7 Jun 2008, Patrick McHardy wrote: > Krzysztof Oledzki wrote: >> On Sat, 7 Jun 2008, Patrick McHardy wrote: >>> In case its not, does that kernel include commit 86577c661? >> >> No, it does not but unfortunately this fix does not solve the crash. >> However, before you spend too much time on this I'll check 2.6.25.6 and >> 2.6.26-rc to make sure this problem has not been solved already. > > Thanks, please let me know how it turns out. Clean 2.6.26-rc5 kernel, no additional patches at all. BUG: unable to handle kernel NULL pointer dereference at 00000032 IP: [<c03d930e>] nf_nat_setup_info+0x219/0x57f *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Pid: 1414, comm: conntrackd Not tainted (2.6.26-rc5 #1) EIP: 0060:[<c03d930e>] EFLAGS: 00010282 CPU: 1 EIP is at nf_nat_setup_info+0x219/0x57f EAX: c05bd47c EBX: f754bcc4 ECX: 0000000c EDX: 00000000 ESI: 0000019e EDI: f1c49bb4 EBP: f1c49bc8 ESP: f1c49b78 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process conntrackd (pid: 1414, ti=f1c48000 task=f7d31030 task.ti=f1c48000) Stack: 00000000 f1c49c2c f322b7fc 00000008 0005caaa f1c49bac 0005caaa c0138e70 0552215c 00000000 117334e0 00005102 c2012108 3aaf4108 0000002c 00000000 c0139bdc 3aa00780 f50b3474 f1c49c04 00000008 c038e728 0000000a f50b3474 Call Trace: [<c0138e70>] clockevents_program_event+0xca/0xd9 [<c0139bdc>] tick_program_event+0x30/0x4f [<c038e728>] nla_parse+0x5c/0xb0 [<c039801f>] ctnetlink_change_status+0x190/0x1c6 [<c03982f0>] ctnetlink_new_conntrack+0x189/0x61f [<c0108346>] read_tsc+0x6/0x22 [<c01367c4>] getnstimeofday+0x32/0xad [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8 [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8 [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8 [<c038d2ce>] netlink_rcv_skb+0x2d/0x71 [<c0390205>] nfnetlink_rcv+0x19/0x24 [<c038d0f5>] netlink_unicast+0x1b3/0x216 [<c038d892>] netlink_sendmsg+0x237/0x244 [<c035cf39>] sock_sendmsg+0xb8/0xd1 [<c013223c>] autoremove_wake_function+0x0/0x2b [<c013223c>] autoremove_wake_function+0x0/0x2b [<c035d7a0>] sys_sendto+0xfc/0x127 [<c014f90e>] __pagevec_lru_add_active+0x99/0xa4 [<c0152909>] __inc_zone_state+0x10/0x61 [<c045c854>] _spin_unlock+0xc/0x1f [<c015421a>] do_wp_page+0x3e7/0x440 [<c035e02d>] sys_socketcall+0x106/0x196 [<c0103946>] syscall_call+0x7/0xb ======================= Code: e8 25 4e d4 ff 89 e0 25 00 e0 ff ff f6 40 08 04 74 48 e8 2f 1c 08 00 eb 41 8b 1b 85 db 74 1d 8b 03 0f 18 00 90 8b 53 18 8d 4a 0c <8a> 41 26 3a 84 24 8a 00 00 00 75 e2 e9 09 ff ff ff b8 01 00 00 EIP: [<c03d930e>] nf_nat_setup_info+0x219/0x57f SS:ESP 0068:f1c49b78 ---[ end trace 5de3919242e64ed5 ]--- note: conntrackd[1414] exited with preempt_count 1 BUG: scheduling while atomic: conntrackd/1414/0x10000002 Pid: 1414, comm: conntrackd Tainted: G D 2.6.26-rc5 #1 [<c045a96f>] schedule+0x9b/0x60b [<c015ba07>] free_pages_and_swap_cache+0x6a/0x7e [<c011ed44>] __cond_resched+0xf/0x27 [<c045b00e>] _cond_resched+0x21/0x2a [<c0154845>] unmap_vmas+0x47e/0x551 [<c01573c7>] exit_mmap+0x70/0xf8 [<c012162a>] mmput+0x1c/0x7e [<c0125b4c>] do_exit+0x1dc/0x572 [<c0104ccb>] die+0x11f/0x124 [<c01143bf>] do_page_fault+0x4ae/0x567 [<c01367c4>] getnstimeofday+0x32/0xad [<c0113f11>] do_page_fault+0x0/0x567 [<c045cb3a>] error_code+0x72/0x78 [<c03d930e>] nf_nat_setup_info+0x219/0x57f [<c0138e70>] clockevents_program_event+0xca/0xd9 [<c0139bdc>] tick_program_event+0x30/0x4f [<c038e728>] nla_parse+0x5c/0xb0 [<c039801f>] ctnetlink_change_status+0x190/0x1c6 [<c03982f0>] ctnetlink_new_conntrack+0x189/0x61f [<c0108346>] read_tsc+0x6/0x22 [<c01367c4>] getnstimeofday+0x32/0xad [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8 [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8 [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8 [<c038d2ce>] netlink_rcv_skb+0x2d/0x71 [<c0390205>] nfnetlink_rcv+0x19/0x24 [<c038d0f5>] netlink_unicast+0x1b3/0x216 [<c038d892>] netlink_sendmsg+0x237/0x244 [<c035cf39>] sock_sendmsg+0xb8/0xd1 [<c013223c>] autoremove_wake_function+0x0/0x2b [<c013223c>] autoremove_wake_function+0x0/0x2b [<c035d7a0>] sys_sendto+0xfc/0x127 [<c014f90e>] __pagevec_lru_add_active+0x99/0xa4 [<c0152909>] __inc_zone_state+0x10/0x61 [<c045c854>] _spin_unlock+0xc/0x1f [<c015421a>] do_wp_page+0x3e7/0x440 [<c035e02d>] sys_socketcall+0x106/0x196 [<c0103946>] syscall_call+0x7/0xb ======================= Best regards, Krzysztof Ol Krzysztof Oledzki wrote:
>
>
> On Sat, 7 Jun 2008, Patrick McHardy wrote:
>
>> Krzysztof Oledzki wrote:
>>> On Sat, 7 Jun 2008, Patrick McHardy wrote:
>>>> In case its not, does that kernel include commit 86577c661?
>>>
>>> No, it does not but unfortunately this fix does not solve the crash.
>>> However, before you spend too much time on this I'll check 2.6.25.6
>>> and 2.6.26-rc to make sure this problem has not been solved already.
>>
>> Thanks, please let me know how it turns out.
>
> Clean 2.6.26-rc5 kernel, no additional patches at all.
>
> BUG: unable to handle kernel NULL pointer dereference at 00000032
> IP: [<c03d930e>] nf_nat_setup_info+0x219/0x57f
> *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP
How heavily is that system loaded (conntrack-wise)?
Reply-To: ole@ans.pl On Sat, 7 Jun 2008, Patrick McHardy wrote: > Krzysztof Oledzki wrote: >> >> >> On Sat, 7 Jun 2008, Patrick McHardy wrote: >> >>> Krzysztof Oledzki wrote: >>>> On Sat, 7 Jun 2008, Patrick McHardy wrote: >>>>> In case its not, does that kernel include commit 86577c661? >>>> >>>> No, it does not but unfortunately this fix does not solve the crash. >>>> However, before you spend too much time on this I'll check 2.6.25.6 and >>>> 2.6.26-rc to make sure this problem has not been solved already. >>> >>> Thanks, please let me know how it turns out. >> >> Clean 2.6.26-rc5 kernel, no additional patches at all. >> >> BUG: unable to handle kernel NULL pointer dereference at 00000032 >> IP: [<c03d930e>] nf_nat_setup_info+0x219/0x57f >> *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP > > How heavily is that system loaded (conntrack-wise)? It is nearly 100% idle (backup). However, I'm able to trigger this bug when I try to synchronize connections from an active firewall (it is still running a heavily patched 2.6.22 kernel, btw). Best regards, Krzysztof Ol Latest working kernel version: 2.6.22.19 (2.6.23 not tested yet) Could you attach the nf_nat_core.o file of your 2.6.26-rc5 kernel please? Created attachment 16433 [details]
nf_nat_core.o from 2.6.26-rc5, compiled with gcc-4.2.3
Done, please let me know if you need another version, for example compiled with CONFIG_DEBUG_INFO or with a different gcc version. BTW: This problem also exists with gcc-4.1.2. bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10875 > > > > > > ------- Comment #16 from olel@ans.pl 2008-06-08 10:28 ------- > Done, please let me know if you need another version, for example compiled > with > CONFIG_DEBUG_INFO or with a different gcc version. BTW: This problem also > exists with gcc-4.1.2. Thanks, I just wanted to verify that it is actually the nat->ct pointer that is bogus. It crashes in same_src() when comparing the protonum at offset 38. This patch (on top of the previous ones) might help. diff --git a/net/ipv4/netfilter/nf_nat_core.c b/net/ipv4/netfilter/nf_nat_core.c index 0457859..c22dde5 100644 --- a/net/ipv4/netfilter/nf_nat_core.c +++ b/net/ipv4/netfilter/nf_nat_core.c @@ -556,7 +555,6 @@ static void nf_nat_cleanup_conntrack(struct nf_conn *ct) spin_lock_bh(&nf_nat_lock); hlist_del_rcu(&nat->bysource); - nat->ct = NULL; spin_unlock_bh(&nf_nat_lock); } Yep, like I said I already tried something similar: - if (ct && same_src(ct, tuple)) { + if (same_src(ct, tuple)) { but without luck: http://bugzilla.kernel.org/attachment.cgi?id=16415 However, it was without the recent hlist_replace_rcu/call_rcu fixes. I'll retest your fix ASAP. Yes, they might help, the second crash with your attempted fix (seems to be reversed in the diff) looks like random data, which could be caused by the potential use-after-free the RCU fixes should cure. (In reply to comment #19) > Yes, they might help, It did not help: BUG: unable to handle kernel NULL pointer dereference at 00000032 IP: [<c03d9307>] nf_nat_setup_info+0x219/0x582 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Pid: 2793, comm: conntrackd Not tainted (2.6.26-rc5 #1) EIP: 0060:[<c03d9307>] EFLAGS: 00010286 CPU: 1 EIP is at nf_nat_setup_info+0x219/0x582 For totally unknown reason nat->ct became NULL. > the second crash with your attempted fix (seems to be > reversed in the diff) Indeed, sorry. > looks like random data, which could be caused by the > potential use-after-free the RCU fixes should cure. I'll retry the test with my trivial workaround applied to check if the second bug still exists. bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10875 > > ------- Comment #20 from olel@ans.pl 2008-06-08 12:23 ------- > (In reply to comment #19) >> Yes, they might help, > > It did not help: > > BUG: unable to handle kernel NULL pointer dereference at 00000032 > IP: [<c03d9307>] nf_nat_setup_info+0x219/0x582 > *pde = 00000000 > Oops: 0000 [#1] PREEMPT SMP > > Pid: 2793, comm: conntrackd Not tainted (2.6.26-rc5 #1) > EIP: 0060:[<c03d9307>] EFLAGS: 00010286 CPU: 1 > EIP is at nf_nat_setup_info+0x219/0x582 > > For totally unknown reason nat->ct became NULL. Do you have any helpers loaded? Might be worth testing without helpers since that could cause ct_extend reallocations. (In reply to comment #20) > (In reply to comment #19) > > Yes, they might help, > > It did not help: > > BUG: unable to handle kernel NULL pointer dereference at 00000032 > IP: [<c03d9307>] nf_nat_setup_info+0x219/0x582 > *pde = 00000000 > Oops: 0000 [#1] PREEMPT SMP > > Pid: 2793, comm: conntrackd Not tainted (2.6.26-rc5 #1) > EIP: 0060:[<c03d9307>] EFLAGS: 00010286 CPU: 1 > EIP is at nf_nat_setup_info+0x219/0x582 > > For totally unknown reason nat->ct became NULL. > > > the second crash with your attempted fix (seems to be > > reversed in the diff) > > Indeed, sorry. > > > looks like random data, which could be caused by the > > potential use-after-free the RCU fixes should cure. > > I'll retry the test with my trivial workaround applied to check if the second > bug still exists. :( IP: [<c03d930f>] nf_nat_setup_info+0x221/0x58a *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Pid: 2672, comm: conntrackd Not tainted (2.6.26-rc5 #1) EIP: 0060:[<c03d930f>] EFLAGS: 00010206 CPU: 0 EIP is at nf_nat_setup_info+0x221/0x58a EAX: 000f6478 EBX: 000f6478 ECX: 00000001 EDX: 00000000 ESI: 00000198 EDI: f5a47bb4 EBP: f5a47bc8 ESP: f5a47b78 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process conntrackd (pid: 2672, ti=f5a46000 task=f7d7dbf0 task.ti=f5a46000) Stack: 00000000 f5a47c2c e73a6724 00000008 00000002 f5a47e58 00001000 c016f0f8 f5a47f9c f5a47f4c f5a47be4 0000000d f5a47e5c f5a47e60 f5a47e64 f5a47e50 f5a47e54 f5a47e58 e5e27a74 f5a47c04 00000008 c038e728 0000000a e5e27a74 Call Trace: [<c016f0f8>] do_select+0x3c3/0x3d6 [<c038e728>] nla_parse+0x5c/0xb0 [<c039801f>] ctnetlink_change_status+0x190/0x1c6 [<c03982f0>] ctnetlink_new_conntrack+0x189/0x61f [<c01273d0>] local_bh_enable+0x6a/0x7f [<c0392221>] __nf_conntrack_find+0xe9/0xf3 [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8 [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8 [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8 [<c038d2ce>] netlink_rcv_skb+0x2d/0x71 [<c0390205>] nfnetlink_rcv+0x19/0x24 [<c038d0f5>] netlink_unicast+0x1b3/0x216 [<c038d892>] netlink_sendmsg+0x237/0x244 [<c035cf39>] sock_sendmsg+0xb8/0xd1 [<c013223c>] autoremove_wake_function+0x0/0x2b [<c013223c>] autoremove_wake_function+0x0/0x2b [<c016f337>] core_sys_select+0x22c/0x28f [<c035d7a0>] sys_sendto+0xfc/0x127 [<c0164424>] do_sync_write+0xbe/0x105 [<c0166d5d>] cp_new_stat64+0xf9/0x10b [<c013223c>] autoremove_wake_function+0x0/0x2b [<c011b5f8>] hrtick_start_fair+0xe7/0x128 [<c035e02d>] sys_socketcall+0x106/0x196 [<c0103946>] syscall_call+0x7/0xb ======================= Code: 1c 08 00 85 f6 74 1e b8 01 00 00 00 e8 18 4e d4 ff 89 e0 25 00 e0 ff ff f6 40 08 04 74 3e e8 22 1c 08 00 eb 37 8b 1b 85 db 74 13 <8b> 03 0f 18 00 90 8b 53 18 85 d2 0f 85 01 ff ff ff eb e7 b8 01 EIP: [<c03d930f>] nf_nat_setup_info+0x221/0x58a SS:ESP 0068:f5a47b78 ---[ end trace 72f9ff0e20f69bdb ]--- (In reply to comment #21) > bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10875 > > > > ------- Comment #20 from olel@ans.pl 2008-06-08 12:23 ------- > > (In reply to comment #19) > >> Yes, they might help, > > > > It did not help: > > > > BUG: unable to handle kernel NULL pointer dereference at 00000032 > > IP: [<c03d9307>] nf_nat_setup_info+0x219/0x582 > > *pde = 00000000 > > Oops: 0000 [#1] PREEMPT SMP > > > > Pid: 2793, comm: conntrackd Not tainted (2.6.26-rc5 #1) > > EIP: 0060:[<c03d9307>] EFLAGS: 00010286 CPU: 1 > > EIP is at nf_nat_setup_info+0x219/0x582 > > > > For totally unknown reason nat->ct became NULL. > > Do you have any helpers loaded? All. ;) > Might be worth testing without > helpers since that could cause ct_extend reallocations. OK. BUG: unable to handle kernel paging request at 000f6478
> IP: [<c03d930f>] nf_nat_setup_info+0x221/0x58a
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT SMP
>
> Pid: 2672, comm: conntrackd Not tainted (2.6.26-rc5 #1)
> EIP: 0060:[<c03d930f>] EFLAGS: 00010206 CPU: 0
> EIP is at nf_nat_setup_info+0x221/0x58a
> EAX: 000f6478 EBX: 000f6478 ECX: 00000001 EDX: 00000000
> ESI: 00000198 EDI: f5a47bb4 EBP: f5a47bc8 ESP: f5a47b78
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process conntrackd (pid: 2672, ti=f5a46000 task=f7d7dbf0 task.ti=f5a46000)
> Stack: 00000000 f5a47c2c e73a6724 00000008 00000002 f5a47e58 00001000
> c016f0f8
> f5a47f9c f5a47f4c f5a47be4 0000000d f5a47e5c f5a47e60 f5a47e64
> f5a47e50
> f5a47e54 f5a47e58 e5e27a74 f5a47c04 00000008 c038e728 0000000a
> e5e27a74
> Call Trace:
> [<c016f0f8>] do_select+0x3c3/0x3d6
> [<c038e728>] nla_parse+0x5c/0xb0
> [<c039801f>] ctnetlink_change_status+0x190/0x1c6
> [<c03982f0>] ctnetlink_new_conntrack+0x189/0x61f
> [<c01273d0>] local_bh_enable+0x6a/0x7f
> [<c0392221>] __nf_conntrack_find+0xe9/0xf3
> [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8
> [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8
> [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8
> [<c038d2ce>] netlink_rcv_skb+0x2d/0x71
> [<c0390205>] nfnetlink_rcv+0x19/0x24
> [<c038d0f5>] netlink_unicast+0x1b3/0x216
> [<c038d892>] netlink_sendmsg+0x237/0x244
> [<c035cf39>] sock_sendmsg+0xb8/0xd1
> [<c013223c>] autoremove_wake_function+0x0/0x2b
> [<c013223c>] autoremove_wake_function+0x0/0x2b
> [<c016f337>] core_sys_select+0x22c/0x28f
> [<c035d7a0>] sys_sendto+0xfc/0x127
> [<c0164424>] do_sync_write+0xbe/0x105
> [<c0166d5d>] cp_new_stat64+0xf9/0x10b
> [<c013223c>] autoremove_wake_function+0x0/0x2b
> [<c011b5f8>] hrtick_start_fair+0xe7/0x128
> [<c035e02d>] sys_socketcall+0x106/0x196
> [<c0103946>] syscall_call+0x7/0xb
> =======================
> Code: 1c 08 00 85 f6 74 1e b8 01 00 00 00 e8 18 4e d4 ff 89 e0 25 00 e0 ff ff
> f6 40 08 04 74 3e e8 22 1c 08 00 eb 37 8b 1b 85 db 74 13 <8b> 03 0f 18 00 90
> 8b
> 53 18 85 d2 0f 85 01 ff ff ff eb e7 b8 01
> EIP: [<c03d930f>] nf_nat_setup_info+0x221/0x58a SS:ESP 0068:f5a47b78
> ---[ end trace 72f9ff0e20f69bdb ]---
>
This last one is from testing without helpers? One more question - what kind of NAT is it trying to set up (SNAT/DNAT)? OK, its obviously SNAT. But is there also a DNAT mapping? Reply-To: ole@ans.pl On Sun, 8 Jun 2008, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10875 > > > > > > ------- Comment #25 from kaber@trash.net 2008-06-08 12:55 ------- > This last one is from testing without helpers? Not yet. I noticed that my previous Oops/bug was incomplete - the first line was missing. Best regards, Krzysztof Ol (In reply to comment #27) > OK, its obviously SNAT. But is there also a DNAT mapping? There is currently no NAT at all on this host nor itables rules: # iptables-save # Generated by iptables-save v1.4.0 on Sun Jun 8 22:20:37 2008 *raw :PREROUTING ACCEPT [127822:97238543] :OUTPUT ACCEPT [28623:2680302] COMMIT # Completed on Sun Jun 8 22:20:37 2008 # Generated by iptables-save v1.4.0 on Sun Jun 8 22:20:37 2008 *nat :PREROUTING ACCEPT [8967:836433] :POSTROUTING ACCEPT [17:1111] :OUTPUT ACCEPT [17:1111] COMMIT # Completed on Sun Jun 8 22:20:37 2008 # Generated by iptables-save v1.4.0 on Sun Jun 8 22:20:37 2008 *mangle :PREROUTING ACCEPT [127822:97238543] :INPUT ACCEPT [118846:96421093] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [28623:2680302] :POSTROUTING ACCEPT [28623:2680302] COMMIT # Completed on Sun Jun 8 22:20:37 2008 # Generated by iptables-save v1.4.0 on Sun Jun 8 22:20:37 2008 *filter :INPUT ACCEPT [118846:96421093] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [28623:2680302] COMMIT # Completed on Sun Jun 8 22:20:37 2008 All connections come from a master firewall I'm trying to synchronize with. And yes, on that host I have both SNAT and DNAT rules. No helpers (I hope): # CONFIG_NF_CONNTRACK_AMANDA is not set # CONFIG_NF_CONNTRACK_FTP is not set # CONFIG_NF_CONNTRACK_H323 is not set # CONFIG_NF_CONNTRACK_IRC is not set # CONFIG_NF_CONNTRACK_NETBIOS_NS is not set # CONFIG_NF_CONNTRACK_PPTP is not set # CONFIG_NF_CONNTRACK_SANE is not set # CONFIG_NF_CONNTRACK_SIP is not set # CONFIG_NF_CONNTRACK_TFTP is not set # CONFIG_NF_NAT_FTP is not set # CONFIG_NF_NAT_IRC is not set # CONFIG_NF_NAT_TFTP is not set # CONFIG_NF_NAT_AMANDA is not set # CONFIG_NF_NAT_PPTP is not set # CONFIG_NF_NAT_H323 is not set # CONFIG_NF_NAT_SIP is not set # conntrackd -d ; sleep 30 ; conntrack -s|grep "current active connections" current active connections: 47557 # conntrackd -c BUG: unable to handle kernel paging request at 00120fbd IP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Pid: 2795, comm: conntrackd Not tainted (2.6.26-rc5 #1) EIP: 0060:[<c03d394b>] EFLAGS: 00010206 CPU: 1 EIP is at nf_nat_setup_info+0x221/0x58a EAX: 00120fbd EBX: 00120fbd ECX: 00000001 EDX: 00000000 ESI: 0000019e EDI: e853bbb4 EBP: e853bbc8 ESP: e853bb78 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process conntrackd (pid: 2795, ti=e853a000 task=f7de10f0 task.ti=e853a000) Stack: 00000000 e853bc2c e85672ec 00000008 c0561084 63c1db4a 00000000 00000000 00000000 0002e109 61d2b1c3 00000000 00000000 00000000 01114e22 61d2b1c3 00000000 00000000 f7444674 e853bc04 00000008 c038e728 0000000a f7444674 Call Trace: [<c038e728>] nla_parse+0x5c/0xb0 [<c0397c1b>] ctnetlink_change_status+0x190/0x1c6 [<c0397eec>] ctnetlink_new_conntrack+0x189/0x61f [<c0119aee>] update_curr+0x3d/0x52 [<c03902d1>] nfnetlink_rcv_msg+0xc1/0xd8 [<c0390228>] nfnetlink_rcv_msg+0x18/0xd8 [<c0390210>] nfnetlink_rcv_msg+0x0/0xd8 [<c038d2ce>] netlink_rcv_skb+0x2d/0x71 [<c0390205>] nfnetlink_rcv+0x19/0x24 [<c038d0f5>] netlink_unicast+0x1b3/0x216 [<c038d892>] netlink_sendmsg+0x237/0x244 [<c035cf39>] sock_sendmsg+0xb8/0xd1 [<c013223c>] autoremove_wake_function+0x0/0x2b [<c013223c>] autoremove_wake_function+0x0/0x2b [<c035d7a0>] sys_sendto+0xfc/0x127 [<c0152909>] __inc_zone_state+0x10/0x61 [<c0455174>] _spin_unlock+0xc/0x1f [<c015421a>] do_wp_page+0x3e7/0x440 [<c035e02d>] sys_socketcall+0x106/0x196 [<c0103946>] syscall_call+0x7/0xb ======================= Code: ff 07 00 85 f6 74 1e b8 01 00 00 00 e8 dc a7 d4 ff 89 e0 25 00 e0 ff ff f6 40 08 04 74 3e e8 06 ff 07 00 eb 37 8b 1b 85 db 74 13 <8b> 03 0f 18 00 90 8b 53 18 85 d2 0f 85 01 ff ff ff eb e7 b8 01 EIP: [<c03d394b>] nf_nat_setup_info+0x221/0x58a SS:ESP 0068:e853bb78 ---[ end trace d485143ad4696dc9 ]--- It is 2.6.26-rc5 + "hlist_replace_rcu, call_rcu" fixes, "- nat->ct = NULL" finx and my workaround to check "ct". Still no luck. :( bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10875 This one might help (again on top of all others) - if conntrack creation in ctnetlink fails, the entry is freed without calling the extension destructor. diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index c4b1799..662c1cc 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -196,8 +196,6 @@ destroy_conntrack(struct nf_conntrack *nfct) if (l4proto && l4proto->destroy) l4proto->destroy(ct); - nf_ct_ext_destroy(ct); - rcu_read_unlock(); spin_lock_bh(&nf_conntrack_lock); @@ -520,6 +518,7 @@ static void nf_conntrack_free_rcu(struct rcu_head *head) void nf_conntrack_free(struct nf_conn *ct) { + nf_ct_ext_destroy(ct); call_rcu(&ct->rcu, nf_conntrack_free_rcu); } EXPORT_SYMBOL_GPL(nf_conntrack_free); (In reply to comment #31) > bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10875 > > This one might help (again on top of all others) - if conntrack > creation in ctnetlink fails, the entry is freed without calling > the extension destructor. Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Works like a charm, thank you. Tested both 2.6.26-rc5 with 4 above patches and 2.6.24.7 with 86577c661bc01d5c4e477d74567df4470d6c5138 and 019f692e a719a2da17606511d2648b8cc1762268. No more Oopses/BUGs/crahses. So, do we need all 4 patches, both for 2.6.26 and 2.6.25-stable, right? Thanks a lot! I'll go over the patches tommorrow and check which ones are really needed, but its probably all of them, yes. I hope we can avoid the addition of a rcu_head to struct nf_ct_ext somehow. fixed by commit ceeff7541e5a4ba8e8d97ffbae32b3f283cb7a3f |