[10542399.515396] BUG: unable to handle kernel NULL pointer dereference at 000000000000003e [10542399.523469] IP: [<ffffffffa1491a4b>] find_appropriate_src+0xdb/0x1a0 [nf_nat] [10542399.530843] PGD 17f55ec067 PUD 17fba37067 PMD 0 [10542399.535727] Oops: 0000 [#1] SMP [10542399.539220] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map [10542399.547355] CPU 8 [10542399.647544] Supported: Yes, External [10542399.651361] Pid: 0, comm: swapper Tainted: P NX 2.6.32.43-0.4-default #1 Thurley [10542399.659755] RIP: 0010:[<ffffffffa1491a4b>] [<ffffffffa1491a4b>] find_appropriate_src+0xdb/0x1a0 [nf_nat] [10542399.669552] RSP: 0018:ffff88002c3039f0 EFLAGS: 00010286 [10542399.675095] RAX: 0000000000000000 RBX: ffff8817814beb90 RCX: 0000000024852261 [10542399.682454] RDX: 0000000000000000 RSI: 00000000327c4d71 RDI: ffffffff81cd4dc0 [10542399.689812] RBP: ffff88002c303ad0 R08: 0000000000000011 R09: 0000000000000002 [10542399.697170] R10: 0000000000004000 R11: ffffffffa14726e0 R12: ffff88002c303aa0 [10542399.704529] R13: ffff88002c303b40 R14: ffff88002c303b4c R15: ffff88002c303b4e [10542399.711888] FS: 0000000000000000(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000 [10542399.720199] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [10542399.726175] CR2: 000000000000003e CR3: 00000017f67f1000 CR4: 00000000000006e0 [10542399.733534] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [10542399.740893] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [10542399.748254] Process swapper (pid: 0, threadinfo ffff881810db2000, task ffff881810db0080) [10542399.756560] Stack: [10542399.758821] 00000000ffffffff ffff88002c303aa0 ffff88002c303ad0 ffff88002c303b40 [10542399.766301] <0> 0000000000000000 ffff8817f7d639e8 0000000000000100 ffffffffa1491beb [10542399.774237] <0> ffff88002c303ad0 ffff8817f7d639e8 ffff88002c303b40 ffff88002c303aa0 [10542399.782365] Call Trace: [10542399.785085] [<ffffffffa1491beb>] get_unique_tuple+0xdb/0x240 [nf_nat] [10542399.791847] [<ffffffffa1491de9>] nf_nat_setup_info+0x99/0x350 [nf_nat] [10542399.798697] [<ffffffffa149e162>] alloc_null_binding+0x52/0x90 [iptable_nat] [10542399.805977] [<ffffffffa149e519>] nf_nat_fn+0x1e9/0x280 [iptable_nat] [10542399.812654] [<ffffffff81318d18>] nf_iterate+0x68/0xa0 [10542399.818031] [<ffffffff81318db2>] nf_hook_slow+0x62/0xf0 [10542399.823582] [<ffffffff813214a1>] ip_local_deliver+0x51/0x80 [10542399.829477] [<ffffffff81320a59>] ip_rcv_finish+0x1b9/0x440 [10542399.835288] [<ffffffff812f5f89>] netif_receive_skb+0x599/0x6a0 [10542399.841454] [<ffffffffa0ea4837>] ixgbe_clean_rx_irq+0x3d7/0xe50 [ixgbe] [10542399.848397] [<ffffffffa0ea53e4>] ixgbe_clean_rxtx_many+0x134/0x270 [ixgbe] [10542399.855595] [<ffffffff812f6863>] net_rx_action+0xe3/0x1a0 [10542399.861318] [<ffffffff810533ef>] __do_softirq+0xbf/0x170 [10542399.866956] [<ffffffff810040bc>] call_softirq+0x1c/0x30 [10542399.872506] [<ffffffff81005cfd>] do_softirq+0x4d/0x80 [10542399.877883] [<ffffffff81053275>] irq_exit+0x85/0x90 [10542399.883087] [<ffffffff8100525e>] do_IRQ+0x6e/0xe0 [10542399.888120] [<ffffffff81003913>] ret_from_intr+0x0/0xa [10542399.893582] [<ffffffff8100ae42>] mwait_idle+0x62/0x70 [10542399.898957] [<ffffffff8100204a>] cpu_idle+0x5a/0xb0 [10542399.904159] Code: 00 00 00 4d 8d 7d 0e 4d 8d 75 0c 48 89 c3 eb 14 48 8b 03 48 85 c0 0f 84 84 00 00 00 44 0f b6 45 26 48 89 c3 48 8b 53 20 48 8b 03 <44> 38 42 3e 0f 18 08 75 dc 8b 42 18 3b 45 00 75 d4 0f b7 42 28 From the vmcore,we found that: 1 OOPS occured at the statement 't->dst.protonum == tuple->dst.protonum' in inline function same_src. 2 The first parameter of same_src "ct" is NULL,The value of 'ct' came from 'ct = nat->ct'. 3 Read the content of the 'nat', all member's value are zero. static void nf_nat_cleanup_conntrack(struct nf_conn *ct) { struct nf_conn_nat *nat = nf_ct_ext_find(ct, NF_CT_EXT_NAT); if (nat == NULL || nat->ct == NULL) return; NF_CT_ASSERT(nat->ct->status & IPS_NAT_DONE_MASK); spin_lock_bh(&nf_nat_lock); hlist_del_rcu(&nat->bysource); spin_unlock_bh(&nf_nat_lock); } void nf_conntrack_free(struct nf_conn *ct) { struct net *net = nf_ct_net(ct); nf_ct_ext_destroy(ct); //For NAT,it will call nf_nat_cleanup_conntrack atomic_dec(&net->ct.count); nf_ct_ext_free(ct); // Free nat-extention memory by kfree; is it possible that the extention was still used in a RCU read side ? kmem_cache_free(net->ct.nf_conntrack_cachep, ct); }
This bug has been reported in netfilter bugzilla here: https://bugzilla.netfilter.org/show_bug.cgi?id=714 Could you please add yourself to cc list there and add any additional information which might be useful?
Please also include output of "iptables -t nat -nvL" so we can see what NAT rules are being applied.
(In reply to Phil from comment #2) > Please also include output of "iptables -t nat -nvL" so we can see what NAT > rules are being applied. Sorry, the issue also occured on customer's environment, can not get the information now.
Created attachment 107451 [details] simulate the operation on nat_bysource list
In reply to Phil from comment #1) > This bug has been reported in netfilter bugzilla here: > https://bugzilla.netfilter.org/show_bug.cgi?id=714 Could you please add > yourself to cc list there and add any additional information which might be > useful? We have wrote a kernel module to simulate the operation on nat_bysource list,refer to attachment for detail. If we comment the statement "synchronize_rcu()" in function thread_delete, OOPS occured for NULL pointer exception. Is it safe to synchronize_rcu at the end of function nf_nat_cleanup_conntrack? or can we replace rcu_read_lock with spin_lock_bh(&nf_nat_lock)?
(In reply to leezhao from comment #3) > (In reply to Phil from comment #2) > Please also include output of "iptables > -t nat -nvL" so we can see what NAT > rules are being applied. Sorry, the > issue also occured on customer's environment, can not get the information > now. Just found a test environment, the output of "iptables -t nat -nvL" is: (In reply to leezhao from comment #5) > In reply to Phil from comment #1) > This bug has been reported in netfilter > bugzilla here: > https://bugzilla.netfilter.org/show_bug.cgi?id=714 Could > you please add > yourself to cc list there and add any additional > information which might be > useful? We have wrote a kernel module to > simulate the operation on nat_bysource list,refer to attachment for detail. > If we comment the statement "synchronize_rcu()" in function thread_delete, > OOPS occured for NULL pointer exception. Is it safe to synchronize_rcu at > the end of function nf_nat_cleanup_conntrack? or can we replace > rcu_read_lock with spin_lock_bh(&nf_nat_lock) in function > find_appropriate_src ?
(In reply to leezhao from comment #3) > (In reply to Phil from comment #2) > Please also include output of "iptables > -t nat -nvL" so we can see what NAT > rules are being applied. Sorry, the > issue also occured on customer's environment, can not get the information > now. Just found a test environment, the output of "iptables -t nat -nvL" is: JINLUB017_01:~ # iptables -t nat -nvL Chain PREROUTING (policy ACCEPT 22M packets, 2590M bytes) pkts bytes target prot opt in out source destination 0 0 DNAT udp -- pubeth9 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth9 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth4 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth4 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth3 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth3 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth2 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth2 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth10 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth10 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth1 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth1 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- * * 0.0.0.0/0 172.18.53.1 tcp dpt:80 to:172.18.53.1:8080 Chain POSTROUTING (policy ACCEPT 88090 packets, 6081K bytes) pkts bytes target prot opt in out source destination 0 0 SNAT tcp -- * priveth0 0.0.0.0/0 172.17.136.2 tcp dpt:4045 to:172.17.136.153 0 0 SNAT tcp -- * * 172.18.53.1 0.0.0.0/0 tcp spt:8080 to:172.18.53.1:80 Chain OUTPUT (policy ACCEPT 88090 packets, 6081K bytes) pkts bytes target prot opt in out source destination JINLUB017_01:~ # ifconfig priveth0 Link encap:Ethernet HWaddr 28:6E:D4:43:52:7D inet addr:172.18.53.141 Bcast:172.18.53.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2285804 errors:0 dropped:0 overruns:0 frame:0 TX packets:5161093 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:288897684 (275.5 Mb) TX bytes:5094949791 (4858.9 Mb) priveth0: Link encap:Ethernet HWaddr 28:6E:D4:43:52:7D inet addr:172.18.53.1 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 priveth0: Link encap:Ethernet HWaddr 28:6E:D4:43:52:7D inet addr:172.18.53.2 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
(In reply to leezhao from comment #5) > We have wrote a kernel module to simulate the operation on nat_bysource > list,refer to attachment for detail. If we comment the statement > "synchronize_rcu()" in function thread_delete, OOPS occured for NULL pointer > exception. This is not surprising, given you have a memset after the synchronize_rcu. There does not appear to be a memset done in the netfilter code at delete time, so why are you zeroing out the structure in your test? > Is it safe to synchronize_rcu at the end of function > nf_nat_cleanup_conntrack? It is "safe" but I can't find why it should be required. Are you able to reproduce the issue at will? If so, does adding the call help? > can we replace rcu_read_lock with spin_lock_bh(&nf_nat_lock)? This would completely defeat the purpose of using RCU.
This issue is fixed in commit c13a84a8 (netfilter: nf_conntrack: use RCU safe kfree for conntrack extensions). https://git.kernel.org/cgit/linux/kernel/git/pablo/nf.git/commit/?id=c13a84a830a208fb3443628773c8ca0557773cc7 Should appear in 3.12 and be backported to -stable kernels.