Bug 60853 - OOPS at find_appropriate_src+0xdb/0x1a0 [nf_nat]
Summary: OOPS at find_appropriate_src+0xdb/0x1a0 [nf_nat]
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-05 03:02 UTC by leezhao
Modified: 2013-11-13 15:21 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.32.43-0.4-default
Subsystem:
Regression: No
Bisected commit-id:


Attachments
simulate the operation on nat_bysource list (2.48 KB, text/plain)
2013-09-06 02:46 UTC, leezhao
Details

Description leezhao 2013-09-05 03:02:05 UTC
[10542399.515396] BUG: unable to handle kernel NULL pointer dereference at 000000000000003e
[10542399.523469] IP: [<ffffffffa1491a4b>] find_appropriate_src+0xdb/0x1a0 [nf_nat]
[10542399.530843] PGD 17f55ec067 PUD 17fba37067 PMD 0
[10542399.535727] Oops: 0000 [#1] SMP
[10542399.539220] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
[10542399.547355] CPU 8
[10542399.647544] Supported: Yes, External
[10542399.651361] Pid: 0, comm: swapper Tainted: P          NX 2.6.32.43-0.4-default #1 Thurley
[10542399.659755] RIP: 0010:[<ffffffffa1491a4b>]  [<ffffffffa1491a4b>] find_appropriate_src+0xdb/0x1a0 [nf_nat]
[10542399.669552] RSP: 0018:ffff88002c3039f0  EFLAGS: 00010286
[10542399.675095] RAX: 0000000000000000 RBX: ffff8817814beb90 RCX: 0000000024852261
[10542399.682454] RDX: 0000000000000000 RSI: 00000000327c4d71 RDI: ffffffff81cd4dc0
[10542399.689812] RBP: ffff88002c303ad0 R08: 0000000000000011 R09: 0000000000000002
[10542399.697170] R10: 0000000000004000 R11: ffffffffa14726e0 R12: ffff88002c303aa0
[10542399.704529] R13: ffff88002c303b40 R14: ffff88002c303b4c R15: ffff88002c303b4e
[10542399.711888] FS:  0000000000000000(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000
[10542399.720199] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[10542399.726175] CR2: 000000000000003e CR3: 00000017f67f1000 CR4: 00000000000006e0
[10542399.733534] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10542399.740893] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[10542399.748254] Process swapper (pid: 0, threadinfo ffff881810db2000, task ffff881810db0080)
[10542399.756560] Stack:
[10542399.758821]  00000000ffffffff ffff88002c303aa0 ffff88002c303ad0 ffff88002c303b40
[10542399.766301] <0> 0000000000000000 ffff8817f7d639e8 0000000000000100 ffffffffa1491beb
[10542399.774237] <0> ffff88002c303ad0 ffff8817f7d639e8 ffff88002c303b40 ffff88002c303aa0
[10542399.782365] Call Trace:
[10542399.785085]  [<ffffffffa1491beb>] get_unique_tuple+0xdb/0x240 [nf_nat]
[10542399.791847]  [<ffffffffa1491de9>] nf_nat_setup_info+0x99/0x350 [nf_nat]
[10542399.798697]  [<ffffffffa149e162>] alloc_null_binding+0x52/0x90 [iptable_nat]
[10542399.805977]  [<ffffffffa149e519>] nf_nat_fn+0x1e9/0x280 [iptable_nat]
[10542399.812654]  [<ffffffff81318d18>] nf_iterate+0x68/0xa0
[10542399.818031]  [<ffffffff81318db2>] nf_hook_slow+0x62/0xf0
[10542399.823582]  [<ffffffff813214a1>] ip_local_deliver+0x51/0x80
[10542399.829477]  [<ffffffff81320a59>] ip_rcv_finish+0x1b9/0x440
[10542399.835288]  [<ffffffff812f5f89>] netif_receive_skb+0x599/0x6a0
[10542399.841454]  [<ffffffffa0ea4837>] ixgbe_clean_rx_irq+0x3d7/0xe50 [ixgbe]
[10542399.848397]  [<ffffffffa0ea53e4>] ixgbe_clean_rxtx_many+0x134/0x270 [ixgbe]
[10542399.855595]  [<ffffffff812f6863>] net_rx_action+0xe3/0x1a0
[10542399.861318]  [<ffffffff810533ef>] __do_softirq+0xbf/0x170
[10542399.866956]  [<ffffffff810040bc>] call_softirq+0x1c/0x30
[10542399.872506]  [<ffffffff81005cfd>] do_softirq+0x4d/0x80
[10542399.877883]  [<ffffffff81053275>] irq_exit+0x85/0x90
[10542399.883087]  [<ffffffff8100525e>] do_IRQ+0x6e/0xe0
[10542399.888120]  [<ffffffff81003913>] ret_from_intr+0x0/0xa
[10542399.893582]  [<ffffffff8100ae42>] mwait_idle+0x62/0x70
[10542399.898957]  [<ffffffff8100204a>] cpu_idle+0x5a/0xb0
[10542399.904159] Code: 00 00 00 4d 8d 7d 0e 4d 8d 75 0c 48 89 c3 eb 14 48 8b 03 48 85 c0 0f 84 84 00 00 00 44 0f b6 45 26 48 89 c3 48 8b 53 20 48 8b 03 <44> 38 42 3e 0f 18 08 75 dc 8b 42 18 3b 45 00 75 d4 0f b7 42 28

From the vmcore,we found that: 

1 OOPS occured at the statement 't->dst.protonum == tuple->dst.protonum' in inline function same_src. 

2 The first parameter of same_src "ct" is NULL,The value of 'ct' came from 'ct = nat->ct'.

3 Read the content of the 'nat', all member's value are zero.
  
 
static void nf_nat_cleanup_conntrack(struct nf_conn *ct)
{
	struct nf_conn_nat *nat = nf_ct_ext_find(ct, NF_CT_EXT_NAT);

	if (nat == NULL || nat->ct == NULL)
		return;

	NF_CT_ASSERT(nat->ct->status & IPS_NAT_DONE_MASK);

	spin_lock_bh(&nf_nat_lock);
	hlist_del_rcu(&nat->bysource); 
	spin_unlock_bh(&nf_nat_lock);
}

void nf_conntrack_free(struct nf_conn *ct)
{
  struct net *net = nf_ct_net(ct);
  nf_ct_ext_destroy(ct); //For NAT,it will call nf_nat_cleanup_conntrack
  atomic_dec(&net->ct.count);	
  nf_ct_ext_free(ct);  // Free nat-extention memory by kfree; is it possible that the extention was still used in a RCU read side ?
  kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
}
Comment 1 Phil 2013-09-05 15:26:13 UTC
This bug has been reported in netfilter bugzilla here:

https://bugzilla.netfilter.org/show_bug.cgi?id=714

Could you please add yourself to cc list there and add any additional information which might be useful?
Comment 2 Phil 2013-09-06 00:14:32 UTC
Please also include output of "iptables -t nat -nvL" so we can see what NAT rules are being applied.
Comment 3 leezhao 2013-09-06 01:04:06 UTC
(In reply to Phil from comment #2)
> Please also include output of "iptables -t nat -nvL" so we can see what NAT
> rules are being applied.

Sorry, the issue also occured on customer's environment, can not get the information now.
Comment 4 leezhao 2013-09-06 02:46:59 UTC
Created attachment 107451 [details]
simulate the operation on nat_bysource list
Comment 5 leezhao 2013-09-06 02:48:31 UTC
In reply to Phil from comment #1)
> This bug has been reported in netfilter bugzilla here:
> https://bugzilla.netfilter.org/show_bug.cgi?id=714

Could you please add
> yourself to cc list there and add any additional information which might be
> useful?

We have wrote a kernel module to simulate the operation on nat_bysource list,refer to attachment for detail. If we comment the statement "synchronize_rcu()" in function thread_delete, OOPS occured for NULL pointer exception.

Is it safe to synchronize_rcu at the end of function nf_nat_cleanup_conntrack?

or

can we replace rcu_read_lock with spin_lock_bh(&nf_nat_lock)?
Comment 6 leezhao 2013-09-06 02:53:18 UTC
(In reply to leezhao from comment #3)
> (In reply to Phil from comment #2)
> Please also include output of "iptables
> -t nat -nvL" so we can see what NAT
> rules are being applied.

Sorry, the
> issue also occured on customer's environment, can not get the information
> now.

Just found a test environment, the output of "iptables -t nat -nvL" is:

(In reply to leezhao from comment #5)
> In reply to Phil from comment #1)
> This bug has been reported in netfilter
> bugzilla here:
> https://bugzilla.netfilter.org/show_bug.cgi?id=714

Could
> you please add
> yourself to cc list there and add any additional
> information which might be
> useful?

We have wrote a kernel module to
> simulate the operation on nat_bysource list,refer to attachment for detail.
> If we comment the statement "synchronize_rcu()" in function thread_delete,
> OOPS occured for NULL pointer exception.

Is it safe to synchronize_rcu at
> the end of function nf_nat_cleanup_conntrack?

or

can we replace
> rcu_read_lock with spin_lock_bh(&nf_nat_lock) in function
> find_appropriate_src ?
Comment 7 leezhao 2013-09-06 03:00:18 UTC
(In reply to leezhao from comment #3)
> (In reply to Phil from comment #2)
> Please also include output of "iptables
> -t nat -nvL" so we can see what NAT
> rules are being applied.

Sorry, the
> issue also occured on customer's environment, can not get the information
> now.

Just found a test environment, the output of "iptables -t nat -nvL" is:

JINLUB017_01:~ # iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 22M packets, 2590M bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       udp  --  pubeth9 *       0.0.0.0/0            0.0.0.0/0           udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth9 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth4 *       0.0.0.0/0            0.0.0.0/0           udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth4 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth3 *       0.0.0.0/0            0.0.0.0/0           udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth3 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth2 *       0.0.0.0/0            0.0.0.0/0           udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth2 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth10 *       0.0.0.0/0            0.0.0.0/0           udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth10 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       udp  --  pubeth1 *       0.0.0.0/0            0.0.0.0/0           udp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  pubeth1 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:4045 to:172.17.136.2:4045
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            172.18.53.1         tcp dpt:80 to:172.18.53.1:8080

Chain POSTROUTING (policy ACCEPT 88090 packets, 6081K bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 SNAT       tcp  --  *      priveth0  0.0.0.0/0            172.17.136.2        tcp dpt:4045 to:172.17.136.153
    0     0 SNAT       tcp  --  *      *       172.18.53.1          0.0.0.0/0           tcp spt:8080 to:172.18.53.1:80

Chain OUTPUT (policy ACCEPT 88090 packets, 6081K bytes)
 pkts bytes target     prot opt in     out     source               destination
JINLUB017_01:~ # ifconfig
priveth0  Link encap:Ethernet  HWaddr 28:6E:D4:43:52:7D
          inet addr:172.18.53.141  Bcast:172.18.53.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2285804 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5161093 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:288897684 (275.5 Mb)  TX bytes:5094949791 (4858.9 Mb)

priveth0: Link encap:Ethernet  HWaddr 28:6E:D4:43:52:7D
          inet addr:172.18.53.1  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

priveth0: Link encap:Ethernet  HWaddr 28:6E:D4:43:52:7D
          inet addr:172.18.53.2  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Comment 8 Phil 2013-09-11 00:27:28 UTC
(In reply to leezhao from comment #5)
> We have wrote a kernel module to simulate the operation on nat_bysource
> list,refer to attachment for detail. If we comment the statement
> "synchronize_rcu()" in function thread_delete, OOPS occured for NULL pointer
> exception.

This is not surprising, given you have a memset after the synchronize_rcu.  There does not appear to be a memset done in the netfilter code at delete time, so why are you zeroing out the structure in your test?

> Is it safe to synchronize_rcu at the end of function
> nf_nat_cleanup_conntrack?

It is "safe" but I can't find why it should be required.  Are you able to reproduce the issue at will?  If so, does adding the call help?

> can we replace rcu_read_lock with spin_lock_bh(&nf_nat_lock)?

This would completely defeat the purpose of using RCU.
Comment 9 Phil 2013-09-15 15:26:51 UTC
This issue is fixed in commit c13a84a8 (netfilter: nf_conntrack: use RCU safe kfree for conntrack extensions).  

https://git.kernel.org/cgit/linux/kernel/git/pablo/nf.git/commit/?id=c13a84a830a208fb3443628773c8ca0557773cc7

Should appear in 3.12 and be backported to -stable kernels.

Note You need to log in before you can comment on or make changes to this bug.