Bug 60853
Summary: | OOPS at find_appropriate_src+0xdb/0x1a0 [nf_nat] | ||
---|---|---|---|
Product: | Networking | Reporter: | leezhao (lizhao09) |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, kernel, lizhao09 |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 2.6.32.43-0.4-default | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | simulate the operation on nat_bysource list |
Description
leezhao
2013-09-05 03:02:05 UTC
This bug has been reported in netfilter bugzilla here: https://bugzilla.netfilter.org/show_bug.cgi?id=714 Could you please add yourself to cc list there and add any additional information which might be useful? Please also include output of "iptables -t nat -nvL" so we can see what NAT rules are being applied. (In reply to Phil from comment #2) > Please also include output of "iptables -t nat -nvL" so we can see what NAT > rules are being applied. Sorry, the issue also occured on customer's environment, can not get the information now. Created attachment 107451 [details]
simulate the operation on nat_bysource list
In reply to Phil from comment #1) > This bug has been reported in netfilter bugzilla here: > https://bugzilla.netfilter.org/show_bug.cgi?id=714 Could you please add > yourself to cc list there and add any additional information which might be > useful? We have wrote a kernel module to simulate the operation on nat_bysource list,refer to attachment for detail. If we comment the statement "synchronize_rcu()" in function thread_delete, OOPS occured for NULL pointer exception. Is it safe to synchronize_rcu at the end of function nf_nat_cleanup_conntrack? or can we replace rcu_read_lock with spin_lock_bh(&nf_nat_lock)? (In reply to leezhao from comment #3) > (In reply to Phil from comment #2) > Please also include output of "iptables > -t nat -nvL" so we can see what NAT > rules are being applied. Sorry, the > issue also occured on customer's environment, can not get the information > now. Just found a test environment, the output of "iptables -t nat -nvL" is: (In reply to leezhao from comment #5) > In reply to Phil from comment #1) > This bug has been reported in netfilter > bugzilla here: > https://bugzilla.netfilter.org/show_bug.cgi?id=714 Could > you please add > yourself to cc list there and add any additional > information which might be > useful? We have wrote a kernel module to > simulate the operation on nat_bysource list,refer to attachment for detail. > If we comment the statement "synchronize_rcu()" in function thread_delete, > OOPS occured for NULL pointer exception. Is it safe to synchronize_rcu at > the end of function nf_nat_cleanup_conntrack? or can we replace > rcu_read_lock with spin_lock_bh(&nf_nat_lock) in function > find_appropriate_src ? (In reply to leezhao from comment #3) > (In reply to Phil from comment #2) > Please also include output of "iptables > -t nat -nvL" so we can see what NAT > rules are being applied. Sorry, the > issue also occured on customer's environment, can not get the information > now. Just found a test environment, the output of "iptables -t nat -nvL" is: JINLUB017_01:~ # iptables -t nat -nvL Chain PREROUTING (policy ACCEPT 22M packets, 2590M bytes) pkts bytes target prot opt in out source destination 0 0 DNAT udp -- pubeth9 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth9 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth4 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth4 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth3 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth3 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth2 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth2 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth10 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth10 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT udp -- pubeth1 * 0.0.0.0/0 0.0.0.0/0 udp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- pubeth1 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:4045 to:172.17.136.2:4045 0 0 DNAT tcp -- * * 0.0.0.0/0 172.18.53.1 tcp dpt:80 to:172.18.53.1:8080 Chain POSTROUTING (policy ACCEPT 88090 packets, 6081K bytes) pkts bytes target prot opt in out source destination 0 0 SNAT tcp -- * priveth0 0.0.0.0/0 172.17.136.2 tcp dpt:4045 to:172.17.136.153 0 0 SNAT tcp -- * * 172.18.53.1 0.0.0.0/0 tcp spt:8080 to:172.18.53.1:80 Chain OUTPUT (policy ACCEPT 88090 packets, 6081K bytes) pkts bytes target prot opt in out source destination JINLUB017_01:~ # ifconfig priveth0 Link encap:Ethernet HWaddr 28:6E:D4:43:52:7D inet addr:172.18.53.141 Bcast:172.18.53.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2285804 errors:0 dropped:0 overruns:0 frame:0 TX packets:5161093 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:288897684 (275.5 Mb) TX bytes:5094949791 (4858.9 Mb) priveth0: Link encap:Ethernet HWaddr 28:6E:D4:43:52:7D inet addr:172.18.53.1 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 priveth0: Link encap:Ethernet HWaddr 28:6E:D4:43:52:7D inet addr:172.18.53.2 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 (In reply to leezhao from comment #5) > We have wrote a kernel module to simulate the operation on nat_bysource > list,refer to attachment for detail. If we comment the statement > "synchronize_rcu()" in function thread_delete, OOPS occured for NULL pointer > exception. This is not surprising, given you have a memset after the synchronize_rcu. There does not appear to be a memset done in the netfilter code at delete time, so why are you zeroing out the structure in your test? > Is it safe to synchronize_rcu at the end of function > nf_nat_cleanup_conntrack? It is "safe" but I can't find why it should be required. Are you able to reproduce the issue at will? If so, does adding the call help? > can we replace rcu_read_lock with spin_lock_bh(&nf_nat_lock)? This would completely defeat the purpose of using RCU. This issue is fixed in commit c13a84a8 (netfilter: nf_conntrack: use RCU safe kfree for conntrack extensions). https://git.kernel.org/cgit/linux/kernel/git/pablo/nf.git/commit/?id=c13a84a830a208fb3443628773c8ca0557773cc7 Should appear in 3.12 and be backported to -stable kernels. |