Bug 70471

Summary: xfrm policy node will double unlink.
Product: Networking Reporter: Xianpeng Zhao (673321875)
Component: OtherAssignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: normal CC: adobriyan, alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36 Subsystem:
Regression: No Bisected commit-id:
Attachments: My patch, but don't work.

Description Xianpeng Zhao 2014-02-13 05:20:38 UTC
Created attachment 125861 [details]
My patch, but don't work.

Hi community,

    I found a case may cause the xfrm policy node will double unlink from the list.
    
    I am not an expert of XFRM, just from the code logic.
    This scenario may cause double unlink a node:
In thread context, After removed the node from list, before remove the xfrm policy expire timer. At this point, a timer interrupt come, and call the run_timer_softirq to execute the xfrm_policy_timer to remove the expired policy node; because this policy node had already removed from list. this remove will cause the node double unlinked.

    My solution is like this, but found it can not fix this problem. Could any one can help me on this case?
    
my patch:
--- net/xfrm/xfrm_policy.c_old	2014-02-10 10:18:28.421504317 +0800
+++ net/xfrm/xfrm_policy.c	2014-02-10 10:19:01.661503334 +0800
@@ -330,7 +330,6 @@ static void xfrm_queue_purge(struct sk_b
 
 static void xfrm_policy_kill(struct xfrm_policy *policy)
 {
-	policy->walk.dead = 1;
 
 	atomic_inc(&policy->genid);
 
@@ -1156,6 +1155,7 @@ static struct xfrm_policy *__xfrm_policy
 	if (hlist_unhashed(&pol->bydst))
 		return NULL;
 
+	policy->walk.dead = 1;
 	hlist_del(&pol->bydst);
 	hlist_del(&pol->byidx);
 	list_del(&pol->walk.all);



LOG:
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.1/0000:02:00.0/net/eth1/queu                                                                                                  es/rx-0/rps_flow_cnt
CPU 2
Modules linked in: sch_tbf iptable_nat ipt_MASQUERADE ipt_REDIRECT ipt_NETMAP xt                                                                                                  _limit xt_state xt_u32 ipt_ULOG xt_dscp xt_iprange nf_conntrack_netlink nfnetlin                                                                                                  k_log nfnetlink_queue nfnetlink nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_nat                                                                                                   nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ipt_REJECT xt_TCP                                                                                                  MSS xt_CLASSIFY xt_DSCP ipt_LOG iptable_raw iptable_mangle iptable_filter ip_tab                                                                                                  les ah_fe_hook fe ath_rate_atheros ath_dfs ath_hal asf adf eth_drv ah_board0 kmp                                                                                                  i ah_systop

Pid: 0, comm: kworker/0:1 Not tainted 2.6.36 #1 MAHOBAY/Aerohive HiveManager App                                                                                                  liance
RIP: 0010:[<ffffffff8338cacc>]  [<ffffffff8338cacc>] __xfrm_policy_unlink+0x1e/0                                                                                                  x8f
RSP: 0018:ffff880001d03de0  EFLAGS: 00010286
RAX: dead000000100100 RBX: ffff880216eb1400 RCX: ffff880217520c28
RDX: dead000000200200 RSI: 0000000000000001 RDI: ffff880216eb1400
RBP: 0000000000000001 R08: ffff88021e4cd090 R09: 0000000000000010
R10: 0000000000000000 R11: ffff880001d03e50 R12: fffffffffffffffd
R13: 0000000000001fd1 R14: ffff88021e4d5fd8 R15: ffff88021e4d5fd8
FS:  0000000000000000(0000) GS:ffff880001d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000006280b4 CR3: 000000021cef5000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 0, threadinfo ffff88021e4d4000, task ffff88021e4bb800)
Stack:
 0000000000000020 ffff880216eb1400 0000000000000001 ffffffff8338d872
<0> 0000000000001fd1 0000000000000001 ffff880216eb1400 ffffffff8338e368
<0> ffffffffffffff4e ffff88021e4cc000 0000000000000100 ffff880001d03e90
Call Trace:
 <IRQ>
 [<ffffffff8338d872>] ? xfrm_policy_delete+0x24/0x4b
 [<ffffffff8338e368>] ? xfrm_policy_timer+0x16b/0x1a4
 [<ffffffff8338e1fd>] ? xfrm_policy_timer+0x0/0x1a4
 [<ffffffff83044903>] ? run_timer_softirq+0x204/0x2e3
 [<ffffffff8303e362>] ? __do_softirq+0x107/0x219
 [<ffffffff830037cc>] ? call_softirq+0x1c/0x28
 [<ffffffff83004bc9>] ? do_softirq+0x31/0x63
 [<ffffffff830176f2>] ? smp_apic_timer_interrupt+0x87/0x94
 [<ffffffff83003293>] ? apic_timer_interrupt+0x13/0x20
 <EOI>
 [<ffffffff83008cb4>] ? mwait_idle+0xbd/0xc4
 [<ffffffff83001c1d>] ? cpu_idle+0x51/0xbc
Code: 5d 41 5c 41 5d 41 5e 44 89 f8 41 5f c3 55 89 f5 53 48 89 fb 48 83 ec 08 48                                                                                                   8b 57 08 48 85 d2 75 04 31 db eb 70 48 8b 07 48 85 c0 <48> 89 02 74 04 48 89 50                                                                                                   08 48 8b 43 10 48 8b 53 18 49 b9 00 01
RIP  [<ffffffff8338cacc>] __xfrm_policy_unlink+0x1e/0x8f
 RSP <ffff880001d03de0>
Comment 1 Alan 2014-02-17 12:07:26 UTC
If you have not already done so can you post this to netdev@vger.kernel.org for discussion

Thanks
Comment 2 Xianpeng Zhao 2014-02-18 02:44:44 UTC
(In reply to Alan from comment #1)
> If you have not already done so can you post this to netdev@vger.kernel.org
> for discussion
> 
> Thanks

Could you confirm the mail address? I sent to it failure.
Comment 3 Xianpeng Zhao 2014-02-18 04:51:02 UTC
Ok now, send successfully.
Comment 4 Alexey Dobriyan 2014-03-09 12:51:48 UTC
fixed in 3a9016f97fdc8bfbb26ff36ba8f3dc9162eb691b
xfrm: Fix unlink race when policies are deleted.