Bug 11058 - DEADLOOP in kernel network module
Summary: DEADLOOP in kernel network module
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Netfilter/Iptables (show other bugs)
Hardware: All Linux
: P1 high
Assignee: networking_netfilter-iptables@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-08 20:13 UTC by hegang
Modified: 2008-07-10 23:33 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.16.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description hegang 2008-07-08 20:13:20 UTC
Latest working kernel version:2.6.16.1
Earliest failing kernel version:2.6.16.1
Distribution:FC3
Hardware Environment:
Software Environment:

Problem Description:

We have seen a deadloop in softirq and we got the cause by creating and analysising the core dump of the kernel.I dont know where to submit a patch , so I report it here,wish you can apply it to furthur version of linux kernel.

The Cause:

Both  ctnetlink_del_conntrack() and the tcp_packet() run
the code below:

    if (del_timer(&ct->timeout))                /*deactive the timer*/
       ct->timeout.function((unsigned long) ct);/*remove conntrack from
conntrack table*/

the ctnetlink_del_conntrack() context is:
................
	if (cda[CTA_ID-1]) {
		u_int32_t id = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_ID-1]));
		if (ct->id != id) {
			ip_conntrack_put(ct);
			return -ENOENT;
		}
	}	
*	if (del_timer(&ct->timeout))
*		ct->timeout.function((unsigned long)ct);

	ip_conntrack_put(ct);
...........

the tcp_packet() context is:
...........
	case TCP_CONNTRACK_SYN_SENT:
		if (old_state < TCP_CONNTRACK_TIME_WAIT)
			break;
		if ((conntrack->proto.tcp.seen[dir].flags &
		         IP_CT_TCP_FLAG_CLOSE_INIT)
		    || after(ntohl(th->seq),
		    	     conntrack->proto.tcp.seen[dir].td_end)) {	
		    	/* Attempt to reopen a closed connection.
		    	* Delete this connection and look up again. */
		    	write_unlock_bh(&tcp_lock);
*		    	if (del_timer(&conntrack->timeout))
*		    		conntrack->timeout.function((unsigned long)
*		    					    conntrack);
		    	return -NF_REPEAT;
......

How the DEADLOOP happened?
 
    (1)in ctnetlink_del_conntrack()(runs in system call context): the del_timer is called and then goes to timeout.function.
    (2)before timeout.function finish excution(means the conntrack not
removed),an interrupt happens and a SYN packet of  the same conntrack
comes.CPU goes to irq handle and enventually runs tcp_packet().
    (3)in tcp_packet() ,del_timer() will fail because the timer was
already deleted. the timeout.function in tcp_packet will not run,
-NF_REPEAT is returned, the SYN packet will be passed back again.
    (4)Neither side has the chance to run timeout.function,the
conntrack remains there,deadloop happen,the SYN packet will be passed back again and again.

The fix maybe,add lock the softirq when doing conntrack removing:
+++    local_bh_disable();
      if (del_timer(&ct->timeout))                /*deactive the timer*/
         ct->timeout.function((unsigned long) ct);/*remove conntrack from
conntrack table*/
+++    local_bh_enable();

Thanks, may this be helpful.
My email: hemao77@gmail.com

It is hard to reproduce , but it really happen on our linux box.
Comment 1 Anonymous Emailer 2008-07-08 21:05:32 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue,  8 Jul 2008 20:13:20 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11058
> 
>            Summary: DEADLOOP in kernel network module
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.16.1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Netfilter/Iptables
>         AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org
>         ReportedBy: hemao77@gmail.com
> 
> 
> Latest working kernel version:2.6.16.1
> Earliest failing kernel version:2.6.16.1
> Distribution:FC3
> Hardware Environment:
> Software Environment:
> 
> Problem Description:
> 
> We have seen a deadloop in softirq and we got the cause by creating and
> analysising the core dump of the kernel.I dont know where to submit a patch ,
> so I report it here,wish you can apply it to furthur version of linux kernel.
> 
> The Cause:
> 
> Both  ctnetlink_del_conntrack() and the tcp_packet() run
> the code below:
> 
>     if (del_timer(&ct->timeout))                /*deactive the timer*/
>        ct->timeout.function((unsigned long) ct);/*remove conntrack from
> conntrack table*/
> 
> the ctnetlink_del_conntrack() context is:
> ................
>         if (cda[CTA_ID-1]) {
>                 u_int32_t id = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_ID-1]));
>                 if (ct->id != id) {
>                         ip_conntrack_put(ct);
>                         return -ENOENT;
>                 }
>         }       
> *       if (del_timer(&ct->timeout))
> *               ct->timeout.function((unsigned long)ct);
> 
>         ip_conntrack_put(ct);
> ...........
> 
> the tcp_packet() context is:
> ...........
>         case TCP_CONNTRACK_SYN_SENT:
>                 if (old_state < TCP_CONNTRACK_TIME_WAIT)
>                         break;
>                 if ((conntrack->proto.tcp.seen[dir].flags &
>                          IP_CT_TCP_FLAG_CLOSE_INIT)
>                     || after(ntohl(th->seq),
>                              conntrack->proto.tcp.seen[dir].td_end)) {  
>                         /* Attempt to reopen a closed connection.
>                         * Delete this connection and look up again. */
>                         write_unlock_bh(&tcp_lock);
> *                       if (del_timer(&conntrack->timeout))
> *                               conntrack->timeout.function((unsigned long)
> *                                                           conntrack);
>                         return -NF_REPEAT;
> ......
> 
> How the DEADLOOP happened?
> 
>     (1)in ctnetlink_del_conntrack()(runs in system call context): the
>     del_timer
> is called and then goes to timeout.function.
>     (2)before timeout.function finish excution(means the conntrack not
> removed),an interrupt happens and a SYN packet of  the same conntrack
> comes.CPU goes to irq handle and enventually runs tcp_packet().
>     (3)in tcp_packet() ,del_timer() will fail because the timer was
> already deleted. the timeout.function in tcp_packet will not run,
> -NF_REPEAT is returned, the SYN packet will be passed back again.
>     (4)Neither side has the chance to run timeout.function,the
> conntrack remains there,deadloop happen,the SYN packet will be passed back
> again and again.
> 
> The fix maybe,add lock the softirq when doing conntrack removing:
> +++    local_bh_disable();
>       if (del_timer(&ct->timeout))                /*deactive the timer*/
>          ct->timeout.function((unsigned long) ct);/*remove conntrack from
> conntrack table*/
> +++    local_bh_enable();
> 
> Thanks, may this be helpful.
> My email: hemao77@gmail.com
> 
> It is hard to reproduce , but it really happen on our linux box.
> 

Thanks.

Please submit patches via email as described in
Documentation/SubmittingPatches.  The file ./MAINTAINERS can be used to
determine which individuals and mailing lists the patch should be sent to.

But that's for next time - this patch is small enough for the netfilter
developers to be able to type in again ;)
Comment 2 Patrick McHardy 2008-07-09 05:45:55 UTC
Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue,  8 Jul 2008 20:13:20 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=11058
>>
>>            Summary: DEADLOOP in kernel network module
>> ...
>> Problem Description:
>>
>> We have seen a deadloop in softirq and we got the cause by creating and
>> analysising the core dump of the kernel.I dont know where to submit a patch
>> ,
>> so I report it here,wish you can apply it to furthur version of linux
>> kernel.
>>
>> The Cause:
>>
>> Both  ctnetlink_del_conntrack() and the tcp_packet() run
>> the code below:
>>
>>     if (del_timer(&ct->timeout))                /*deactive the timer*/
>>        ct->timeout.function((unsigned long) ct);/*remove conntrack from
>> conntrack table*/
>>
>> the ctnetlink_del_conntrack() context is:
>> ................
>>         if (cda[CTA_ID-1]) {
>>                 u_int32_t id = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_ID-1]));
>>                 if (ct->id != id) {
>>                         ip_conntrack_put(ct);
>>                         return -ENOENT;
>>                 }
>>         }       
>> *       if (del_timer(&ct->timeout))
>> *               ct->timeout.function((unsigned long)ct);
>>
>>         ip_conntrack_put(ct);
>> ...........
>>
>> the tcp_packet() context is:
>> ...........
>>         case TCP_CONNTRACK_SYN_SENT:
>>                 if (old_state < TCP_CONNTRACK_TIME_WAIT)
>>                         break;
>>                 if ((conntrack->proto.tcp.seen[dir].flags &
>>                          IP_CT_TCP_FLAG_CLOSE_INIT)
>>                     || after(ntohl(th->seq),
>>                              conntrack->proto.tcp.seen[dir].td_end)) {  
>>                         /* Attempt to reopen a closed connection.
>>                         * Delete this connection and look up again. */
>>                         write_unlock_bh(&tcp_lock);
>> *                       if (del_timer(&conntrack->timeout))
>> *                               conntrack->timeout.function((unsigned long)
>> *                                                           conntrack);
>>                         return -NF_REPEAT;
>> ......
>>
>> How the DEADLOOP happened?
>>
>>     (1)in ctnetlink_del_conntrack()(runs in system call context): the
>>     del_timer
>> is called and then goes to timeout.function.
>>     (2)before timeout.function finish excution(means the conntrack not
>> removed),an interrupt happens and a SYN packet of  the same conntrack
>> comes.CPU goes to irq handle and enventually runs tcp_packet().
>>     (3)in tcp_packet() ,del_timer() will fail because the timer was
>> already deleted. the timeout.function in tcp_packet will not run,
>> -NF_REPEAT is returned, the SYN packet will be passed back again.
>>     (4)Neither side has the chance to run timeout.function,the
>> conntrack remains there,deadloop happen,the SYN packet will be passed back
>> again and again.
>>
>> The fix maybe,add lock the softirq when doing conntrack removing:
>> +++    local_bh_disable();
>>       if (del_timer(&ct->timeout))                /*deactive the timer*/
>>          ct->timeout.function((unsigned long) ct);/*remove conntrack from
>> conntrack table*/
>> +++    local_bh_enable();
>>
>> Thanks, may this be helpful.
>> My email: hemao77@gmail.com
>>
>> It is hard to reproduce , but it really happen on our linux box.
>>
> 
> Thanks.
> 
> Please submit patches via email as described in
> Documentation/SubmittingPatches.  The file ./MAINTAINERS can be used to
> determine which individuals and mailing lists the patch should be sent to.
> 
> But that's for next time - this patch is small enough for the netfilter
> developers to be able to type in again ;)

Good catch, thanks. Basically all del_timer()/timeout.function calls
in conntrack can happen in process context, so we'd have to disable
BHs every time we do this. I think this fix should also work. The
only spot where we return NF_REPEAT is in TCP conntrack, so we can
simply make sure we only do this if we actually managed to kill the
connection.

Jozsef, what do you think?

[patch for review against net-next, I'll backport it later unless
there are objections]

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index d5d76ec..15ce3fb 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -223,23 +223,23 @@ static inline void nf_ct_refresh(struct nf_conn *ct,
 	__nf_ct_refresh_acct(ct, 0, skb, extra_jiffies, 0);
 }
 
-extern void __nf_ct_kill_acct(struct nf_conn *ct,
-				enum ip_conntrack_info ctinfo,
-				const struct sk_buff *skb,
-				int do_acct);
+extern int __nf_ct_kill_acct(struct nf_conn *ct,
+			     enum ip_conntrack_info ctinfo,
+			     const struct sk_buff *skb,
+			     int do_acct);
 
 /* kill conntrack and do accounting */
-static inline void nf_ct_kill_acct(struct nf_conn *ct,
+static inline int nf_ct_kill_acct(struct nf_conn *ct,
 				enum ip_conntrack_info ctinfo,
 				const struct sk_buff *skb)
 {
-	__nf_ct_kill_acct(ct, ctinfo, skb, 1);
+	return __nf_ct_kill_acct(ct, ctinfo, skb, 1);
 }
 
 /* kill conntrack without accounting */
-static inline void nf_ct_kill(struct nf_conn *ct)
+static inline int nf_ct_kill(struct nf_conn *ct)
 {
-	__nf_ct_kill_acct(ct, 0, NULL, 0);
+	return __nf_ct_kill_acct(ct, 0, NULL, 0);
 }
 
 /* These are for NAT.  Icky. */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 212a088..aad9585 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -848,10 +848,10 @@ acct:
 }
 EXPORT_SYMBOL_GPL(__nf_ct_refresh_acct);
 
-void __nf_ct_kill_acct(struct nf_conn *ct,
-		enum ip_conntrack_info ctinfo,
-		const struct sk_buff *skb,
-		int do_acct)
+int __nf_ct_kill_acct(struct nf_conn *ct,
+		      enum ip_conntrack_info ctinfo,
+		      const struct sk_buff *skb,
+		      int do_acct)
 {
 #ifdef CONFIG_NF_CT_ACCT
 	if (do_acct) {
@@ -862,8 +862,11 @@ void __nf_ct_kill_acct(struct nf_conn *ct,
 		spin_unlock_bh(&nf_conntrack_lock);
 	}
 #endif
-	if (del_timer(&ct->timeout))
+	if (del_timer(&ct->timeout)) {
 		ct->timeout.function((unsigned long)ct);
+		return 1;
+	}
+	return 0;
 }
 EXPORT_SYMBOL_GPL(__nf_ct_kill_acct);
 
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 740acd6..41032d2 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -844,7 +844,11 @@ static int tcp_packet(struct nf_conn *ct,
 			/* Attempt to reopen a closed/aborted connection.
 			 * Delete this connection and look up again. */
 			write_unlock_bh(&tcp_lock);
-			nf_ct_kill(ct);
+			/* The connection might be killed in parallel in process
+			 * context. Don't repeat lookup so it gets a chance to
+			 * complete. */
+			if (!nf_ct_kill(ct))
+				return -NF_DROP;
 			return -NF_REPEAT;
 		}
 		/* Fall through */
Comment 3 Jozsef Kadlecsik 2008-07-09 06:27:12 UTC
On Wed, 9 Jul 2008, Patrick McHardy wrote:

> Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Tue,  8 Jul 2008 20:13:20 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> > wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=11058
> > > 
> > >            Summary: DEADLOOP in kernel network module
[...]
> > > How the DEADLOOP happened?
> > > 
> > >     (1)in ctnetlink_del_conntrack()(runs in system call context): the
> > > del_timer
> > > is called and then goes to timeout.function.
> > >     (2)before timeout.function finish excution(means the conntrack not
> > > removed),an interrupt happens and a SYN packet of  the same conntrack
> > > comes.CPU goes to irq handle and enventually runs tcp_packet().
> > >     (3)in tcp_packet() ,del_timer() will fail because the timer was
> > > already deleted. the timeout.function in tcp_packet will not run,
> > > -NF_REPEAT is returned, the SYN packet will be passed back again.
> > >     (4)Neither side has the chance to run timeout.function,the
> > > conntrack remains there,deadloop happen,the SYN packet will be passed
> back
> > > again and again.
> > > 
> > > The fix maybe,add lock the softirq when doing conntrack removing:
> > > +++    local_bh_disable();
> > >       if (del_timer(&ct->timeout))                /*deactive the timer*/
> > >          ct->timeout.function((unsigned long) ct);/*remove conntrack from
> > > conntrack table*/
> > > +++    local_bh_enable();
> > > 
> > > Thanks, may this be helpful.
> > > My email: hemao77@gmail.com
> > > 
> > > It is hard to reproduce , but it really happen on our linux box.
> > > 
> > 
> > Thanks.
> > 
> > Please submit patches via email as described in
> > Documentation/SubmittingPatches.  The file ./MAINTAINERS can be used to
> > determine which individuals and mailing lists the patch should be sent to.
> > 
> > But that's for next time - this patch is small enough for the netfilter
> > developers to be able to type in again ;)
> 
> Good catch, thanks. Basically all del_timer()/timeout.function calls
> in conntrack can happen in process context, so we'd have to disable
> BHs every time we do this. I think this fix should also work. The
> only spot where we return NF_REPEAT is in TCP conntrack, so we can
> simply make sure we only do this if we actually managed to kill the
> connection.
> 
> Jozsef, what do you think?

I agree with you completely - and nice catch, indeed! Your proposed patch 
looks just fine.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary
Comment 4 Patrick McHardy 2008-07-09 08:01:12 UTC
Jozsef Kadlecsik wrote:
> On Wed, 9 Jul 2008, Patrick McHardy wrote:
> 
>> Good catch, thanks. Basically all del_timer()/timeout.function calls
>> in conntrack can happen in process context, so we'd have to disable
>> BHs every time we do this. I think this fix should also work. The
>> only spot where we return NF_REPEAT is in TCP conntrack, so we can
>> simply make sure we only do this if we actually managed to kill the
>> connection.
>>
>> Jozsef, what do you think?
> 
> I agree with you completely - and nice catch, indeed! Your proposed patch 
> looks just fine.

Thanks, I'll send a backport for 2.6.26 to Dave tonight.
Comment 5 Patrick McHardy 2008-07-09 09:33:34 UTC
Patrick McHardy wrote:
> Jozsef Kadlecsik wrote:
>> On Wed, 9 Jul 2008, Patrick McHardy wrote:
>>
>>> Good catch, thanks. Basically all del_timer()/timeout.function calls
>>> in conntrack can happen in process context, so we'd have to disable
>>> BHs every time we do this. I think this fix should also work. The
>>> only spot where we return NF_REPEAT is in TCP conntrack, so we can
>>> simply make sure we only do this if we actually managed to kill the
>>> connection.
>>>
>>> Jozsef, what do you think?
>>
>> I agree with you completely - and nice catch, indeed! Your proposed 
>> patch looks just fine.
> 
> Thanks, I'll send a backport for 2.6.26 to Dave tonight.

OK, this is the patch I'll send upstream.

commit baa04a1fb3dbef550ed1dc5acd15e21e7dde3b85
Author: Patrick McHardy <kaber@trash.net>
Date:   Wed Jul 9 18:32:29 2008 +0200

    netfilter: nf_conntrack_tcp: fix endless loop
    
    When a conntrack entry is destroyed in process context and destruction
    is interrupted by packet processing and the packet is an attempt to
    reopen a closed connection, TCP conntrack tries to kill the old entry
    itself and returns NF_REPEAT to pass the packet through the hook
    again. This may lead to an endless loop: TCP conntrack repeatedly
    finds the old entry, but can not kill it itself since destruction
    is already in progress, but destruction in process context can not
    complete since TCP conntrack is keeping the CPU busy.
    
    Drop the packet in TCP conntrack if we can't kill the connection
    ourselves to avoid this.
    
    Reported by: hemao77@gmail.com [ Kernel bugzilla #11058 ]
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 271cd01..dd28fb2 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -844,9 +844,15 @@ static int tcp_packet(struct nf_conn *ct,
 			/* Attempt to reopen a closed/aborted connection.
 			 * Delete this connection and look up again. */
 			write_unlock_bh(&tcp_lock);
-			if (del_timer(&ct->timeout))
+			/* Only repeat if we can actually remove the timer.
+			 * Destruction may already be in progress in process
+			 * context and we must give it a chance to terminate.
+			 */
+			if (del_timer(&ct->timeout)) {
 				ct->timeout.function((unsigned long)ct);
-			return -NF_REPEAT;
+				return -NF_REPEAT;
+			}
+			return -NF_DROP;
 		}
 		/* Fall through */
 	case TCP_CONNTRACK_IGNORE:
Comment 6 Adrian Bunk 2008-07-10 23:33:16 UTC
fixed by commit 6b69fe0c73c0f5a8dacf8f889db3cc9adee53649

Note You need to log in before you can comment on or make changes to this bug.