Latest working kernel version:2.6.16.1 Earliest failing kernel version:2.6.16.1 Distribution:FC3 Hardware Environment: Software Environment: Problem Description: We have seen a deadloop in softirq and we got the cause by creating and analysising the core dump of the kernel.I dont know where to submit a patch , so I report it here,wish you can apply it to furthur version of linux kernel. The Cause: Both ctnetlink_del_conntrack() and the tcp_packet() run the code below: if (del_timer(&ct->timeout)) /*deactive the timer*/ ct->timeout.function((unsigned long) ct);/*remove conntrack from conntrack table*/ the ctnetlink_del_conntrack() context is: ................ if (cda[CTA_ID-1]) { u_int32_t id = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_ID-1])); if (ct->id != id) { ip_conntrack_put(ct); return -ENOENT; } } * if (del_timer(&ct->timeout)) * ct->timeout.function((unsigned long)ct); ip_conntrack_put(ct); ........... the tcp_packet() context is: ........... case TCP_CONNTRACK_SYN_SENT: if (old_state < TCP_CONNTRACK_TIME_WAIT) break; if ((conntrack->proto.tcp.seen[dir].flags & IP_CT_TCP_FLAG_CLOSE_INIT) || after(ntohl(th->seq), conntrack->proto.tcp.seen[dir].td_end)) { /* Attempt to reopen a closed connection. * Delete this connection and look up again. */ write_unlock_bh(&tcp_lock); * if (del_timer(&conntrack->timeout)) * conntrack->timeout.function((unsigned long) * conntrack); return -NF_REPEAT; ...... How the DEADLOOP happened? (1)in ctnetlink_del_conntrack()(runs in system call context): the del_timer is called and then goes to timeout.function. (2)before timeout.function finish excution(means the conntrack not removed),an interrupt happens and a SYN packet of the same conntrack comes.CPU goes to irq handle and enventually runs tcp_packet(). (3)in tcp_packet() ,del_timer() will fail because the timer was already deleted. the timeout.function in tcp_packet will not run, -NF_REPEAT is returned, the SYN packet will be passed back again. (4)Neither side has the chance to run timeout.function,the conntrack remains there,deadloop happen,the SYN packet will be passed back again and again. The fix maybe,add lock the softirq when doing conntrack removing: +++ local_bh_disable(); if (del_timer(&ct->timeout)) /*deactive the timer*/ ct->timeout.function((unsigned long) ct);/*remove conntrack from conntrack table*/ +++ local_bh_enable(); Thanks, may this be helpful. My email: hemao77@gmail.com It is hard to reproduce , but it really happen on our linux box.
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 8 Jul 2008 20:13:20 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11058 > > Summary: DEADLOOP in kernel network module > Product: Networking > Version: 2.5 > KernelVersion: 2.6.16.1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Netfilter/Iptables > AssignedTo: networking_netfilter-iptables@kernel-bugs.osdl.org > ReportedBy: hemao77@gmail.com > > > Latest working kernel version:2.6.16.1 > Earliest failing kernel version:2.6.16.1 > Distribution:FC3 > Hardware Environment: > Software Environment: > > Problem Description: > > We have seen a deadloop in softirq and we got the cause by creating and > analysising the core dump of the kernel.I dont know where to submit a patch , > so I report it here,wish you can apply it to furthur version of linux kernel. > > The Cause: > > Both ctnetlink_del_conntrack() and the tcp_packet() run > the code below: > > if (del_timer(&ct->timeout)) /*deactive the timer*/ > ct->timeout.function((unsigned long) ct);/*remove conntrack from > conntrack table*/ > > the ctnetlink_del_conntrack() context is: > ................ > if (cda[CTA_ID-1]) { > u_int32_t id = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_ID-1])); > if (ct->id != id) { > ip_conntrack_put(ct); > return -ENOENT; > } > } > * if (del_timer(&ct->timeout)) > * ct->timeout.function((unsigned long)ct); > > ip_conntrack_put(ct); > ........... > > the tcp_packet() context is: > ........... > case TCP_CONNTRACK_SYN_SENT: > if (old_state < TCP_CONNTRACK_TIME_WAIT) > break; > if ((conntrack->proto.tcp.seen[dir].flags & > IP_CT_TCP_FLAG_CLOSE_INIT) > || after(ntohl(th->seq), > conntrack->proto.tcp.seen[dir].td_end)) { > /* Attempt to reopen a closed connection. > * Delete this connection and look up again. */ > write_unlock_bh(&tcp_lock); > * if (del_timer(&conntrack->timeout)) > * conntrack->timeout.function((unsigned long) > * conntrack); > return -NF_REPEAT; > ...... > > How the DEADLOOP happened? > > (1)in ctnetlink_del_conntrack()(runs in system call context): the > del_timer > is called and then goes to timeout.function. > (2)before timeout.function finish excution(means the conntrack not > removed),an interrupt happens and a SYN packet of the same conntrack > comes.CPU goes to irq handle and enventually runs tcp_packet(). > (3)in tcp_packet() ,del_timer() will fail because the timer was > already deleted. the timeout.function in tcp_packet will not run, > -NF_REPEAT is returned, the SYN packet will be passed back again. > (4)Neither side has the chance to run timeout.function,the > conntrack remains there,deadloop happen,the SYN packet will be passed back > again and again. > > The fix maybe,add lock the softirq when doing conntrack removing: > +++ local_bh_disable(); > if (del_timer(&ct->timeout)) /*deactive the timer*/ > ct->timeout.function((unsigned long) ct);/*remove conntrack from > conntrack table*/ > +++ local_bh_enable(); > > Thanks, may this be helpful. > My email: hemao77@gmail.com > > It is hard to reproduce , but it really happen on our linux box. > Thanks. Please submit patches via email as described in Documentation/SubmittingPatches. The file ./MAINTAINERS can be used to determine which individuals and mailing lists the patch should be sent to. But that's for next time - this patch is small enough for the netfilter developers to be able to type in again ;)
Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 8 Jul 2008 20:13:20 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=11058 >> >> Summary: DEADLOOP in kernel network module >> ... >> Problem Description: >> >> We have seen a deadloop in softirq and we got the cause by creating and >> analysising the core dump of the kernel.I dont know where to submit a patch >> , >> so I report it here,wish you can apply it to furthur version of linux >> kernel. >> >> The Cause: >> >> Both ctnetlink_del_conntrack() and the tcp_packet() run >> the code below: >> >> if (del_timer(&ct->timeout)) /*deactive the timer*/ >> ct->timeout.function((unsigned long) ct);/*remove conntrack from >> conntrack table*/ >> >> the ctnetlink_del_conntrack() context is: >> ................ >> if (cda[CTA_ID-1]) { >> u_int32_t id = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_ID-1])); >> if (ct->id != id) { >> ip_conntrack_put(ct); >> return -ENOENT; >> } >> } >> * if (del_timer(&ct->timeout)) >> * ct->timeout.function((unsigned long)ct); >> >> ip_conntrack_put(ct); >> ........... >> >> the tcp_packet() context is: >> ........... >> case TCP_CONNTRACK_SYN_SENT: >> if (old_state < TCP_CONNTRACK_TIME_WAIT) >> break; >> if ((conntrack->proto.tcp.seen[dir].flags & >> IP_CT_TCP_FLAG_CLOSE_INIT) >> || after(ntohl(th->seq), >> conntrack->proto.tcp.seen[dir].td_end)) { >> /* Attempt to reopen a closed connection. >> * Delete this connection and look up again. */ >> write_unlock_bh(&tcp_lock); >> * if (del_timer(&conntrack->timeout)) >> * conntrack->timeout.function((unsigned long) >> * conntrack); >> return -NF_REPEAT; >> ...... >> >> How the DEADLOOP happened? >> >> (1)in ctnetlink_del_conntrack()(runs in system call context): the >> del_timer >> is called and then goes to timeout.function. >> (2)before timeout.function finish excution(means the conntrack not >> removed),an interrupt happens and a SYN packet of the same conntrack >> comes.CPU goes to irq handle and enventually runs tcp_packet(). >> (3)in tcp_packet() ,del_timer() will fail because the timer was >> already deleted. the timeout.function in tcp_packet will not run, >> -NF_REPEAT is returned, the SYN packet will be passed back again. >> (4)Neither side has the chance to run timeout.function,the >> conntrack remains there,deadloop happen,the SYN packet will be passed back >> again and again. >> >> The fix maybe,add lock the softirq when doing conntrack removing: >> +++ local_bh_disable(); >> if (del_timer(&ct->timeout)) /*deactive the timer*/ >> ct->timeout.function((unsigned long) ct);/*remove conntrack from >> conntrack table*/ >> +++ local_bh_enable(); >> >> Thanks, may this be helpful. >> My email: hemao77@gmail.com >> >> It is hard to reproduce , but it really happen on our linux box. >> > > Thanks. > > Please submit patches via email as described in > Documentation/SubmittingPatches. The file ./MAINTAINERS can be used to > determine which individuals and mailing lists the patch should be sent to. > > But that's for next time - this patch is small enough for the netfilter > developers to be able to type in again ;) Good catch, thanks. Basically all del_timer()/timeout.function calls in conntrack can happen in process context, so we'd have to disable BHs every time we do this. I think this fix should also work. The only spot where we return NF_REPEAT is in TCP conntrack, so we can simply make sure we only do this if we actually managed to kill the connection. Jozsef, what do you think? [patch for review against net-next, I'll backport it later unless there are objections] diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index d5d76ec..15ce3fb 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -223,23 +223,23 @@ static inline void nf_ct_refresh(struct nf_conn *ct, __nf_ct_refresh_acct(ct, 0, skb, extra_jiffies, 0); } -extern void __nf_ct_kill_acct(struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - const struct sk_buff *skb, - int do_acct); +extern int __nf_ct_kill_acct(struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + const struct sk_buff *skb, + int do_acct); /* kill conntrack and do accounting */ -static inline void nf_ct_kill_acct(struct nf_conn *ct, +static inline int nf_ct_kill_acct(struct nf_conn *ct, enum ip_conntrack_info ctinfo, const struct sk_buff *skb) { - __nf_ct_kill_acct(ct, ctinfo, skb, 1); + return __nf_ct_kill_acct(ct, ctinfo, skb, 1); } /* kill conntrack without accounting */ -static inline void nf_ct_kill(struct nf_conn *ct) +static inline int nf_ct_kill(struct nf_conn *ct) { - __nf_ct_kill_acct(ct, 0, NULL, 0); + return __nf_ct_kill_acct(ct, 0, NULL, 0); } /* These are for NAT. Icky. */ diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 212a088..aad9585 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -848,10 +848,10 @@ acct: } EXPORT_SYMBOL_GPL(__nf_ct_refresh_acct); -void __nf_ct_kill_acct(struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - const struct sk_buff *skb, - int do_acct) +int __nf_ct_kill_acct(struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + const struct sk_buff *skb, + int do_acct) { #ifdef CONFIG_NF_CT_ACCT if (do_acct) { @@ -862,8 +862,11 @@ void __nf_ct_kill_acct(struct nf_conn *ct, spin_unlock_bh(&nf_conntrack_lock); } #endif - if (del_timer(&ct->timeout)) + if (del_timer(&ct->timeout)) { ct->timeout.function((unsigned long)ct); + return 1; + } + return 0; } EXPORT_SYMBOL_GPL(__nf_ct_kill_acct); diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 740acd6..41032d2 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -844,7 +844,11 @@ static int tcp_packet(struct nf_conn *ct, /* Attempt to reopen a closed/aborted connection. * Delete this connection and look up again. */ write_unlock_bh(&tcp_lock); - nf_ct_kill(ct); + /* The connection might be killed in parallel in process + * context. Don't repeat lookup so it gets a chance to + * complete. */ + if (!nf_ct_kill(ct)) + return -NF_DROP; return -NF_REPEAT; } /* Fall through */
On Wed, 9 Jul 2008, Patrick McHardy wrote: > Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Tue, 8 Jul 2008 20:13:20 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > > wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=11058 > > > > > > Summary: DEADLOOP in kernel network module [...] > > > How the DEADLOOP happened? > > > > > > (1)in ctnetlink_del_conntrack()(runs in system call context): the > > > del_timer > > > is called and then goes to timeout.function. > > > (2)before timeout.function finish excution(means the conntrack not > > > removed),an interrupt happens and a SYN packet of the same conntrack > > > comes.CPU goes to irq handle and enventually runs tcp_packet(). > > > (3)in tcp_packet() ,del_timer() will fail because the timer was > > > already deleted. the timeout.function in tcp_packet will not run, > > > -NF_REPEAT is returned, the SYN packet will be passed back again. > > > (4)Neither side has the chance to run timeout.function,the > > > conntrack remains there,deadloop happen,the SYN packet will be passed > back > > > again and again. > > > > > > The fix maybe,add lock the softirq when doing conntrack removing: > > > +++ local_bh_disable(); > > > if (del_timer(&ct->timeout)) /*deactive the timer*/ > > > ct->timeout.function((unsigned long) ct);/*remove conntrack from > > > conntrack table*/ > > > +++ local_bh_enable(); > > > > > > Thanks, may this be helpful. > > > My email: hemao77@gmail.com > > > > > > It is hard to reproduce , but it really happen on our linux box. > > > > > > > Thanks. > > > > Please submit patches via email as described in > > Documentation/SubmittingPatches. The file ./MAINTAINERS can be used to > > determine which individuals and mailing lists the patch should be sent to. > > > > But that's for next time - this patch is small enough for the netfilter > > developers to be able to type in again ;) > > Good catch, thanks. Basically all del_timer()/timeout.function calls > in conntrack can happen in process context, so we'd have to disable > BHs every time we do this. I think this fix should also work. The > only spot where we return NF_REPEAT is in TCP conntrack, so we can > simply make sure we only do this if we actually managed to kill the > connection. > > Jozsef, what do you think? I agree with you completely - and nice catch, indeed! Your proposed patch looks just fine. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary
Jozsef Kadlecsik wrote: > On Wed, 9 Jul 2008, Patrick McHardy wrote: > >> Good catch, thanks. Basically all del_timer()/timeout.function calls >> in conntrack can happen in process context, so we'd have to disable >> BHs every time we do this. I think this fix should also work. The >> only spot where we return NF_REPEAT is in TCP conntrack, so we can >> simply make sure we only do this if we actually managed to kill the >> connection. >> >> Jozsef, what do you think? > > I agree with you completely - and nice catch, indeed! Your proposed patch > looks just fine. Thanks, I'll send a backport for 2.6.26 to Dave tonight.
Patrick McHardy wrote: > Jozsef Kadlecsik wrote: >> On Wed, 9 Jul 2008, Patrick McHardy wrote: >> >>> Good catch, thanks. Basically all del_timer()/timeout.function calls >>> in conntrack can happen in process context, so we'd have to disable >>> BHs every time we do this. I think this fix should also work. The >>> only spot where we return NF_REPEAT is in TCP conntrack, so we can >>> simply make sure we only do this if we actually managed to kill the >>> connection. >>> >>> Jozsef, what do you think? >> >> I agree with you completely - and nice catch, indeed! Your proposed >> patch looks just fine. > > Thanks, I'll send a backport for 2.6.26 to Dave tonight. OK, this is the patch I'll send upstream. commit baa04a1fb3dbef550ed1dc5acd15e21e7dde3b85 Author: Patrick McHardy <kaber@trash.net> Date: Wed Jul 9 18:32:29 2008 +0200 netfilter: nf_conntrack_tcp: fix endless loop When a conntrack entry is destroyed in process context and destruction is interrupted by packet processing and the packet is an attempt to reopen a closed connection, TCP conntrack tries to kill the old entry itself and returns NF_REPEAT to pass the packet through the hook again. This may lead to an endless loop: TCP conntrack repeatedly finds the old entry, but can not kill it itself since destruction is already in progress, but destruction in process context can not complete since TCP conntrack is keeping the CPU busy. Drop the packet in TCP conntrack if we can't kill the connection ourselves to avoid this. Reported by: hemao77@gmail.com [ Kernel bugzilla #11058 ] Signed-off-by: Patrick McHardy <kaber@trash.net> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 271cd01..dd28fb2 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -844,9 +844,15 @@ static int tcp_packet(struct nf_conn *ct, /* Attempt to reopen a closed/aborted connection. * Delete this connection and look up again. */ write_unlock_bh(&tcp_lock); - if (del_timer(&ct->timeout)) + /* Only repeat if we can actually remove the timer. + * Destruction may already be in progress in process + * context and we must give it a chance to terminate. + */ + if (del_timer(&ct->timeout)) { ct->timeout.function((unsigned long)ct); - return -NF_REPEAT; + return -NF_REPEAT; + } + return -NF_DROP; } /* Fall through */ case TCP_CONNTRACK_IGNORE:
fixed by commit 6b69fe0c73c0f5a8dacf8f889db3cc9adee53649