Unregister netdevice would fail after send sctp traffic in net namespace Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Commit: 2f9165cb5217 - nvme-pci: add quirks for Lexar 256GB SSD Reproducer: ip netns add N ip -n N link set lo up ip -n N link add type veth ip -n N addr add 2001:db8:ffff:21::1/64 dev veth0 ip -n N addr add 2001:db8:ffff:21::2/64 dev veth1 ip -n N link set veth0 up ip -n N link set veth1 up sleep 1 ip netns exec N sctp_test -H 2001:db8:ffff:21::2 -P 9999 -l & sleep 1 ip netns exec N timeout 5 sctp_test -H 2001:db8:ffff:21::1 -P 6013 -h 2001:db8:ffff:21::2 -p 9999 -s -c 1 -x 1 -X 1 echo $? ip netns del N Then wait seconds, check dmesg: [ 422.889040] unregister_netdevice: waiting for veth0 to become free. Usage count = 1 [ 433.039034] unregister_netdevice: waiting for veth0 to become free. Usage count = 1 [ 443.158895] unregister_netdevice: waiting for veth0 to become free. Usage count = 1 ...
This is actually an SCTP bug: the pernet ctrlsock could hold the dst_entry and release it until sctp_ctrlsock_ops->exit() is called, which is too late, as default_device_ops->exit() called before it will be hanging there to waiting for its releasing. The fix should let the ctrlsock not hold the dst_entry in sctp_packet_transmit(): diff --git a/net/sctp/output.c b/net/sctp/output.c index 6614c9fdc51e..a6aa17df09ef 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -584,13 +584,6 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp) goto out; } - rcu_read_lock(); - if (__sk_dst_get(sk) != tp->dst) { - dst_hold(tp->dst); - sk_setup_caps(sk, tp->dst); - } - rcu_read_unlock(); - /* pack up chunks */ pkt_count = sctp_packet_pack(packet, head, gso, gfp); if (!pkt_count) { diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 3fd06a27105d..5cb1aa5f067b 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -1135,6 +1135,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx, static void sctp_outq_flush_transports(struct sctp_flush_ctx *ctx) { + struct sock *sk = ctx->asoc->base.sk; struct list_head *ltransport; struct sctp_packet *packet; struct sctp_transport *t; @@ -1144,6 +1145,12 @@ static void sctp_outq_flush_transports(struct sctp_flush_ctx *ctx) t = list_entry(ltransport, struct sctp_transport, send_ready); packet = &t->packet; if (!sctp_packet_empty(packet)) { + rcu_read_lock(); + if (t->dst && __sk_dst_get(sk) != t->dst) { + dst_hold(t->dst); + sk_setup_caps(sk, t->dst); + } + rcu_read_unlock(); error = sctp_packet_transmit(packet, ctx->gfp); if (error < 0) ctx->q->asoc->base.sk->sk_err = -error; By moving sk_setup_caps() out of sctp_packet_transmit(), it will also save some rounds when sending packets only in one transport at the same time. I will post it upstream soon. Thanks.
posted: https://lore.kernel.org/netdev/9db6df3e544dd6ec6e4ec5091b0a750ac08d6e1b.1616125961.git.lucien.xin@gmail.com/
Glad to see it resolved so quickly