Bug 88111 - Race condition in net_tx_action?
Summary: Race condition in net_tx_action?
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-12 18:52 UTC by Angelo Rizzi
Modified: 2014-11-24 11:37 UTC (History)
1 user (show)

See Also:
Kernel Version: all
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Angelo Rizzi 2014-11-12 18:52:01 UTC
Hi all,

I have a question about a strange situation i've faced on my linux-based embedded system:

Using 2 network device (transmitting asynchronously), i found a kind of "leak" in sk_buff alloc/free that drives my test program, after some days of continuous transmission, to be unable to write on the xmitting socket ("poll()" function using POLLOUT request always returning 0).

After a lot of test, i've found the reason for such behaviour in the net_tx_action() function (net/core/dev.c):

Let me explain what i've found:

The following code is used in order to get the current list of sk_buff to free:

static void net_tx_action(struct softirq_action *h)
{
        struct softnet_data *sd = &__get_cpu_var(softnet_data);
 
        if (sd->completion_queue) {
                 struct sk_buff *clist;
 
                 local_irq_disable();
                 clist = sd->completion_queue;
                 sd->completion_queue = NULL;
                 local_irq_enable();
 
Transmitting asynchronously on all the network devices available i've noticed the following behaviour:
a) The instruction "if (sd->completion_queue) {" saves on a CPU register the pointer value (register contents is used for the comparison)
b) The interupt is disabled (using "local_irq_disable")
c) when the content of "clist" is updated, the register is used, instead of re-read the "completion_queue" variable.

So, when a low-level tx interrupt arrives after the latching of "completion_queue", but before "local_irq_disable", the value stored in "clist" reflect the situation before low-level tx interrupt, resulting in a sk_buff leak

I've changed the declaration of "sd" as follows:

        volatile struct softnet_data *sd = &__get_cpu_var(softnet_data);

and everything is now ok.

Is that correct?

Thanks,
Angelo
Comment 1 Daniel Borkmann 2014-11-24 11:37:10 UTC
Hi Angelo, could you send a patch along with your report for discussion to netdev@vger.kernel.org ? See Documentation/SubmittingPatches for some information on how to format your patch. Thanks!

Note You need to log in before you can comment on or make changes to this bug.