Bug 74851

Summary: Using kernel 3.14.x causes NAT Clients having very slow upload (< 5kb/s) or timeouts
Product: Networking Reporter: Conrad Kostecki (ck+kernelbugzilla)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, andjjh, bugzilla.kernel.org, bvanassche
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.14.1 Subsystem:
Regression: No Bisected commit-id:

Description Conrad Kostecki 2014-04-25 18:28:55 UTC
Hi!
I've recently upgraded on my router-gentoo box my kernel from 3.13.x to 3.14.1. The kernel boots fine, but it causes, that every client behind NAT cannot upload anymore fast. The upload starts very fast (~ 400kb/s) for about 0.5 sec and drops immediately to about 2-5kb/s. There are also sometime timeouts after a few more seconds. Only the upload seems to be affected, while downloads is always full speed and works fine. Uploading directly on the gentoo-router box works always fine with the new kernel!

I've tried gentoo-sources and vanilla-sources. Both gave me the same result. I don't know, how to debug this problem. Which information do you need?

Conrad

-----

Gentoo Linux
Kernel 3.15.6 / 3.14.1
iptables v1.4.21
4x Intel 82574L (Soekris net6501-70 platform)
Comment 1 Bart Van Assche 2014-04-27 17:00:53 UTC
Can you provide more information about the network path between the system running kernel 3.14.1 and the remote system ? I only see these slow uploads if there is a VPN connection between the local and the remote system. With kernel 3.13.11 uploads happens at full speed but with both kernels 3.14.1 and 3.14.2 uploads over VPN are abnormally slow. With kernel 3.14.2 and without VPN between the local and the remote system:
$ for ((k=10;k<=20;k++)); do i=$((1<<k)); echo ==== $i; time dd if=/dev/zero bs=1 count=$i 2>/dev/null | ssh -q ...@... dd of=/dev/null bs=1; done |& grep -E '^=|kB/s$'
==== 1024
1024 bytes (1.0 kB) copied, 0.161837 s, 6.3 kB/s
==== 2048
2048 bytes (2.0 kB) copied, 0.164881 s, 12.4 kB/s
==== 4096
4096 bytes (4.1 kB) copied, 0.162671 s, 25.2 kB/s
==== 8192
8192 bytes (8.2 kB) copied, 0.169641 s, 48.3 kB/s
==== 16384
16384 bytes (16 kB) copied, 0.341838 s, 47.9 kB/s
==== 32768
32768 bytes (33 kB) copied, 0.360242 s, 91.0 kB/s
==== 65536
65536 bytes (66 kB) copied, 0.52907 s, 124 kB/s
==== 131072
131072 bytes (131 kB) copied, 1.07712 s, 122 kB/s
==== 262144
262144 bytes (262 kB) copied, 1.33459 s, 196 kB/s
==== 524288
524288 bytes (524 kB) copied, 2.06827 s, 253 kB/s
==== 1048576
1048576 bytes (1.0 MB) copied, 3.5538 s, 295 kB/s

With kernel 3.14.2 and with VPN between the local and the remote system:
$ sudo "echo 0 > /proc/sys/net/ipv4/tcp_autocorking"
$ for ((k=10;k<=20;k++)); do i=$((1<<k)); echo ==== $i; time dd if=/dev/zero bs=1 count=$i 2>/dev/null | ssh -q ...@... dd of=/dev/null bs=1; done |& grep -E '^=|kB/s$'
==== 1024
1024 bytes (1.0 kB) copied, 0.139332 s, 7.3 kB/s
==== 2048
2048 bytes (2.0 kB) copied, 0.666552 s, 3.1 kB/s
==== 4096
4096 bytes (4.1 kB) copied, 0.818207 s, 5.0 kB/s
==== 8192
8192 bytes (8.2 kB) copied, 1.73257 s, 4.7 kB/s
==== 16384
16384 bytes (16 kB) copied, 13.3277 s, 1.2 kB/s
==== 32768
[ hangs ]
$ sudo "echo 1 > /proc/sys/net/ipv4/tcp_autocorking"
$ for ((k=10;k<=20;k++)); do i=$((1<<k)); echo ==== $i; time dd if=/dev/zero bs=1 count=$i 2>/dev/null | ssh -q ...@... dd of=/dev/null bs=1; done |& grep -E '^=|kB/s$'
==== 1024
1024 bytes (1.0 kB) copied, 0.132996 s, 7.7 kB/s
==== 2048
2048 bytes (2.0 kB) copied, 0.66048 s, 3.1 kB/s
==== 4096
4096 bytes (4.1 kB) copied, 0.809722 s, 5.1 kB/s
==== 8192
8192 bytes (8.2 kB) copied, 1.72476 s, 4.7 kB/s
==== 16384
16384 bytes (16 kB) copied, 13.0109 s, 1.3 kB/s
==== 32768
[ hangs ]

NAT was disabled during these tests:
asus:~ # iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
asus:~ # iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
Comment 2 Conrad Kostecki 2014-04-27 17:14:46 UTC
Hi!

(In reply to Bart Van Assche from comment #1)
> Can you provide more information about the network path between the system
> running kernel 3.14.1 and the remote system ? I only see these slow uploads
> if there is a VPN connection between the local and the remote system. With
> kernel 3.13.11 uploads happens at full speed but with both kernels 3.14.1
> and 3.14.2 uploads over VPN are abnormally slow. With kernel 3.14.2 and
> without VPN between the local and the remote system:

I am not using any VPN. I am just in my own LAN, connected trough an switch directly to my router. Uploads where tested against my own root server by Hetzner. 

Conrad
Comment 3 Bart Van Assche 2014-04-27 20:01:17 UTC
Can you try to cherry-pick commit 1e785f48d29a09b6cf96db7b49b6320dada332e1
("net: Start with correct mac_len in skb_network_protocol" - see also http://www.spinics.net/lists/netdev/msg279712.html) ?
Comment 4 Conrad Kostecki 2014-04-28 18:16:47 UTC
That's it! After applying commit 1e785f48d29a09b6cf96db7b49b6320dada332e1 to my kernel 3.14.1, all clients can upload fast again. Problem is gone :)

Will be this included in an future 3.14.x release?
Comment 5 Bart Van Assche 2014-04-29 13:39:02 UTC
In another message in the same thread I noticed that this patch has already been queued for a future 3.14.x kernel (see also http://www.spinics.net/lists/netdev/msg279751.html).
Comment 6 Bart Van Assche 2014-05-11 09:05:49 UTC
Duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=73891.
Comment 7 James Anderson 2014-05-21 13:00:46 UTC
I had the same problem on kernel 3.14.3 and applied this commit which apparently fixes the problem. However, after a period of time, some clients behind the ip forwarding host lose upload connectivity (it stalls to 0KB/s). Other clients seem unaffected. Rebooting client machines does nothing.

The following sequence, executed on the host that is handling the ip forwarding, immediately fixes the problem: 

root:~# echo 0 > /proc/sys/net/ipv4/ip_forward
root:~# echo 1 > /proc/sys/net/ipv4/ip_forward

It seems as if this commit only partially fixes the problem or that there is a different and unrelated problem with the ip forwarding code. At the moment, I'm unable to find a way of consistently reproducing the issue: client lockups appear random.