Yes, this seems strange, but it's seems to be true. My network scheme is quite simple: (host1) <--- 10gbe ---> (bridge host) <--- 10gbe ---> (host2) host1 & host2 are actually VMWare ESXi hypervisors, but that's irrelevant in this case i think. Network adapters are Intel's 82599 10 gig cards on all hosts. At the bridge, on each interface i've created a vlan, and then bridged them: # vconfig add eth0 102 # vconfig add eth1 102 # brctl addbr br0 # brctl addif br0 eth0.102 # brctl addif br0 eth1.102 # ip link set br0 mtu 9000 up ...etc... At this point, the bridge seems to be working, i can ping between host1 & host2, even with jumbo frames without fragmentation. BUT when i am trying to use iperf & friends to measure raw tcp speed between hosts 1/2, i'm getting something weird like 7-10 MEGABITS per second, or even an iperf hang until ctrl+c. If i attach an ip address to the bridge, and measure between hosts and the bridge, it works flawlessly, rendering 9.8Gbit/s in both directions. While trying to find a solution, when i ran out of options, i've set net.ipv4.ip_forward to 1, and, SURPRISE, the bridge started to work like a charm, at almost 10gig speed. What makes it stranger, is that in my kernel, i've turned off all routing code, iptables and other stuff, as this host serves primarily as iSCSI target. I have little knowledge in kernel's deep internals, but i always thought that bridging & routing are on different levels of operation and couldn't affect each other (ebtables is an exception, but i don't have it :) ). Maybe i'm interpreting the results wrong, but i've ruled out everything else. Currently, i can't use this setup as a test ground, i'll try to replicate the scheme in a virtual environment to see if other kernels are affected as well. Glad to hear any ideas on this.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Fri, 3 Jun 2011 19:21:20 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=36602 > > Summary: Bridge fails to work normally without > net.ipv4.ip_forward=1 > Product: Networking > Version: 2.5 > Kernel Version: 2.6.38.7 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: igor@novg.net > Regression: No > > > Yes, this seems strange, but it's seems to be true. > > My network scheme is quite simple: > > (host1) <--- 10gbe ---> (bridge host) <--- 10gbe ---> (host2) > > host1 & host2 are actually VMWare ESXi hypervisors, but that's irrelevant in > this case i think. > > Network adapters are Intel's 82599 10 gig cards on all hosts. > > At the bridge, on each interface i've created a vlan, and then bridged them: > # vconfig add eth0 102 > # vconfig add eth1 102 > # brctl addbr br0 > # brctl addif br0 eth0.102 > # brctl addif br0 eth1.102 > # ip link set br0 mtu 9000 up > ...etc... > > At this point, the bridge seems to be working, i can ping between host1 & > host2, even with jumbo frames without fragmentation. > > BUT when i am trying to use iperf & friends to measure raw tcp speed between > hosts 1/2, i'm getting something weird like 7-10 MEGABITS per second, or even > an iperf hang until ctrl+c. > > If i attach an ip address to the bridge, and measure between hosts and the > bridge, it works flawlessly, rendering 9.8Gbit/s in both directions. > > While trying to find a solution, when i ran out of options, i've set > net.ipv4.ip_forward to 1, and, SURPRISE, the bridge started to work like a > charm, at almost 10gig speed. > > What makes it stranger, is that in my kernel, i've turned off all routing > code, > iptables and other stuff, as this host serves primarily as iSCSI target. > > I have little knowledge in kernel's deep internals, but i always thought that > bridging & routing are on different levels of operation and couldn't affect > each other (ebtables is an exception, but i don't have it :) ). > > Maybe i'm interpreting the results wrong, but i've ruled out everything else. > > Currently, i can't use this setup as a test ground, i'll try to replicate the > scheme in a virtual environment to see if other kernels are affected as well. > > Glad to hear any ideas on this. >
On Fri, 2011-06-03 at 12:36 -0700, Andrew Morton wrote: [...] > > At the bridge, on each interface i've created a vlan, and then bridged > them: > > # vconfig add eth0 102 > > # vconfig add eth1 102 > > # brctl addbr br0 > > # brctl addif br0 eth0.102 > > # brctl addif br0 eth1.102 > > # ip link set br0 mtu 9000 up > > ...etc... > > > > At this point, the bridge seems to be working, i can ping between host1 & > > host2, even with jumbo frames without fragmentation. > > > > BUT when i am trying to use iperf & friends to measure raw tcp speed > between > > hosts 1/2, i'm getting something weird like 7-10 MEGABITS per second, or > even > > an iperf hang until ctrl+c. This sounds like a symptom of doing LRO on a bridged device. Normally we turn off LRO for bridge members automatically, but we haven't been doing that when the bridge members are VLAN devices. > > If i attach an ip address to the bridge, and measure between hosts and the > > bridge, it works flawlessly, rendering 9.8Gbit/s in both directions. > > > > While trying to find a solution, when i ran out of options, i've set > > net.ipv4.ip_forward to 1, and, SURPRISE, the bridge started to work like a > > charm, at almost 10gig speed. [...] Right, that should force LRO off for all devices with IPv4 set up. This should be fixed by: commit f11970e383acd6f505f492f1bc07fb1a4d884829 Author: Neil Horman <nhorman@tuxdriver.com> Date: Tue May 24 08:31:09 2011 +0000 net: make dev_disable_lro use physical device if passed a vlan dev (v2) which is in 3.0-rc1. Ben.
Hmm... By LRO do you mean in-kernel software CONFIG_INET_LRO, or nic driver's LRO? Because i've read about all bad things that can happen with LRO & bridging/routing used together, and built ixgbe drivers without LRO at all: make CFLAGS_EXTRA="-DIXGBE_NO_LRO -DIXGBE_NO_LLI" KSP="/usr/src/linux" install So i thought that i'll be safe from that problem...