Bug 25062 - Bonding packet deduplication doesn't work properly anymore
Bonding packet deduplication doesn't work properly anymore
Status: RESOLVED OBSOLETE
Product: Networking
Classification: Unclassified
Component: Other
All Linux
: P1 high
Assigned To: Arnaldo Carvalho de Melo
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-12-17 11:45 UTC by Kevin Lapagna
Modified: 2012-08-14 14:18 UTC (History)
5 users (show)

See Also:
Kernel Version: > 2.6.33
Tree: Mainline
Regression: No


Attachments

Description Kevin Lapagna 2010-12-17 11:45:10 UTC
Here's the setup:

switch: ordinary cisco switch
eth0: NIC with kernel module tg3
eth1: NIC with kernel module e1000e
bond0: bond with slaves eth0,eth1 in mode 1 (or 5)
bond0.100: vlan device created with vconfig
bridge100: bridge created with brctl
tap1: tap device created with tunctl
vguest: qemu-kvm vguest whit emulated e1000 NIC

 
|¯¯¯¯¯¯¯¯|-- eth0 \                                               |¯¯¯¯¯¯¯¯|
| switch |          -- bond0 -- bond0.100 -- bridge100 -- tap1 -- | vguest |
|________|-- eth1 /                                               |________|

When the vguest emits an ethernet broadcast (DHCP-request), it's forwarded all the way up to the switch, through eth0. The switch forwards the broadcast - also to eth1. The packet travels then all the way back to bridge100. So the last status known for bridge100, regarding the mac address of the vgeust is, that it is behind bond0.110 (instead of tap1). If a DHCP-server responds to the request, the packet travels to bridge100, which has now a faulty MAC-address-table and the packet will be rejected and never reaches tap1 and therefor not the vguest.

I witnessed this wrong behavior in kernel 2.6.37-rc5 (debian package), 2.6.36.2 and 2.6.35.9 (self compiled -  vanilla). The setup has worked with kernels <= 2.6.33.7. I've never tried 2.6.34.

I assume the setup above is a common way for the separation of virtual guests on a network level. So this could become a major issue for a lot of people when upgrading their kernels.
Comment 1 Andrew Morton 2011-01-04 21:40:37 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Fri, 17 Dec 2010 11:45:18 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=25062
> 
>            Summary: Bonding packet deduplication doesn't work properly
>                     anymore
>            Product: Networking
>            Version: 2.5
>     Kernel Version: > 2.6.33
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: kevin.lapagna@bigtag.ch
>         Regression: No
> 
> 
> Here's the setup:
> 
> switch: ordinary cisco switch
> eth0: NIC with kernel module tg3
> eth1: NIC with kernel module e1000e
> bond0: bond with slaves eth0,eth1 in mode 1 (or 5)
> bond0.100: vlan device created with vconfig
> bridge100: bridge created with brctl
> tap1: tap device created with tunctl
> vguest: qemu-kvm vguest whit emulated e1000 NIC
> 
> 
> |________________|-- eth0 \                                               |________________|
> | switch |          -- bond0 -- bond0.100 -- bridge100 -- tap1 -- | vguest |
> |________|-- eth1 /                                               |________|
> 
> When the vguest emits an ethernet broadcast (DHCP-request), it's forwarded all
> the way up to the switch, through eth0. The switch forwards the broadcast -
> also to eth1. The packet travels then all the way back to bridge100. So the
> last status known for bridge100, regarding the mac address of the vgeust is,
> that it is behind bond0.110 (instead of tap1). If a DHCP-server responds to the
> request, the packet travels to bridge100, which has now a faulty
> MAC-address-table and the packet will be rejected and never reaches tap1 and
> therefor not the vguest.
> 
> I witnessed this wrong behavior in kernel 2.6.37-rc5 (debian package), 2.6.36.2
> and 2.6.35.9 (self compiled -  vanilla). The setup has worked with kernels <=
> 2.6.33.7. I've never tried 2.6.34.
> 
> I assume the setup above is a common way for the separation of virtual guests
> on a network level. So this could become a major issue for a lot of people when
> upgrading their kernels.
>
Comment 2 Jay Vosburgh 2011-01-07 02:47:42 UTC
Andrew Morton <akpm@linux-foundation.org> wrote:

>On Fri, 17 Dec 2010 11:45:18 GMT
>bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=25062
>> 
>>            Summary: Bonding packet deduplication doesn't work properly
>>                     anymore
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: > 2.6.33
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: high
>>           Priority: P1
>>          Component: Other
>>         AssignedTo: acme@ghostprotocols.net
>>         ReportedBy: kevin.lapagna@bigtag.ch
>>         Regression: No
>> 
>> 
>> Here's the setup:
>> 
>> switch: ordinary cisco switch
>> eth0: NIC with kernel module tg3
>> eth1: NIC with kernel module e1000e
>> bond0: bond with slaves eth0,eth1 in mode 1 (or 5)
>> bond0.100: vlan device created with vconfig
>> bridge100: bridge created with brctl
>> tap1: tap device created with tunctl
>> vguest: qemu-kvm vguest whit emulated e1000 NIC
>> 
>> 
>> |________________|-- eth0 \                                               |________________|
>> | switch |          -- bond0 -- bond0.100 -- bridge100 -- tap1 -- | vguest |
>> |________|-- eth1 /                                               |________|
>> 
>> When the vguest emits an ethernet broadcast (DHCP-request), it's forwarded all
>> the way up to the switch, through eth0. The switch forwards the broadcast -
>> also to eth1. The packet travels then all the way back to bridge100. So the
>> last status known for bridge100, regarding the mac address of the vgeust is,
>> that it is behind bond0.110 (instead of tap1). If a DHCP-server responds to the
>> request, the packet travels to bridge100, which has now a faulty
>> MAC-address-table and the packet will be rejected and never reaches tap1 and
>> therefor not the vguest.
>> 
>> I witnessed this wrong behavior in kernel 2.6.37-rc5 (debian package), 2.6.36.2
>> and 2.6.35.9 (self compiled -  vanilla). The setup has worked with kernels <=
>> 2.6.33.7. I've never tried 2.6.34.
>> 
>> I assume the setup above is a common way for the separation of virtual guests
>> on a network level. So this could become a major issue for a lot of people when
>> upgrading their kernels.

	Just a note that I have reproduced what I believe is the same
problem (I didn't use tap, and assigned an IP to the bridge).  I used
arping to generate ethernet broadcasts.  I see the problem on 2.6.36.2,
but not on today's net-next-2.6.

	I'll see if I can dig up the root cause tomorrow.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

Note You need to log in before you can comment on or make changes to this bug.