Bug 31722 - bridge not routing packets via source bridgeport
Summary: bridge not routing packets via source bridgeport
Status: RESOLVED OBSOLETE
Alias: None
Product: Networking
Classification: Unclassified
Component: Netfilter/Iptables (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: networking_netfilter-iptables@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-23 10:04 UTC by Sebastian J. Bronner
Modified: 2012-08-20 15:14 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.35 and up
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
log of packet trace to 1.1.1.2 (3.17 KB, text/plain)
2011-03-23 10:06 UTC, Sebastian J. Bronner
Details
log of packet trace to 1.1.1.3 (2.23 KB, text/plain)
2011-03-23 10:07 UTC, Sebastian J. Bronner
Details
iptables nat configuration (645 bytes, text/plain)
2011-03-23 10:07 UTC, Sebastian J. Bronner
Details

Description Sebastian J. Bronner 2011-03-23 10:04:48 UTC
We recently upgraded from 2.6.32.25 to 2.6.35.24 and discovered that our
virtual machines can no longer access their own external IP addresses.
Testing revealed that 2.6.34 was the last version not to have the
problem. 2.6.36 still had it. But on to the details.

Our setup:

We use KVM to virtualise our guests. The physical machines (nodes) act
as One-to-One NAT routers to the virtual machines. The virtual machines
are connected via virtio interfaces in a bridge.

Since the virtual machines only know about their RFC-1918 addresses, any
request they make to their NATed global addresses requires a trip
through the node's netfilter to perform the needed SNAT and DNAT operations.

Take the following setup:

   {internet}
       |
     (eth0)       <- 1.1.1.254, proxy_arp=1
       |
     [node]       <- ip_forward=1, routes*, nat**
       |
    (virbr1)      <- 10.0.0.1
    /      \
(vnet0)     |
   |     (vnet1)
(veth0)     |     <- 10.0.0.2
   |     (veth0)  <- 10.0.0.3
 [vm1]      |
          [vm2]

* The static routes on the node for the vms mentioned above are as follows:
# ip r
1.1.1.2 dev virbr1 scope link
1.1.1.3 dev virbr1 scope link

** The NAT rules are set up as follows (in reality, they're a bit more
complicated - but this suffices to illustrate the problem at hand):
# iptables-save -t nat
-A PREROUTING -d 1.1.1.2 -j DNAT --to-destination 10.0.0.2
-A PREROUTING -d 1.1.1.3 -j DNAT --to-destination 10.0.0.3
-A POSTROUTING -s 10.0.0.2 -j SNAT --to-source 1.1.1.2
-A POSTROUTING -s 10.0.0.3 -j SNAT --to-source 1.1.1.3

This means that 1.1.1.2 maps to 10.0.0.2 (vm1) and
                1.1.1.3 maps to 10.0.0.3 (vm2).

Assuming ssh is running on both vms, running 'nc -v 1.1.1.3 22' from vm1
gets me ssh's introductory message.

Assuming, no service is running on port 23, running 'nc -v 1.1.1.3 23'
from vm1 gets me 'Connection refused'.

That's all fine and exactly as it should be. The vms are accessible from
the internet as well, and can access the internet.

If, however, i run 'nc -v 1.1.1.2 22' from vm1 (or any other port for that
matter), I get a timeout!

Running tcpdump on all the involved interfaces showed me that the
packets successfully traverse veth0 and vnet0 and appear to get lost
upon reaching virbr1.

So, then I decided to set up a packet trace with iptables:
[on the node]
# modprobe ipt_LOG
# iptables -t raw -A PREROUTING -p tcp --dport 4577 -j TRACE
# tail -f /var/log/messages | grep TRACE
[on vm1]
# nc -v 1.1.1.2 4577

The results were very interesting, if somewhat dumbfounding. They are
attached for easier perusal. The gist of it is that the packet in
question disappears without a trace after going through the DNAT rule in
the PREROUTING chain of the NAT table. This can be seen happening three
times in vm1-to-1.1.1.2.txt in three and six second intervals (retries).

For comparison, I have also included a trace of a successful packet
traversal that ends in a 'Connection refused'. It is in vm1-to-1.1.1.3.txt.

As a last note, I should add that the problem isn't related to the IP
address. I eliminated that by putting two RFC-1918 IPs on vm1 and
mapping two IPs to it, then running nc on one IP, while the other one
was being used as the source IP.

The problem appears to be that packets can't be routed out the same
bridgeport that they arrived on.

I hope this all makes sense and that you can reproduce the problem. One
virtual machine will suffise to see the problem at work.

Feel free to contact me if you need more information or have suggestions
for me.

Cheers,
Sebastian Bronner

P.S.: The IP addresses are faked. I used vim to replace all instances of
the real IPs with the fake ones used in this bug report consistently.
Comment 1 Sebastian J. Bronner 2011-03-23 10:06:03 UTC
Created attachment 51712 [details]
log of packet trace to 1.1.1.2
Comment 2 Sebastian J. Bronner 2011-03-23 10:07:18 UTC
Created attachment 51722 [details]
log of packet trace to 1.1.1.3
Comment 3 Sebastian J. Bronner 2011-03-23 10:07:58 UTC
Created attachment 51732 [details]
iptables nat configuration
Comment 4 Sebastian J. Bronner 2011-03-23 10:10:31 UTC
This problem report was originally sent to netdev@vger.kernel.org. The discussion was minimal, but can be seen at

http://www.spinics.net/lists/netdev/msg151468.html

Note You need to log in before you can comment on or make changes to this bug.