Bug 16216

Summary: wrong source addr of UDP packets when using policy routing
Product: Networking Reporter: borg
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED OBSOLETE    
Severity: normal CC: akpm, alan, ambrop7
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24.7 Subsystem:
Regression: No Bisected commit-id:

Description borg 2010-06-15 15:14:39 UTC
When policy routing is used, UDP packets have wrong source address.
Source addr is probably taken from looking up routing table (main) to given
destination instead of being set just after POSTROUTING, looking up cache.

This how it looks like doing simple netcat test:
(tcpdump is run on aa.aa.47.90)
16:38:02.053053 IP aa.aa.47.67.32826 > aa.aa.47.90.660: UDP, length 8
16:38:05.660394 IP bb.bbb.241.62.660 > aa.aa.47.67.32826: UDP, length 8

aa.aa.47.90 have specific setup having 3 routing tables: main, 10, 20
and all of them have default gateway. bb.bbb.241.62 is an addr of 
outgoing interface of default route from main table.
If a packet cames from specific interface
its being stored to ipset and when packet is going to be sent out of the box
its being marked in mangle OUTPUT matching specific ipset:

### mangle PREROUTING ###
fw="iptables -t mangle -A PREROUTING"
$fw -i vlan0.13 -j SET --add-set gw10 src
$fw -i lan2 -j SET --add-set gw20 src

### mangle OUTPUT ###
fw="iptables -t mangle -A OUTPUT"
$fw -m set --set gw10 dst -j MARK --set-mark 10
$fw -m set --set gw10 dst -j ACCEPT
$fw -m set --set gw20 dst -j MARK --set-mark 20
$fw -m set --set gw20 dst -j ACCEPT

% ip rule show
32764:  from all fwmark 0x14 lookup 20
32765:  from all fwmark 0xa lookup 10

Problem was noticed for UDP packets (openvpn connections are not working).
Other non connection oriented protocols might be affected too.
TCP (as connection oriented protocol) works just fine.
Comment 1 Andrew Morton 2010-06-15 19:02:56 UTC
2.6.24 is an awfully old kernel.  Are you able to determine whether the problem is present in more recent code?

Thanks.
Comment 2 borg 2010-06-16 10:57:09 UTC
Hello.

Confirmed on Linux 2.6.34

[borg@vmware] cat /tmp/tcpdump.log
12:57:54.072055 IP 10.10.0.20.1111 > 10.0.0.1.5000: UDP, length 8
12:57:56.332161 IP 169.254.0.4.5000 > 10.10.0.20.1111: UDP, length 8
[borg@vmware] ip rule show
0:      from all lookup local
32764:  from all fwmark 0x2 lookup 2
32765:  from all fwmark 0x1 lookup 1
32766:  from all lookup main
32767:  from all lookup default
[borg@vmware] ip route show table main
10.0.0.0/24 dev eth1  proto kernel  scope link  src 10.0.0.1
10.0.1.0/24 dev eth2  proto kernel  scope link  src 10.0.1.1
169.254.0.0/24 dev eth0  proto kernel  scope link  src 169.254.0.4
default via 169.254.0.1 dev eth0
[borg@vmware] ip route show table 1
default via 10.0.0.2 dev eth1
[root@vmware] ipset -L
Name: t1
Type: iphash
References: 3
Header: hashsize: 1024 probes: 8 resize: 50
Members:
10.10.0.20

Test was done using netcat.
vmware: nc -u -l -p 5000
client: nc -u 10.0.0.1 5000
second line of tcpdump.log is an asnwer from vmware -> client.

Regards,
Borg
Comment 3 Andrew Morton 2010-06-16 16:34:22 UTC
On Tue, 15 Jun 2010 15:14:43 GMT bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16216
> 
>            Summary: wrong source addr of UDP packets when using policy
>                     routing
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.24.7

The reporter has confirmed that this issue persistes in 2.6.34.

>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: borg@uu3.net
>         Regression: No
> 
> 
> When policy routing is used, UDP packets have wrong source address.
> Source addr is probably taken from looking up routing table (main) to given
> destination instead of being set just after POSTROUTING, looking up cache.
> 
> This how it looks like doing simple netcat test:
> (tcpdump is run on aa.aa.47.90)
> 16:38:02.053053 IP aa.aa.47.67.32826 > aa.aa.47.90.660: UDP, length 8
> 16:38:05.660394 IP bb.bbb.241.62.660 > aa.aa.47.67.32826: UDP, length 8
> 
> aa.aa.47.90 have specific setup having 3 routing tables: main, 10, 20
> and all of them have default gateway. bb.bbb.241.62 is an addr of 
> outgoing interface of default route from main table.
> If a packet cames from specific interface
> its being stored to ipset and when packet is going to be sent out of the box
> its being marked in mangle OUTPUT matching specific ipset:
> 
> ### mangle PREROUTING ###
> fw="iptables -t mangle -A PREROUTING"
> $fw -i vlan0.13 -j SET --add-set gw10 src
> $fw -i lan2 -j SET --add-set gw20 src
> 
> ### mangle OUTPUT ###
> fw="iptables -t mangle -A OUTPUT"
> $fw -m set --set gw10 dst -j MARK --set-mark 10
> $fw -m set --set gw10 dst -j ACCEPT
> $fw -m set --set gw20 dst -j MARK --set-mark 20
> $fw -m set --set gw20 dst -j ACCEPT
> 
> % ip rule show
> 32764:  from all fwmark 0x14 lookup 20
> 32765:  from all fwmark 0xa lookup 10
> 
> Problem was noticed for UDP packets (openvpn connections are not working).
> Other non connection oriented protocols might be affected too.
> TCP (as connection oriented protocol) works just fine.
>
Comment 4 Patrick McHardy 2010-06-16 17:23:06 UTC
Andrew Morton wrote:
> On Tue, 15 Jun 2010 15:14:43 GMT bugzilla-daemon@bugzilla.kernel.org wrote:
>
>   
>> https://bugzilla.kernel.org/show_bug.cgi?id=16216
>>
>>            Summary: wrong source addr of UDP packets when using policy
>>                     routing
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 2.6.24.7
>>     
>
> The reporter has confirmed that this issue persistes in 2.6.34.
>
>   
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: IPV4
>>         AssignedTo: shemminger@linux-foundation.org
>>         ReportedBy: borg@uu3.net
>>         Regression: No
>>
>>
>> When policy routing is used, UDP packets have wrong source address.
>> Source addr is probably taken from looking up routing table (main) to given
>> destination instead of being set just after POSTROUTING, looking up cache.
>>
>> This how it looks like doing simple netcat test:
>> (tcpdump is run on aa.aa.47.90)
>> 16:38:02.053053 IP aa.aa.47.67.32826 > aa.aa.47.90.660: UDP, length 8
>> 16:38:05.660394 IP bb.bbb.241.62.660 > aa.aa.47.67.32826: UDP, length 8
>>
>> aa.aa.47.90 have specific setup having 3 routing tables: main, 10, 20
>> and all of them have default gateway. bb.bbb.241.62 is an addr of 
>> outgoing interface of default route from main table.
>> If a packet cames from specific interface
>> its being stored to ipset and when packet is going to be sent out of the box
>> its being marked in mangle OUTPUT matching specific ipset:
>>
>> ### mangle PREROUTING ###
>> fw="iptables -t mangle -A PREROUTING"
>> $fw -i vlan0.13 -j SET --add-set gw10 src
>> $fw -i lan2 -j SET --add-set gw20 src
>>
>> ### mangle OUTPUT ###
>> fw="iptables -t mangle -A OUTPUT"
>> $fw -m set --set gw10 dst -j MARK --set-mark 10
>> $fw -m set --set gw10 dst -j ACCEPT
>> $fw -m set --set gw20 dst -j MARK --set-mark 20
>> $fw -m set --set gw20 dst -j ACCEPT
>>
>> % ip rule show
>> 32764:  from all fwmark 0x14 lookup 20
>> 32765:  from all fwmark 0xa lookup 10

This is know behaviour, fwmarks don't work for source address selection
since before the source address is chosen, you don't even have a packet
which could be marked.
Comment 5 Eric Dumazet 2010-06-16 17:29:10 UTC
Le mercredi 16 juin 2010 à 18:46 +0200, Patrick McHardy a écrit :

> This is know behaviour, fwmarks don't work for source address selection
> since before the source address is chosen, you don't even have a packet
> which could be marked.

We know have sk->sk_mark routing (socket based), so we might change
sk->sk_mark with appropriate iptables target when one packet is
received... not very clean but worth to mention...

commit 914a9ab386a288d0f22252fc268ecbc048cdcbd5
Author: Atis Elsts <atis@mikrotik.com>
Date:   Thu Oct 1 15:16:49 2009 -0700

    net: Use sk_mark for routing lookup in more places
    
    This patch against v2.6.31 adds support for route lookup using sk_mark in some
    more places. The benefits from this patch are the following.
    First, SO_MARK option now has effect on UDP sockets too.
    Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
    lookup correctly if TCP sockets with SO_MARK were used.
    
    Signed-off-by: Atis Elsts <atis@mikrotik.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Comment 6 Eric Dumazet 2010-06-16 17:34:38 UTC
Le mercredi 16 juin 2010 à 18:46 +0200, Patrick McHardy a écrit :

> This is know behaviour, fwmarks don't work for source address selection
> since before the source address is chosen, you don't even have a packet
> which could be marked.

We know have sk->sk_mark routing (socket based), so we might change
sk->sk_mark with appropriate iptables target when one packet is
received... not very clean but worth to mention...

commit 914a9ab386a288d0f22252fc268ecbc048cdcbd5
Author: Atis Elsts <atis@mikrotik.com>
Date:   Thu Oct 1 15:16:49 2009 -0700

    net: Use sk_mark for routing lookup in more places
    
    This patch against v2.6.31 adds support for route lookup using sk_mark in some
    more places. The benefits from this patch are the following.
    First, SO_MARK option now has effect on UDP sockets too.
    Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing
    lookup correctly if TCP sockets with SO_MARK were used.
    
    Signed-off-by: Atis Elsts <atis@mikrotik.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Comment 7 Patrick McHardy 2010-06-16 17:43:56 UTC
Eric Dumazet wrote:
> Le mercredi 16 juin 2010 à 18:46 +0200, Patrick McHardy a écrit :
>
>   
>> This is know behaviour, fwmarks don't work for source address selection
>> since before the source address is chosen, you don't even have a packet
>> which could be marked.
>>     
>
> We know have sk->sk_mark routing (socket based), so we might change
> sk->sk_mark with appropriate iptables target when one packet is
> received... not very clean but worth to mention...
>   
That would still be too late. The proper way would be to have the 
application
set the socket mark.
Comment 8 borg 2010-06-16 20:25:10 UTC
(In reply to comment #4)
> Andrew Morton wrote:
> This is know behaviour, fwmarks don't work for source address selection
> since before the source address is chosen, you don't even have a packet
> which could be marked.

What do you mean? In my setup fwmark are done on DST address, not SRC.
ipset stores the SRC addr of incoming packet and then I mark packets
outgoing from box to specified DST addr.
Comment 9 Andrew Morton 2010-06-16 20:50:04 UTC
(In reply to comment #8)
> (In reply to comment #4)
> > Andrew Morton wrote:
> > This is know behaviour, fwmarks don't work for source address selection
> > since before the source address is chosen, you don't even have a packet
> > which could be marked.
> 
> What do you mean? In my setup fwmark are done on DST address, not SRC.
> ipset stores the SRC addr of incoming packet and then I mark packets
> outgoing from box to specified DST addr.

Patrick won't have seen your question.  Please don't update this bug via the bugzilla interface.  Please use emailed reply-to-all in the email thread.
Comment 10 borg 2010-06-18 10:57:56 UTC
Okey. Did you people came into any conclusions?
Is there a patch I can test?

I tried to find 914a9ab386a288d0f22252fc268ecbc048cdcbd5
in few stable trees but was unable to.

---------- Original message ----------

From: Patrick McHardy <kaber@trash.net>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, netdev@vger.kernel.org,
    bugzilla-daemon@bugzilla.kernel.org, bugme-daemon@bugzilla.kernel.org,
    borg@uu3.net
Subject: Re: [Bugme-new] [Bug 16216] New: wrong source addr of UDP packets when
    using policy routing
Date: Wed, 16 Jun 2010 19:43:16 +0200
Message-ID: <4C190D34.8080100@trash.net>

Eric Dumazet wrote:
> Le mercredi 16 juin 2010  18:46 +0200, Patrick McHardy a écrit :
> 
>   
> > This is know behaviour, fwmarks don't work for source address selection
> > since before the source address is chosen, you don't even have a packet
> > which could be marked.
> >     
> 
> We know have sk->sk_mark routing (socket based), so we might change
> sk->sk_mark with appropriate iptables target when one packet is
> received... not very clean but worth to mention...
>   
That would still be too late. The proper way would be to have the application
set the socket mark.
Comment 11 Patrick McHardy 2010-06-22 06:10:26 UTC
Unknown wrote:
> Okey. Did you people came into any conclusions?
> Is there a patch I can test?

As I said, its known and expected behaviour and there's nothing
netfilter can do about it. You could patch your application to use
the SO_MARK socket option to set the socket mark.
Comment 12 borg 2010-06-22 09:22:05 UTC
Hmm. This is not an option.

Okey, thx for info. Seems its time for some hack & slash ;)

---------- Original message ----------

From: Patrick McHardy <kaber@trash.net>
To: Unknown <borg@uu3.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
    Andrew Morton <akpm@linux-foundation.org>, netdev@vger.kernel.org,
    bugzilla-daemon@bugzilla.kernel.org, bugme-daemon@bugzilla.kernel.org
Subject: Re: [Bugme-new] [Bug 16216] New: wrong source addr of UDP packets when
    using policy routing
Date: Tue, 22 Jun 2010 08:09:44 +0200
Message-ID: <4C2053A8.4040400@trash.net>

Unknown wrote:
> Okey. Did you people came into any conclusions?
> Is there a patch I can test?

As I said, its known and expected behaviour and there's nothing
netfilter can do about it. You could patch your application to use
the SO_MARK socket option to set the socket mark.
Comment 13 Ambroz Bizjak 2011-05-09 19:15:56 UTC
It is possible to work this around by NAT-ing the source address after the packet is routed. Something like this:

iptables -t nat -A POSTROUTING -m --mark 10 -j SNAT --to-source <address-of-outgoing-interface>

where mark 10 is the mark being set in the mangle OUTPUT table (not PREROUTING, or you end up NATing everything forwarded).

It may also be necessary to turn off the reverse path filter on the outgoing interface, or replies may be dropped.
Comment 14 Ambroz Bizjak 2011-07-08 00:05:08 UTC
What if the kernel, in order to select the source address, was to create a fake packet, with empty source address, and simulate processing of that packet (by passing it though netfilter and the routing table), to see which interface the packet would go through?

The packet wouldn't have to correspond exactly to the real first packet, just have matching various attributes (e.g. protocol, ports, destination address). It would make policy routing behave much more like people expect it to.

The packet would have to be processed a little differently from a real one; for example, connection tracking would have to be changed to not actually remember the connection, but still make connmarks work.
Comment 15 borg 2011-07-08 13:32:28 UTC
Well, this will be a hack more or less then.
What I think, the best, would be to add ipset support
to ip rule. With this, you could make true src/dst lookup
without hacking in kernel too much.