When policy routing is used, UDP packets have wrong source address. Source addr is probably taken from looking up routing table (main) to given destination instead of being set just after POSTROUTING, looking up cache. This how it looks like doing simple netcat test: (tcpdump is run on aa.aa.47.90) 16:38:02.053053 IP aa.aa.47.67.32826 > aa.aa.47.90.660: UDP, length 8 16:38:05.660394 IP bb.bbb.241.62.660 > aa.aa.47.67.32826: UDP, length 8 aa.aa.47.90 have specific setup having 3 routing tables: main, 10, 20 and all of them have default gateway. bb.bbb.241.62 is an addr of outgoing interface of default route from main table. If a packet cames from specific interface its being stored to ipset and when packet is going to be sent out of the box its being marked in mangle OUTPUT matching specific ipset: ### mangle PREROUTING ### fw="iptables -t mangle -A PREROUTING" $fw -i vlan0.13 -j SET --add-set gw10 src $fw -i lan2 -j SET --add-set gw20 src ### mangle OUTPUT ### fw="iptables -t mangle -A OUTPUT" $fw -m set --set gw10 dst -j MARK --set-mark 10 $fw -m set --set gw10 dst -j ACCEPT $fw -m set --set gw20 dst -j MARK --set-mark 20 $fw -m set --set gw20 dst -j ACCEPT % ip rule show 32764: from all fwmark 0x14 lookup 20 32765: from all fwmark 0xa lookup 10 Problem was noticed for UDP packets (openvpn connections are not working). Other non connection oriented protocols might be affected too. TCP (as connection oriented protocol) works just fine.
2.6.24 is an awfully old kernel. Are you able to determine whether the problem is present in more recent code? Thanks.
Hello. Confirmed on Linux 2.6.34 [borg@vmware] cat /tmp/tcpdump.log 12:57:54.072055 IP 10.10.0.20.1111 > 10.0.0.1.5000: UDP, length 8 12:57:56.332161 IP 169.254.0.4.5000 > 10.10.0.20.1111: UDP, length 8 [borg@vmware] ip rule show 0: from all lookup local 32764: from all fwmark 0x2 lookup 2 32765: from all fwmark 0x1 lookup 1 32766: from all lookup main 32767: from all lookup default [borg@vmware] ip route show table main 10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.1 10.0.1.0/24 dev eth2 proto kernel scope link src 10.0.1.1 169.254.0.0/24 dev eth0 proto kernel scope link src 169.254.0.4 default via 169.254.0.1 dev eth0 [borg@vmware] ip route show table 1 default via 10.0.0.2 dev eth1 [root@vmware] ipset -L Name: t1 Type: iphash References: 3 Header: hashsize: 1024 probes: 8 resize: 50 Members: 10.10.0.20 Test was done using netcat. vmware: nc -u -l -p 5000 client: nc -u 10.0.0.1 5000 second line of tcpdump.log is an asnwer from vmware -> client. Regards, Borg
On Tue, 15 Jun 2010 15:14:43 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16216 > > Summary: wrong source addr of UDP packets when using policy > routing > Product: Networking > Version: 2.5 > Kernel Version: 2.6.24.7 The reporter has confirmed that this issue persistes in 2.6.34. > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: borg@uu3.net > Regression: No > > > When policy routing is used, UDP packets have wrong source address. > Source addr is probably taken from looking up routing table (main) to given > destination instead of being set just after POSTROUTING, looking up cache. > > This how it looks like doing simple netcat test: > (tcpdump is run on aa.aa.47.90) > 16:38:02.053053 IP aa.aa.47.67.32826 > aa.aa.47.90.660: UDP, length 8 > 16:38:05.660394 IP bb.bbb.241.62.660 > aa.aa.47.67.32826: UDP, length 8 > > aa.aa.47.90 have specific setup having 3 routing tables: main, 10, 20 > and all of them have default gateway. bb.bbb.241.62 is an addr of > outgoing interface of default route from main table. > If a packet cames from specific interface > its being stored to ipset and when packet is going to be sent out of the box > its being marked in mangle OUTPUT matching specific ipset: > > ### mangle PREROUTING ### > fw="iptables -t mangle -A PREROUTING" > $fw -i vlan0.13 -j SET --add-set gw10 src > $fw -i lan2 -j SET --add-set gw20 src > > ### mangle OUTPUT ### > fw="iptables -t mangle -A OUTPUT" > $fw -m set --set gw10 dst -j MARK --set-mark 10 > $fw -m set --set gw10 dst -j ACCEPT > $fw -m set --set gw20 dst -j MARK --set-mark 20 > $fw -m set --set gw20 dst -j ACCEPT > > % ip rule show > 32764: from all fwmark 0x14 lookup 20 > 32765: from all fwmark 0xa lookup 10 > > Problem was noticed for UDP packets (openvpn connections are not working). > Other non connection oriented protocols might be affected too. > TCP (as connection oriented protocol) works just fine. >
Andrew Morton wrote: > On Tue, 15 Jun 2010 15:14:43 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > > >> https://bugzilla.kernel.org/show_bug.cgi?id=16216 >> >> Summary: wrong source addr of UDP packets when using policy >> routing >> Product: Networking >> Version: 2.5 >> Kernel Version: 2.6.24.7 >> > > The reporter has confirmed that this issue persistes in 2.6.34. > > >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: IPV4 >> AssignedTo: shemminger@linux-foundation.org >> ReportedBy: borg@uu3.net >> Regression: No >> >> >> When policy routing is used, UDP packets have wrong source address. >> Source addr is probably taken from looking up routing table (main) to given >> destination instead of being set just after POSTROUTING, looking up cache. >> >> This how it looks like doing simple netcat test: >> (tcpdump is run on aa.aa.47.90) >> 16:38:02.053053 IP aa.aa.47.67.32826 > aa.aa.47.90.660: UDP, length 8 >> 16:38:05.660394 IP bb.bbb.241.62.660 > aa.aa.47.67.32826: UDP, length 8 >> >> aa.aa.47.90 have specific setup having 3 routing tables: main, 10, 20 >> and all of them have default gateway. bb.bbb.241.62 is an addr of >> outgoing interface of default route from main table. >> If a packet cames from specific interface >> its being stored to ipset and when packet is going to be sent out of the box >> its being marked in mangle OUTPUT matching specific ipset: >> >> ### mangle PREROUTING ### >> fw="iptables -t mangle -A PREROUTING" >> $fw -i vlan0.13 -j SET --add-set gw10 src >> $fw -i lan2 -j SET --add-set gw20 src >> >> ### mangle OUTPUT ### >> fw="iptables -t mangle -A OUTPUT" >> $fw -m set --set gw10 dst -j MARK --set-mark 10 >> $fw -m set --set gw10 dst -j ACCEPT >> $fw -m set --set gw20 dst -j MARK --set-mark 20 >> $fw -m set --set gw20 dst -j ACCEPT >> >> % ip rule show >> 32764: from all fwmark 0x14 lookup 20 >> 32765: from all fwmark 0xa lookup 10 This is know behaviour, fwmarks don't work for source address selection since before the source address is chosen, you don't even have a packet which could be marked.
Le mercredi 16 juin 2010 à 18:46 +0200, Patrick McHardy a écrit : > This is know behaviour, fwmarks don't work for source address selection > since before the source address is chosen, you don't even have a packet > which could be marked. We know have sk->sk_mark routing (socket based), so we might change sk->sk_mark with appropriate iptables target when one packet is received... not very clean but worth to mention... commit 914a9ab386a288d0f22252fc268ecbc048cdcbd5 Author: Atis Elsts <atis@mikrotik.com> Date: Thu Oct 1 15:16:49 2009 -0700 net: Use sk_mark for routing lookup in more places This patch against v2.6.31 adds support for route lookup using sk_mark in some more places. The benefits from this patch are the following. First, SO_MARK option now has effect on UDP sockets too. Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing lookup correctly if TCP sockets with SO_MARK were used. Signed-off-by: Atis Elsts <atis@mikrotik.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Eric Dumazet wrote: > Le mercredi 16 juin 2010 à 18:46 +0200, Patrick McHardy a écrit : > > >> This is know behaviour, fwmarks don't work for source address selection >> since before the source address is chosen, you don't even have a packet >> which could be marked. >> > > We know have sk->sk_mark routing (socket based), so we might change > sk->sk_mark with appropriate iptables target when one packet is > received... not very clean but worth to mention... > That would still be too late. The proper way would be to have the application set the socket mark.
(In reply to comment #4) > Andrew Morton wrote: > This is know behaviour, fwmarks don't work for source address selection > since before the source address is chosen, you don't even have a packet > which could be marked. What do you mean? In my setup fwmark are done on DST address, not SRC. ipset stores the SRC addr of incoming packet and then I mark packets outgoing from box to specified DST addr.
(In reply to comment #8) > (In reply to comment #4) > > Andrew Morton wrote: > > This is know behaviour, fwmarks don't work for source address selection > > since before the source address is chosen, you don't even have a packet > > which could be marked. > > What do you mean? In my setup fwmark are done on DST address, not SRC. > ipset stores the SRC addr of incoming packet and then I mark packets > outgoing from box to specified DST addr. Patrick won't have seen your question. Please don't update this bug via the bugzilla interface. Please use emailed reply-to-all in the email thread.
Okey. Did you people came into any conclusions? Is there a patch I can test? I tried to find 914a9ab386a288d0f22252fc268ecbc048cdcbd5 in few stable trees but was unable to. ---------- Original message ---------- From: Patrick McHardy <kaber@trash.net> To: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org>, netdev@vger.kernel.org, bugzilla-daemon@bugzilla.kernel.org, bugme-daemon@bugzilla.kernel.org, borg@uu3.net Subject: Re: [Bugme-new] [Bug 16216] New: wrong source addr of UDP packets when using policy routing Date: Wed, 16 Jun 2010 19:43:16 +0200 Message-ID: <4C190D34.8080100@trash.net> Eric Dumazet wrote: > Le mercredi 16 juin 2010 18:46 +0200, Patrick McHardy a écrit : > > > > This is know behaviour, fwmarks don't work for source address selection > > since before the source address is chosen, you don't even have a packet > > which could be marked. > > > > We know have sk->sk_mark routing (socket based), so we might change > sk->sk_mark with appropriate iptables target when one packet is > received... not very clean but worth to mention... > That would still be too late. The proper way would be to have the application set the socket mark.
Unknown wrote: > Okey. Did you people came into any conclusions? > Is there a patch I can test? As I said, its known and expected behaviour and there's nothing netfilter can do about it. You could patch your application to use the SO_MARK socket option to set the socket mark.
Hmm. This is not an option. Okey, thx for info. Seems its time for some hack & slash ;) ---------- Original message ---------- From: Patrick McHardy <kaber@trash.net> To: Unknown <borg@uu3.net> Cc: Eric Dumazet <eric.dumazet@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, netdev@vger.kernel.org, bugzilla-daemon@bugzilla.kernel.org, bugme-daemon@bugzilla.kernel.org Subject: Re: [Bugme-new] [Bug 16216] New: wrong source addr of UDP packets when using policy routing Date: Tue, 22 Jun 2010 08:09:44 +0200 Message-ID: <4C2053A8.4040400@trash.net> Unknown wrote: > Okey. Did you people came into any conclusions? > Is there a patch I can test? As I said, its known and expected behaviour and there's nothing netfilter can do about it. You could patch your application to use the SO_MARK socket option to set the socket mark.
It is possible to work this around by NAT-ing the source address after the packet is routed. Something like this: iptables -t nat -A POSTROUTING -m --mark 10 -j SNAT --to-source <address-of-outgoing-interface> where mark 10 is the mark being set in the mangle OUTPUT table (not PREROUTING, or you end up NATing everything forwarded). It may also be necessary to turn off the reverse path filter on the outgoing interface, or replies may be dropped.
What if the kernel, in order to select the source address, was to create a fake packet, with empty source address, and simulate processing of that packet (by passing it though netfilter and the routing table), to see which interface the packet would go through? The packet wouldn't have to correspond exactly to the real first packet, just have matching various attributes (e.g. protocol, ports, destination address). It would make policy routing behave much more like people expect it to. The packet would have to be processed a little differently from a real one; for example, connection tracking would have to be changed to not actually remember the connection, but still make connmarks work.
Well, this will be a hack more or less then. What I think, the best, would be to add ipset support to ip rule. With this, you could make true src/dst lookup without hacking in kernel too much.