Bug 8519 - NAT prerouting over tun interface broken
Summary: NAT prerouting over tun interface broken
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Netfilter/Iptables (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Herbert Xu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-21 13:07 UTC by Frans Pop
Modified: 2007-10-03 21:01 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.21.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
lsmod on the host (3.32 KB, text/plain)
2007-05-22 03:50 UTC, Frans Pop
Details
Output of cat /proc/net/ip_conntrack on the host (3.33 KB, text/plain)
2007-05-22 03:52 UTC, Frans Pop
Details
Wireshark capture file (2.74 KB, text/plain)
2007-05-22 08:44 UTC, Frans Pop
Details
Wireshark capture file for 2.6.20 (2.16 KB, application/octet-stream)
2007-05-22 09:17 UTC, Frans Pop
Details

Description Frans Pop 2007-05-21 13:07:04 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.20.7
Distribution: Debian unstable
Hardware Environment: EM64T (Pentium D) running amd64 kernel
Software Environment: Debian unstable

Problem Description:
I have the hercules s/390 emulator running on an EM64T host, both running 
Debian unstable. I use a tun interface, a second IP address on eth0 and 
iptables/nat so the emulator has it's own address on my local network.

With 2.6.21.1 on the host, networking between the emulator and the host system 
is fine (I can ssh from the host into the emulator without problems), but 
communication from the emulator with other boxes is broken. Other boxes also 
don't see the emulator if I ping its external address.

If I ping another box on my LAN from the emulator while running wireshark on 
the host, I can see that:
- the echo request gets sent OK
- the other box replies OK
- the host receives the echo reply
- but the tun interface never gets it.

If I boot the host with 2.6.20 everything works fine again.

Here is how the setup looks:
        |---------------- host system --------------------|
                                           |-- emulator --|
            eth0              tun              ctc0
 LAN  <---> 10.19.66.21   
 LAN  <---> 10.19.66.92 <---> 10.19.92.2 <---> 10.19.92.1
                         nat              P2P

The only active iptables rules are:
iptables -t nat -A PREROUTING -d 10.19.66.92 \
         -j DNAT --to-destination 10.19.92.1
iptables -t nat -A POSTROUTING -s 10.19.92.1 \
         -j SNAT --to-source 10.19.66.92
Comment 1 Patrick McHardy 2007-05-21 15:31:40 UTC
Andrew Morton wrote:
> On Mon, 21 May 2007 13:05:36 -0700
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>>Problem Description:
>>I have the hercules s/390 emulator running on an EM64T host, both running 
>>Debian unstable. I use a tun interface, a second IP address on eth0 and 
>>iptables/nat so the emulator has it's own address on my local network.
>>
>>With 2.6.21.1 on the host, networking between the emulator and the host system 
>>is fine (I can ssh from the host into the emulator without problems), but 
>>communication from the emulator with other boxes is broken. Other boxes also 
>>don't see the emulator if I ping its external address.
>>
>>If I ping another box on my LAN from the emulator while running wireshark on 
>>the host, I can see that:
>>- the echo request gets sent OK
>>- the other box replies OK
>>- the host receives the echo reply
>>- but the tun interface never gets it.
>>
>>If I boot the host with 2.6.20 everything works fine again.


Please post the output of lsmod and cat /proc/net/ip_conntrack after
sending a ping.

Comment 2 Frans Pop 2007-05-22 03:50:32 UTC
Created attachment 11565 [details]
lsmod on the host
Comment 3 Frans Pop 2007-05-22 03:52:50 UTC
Created attachment 11566 [details]
Output of cat /proc/net/ip_conntrack on the host
Comment 4 Frans Pop 2007-05-22 03:58:07 UTC
Both files are after trying to ping 10.19.66.1 from the emulator.

10.19.66.1 and 10.19.66.2 are the DNS servers for my local LAN.
Comment 5 Patrick McHardy 2007-05-22 06:50:21 UTC
icmp     1 12 src=10.19.66.11 dst=10.19.66.255 type=8 code=0 id=58132
packets=1 bytes=28 [UNREPLIED] src=10.19.66.255 dst=10.19.66.11 type=0
code=0 id=58132 packets=0 bytes=0 mark=0 secmark=0 use=1

icmp     1 12 src=10.19.66.11 dst=10.19.66.0 type=8 code=0 id=58132
packets=1 bytes=28 [UNREPLIED] src=10.19.66.0 dst=10.19.66.11 type=0
code=0 id=58132 packets=0 bytes=0 mark=0 secmark=0 use=1

The first one shows the broadcast address as destination, the second
one the network address. Are these really the addresses you pinged?

Comment 6 Frans Pop 2007-05-22 08:44:37 UTC
Created attachment 11567 [details]
Wireshark capture file

No, as I said in my later comment (#4), I pinged 10.19.66.1.
10.19.66.11 is my laptop. Has nothing to do with the ping. 

I've tried again with another host (one that has no special roles in the LAN)
and left a ping running for a long time inside the emulator. Nothing at all
shows up in ip_conntrack related to that ping during all that time.

Generally while the ping is running ip_conntrack only shows only the
established ssh session between the emulator and the host and nothing else:

tcp	 6 431563 ESTABLISHED src=10.19.92.2 dst=10.19.92.1 sport=34678
dport=22 packets=71 bytes=7036 src=10.19.92.1 dst=10.19.92.2 sport=22
dport=34678 packets=54 bytes=7072 [ASSURED] mark=0 secmark=0 use=1

So it looks like my timing was bad for the one I sent earlier :-/

Attached a wireshark capture file for the ping to that other host (10.19.66.19)
while listening on all interfaces.
Comment 7 Frans Pop 2007-05-22 09:17:53 UTC
Created attachment 11568 [details]
Wireshark capture file for 2.6.20

I have just done the same ping with 2.6.20, and with that there is also _no_
listing in ip_conntrack for a connection with 10.19.66.19, even though it
*does* work.

The attached wireshark capture file clearly shows the packages being forwarded
to 10.19.92.1 and the subsequent ssh traffic updating the display for the
output of the ping command.
Comment 8 Patrick McHardy 2007-05-23 02:57:10 UTC
The connection tracking entry for pings is destroyed as soon as a reply is seen,
so at least the reply seems to be propely associated with the connection. Can
you please add logging rules to check how far the packet makes it, something
like this:

for i in PREROUTING INPUT FORWARD OUTPUT POSTROUTING; do
    iptables -t mangle -I $i -p icmp -j LOG --log-prefix "$i "
done

Thanks.
Comment 9 Frans Pop 2007-05-23 03:55:15 UTC
Here's what I get after a 'ping -c 1 10.19.66.19':

PREROUTING IN=tun0 OUT= MAC= SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1872 SEQ=1
FORWARD IN=tun0 OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1872 SEQ=1
POSTROUTING IN= OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1872 SEQ=1
PREROUTING IN=eth0 OUT= MAC=00:16:76:04:ff:09:00:10:83:cf:15:a5:08:00 
SRC=10.19.66.19 DST=10.19.66.92 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=19817 
PROTO=ICMP TYPE=0 CODE=0 ID=1872 SEQ=1
Comment 10 Patrick McHardy 2007-05-23 05:06:49 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> ------- Additional Comments From elendil@planet.nl  2007-05-23 03:55 -------
> Here's what I get after a 'ping -c 1 10.19.66.19':
> 
> PREROUTING IN=tun0 OUT= MAC= SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
> PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1872 SEQ=1
> FORWARD IN=tun0 OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
> PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1872 SEQ=1
> POSTROUTING IN= OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
> PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1872 SEQ=1
> PREROUTING IN=eth0 OUT= MAC=00:16:76:04:ff:09:00:10:83:cf:15:a5:08:00 
> SRC=10.19.66.19 DST=10.19.66.92 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=19817 
> PROTO=ICMP TYPE=0 CODE=0 ID=1872 SEQ=1


This looks all OK, but I still can't figure out whats wrong. Could
you install the conntrack tool (included in debian unstable,
apt-get install conntrack) and post the ctnetlink events caused
by the ping (conntrack -E)? Your kernel needs both
CONFIG_NF_CONNTRACK_EVENTS and CONFIG_NF_CT_NETLINK for this.

Thanks.

Comment 11 Frans Pop 2007-05-23 06:02:01 UTC
Isn't there a final FORWARD missing?

Here's the output of conntrack -E after letting ping running a bit:
    [NEW] icmp     1 30 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
[UNREPLIED] src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041
    [NEW] icmp     1 30 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
[UNREPLIED] src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041
 [UPDATE] icmp     1 28 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041
[DESTROY] icmp     1 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
packets=1 bytes=84 src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041 
packets=0 bytes=0
 [UPDATE] icmp     1 30 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041
[DESTROY] icmp     1 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
packets=1 bytes=84 src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041 
packets=0 bytes=0
    [NEW] icmp     1 30 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
[UNREPLIED] src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041
 [UPDATE] icmp     1 30 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041
[DESTROY] icmp     1 src=10.19.92.1 dst=10.19.66.19 type=8 code=0 id=2041 
packets=1 bytes=84 src=10.19.66.19 dst=10.19.66.92 type=0 code=0 id=2041 
packets=0 bytes=0

BTW, thanks for your efforts on this and for your quick responses.
Comment 12 Frans Pop 2007-05-23 06:55:24 UTC
For comparison, here is what I get with 2.6.20 with the iptables logging.
Note the additional FORWARD and POSTROUTING lines.

PREROUTING IN=tun0 OUT= MAC= SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
IN=tun0 OUT= MAC= SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 PREC=0x00 
TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
IN=tun0 OUT= MAC= SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 PREC=0x00 
TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
FORWARD IN=tun0 OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
POSTROUTING IN= OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
IN= OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 PREC=0x00 TTL=63 
ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
PREROUTING IN=eth0 OUT= MAC=00:16:76:04:ff:09:00:10:83:cf:15:a5:08:00 
SRC=10.19.66.19 DST=10.19.66.92 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=46515 
PROTO=ICMP TYPE=0 CODE=0 ID=1642 SEQ=1
FORWARD IN=eth0 OUT=tun0 SRC=10.19.66.19 DST=10.19.92.1 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=46515 PROTO=ICMP TYPE=0 CODE=0 ID=1642 SEQ=1
POSTROUTING IN= OUT=tun0 SRC=10.19.66.19 DST=10.19.92.1 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=46515 PROTO=ICMP TYPE=0 CODE=0 ID=1642 SEQ=1

I also checked the output of conntrack -E with 2.6.20, but that looked 
identical.
Comment 13 Frans Pop 2007-05-23 07:01:48 UTC
Oops, I also had some '-t nat -j LOG' entries. here is a clean output for 
2.6.20 without those:

PREROUTING IN=tun0 OUT= MAC= SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
FORWARD IN=tun0 OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
POSTROUTING IN= OUT=eth0 SRC=10.19.92.1 DST=10.19.66.19 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=1642 SEQ=1
PREROUTING IN=eth0 OUT= MAC=00:16:76:04:ff:09:00:10:83:cf:15:a5:08:00 
SRC=10.19.66.19 DST=10.19.66.92 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=46515 
PROTO=ICMP TYPE=0 CODE=0 ID=1642 SEQ=1
FORWARD IN=eth0 OUT=tun0 SRC=10.19.66.19 DST=10.19.92.1 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=46515 PROTO=ICMP TYPE=0 CODE=0 ID=1642 SEQ=1
POSTROUTING IN= OUT=tun0 SRC=10.19.66.19 DST=10.19.92.1 LEN=84 TOS=0x00 
PREC=0x00 TTL=63 ID=46515 PROTO=ICMP TYPE=0 CODE=0 ID=1642 SEQ=1
Comment 14 Frans Pop 2007-05-27 11:41:21 UTC
As it looked to me we got stuck on this issue, I've done a git bisect and 
traced the regression to this commit:
commit 8030f54499925d073a88c09f30d5d844fb1b3190
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 22 01:53:47 2007 +0900

    [IPV4] devinet: Register inetdev earlier.

I have verified that this commit is the culprit by building a kernel from 
2.6.21 with only that commit reverted. With that kernel installed on the host 
I could again ping normally from within the emulator.

Up to you people to figure out *how* this seemingly innocent patch causes the 
regression :-)
Note that the commit is related to 45ba9dd2007da23da5ac21179451c3c9fee30a96, 
which does the same for IPv6.
Comment 15 Patrick McHardy 2007-05-27 12:03:15 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8519
> ------- Additional Comments From elendil@planet.nl  2007-05-27 11:41 -------
> As it looked to me we got stuck on this issue, I've done a git bisect and 
> traced the regression to this commit:
> commit 8030f54499925d073a88c09f30d5d844fb1b3190
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Thu Feb 22 01:53:47 2007 +0900
> 
>     [IPV4] devinet: Register inetdev earlier.
> 
> I have verified that this commit is the culprit by building a kernel from 
> 2.6.21 with only that commit reverted. With that kernel installed on the host 
> I could again ping normally from within the emulator.
> 
> Up to you people to figure out *how* this seemingly innocent patch causes the 
> regression :-)
> Note that the commit is related to 45ba9dd2007da23da5ac21179451c3c9fee30a96, 
> which does the same for IPv6.


Thanks a lot and sorry for the delay! This should make it a lot easier
to figure out whats wrong. I'll look into it ..

Comment 16 Patrick McHardy 2007-05-27 12:14:17 UTC
I can not see any side-effects of this change that could be responsible.

Herbert, would you mind having a look?
Comment 17 Herbert Xu 2007-05-28 01:21:55 UTC
Hi:

I suggest that you have a look at the rp_filter setting on eth0.  If it's
enabled then try disabling it.
Comment 18 Herbert Xu 2007-05-28 01:22:36 UTC
Oh and if that doesn't help then please take a capture of all the content of
/proc/sys/net/ipv4/eth0 with and without the patch to see if they're different.

Thanks.
Comment 19 Frans Pop 2007-05-28 01:59:21 UTC
> I suggest that you have a look at the rp_filter setting on eth0.
$ cat /proc/sys/net/ipv4/conf/eth0/rp_filter
0

The second suggestion shows the problem:
--- ipv4.good   2007-05-28 10:43:32.000000000 +0200
+++ ipv4.bad    2007-05-28 10:40:44.000000000 +0200
 /proc/sys/net/ipv4/conf/eth0/forwarding:
-1
+0
 /proc/sys/net/ipv4/conf/eth0/rp_filter:
-1
+0

Now, how does this patch cause the "forwarding" setting to not be set?

My /etc/sysctl.conf contains:
<snip>
# Uncomment the next line to enable Spoof protection (reverse-path filter)
net.ipv4.conf.default.rp_filter=1

# Uncomment the next line to enable TCP/IP SYN cookies
net.ipv4.tcp_syncookies=1

# Uncomment the next line to enable packet forwarding for IPv4
net.ipv4.conf.default.forwarding=1
</snip>

This file is read during /etc/rcS.d/S30procps.sh.

(/me feels kind of stupid for not checking these values earlier, but as the 
configuration had always worked fine...)
Comment 20 Herbert Xu 2007-05-28 03:46:33 UTC
Changing the value in default only affects interfaces which are registered
afterwards.  Previously they affected interfaces which are brought up afterwards.

I'll talk to others to see if we could come up with a way to minimise this sort
of pain.
Comment 21 Frans Pop 2007-05-28 04:18:33 UTC
> Changing the value in default only affects interfaces which are registered
> afterwards.  Previously they affected interfaces which are brought up
> afterwards.

To me his seems like a significant change in behavior that is going to affect 
and probably break quite a few systems. I don't know about other 
distributions, but at least in Debian this has been the standard way of 
setting such values for all interfaces for a long time.

I wonder if the change was intentional or an unexpected result of this commit? 
At least the changelog offers no indication that this was intended.

I hope you will find a way to resolve this. If the new behavior remains, this 
will require at least careful documentation.
Comment 22 Herbert Xu 2007-10-03 21:01:27 UTC
It should be fixed in the current kernel.

Note You need to log in before you can comment on or make changes to this bug.