Bug 16184

Summary: Container, X86-64, i386, iptables rule
Product: Networking Reporter: Jean-Marc Pigeon (jmp)
Component: Netfilter/IptablesAssignee: networking_netfilter-iptables (networking_netfilter-iptables)
Status: CLOSED CODE_FIX    
Severity: blocking CC: cebbert, florian, kaber, maciej.rutecki, rjw
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 2.6.35-rc1,2,3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    

Description Jean-Marc Pigeon 2010-06-12 04:17:43 UTC
IS WORKING with 2.6.34, problem still present in 2.6.35-rc2

container network is down because of iptables,  this ONLY if container
is i386 (x86_64 arch container are working no trouble (strange?)).

host is 2.6.35-rc1   (fc13)
container devices are veth (with a bridge at host level).

All container in i386 arch boot correctly, but network is
unable to route packet (ping is not responding).

Problem is related to iptables activation,
adding a SINGLE RULE  (within container iptables)
make the trouble.

here the iptables example

*filter
:INPUT          ACCEPT  [0:0]
:FORWARD        ACCEPT  [0:0]
:OUTPUT         ACCEPT  [0:0]
:std               -    [0:0]
#-----------------------------------------------------------
#standard rules
#-A FORWARD -j std
#troubled Sequence !
-A INPUT -j std         #no packet reach if using that sequence
-A std -j ACCEPT        #
#correct No trouble 
-A INPUT -j ACCEPT      #packet are accepted.
#-----------------------------------------------------------
COMMIT


I didn't try using a host with an i386 arch 2.6.35-rc1 kernel, only host
x86_64 with a mix of different distribution (fc10 -> fc12) and arch as containers.
Comment 1 Jean-Marc Pigeon 2010-06-12 12:31:28 UTC
Bug still within 2.6.35-rc3
exact same workbench used with 2.6.34 is working
Comment 2 Patrick McHardy 2010-06-14 12:02:24 UTC
Please try adding a "-j TRACE" rule to see where the packets disappear.
Comment 3 Jean-Marc Pigeon 2010-06-16 02:16:28 UTC
Here are the data (just ONE ping)

Jun 15 22:02:11 Sorel kernel: [ 1886.459895] TRACE: mangle:PREROUTING:policy:1 IN=eth0 OUT= MAC=00:26:b9:67:a7:e1:00:19:d1:a8:be:cd:08:00 SRC=X.Y.Z.T DST=192.168.31.34 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=40727 SEQ=1
Jun 15 22:02:11 Sorel kernel: [ 1886.459923] TRACE: nat:PREROUTING:policy:1 IN=eth0 OUT= MAC=00:26:b9:67:a7:e1:00:19:d1:a8:be:cd:08:00 SRC=X.Y.Z.T DST=192.168.31.34 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=40727 SEQ=1
Jun 15 22:02:11 Sorel kernel: [ 1886.459953] TRACE: mangle:FORWARD:policy:1 IN=eth0 OUT=br0 SRC=X.Y.Z.T DST=192.168.31.34 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=40727 SEQ=1
Jun 15 22:02:11 Sorel kernel: [ 1886.459968] TRACE: filter:FORWARD:rule:1 IN=eth0 OUT=br0 SRC=X.Y.Z.T DST=192.168.31.34 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=40727 SEQ=1
Jun 15 22:02:11 Sorel kernel: [ 1886.459987] TRACE: filter:std:rule:7 IN=eth0 OUT=br0 SRC=X.Y.Z.T DST=192.168.31.34 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=40727 SEQ=1
Jun 15 22:02:11 Sorel kernel: [ 1886.460004] TRACE: mangle:POSTROUTING:policy:1 IN= OUT=br0 SRC=X.Y.Z.T DST=192.168.31.34 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=40727 SEQ=1
Jun 15 22:02:11 Sorel kernel: [ 1886.460020] TRACE: nat:POSTROUTING:policy:1 IN= OUT=br0 SRC=X.Y.Z.T DST=192.168.31.34 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=40727 SEQ=1




Keep in mind, we are working in container mode. command line
iptables -t raw -p icmp -s X.Y.Z.T/32  -A PREROUTING -j TRACE
was applied on the container side (within /etc/rc.d/rc.local, X.Y.Z.T
is a public IP server)
but seem to apply to host side too...

Here is data for a x86-64 container on the same host (ping working)

Jun 15 22:10:23 Sorel kernel: [ 2378.271703] TRACE: raw:PREROUTING:policy:2 IN=eth0 OUT= MAC=00:26:b9:67:a7:e1:00:19:d1:a8:be:cd:08:00 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1  
Jun 15 22:10:23 Sorel kernel: [ 2378.271738] TRACE: mangle:PREROUTING:policy:1 IN=eth0 OUT= MAC=00:26:b9:67:a7:e1:00:19:d1:a8:be:cd:08:00 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1  
Jun 15 22:10:23 Sorel kernel: [ 2378.271766] TRACE: nat:PREROUTING:policy:1 IN=eth0 OUT= MAC=00:26:b9:67:a7:e1:00:19:d1:a8:be:cd:08:00 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1  
Jun 15 22:10:23 Sorel kernel: [ 2378.271796] TRACE: mangle:FORWARD:policy:1 IN=eth0 OUT=br0 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1  
Jun 15 22:10:23 Sorel kernel: [ 2378.271811] TRACE: filter:FORWARD:rule:1 IN=eth0 OUT=br0 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1  
Jun 15 22:10:23 Sorel kernel: [ 2378.271829] TRACE: filter:std:rule:7 IN=eth0 OUT=br0 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1  
Jun 15 22:10:23 Sorel kernel: [ 2378.271846] TRACE: mangle:POSTROUTING:policy:1 IN= OUT=br0 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1  
Jun 15 22:10:23 Sorel kernel: [ 2378.271861] TRACE: nat:POSTROUTING:policy:1 IN= OUT=br0 SRC=X.Y.Z.T DST=192.168.31.35 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=16152 SEQ=1


I see no noticeable difference... beyond out packet on br0 we are in the dark.
Comment 4 Patrick McHardy 2010-06-16 02:22:31 UTC
It might be related to bridge netfilter changes. Do you have CONFIG_BRIDGE_NETFILTER enabled? If so, please try:

echo 0 >/proc/sys/net/bridge/bridge-nf-call-iptables

if that doesn't help, try disabling the config option.

Do you see the packet in tcpdump in br0?
Comment 5 Jean-Marc Pigeon 2010-06-16 02:31:26 UTC
Yes,
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_DEBUG=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

Now,
cat /proc/sys/net/bridge/bridge-nf-call-iptables
0

Same x86-64 OK, i386 not
Comment 6 Jean-Marc Pigeon 2010-06-16 02:38:02 UTC
TCPDUMP br0 show (host side)
to i386
22:36:25.729891 IP X.Y.Z.T > 192.168.31.34: ICMP echo request, id 60696, seq 1, length 64

to x86-64
22:37:15.964035 IP X.Y.Z.T > 192.168.31.35: ICMP echo request, id 61720, seq 1, length 64
22:37:15.964077 IP 192.168.31.35 > X.Y.Z.T: ICMP echo reply, id 61720, seq 1, length 64
Comment 7 Patrick McHardy 2010-06-16 02:39:59 UTC
It might be corrupting the MAC header. What does tcpdump show using '-e'? Please also post the output of tcpdump on the underlying ethernet device.
Comment 8 Jean-Marc Pigeon 2010-06-16 02:55:47 UTC
NO packet received on the container side.

tcpdump -e on the host side.


22:52:19.100993 Out ea:13:7d:b6:4c:71 ethertype IPv4 (0x0800), length 100: X.Y.Z.T > 192.168.31.34: ICMP echo request, id 24345, seq 1, length 64
22:52:19.100998 Out ea:13:7d:b6:4c:71 ethertype IPv4 (0x0800), length 100: X.Y.Z.T > 192.168.31.34: ICMP echo request, id 24345, seq 1, length 64



22:52:35.005870  In 00:19:d1:a8:be:cd ethertype IPv4 (0x0800), length 100: X.Y.Z.T > 192.168.31.34: ICMP echo request, id 24857, seq 1, length 64
22:52:35.005891 Out ea:13:7d:b6:4c:71 ethertype IPv4 (0x0800), length 100: X.Y.Z.T > 192.168.31.34: ICMP echo request, id 24857, seq 1, length 64
22:52:35.005897 Out ea:13:7d:b6:4c:71 ethertype IPv4 (0x0800), length 100: X.Y.Z.T > 192.168.31.34: ICMP echo request, id 24857, seq 1, length 64


Do not no if this is meaningful.
first ping after the container boot I have two traces output.

second ping 3 traces output.

ping used was ping -c1 192.168.31.34
Comment 9 Jean-Marc Pigeon 2010-06-16 03:00:51 UTC
New kernel with config

CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_DEBUG=y
# CONFIG_NETFILTER_ADVANCED is not set

kernel=2.6.35-rc3-NONET-trace-1+

Same trouble.
Comment 10 Patrick McHardy 2010-06-16 03:01:55 UTC
I'm only seeing a single MAC address in the traces. Please attach a binary dump (-s0 -w dump) from both -i eth0 and -i br0 on the host side.
Comment 11 Patrick McHardy 2010-06-16 03:02:40 UTC
OK now we've overlapped :) So its not bridge netfilter related. I'm currently out of ideas, let me think about it.
Comment 12 Jean-Marc Pigeon 2010-06-16 03:10:39 UTC
command used was
/usr/sbin/tcpdump -n -i any -e icmp and host X.Y.Z.T

so all interface are "dumped" about icmp packet coming from X.Y.Z.T
Comment 13 Rafael J. Wysocki 2010-07-08 23:06:39 UTC
Handled-By : Patrick McHardy <kaber@trash.net>
Comment 14 Jean-Marc Pigeon 2010-07-10 20:10:45 UTC
Problem still present with 2.6.35-rc4

I have redone the test.
exact same container used with both 2.6.34 and 2.6.35-rc4


Please Note:  (hopefully this is meaningful)

there is a bug within 2.6.34 about container and /sys

on 2.6.34 /sys is NOT a namespace, such
cd /sys/class/net show:

lrwxrwxrwx 1 root root 0 Jul 10 16:00 br0 -> ../../devices/virtual/net/br0
lrwxrwxrwx 1 root root 0 Jul 10 16:00 eth0 -> ../../devices/pci0000:00/0000:00:1c.5/0000:04:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Jul 10 16:00 lo -> ../../devices/virtual/net/lo
lrwxrwxrwx 1 root root 0 Jul 10 16:00 sit0 -> ../../devices/virtual/net/sit0
lrwxrwxrwx 1 root root 0 Jul 10 16:00 To_2595 -> ../../devices/virtual/net/To_2595
lrwxrwxrwx 1 root root 0 Jul 10 16:00 To_2662 -> ../../devices/virtual/net/To_2662
lrwxrwxrwx 1 root root 0 Jul 10 16:00 To_2749 -> ../../devices/virtual/net/To_2749

This means within a container you cans see host network devices

This bug is fixed starting with 2.6.35
now, /sys/class/net dislay
lrwxrwxrwx 1 root root 0 Jul 10 16:05 eth0 -> ../../devices/virtual/net/eth0
lrwxrwxrwx 1 root root 0 Jul 10 16:05 lo -> ../../devices/virtual/net/lo
lrwxrwxrwx 1 root root 0 Jul 10 16:06 sit0 -> ../../devices/virtual/net/sit0

this is perfect...

I wonder if the iptables problem could be related to /sys improvement, if such
why only while in 386 mode?

I have done some test with a container using an RH8.0 template and the
Comment 15 Jean-Marc Pigeon 2010-07-10 20:15:06 UTC
Problem STILL PRESENT with 2.6.35-rc4

I have redone the test.
exact same containers used with both 2.6.34 and 2.6.35-rc4


Please Note:  (hopefully this is meaningful)

there is a bug within 2.6.34 about container and /sys

on 2.6.34 /sys is NOT a namespace, such
cd /sys/class/net show:

lrwxrwxrwx 1 root root 0 Jul 10 16:00 br0 -> ../../devices/virtual/net/br0
lrwxrwxrwx 1 root root 0 Jul 10 16:00 eth0 -> ../../devices/pci0000:00/0000:00:1c.5/0000:04:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Jul 10 16:00 lo -> ../../devices/virtual/net/lo
lrwxrwxrwx 1 root root 0 Jul 10 16:00 sit0 -> ../../devices/virtual/net/sit0
lrwxrwxrwx 1 root root 0 Jul 10 16:00 To_2595 -> ../../devices/virtual/net/To_2595
lrwxrwxrwx 1 root root 0 Jul 10 16:00 To_2662 -> ../../devices/virtual/net/To_2662
lrwxrwxrwx 1 root root 0 Jul 10 16:00 To_2749 -> ../../devices/virtual/net/To_2749

This means within a container you can see "host network devices"

This bug is fixed starting with 2.6.35
now, /sys/class/net dislay only container own network devices.
lrwxrwxrwx 1 root root 0 Jul 10 16:05 eth0 -> ../../devices/virtual/net/eth0
lrwxrwxrwx 1 root root 0 Jul 10 16:05 lo -> ../../devices/virtual/net/lo
lrwxrwxrwx 1 root root 0 Jul 10 16:06 sit0 -> ../../devices/virtual/net/sit0

this is perfect...

I wonder if the iptables problem could be related to /sys improvement, if such
why only while in 386 mode?

I have done some test with a container using an (old) RH8.0 template and the
problem is the same.
Comment 16 Patrick McHardy 2010-07-23 12:20:45 UTC
Sorry for the delay. I'll try to reproduce locally.
Comment 17 Florian Mickler 2010-09-07 06:10:20 UTC
Is this issue still existent in current mainline kernels?
Comment 18 Chuck Ebbert 2010-09-14 22:06:09 UTC
I believe the fix is in 2.6.36-rc3:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=cca77b7c81876d819a5806f408b3c29b5b61a815

It's also in the queue for 2.6.35-stable.
Comment 19 Jean-Marc Pigeon 2010-10-02 23:47:11 UTC
confirm bug is fixed in 2.6.36-rc6
Thanks
Comment 20 Florian Mickler 2010-10-03 10:08:05 UTC
Thank you for the confirmation!