Bug 17912 (bogusip) - "bridge-nf-call-iptables" causes bogus IP-Pakets in connection with bridged VLAN (8021Q)
Summary: "bridge-nf-call-iptables" causes bogus IP-Pakets in connection with bridged V...
Status: RESOLVED OBSOLETE
Alias: bogusip
Product: Networking
Classification: Unclassified
Component: Netfilter/Iptables (show other bugs)
Hardware: All Linux
: P1 high
Assignee: networking_netfilter-iptables@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-06 11:24 UTC by Stephan Bärwolf
Modified: 2012-11-05 14:25 UTC (History)
3 users (show)

See Also:
Kernel Version: at least 2.6.33.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Stephan Bärwolf 2010-09-06 11:24:18 UTC
At least in the following testcase the netfilter seem to cause bogus IP pakets
in connection with tagged (in REORDER_HDR style) ethernetframes.
The bogus packets are wrong in size and content and only occur in receiving direction (from untagged networks point of view).

If a normal VLANID1999 tagged ethernet datagram (with IP content) looks the way
XXXXXXXXXXXXYYYYYYYYYYYY810007cf0800...
the bogus ones are in following the style:
XXXXXXXXXXXXYYYYYYYYYYYY08000000000007cf0800...

XX..XX and YY..YY are thereby the source/destination MAC-adresses.
In the bogus packets the nextheader=vlan (0x8100) is overwritten and 4 byte of zeros are attached followed by the rest of the frame.

The testszenario consits of 2 bridges, 3 real phys, 1 TUNTAP and one vlan (id=1999):

    1) br0 consists of eth0, eth2 (yes: eth-two), and tap0

    2) vlan1999 is backended by br0

    3) br1 consists of eth1 and vlan1999

If now eth1 sends an normal, untagged IP-packet (to/behind tap0), then you can catch the bogus packet in the described manner on tap0.
If tap0 sends an vid1999 tagged packet (to/behind eth1), the caught packets (on eth1 as tap0, too) are okay.
Both directions to/from "br1" work okay, too.

If you disable "/proc/sys/net/bridge/bridge-nf-call-iptables" (default=enable) or if you ENABLE "/proc/sys/net/bridge/bridge-nf-filter-vlan-tagged" (default=disable) no more bogus packets are produced.


I already debugged the situation a little bit and I suppose the skb is pushed (this time because of VLAN and so with 4 instead 0 bytes) one time too often in "br_netfilter.c". Maybe sometimes skb->protocol is incorrect (==ETH_P_IP, instead ETH_P_8021Q), too.

my stackdebug:
==============
In "br_nf_pre_routing" before "NF_HOOK(PF_INET,..." the packet is ok, after it not anymore.
"NF_HOOK(PF_INET,..." might call "br_nf_pre_routing_finish", where the packet before calling "NF_HOOK_THRESH(PF_BRIDGE,...)" its still ok.
"NF_HOOK_THRESH(PF_BRIDGE,...)" might call "br_nf_forward_finish".
There before "NF_HOOK_THRESH(PF_BRIDGE,...)" the packet is still still okay.
Through the last "NF_HOOK_THRESH(PF_BRIDGE,...)" "br_nf_dev_queue_xmit" is called and here I fail to debug further on...

In "br_nf_dev_queue_xmit" the return "return br_dev_queue_push_xmit(skb);" seems to destroy the packet. (before its ok, after it destroyed)


Maybe we could work together - please don't hesitate to write me.
Thanks.


Stephan Bärwolf, stephan.baerwolf@tu-ilmenau.de
Comment 1 Patrick McHardy 2010-09-16 09:25:42 UTC
(In reply to comment #0)
> The testszenario consits of 2 bridges, 3 real phys, 1 TUNTAP and one vlan
> (id=1999):
> 
>     1) br0 consists of eth0, eth2 (yes: eth-two), and tap0
> 
>     2) vlan1999 is backended by br0
> 
>     3) br1 consists of eth1 and vlan1999
> 
> If now eth1 sends an normal, untagged IP-packet (to/behind tap0), then you
> can
> catch the bogus packet in the described manner on tap0.

How are the two bridges connected? Are you routing locally between the bridges and/or are you using NAT?
Comment 2 Stephan Bärwolf 2010-09-16 10:44:49 UTC
Hi Mr. McHardy,

on the machine with bridges there is no ip_forwarding. 
Such things as NAT (NAPT) or ipfiltering are disabled, too.

The bridges br0 and br1 only interact passivly beacause of the vlan1999 in br1 backended with br0. Both interfaces host different IP-Subnets.

I haven't found any older kernel-version without this problem, yet.

Stephan Bärwolf, stephan.baerwolf@tu-ilmenau.de
Comment 3 Alan 2012-11-05 14:25:19 UTC
Closing as obsolete. If this is still seen on modern kernels (3.2+) please
update/reopen and also report this to netdev@vger.kernel.org

Note You need to log in before you can comment on or make changes to this bug.