Bug 12327 - Intermittent TCP issues with => 2.6.27
Summary: Intermittent TCP issues with => 2.6.27
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-29 18:52 UTC by Dean Holland
Modified: 2012-05-24 13:43 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.27
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Dean Holland 2008-12-29 18:52:39 UTC
Latest working kernel version: 2.6.26.8
Earliest failing kernel version: 2.6.27
Distribution: Ubuntu
Hardware Environment: amd64, KVM
Software Environment:
Problem Description:

As reported in LP #296767 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296767) I am experiencing intermittent TCP issues over a PPP ADSL2+ connection with the only change being an upgrade to 2.6.27.

A number of websites, ping, traceroute work correctly but I simply can't connect to several including:

- store.apple.com
- youtube.com
- ANZ internet banking (anz.com.au)
- MSN messenger

I have also tried compiling a generic 2.6.28-rc4 kernel and this still suffers from the same issue, however if I reboot into the previous Ubuntu kernel (2.6.24) or a vanilla 2.6.26 kernel the issue disappears.

Steps to reproduce:

1. Use a KVM guest as a gateway to a PPP internet connection
2. Boot with kernel <= 2.6.26
3. Observe functioning networking
4. Boot into 2.6.27+
5. Observe broken networking
Comment 1 Anonymous Emailer 2008-12-29 21:41:09 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 29 Dec 2008 18:52:40 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12327
> 
>            Summary: Intermittent TCP issues with => 2.6.27
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.27
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: speedster@haveacry.com
> 
> 
> Latest working kernel version: 2.6.26.8
> Earliest failing kernel version: 2.6.27
> Distribution: Ubuntu
> Hardware Environment: amd64, KVM
> Software Environment:
> Problem Description:
> 
> As reported in LP #296767
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296767) I am
> experiencing
> intermittent TCP issues over a PPP ADSL2+ connection with the only change
> being
> an upgrade to 2.6.27.
> 
> A number of websites, ping, traceroute work correctly but I simply can't
> connect to several including:
> 
> - store.apple.com
> - youtube.com
> - ANZ internet banking (anz.com.au)
> - MSN messenger
> 
> I have also tried compiling a generic 2.6.28-rc4 kernel and this still
> suffers
> from the same issue, however if I reboot into the previous Ubuntu kernel
> (2.6.24) or a vanilla 2.6.26 kernel the issue disappears.
> 
> Steps to reproduce:
> 
> 1. Use a KVM guest as a gateway to a PPP internet connection
> 2. Boot with kernel <= 2.6.26
> 3. Observe functioning networking
> 4. Boot into 2.6.27+
> 5. Observe broken networking
Comment 2 Ilpo Järvinen 2008-12-31 12:32:37 UTC
On Mon, 29 Dec 2008, Andrew Morton wrote:

> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Mon, 29 Dec 2008 18:52:40 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=12327
> > 
> >            Summary: Intermittent TCP issues with => 2.6.27
> >            Product: Networking
> >            Version: 2.5
> >      KernelVersion: 2.6.27
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: IPV4
> >         AssignedTo: shemminger@linux-foundation.org
> >         ReportedBy: speedster@haveacry.com
> > 
> > 
> > Latest working kernel version: 2.6.26.8
> > Earliest failing kernel version: 2.6.27
> > Distribution: Ubuntu
> > Hardware Environment: amd64, KVM
> > Software Environment:
> > Problem Description:
> > 
> > As reported in LP #296767
> > (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296767) I am
> experiencing
> > intermittent TCP issues over a PPP ADSL2+ connection with the only change
> being
> > an upgrade to 2.6.27.
> > 
> > A number of websites, ping, traceroute work correctly but I simply can't
> > connect to several including:
> > 
> > - store.apple.com
> > - youtube.com
> > - ANZ internet banking (anz.com.au)
> > - MSN messenger
> >
> > I have also tried compiling a generic 2.6.28-rc4 kernel and this still
> suffers
> > from the same issue, however if I reboot into the previous Ubuntu kernel
> > (2.6.24) or a vanilla 2.6.26 kernel the issue disappears.
> > 
> > Steps to reproduce:
> > 
> > 1. Use a KVM guest as a gateway to a PPP internet connection
> > 2. Boot with kernel <= 2.6.26
> > 3. Observe functioning networking
> > 4. Boot into 2.6.27+
> > 5. Observe broken networking

Can you please describe the full topology (which is connected to where and 
using what, and locations of nats, tun/taps, netfilter things, etc.)... 
There's some contradiction between the ubuntu report description and what 
you're giving here. 

Based on your dumps I find it unlikely that the problem would be in the 
end host tcp but I'll verify the packets field by field still to be 
absolutely sure. I'd guess that either the sent packet or reply gets 
lost somewhere since it never arrives with 2.6.27/2.6.28-rcx.
Comment 3 Dean Holland 2008-12-31 15:22:38 UTC
Ilpo J
Comment 4 Herbert Xu 2009-01-02 00:39:03 UTC
On Wed, Dec 31, 2008 at 11:22:23PM +0000, Speedster wrote:
>
> The gateway machine (whinge) runs as a KVM guest, and shares a physical  
> host with three other guests (one Windows, two Linux). Below are the  
> outputs of bridge topology and VLAN tagging on the physical host.

What driver are you using on whinge?

When you took the packet dumps, was it on whinge or the physical
host? If it was taken on whinge, can you try to take the same
dumps on the outgoing physical interface on the host?

Thanks,
Comment 5 Dean Holland 2009-01-05 03:20:31 UTC
Herbert Xu wrote:
> On Wed, Dec 31, 2008 at 11:22:23PM +0000, Speedster wrote:
>> The gateway machine (whinge) runs as a KVM guest, and shares a physical  
>> host with three other guests (one Windows, two Linux). Below are the  
>> outputs of bridge topology and VLAN tagging on the physical host.
> 
> What driver are you using on whinge?
> 
> When you took the packet dumps, was it on whinge or the physical
> host? If it was taken on whinge, can you try to take the same
> dumps on the outgoing physical interface on the host?
> 
> Thanks,

I have tried both virtio and the default (8139too). The original dumps 
were taken on whinge, I have attached two new dumps showing the 
functional (2.6.24) and non-functional (2.6.27) states from the host.

Cheers
Dean
Comment 6 Ilpo Järvinen 2009-01-06 11:11:05 UTC
On Mon, 5 Jan 2009, Speedster wrote:

> Herbert Xu wrote:
> > On Wed, Dec 31, 2008 at 11:22:23PM +0000, Speedster wrote:
> > > The gateway machine (whinge) runs as a KVM guest, and shares a physical
> > > host with three other guests (one Windows, two Linux). Below are the
> > > outputs of bridge topology and VLAN tagging on the physical host.
> > 
> > What driver are you using on whinge?
> > 
> > When you took the packet dumps, was it on whinge or the physical
> > host? If it was taken on whinge, can you try to take the same
> > dumps on the outgoing physical interface on the host?
> > 
> > Thanks,
> 
> I have tried both virtio and the default (8139too).

And what is the actual eth hw on the host?

> The original dumps were taken on whinge, I have attached two new dumps 
> showing the functional (2.6.24) and non-functional (2.6.27) states from 
> the host.

At least to me those didn't reveal anything new :-(.
Comment 7 Dean Holland 2009-01-06 13:20:40 UTC
Ilpo J
Comment 8 Herbert Xu 2009-01-06 20:17:56 UTC
On Mon, Jan 05, 2009 at 08:19:46PM +0900, Speedster wrote:
>
> I have tried both virtio and the default (8139too). The original dumps  
> were taken on whinge, I have attached two new dumps showing the  
> functional (2.6.24) and non-functional (2.6.27) states from the host.

The MAC on the PPPoE packets appears bogus:

22:07:58.344027 00:16:3e:24:f7:a1 > 00:18:18:54:19:54

Can you also attach a dump for connections that do work under
2.6.27? That should tell us whether this MAC is genuine.

If it is bogus, can you dump on the other interfaces to see
who is corrupting it?

Cheers,
Comment 9 Dean Holland 2009-01-07 05:50:33 UTC
Herbert Xu wrote:
> Can you also attach a dump for connections that do work under
> 2.6.27? That should tell us whether this MAC is genuine.

Done. I also realised my wireshark display filter was dst_host rather 
than simply host - it looks as though replies _do_ come back but for 
some reason are ignored? My apologies.

The 2.6.27 dump contains a connection to a local file mirror 
(203.21.20.200) and an attempted connection to store.apple.com 
(17.149.156.10).

> If it is bogus, can you dump on the other interfaces to see
> who is corrupting it?
> 

I believe the differing MAC addresses on the PPP remote endpoint are due 
to my ISP having multiple layer 2 termination routers for 
load-balancing/redundancy and not a symptom of the issue.

Cheers,
Dean
Comment 10 Herbert Xu 2009-01-07 19:08:14 UTC
On Wed, Jan 07, 2009 at 10:49:55PM +0900, Speedster wrote:
>
> Done. I also realised my wireshark display filter was dst_host rather  
> than simply host - it looks as though replies _do_ come back but for  
> some reason are ignored? My apologies.

Great I think we're getting closer.  With your latest dumps it'd
appear that there is an odd difference between the replies that
get through and the ones that don't.  The ones that're somehow
dropped have 2 bytes of transport padding at the end.  I suspect
there's something buggy in your system that's dropping it because
of this.

Can you please take a look at /proc/net/snmp on the host and the
guest to see if IP InDiscards is non-zero?

Also now that we know the problem is definitely in the host/guest
please take another set of dumps on the interfaces leading to and
within the guest to see exactly which path of the system is dropping
the reply.

Thanks,
Comment 11 Ilpo Järvinen 2009-01-08 05:14:03 UTC
On Thu, 8 Jan 2009, Herbert Xu wrote:

> On Wed, Jan 07, 2009 at 10:49:55PM +0900, Speedster wrote:
> >
> > Done. I also realised my wireshark display filter was dst_host rather  
> > than simply host - it looks as though replies _do_ come back but for  
> > some reason are ignored? My apologies.
> 
> Great I think we're getting closer.  With your latest dumps it'd
> appear that there is an odd difference between the replies that
> get through and the ones that don't.  The ones that're somehow
> dropped have 2 bytes of transport padding at the end.  I suspect
> there's something buggy in your system that's dropping it because
> of this.
> 
> Can you please take a look at /proc/net/snmp on the host and the
> guest to see if IP InDiscards is non-zero?
> 
> Also now that we know the problem is definitely in the host/guest
> please take another set of dumps on the interfaces leading to and
> within the guest to see exactly which path of the system is dropping
> the reply.

Another possiblity that comes into my mind is that at TCP side seqnos get 
messed up because of skb->len getting that padding counted in (end_seq 
is depending on skb->len) and something don't like encountering such 
a synack.
Comment 12 Dean Holland 2009-01-08 07:05:24 UTC
Herbert Xu wrote:
> 
> Can you please take a look at /proc/net/snmp on the host and the
> guest to see if IP InDiscards is non-zero?

Both are 0

> Also now that we know the problem is definitely in the host/guest
> please take another set of dumps on the interfaces leading to and
> within the guest to see exactly which path of the system is dropping
> the reply.

Attached (all are the exact same attempted connection), and reveal some 
interesting information.

The path the inbound traffic should take is
1. vlan50 (host)
2. tap interface vnet3 (host) / eth0 (guest)
3. ppp0 (guest)

It looks as though when it is sent out the tap interface the payload 
length is incorrect in the PPPoE section of the frame. When it arrives 
via vlan50 it appears fine. Or at least that's what wireshark highlights 
for me :)

Dean
Comment 13 Anonymous Emailer 2009-01-08 08:38:32 UTC
Reply-To: shemminger@vyatta.com

On Fri, 09 Jan 2009 00:04:42 +0900
Speedster <speedster@haveacry.com> wrote:

> Herbert Xu wrote:
> > 
> > Can you please take a look at /proc/net/snmp on the host and the
> > guest to see if IP InDiscards is non-zero?
> 
> Both are 0
> 
> > Also now that we know the problem is definitely in the host/guest
> > please take another set of dumps on the interfaces leading to and
> > within the guest to see exactly which path of the system is dropping
> > the reply.
> 
> Attached (all are the exact same attempted connection), and reveal some 
> interesting information.
> 
> The path the inbound traffic should take is
> 1. vlan50 (host)
> 2. tap interface vnet3 (host) / eth0 (guest)
> 3. ppp0 (guest)
> 
> It looks as though when it is sent out the tap interface the payload 
> length is incorrect in the PPPoE section of the frame. When it arrives 
> via vlan50 it appears fine. Or at least that's what wireshark highlights 
> for me :)
> 
> Dean

Maybe there is an issue that GRO receive isn't handling padding
properly?
Comment 14 Ilpo Järvinen 2009-01-08 11:40:09 UTC
On Thu, 8 Jan 2009, Stephen Hemminger wrote:

> On Fri, 09 Jan 2009 00:04:42 +0900
> Speedster <speedster@haveacry.com> wrote:
> 
> > Herbert Xu wrote:
> > > 
> > > Can you please take a look at /proc/net/snmp on the host and the
> > > guest to see if IP InDiscards is non-zero?
> > 
> > Both are 0
> > 
> > > Also now that we know the problem is definitely in the host/guest
> > > please take another set of dumps on the interfaces leading to and
> > > within the guest to see exactly which path of the system is dropping
> > > the reply.
> > 
> > Attached (all are the exact same attempted connection), and reveal some 
> > interesting information.
> > 
> > The path the inbound traffic should take is
> > 1. vlan50 (host)
> > 2. tap interface vnet3 (host) / eth0 (guest)
> > 3. ppp0 (guest)
> > 
> > It looks as though when it is sent out the tap interface the payload 
> > length is incorrect in the PPPoE section of the frame. When it arrives 
> > via vlan50 it appears fine. Or at least that's what wireshark highlights 
> > for me :)
> 
> Maybe there is an issue that GRO receive isn't handling padding
> properly?

Hmm, is gro supposed to have something to do with 2.6.27??? Or are you 
talking something else than Herbert's recent GRO stuff?
Comment 15 Anonymous Emailer 2009-01-08 11:55:30 UTC
Reply-To: shemminger@vyatta.com

On Thu, 8 Jan 2009 21:39:30 +0200 (EET)
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:

> On Thu, 8 Jan 2009, Stephen Hemminger wrote:
> 
> > On Fri, 09 Jan 2009 00:04:42 +0900
> > Speedster <speedster@haveacry.com> wrote:
> > 
> > > Herbert Xu wrote:
> > > > 
> > > > Can you please take a look at /proc/net/snmp on the host and the
> > > > guest to see if IP InDiscards is non-zero?
> > > 
> > > Both are 0
> > > 
> > > > Also now that we know the problem is definitely in the host/guest
> > > > please take another set of dumps on the interfaces leading to and
> > > > within the guest to see exactly which path of the system is dropping
> > > > the reply.
> > > 
> > > Attached (all are the exact same attempted connection), and reveal some 
> > > interesting information.
> > > 
> > > The path the inbound traffic should take is
> > > 1. vlan50 (host)
> > > 2. tap interface vnet3 (host) / eth0 (guest)
> > > 3. ppp0 (guest)
> > > 
> > > It looks as though when it is sent out the tap interface the payload 
> > > length is incorrect in the PPPoE section of the frame. When it arrives 
> > > via vlan50 it appears fine. Or at least that's what wireshark highlights 
> > > for me :)
> > 
> > Maybe there is an issue that GRO receive isn't handling padding
> > properly?
> 
> Hmm, is gro supposed to have something to do with 2.6.27??? Or are you 
> talking something else than Herbert's recent GRO stuff?

Trying to find a common thread of why splice and TCP is having
issues in some cases.
Comment 16 Herbert Xu 2009-01-08 13:54:57 UTC
On Thu, Jan 08, 2009 at 08:37:53AM -0800, Stephen Hemminger wrote:
>
> Maybe there is an issue that GRO receive isn't handling padding
> properly?

You are one release early Stephen :) Also GRO can't be enabled
without ethtool yet.

Cheers,
Comment 17 Anonymous Emailer 2009-01-08 15:58:45 UTC
Reply-To: john.dykstra1@gmail.com

On Fri, 2009-01-09 at 00:04 +0900, Speedster wrote:

> Attached (all are the exact same attempted connection), and reveal some
> interesting information.
>
> The path the inbound traffic should take is
> 1. vlan50 (host)
> 2. tap interface vnet3 (host) / eth0 (guest)
> 3. ppp0 (guest)
>
> It looks as though when it is sent out the tap interface the payload
> length is incorrect in the PPPoE section of the frame. When it arrives
> via vlan50 it appears fine. Or at least that's what wireshark highlights
> for me :)

The length field in the PPPoE header doesn't change.

The packet arrives from the ISP with its PPPoE header length field showing
two extra bytes past the IP payload.  The bridge or something close to it
trims those two bytes from the sk_buff, leaving a packet whose PPPoE header
is wrong, but whose upper headers make sense.  The PPPoE driver in the
virtual machine then drops the packet.

I haven't unscrambled where the drop occurs in the PPPoE driver, but that's
presumably what changed in 2.6.27.  It seems to me that the drop is proper.
What's wrong is trimming the sk_buff to match the IP header while ignoring
the L2 header, and that's apparently been that way for a while (if I
understand what the reporter is running where).

Or have I screwed up my first posting to netdev?

  --  John
On Fri, 2009-01-09 at 00:04 +0900, Speedster wrote:<br><br>&gt; Attached (all are the exact same attempted connection), and reveal some <br>&gt; interesting information.<br>&gt; <br>&gt; The path the inbound traffic should take is<br>
&gt; 1. vlan50 (host)<br>&gt; 2. tap interface vnet3 (host) / eth0 (guest)<br>&gt; 3. ppp0 (guest)<br>&gt; <br>&gt; It looks as though when it is sent out the tap interface the payload <br>&gt; length is incorrect in the PPPoE section of the frame. When it arrives <br>
&gt; via vlan50 it appears fine. Or at least that&#39;s what wireshark highlights <br>&gt; for me :)<br><br>The length field in the PPPoE header doesn&#39;t change.&nbsp; <br><br>The packet arrives from the ISP with its PPPoE header length field showing two extra bytes past the IP payload.&nbsp; The bridge or something close to it trims those two bytes from the sk_buff, leaving a packet whose PPPoE header is wrong, but whose upper headers make sense.&nbsp; The PPPoE driver in the virtual machine then drops the packet.<br>
<br>I haven&#39;t unscrambled where the drop occurs in the PPPoE driver, but that&#39;s presumably what changed in 2.6.27.&nbsp; It seems to me that the drop is proper.&nbsp; What&#39;s wrong is trimming the sk_buff to match the IP header while ignoring the L2 header, and that&#39;s apparently been that way for a while (if I understand what the reporter is running where).<br>
<br>Or have I screwed up my first posting to netdev?<br><br>&nbsp; --&nbsp; John<br>
Comment 18 Herbert Xu 2009-01-08 19:14:33 UTC
On Thu, Jan 08, 2009 at 05:58:11PM -0600, John Dykstra wrote:
> 
> I haven't unscrambled where the drop occurs in the PPPoE driver, but that's
> presumably what changed in 2.6.27.  It seems to me that the drop is proper.
> What's wrong is trimming the sk_buff to match the IP header while ignoring
> the L2 header, and that's apparently been that way for a while (if I
> understand what the reporter is running where).
> 
> Or have I screwed up my first posting to netdev?

You've hit the nail on the head :)

The bridge netfilter is just one huge pile of crap that should
be deleted.

The least we should do is delete the VLAN/PPPOE parts of it because
it's simply bogus.  What if two VLANs/PPPOE sessions use the same
IP address pairs? They'll be treated as a single flow which is just
insane.

Cheers,
Comment 19 Herbert Xu 2009-01-09 03:56:05 UTC
Hi:

It turns out that even though we have sysctl's that's supposed
to control pppoe/vlan processing, they don't actually work.

This patch should make them work.

bridge: Fix handling of non-IP packets in FORWARD/POST_ROUTING

Currently the bridge FORWARD/POST_ROUTING chains treats all
non-IPv4 packets as IPv6.  This packet fixes that by returning
NF_ACCEPT on non-IP packets instead, just as is done in PRE_ROUTING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index a65e43a..9a1cd75 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -686,8 +686,11 @@ static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb,
 	if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb) ||
 	    IS_PPPOE_IP(skb))
 		pf = PF_INET;
-	else
+	else if (skb->protocol == htons(ETH_P_IPV6) || IS_VLAN_IPV6(skb) ||
+		 IS_PPPOE_IPV6(skb))
 		pf = PF_INET6;
+	else
+		return NF_ACCEPT;
 
 	nf_bridge_pull_encap_header(skb);
 
@@ -828,8 +831,11 @@ static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff *skb,
 	if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb) ||
 	    IS_PPPOE_IP(skb))
 		pf = PF_INET;
-	else
+	else if (skb->protocol == htons(ETH_P_IPV6) || IS_VLAN_IPV6(skb) ||
+		 IS_PPPOE_IPV6(skb))
 		pf = PF_INET6;
+	else
+		return NF_ACCEPT;
 
 #ifdef CONFIG_NETFILTER_DEBUG
 	if (skb->dst == NULL) {

Cheers,
Comment 20 Herbert Xu 2009-01-09 04:05:14 UTC
On Fri, Jan 09, 2009 at 10:55:15PM +1100, Herbert Xu wrote:
> 
> It turns out that even though we have sysctl's that's supposed
> to control pppoe/vlan processing, they don't actually work.
> 
> This patch should make them work.

With that we can actually turn them off.

bridge: Disable PPPOE/VLAN processing by default

The PPPOE/VLAN processing code in the bridge netfilter is broken
by design.  The VLAN tag and the PPPOE session ID are an integral
part of the packet flow information, yet they're completely
ignored by the bridge netfilter.  This is potentially a security
hole as it treats all VLANs and PPPOE sessions as the same.

What's more, it's actually broken for PPPOE as the bridge netfilter
tries to trim the packets to the IP length without adjusting the
PPPOE header (and adjusting the PPPOE header isn't much better
since the PPPOE peer may require the padding to be present).

Therefore we should disable this by default.

It does mean that people relying on this feature may lose networking
depending on how their bridge netfilter rules are configured.
However, IMHO the problems this code causes are serious enough to
warrant this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 9a1cd75..cf754ac 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -58,11 +58,11 @@ static struct ctl_table_header *brnf_sysctl_header;
 static int brnf_call_iptables __read_mostly = 1;
 static int brnf_call_ip6tables __read_mostly = 1;
 static int brnf_call_arptables __read_mostly = 1;
-static int brnf_filter_vlan_tagged __read_mostly = 1;
-static int brnf_filter_pppoe_tagged __read_mostly = 1;
+static int brnf_filter_vlan_tagged __read_mostly = 0;
+static int brnf_filter_pppoe_tagged __read_mostly = 0;
 #else
-#define brnf_filter_vlan_tagged 1
-#define brnf_filter_pppoe_tagged 1
+#define brnf_filter_vlan_tagged 0
+#define brnf_filter_pppoe_tagged 0
 #endif
 
 static inline __be16 vlan_proto(const struct sk_buff *skb)

Cheers,
Comment 21 Patrick McHardy 2009-01-11 21:26:46 UTC
Herbert Xu wrote:
> On Thu, Jan 08, 2009 at 05:58:11PM -0600, John Dykstra wrote:
>> I haven't unscrambled where the drop occurs in the PPPoE driver, but that's
>> presumably what changed in 2.6.27.  It seems to me that the drop is proper.
>> What's wrong is trimming the sk_buff to match the IP header while ignoring
>> the L2 header, and that's apparently been that way for a while (if I
>> understand what the reporter is running where).
>>
>> Or have I screwed up my first posting to netdev?
> 
> You've hit the nail on the head :)
> 
> The bridge netfilter is just one huge pile of crap that should
> be deleted.

Fully agreed. So far unfortunately nobody had the stomach to
get rid of this piece of junk.
Comment 22 Patrick McHardy 2009-01-11 21:28:30 UTC
Herbert Xu wrote:
> It turns out that even though we have sysctl's that's supposed
> to control pppoe/vlan processing, they don't actually work.

Which reminds me - I think we should change them to default to off,
they regulary break things for unsuspecting users. I'm not sure
though what the best way to warn people for a while would be,
feature-removal-schedule doesn't seem approriate.

> This patch should make them work.
> 
> bridge: Fix handling of non-IP packets in FORWARD/POST_ROUTING
> 
> Currently the bridge FORWARD/POST_ROUTING chains treats all
> non-IPv4 packets as IPv6.  This packet fixes that by returning
> NF_ACCEPT on non-IP packets instead, just as is done in PRE_ROUTING.

Applied, thanks.
Comment 23 Patrick McHardy 2009-01-11 21:30:54 UTC
Herbert Xu wrote:
> bridge: Disable PPPOE/VLAN processing by default
> 
> The PPPOE/VLAN processing code in the bridge netfilter is broken
> by design.  The VLAN tag and the PPPOE session ID are an integral
> part of the packet flow information, yet they're completely
> ignored by the bridge netfilter.  This is potentially a security
> hole as it treats all VLANs and PPPOE sessions as the same.
> 
> What's more, it's actually broken for PPPOE as the bridge netfilter
> tries to trim the packets to the IP length without adjusting the
> PPPOE header (and adjusting the PPPOE header isn't much better
> since the PPPOE peer may require the padding to be present).
> 
> Therefore we should disable this by default.
> 
> It does mean that people relying on this feature may lose networking
> depending on how their bridge netfilter rules are configured.
> However, IMHO the problems this code causes are serious enough to
> warrant this.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

A good reason to disable this crap :)

Applied, thanks.
Comment 24 Dean Holland 2009-01-13 02:51:20 UTC
Herbert Xu wrote:
> On Fri, Jan 09, 2009 at 10:55:15PM +1100, Herbert Xu wrote:
>> It turns out that even though we have sysctl's that's supposed
>> to control pppoe/vlan processing, they don't actually work.
>>
>> This patch should make them work.
> 

<snip>

> Cheers,

I can confirm this resolves the issue. I have compiled a 2.6.28 kernel 
with Herbert's patches and I can now use the packaged 2.6.27-9 kernel in 
the Ubuntu guest.

Cheers
Dean
Comment 25 Dean Holland 2009-03-06 02:40:34 UTC
Have these commits made it into a kernel release yet? I haven't seen 
them in any of the Changelogs and am keen to get back to a packaged 
distribution kernel :)

Sorry if if I have missed it, but if not is there any indication of when 
it should make it in?

Cheers
Dean

Note You need to log in before you can comment on or make changes to this bug.