Latest working kernel version: 2.6.26.8 Earliest failing kernel version: 2.6.27 Distribution: Ubuntu Hardware Environment: amd64, KVM Software Environment: Problem Description: As reported in LP #296767 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296767) I am experiencing intermittent TCP issues over a PPP ADSL2+ connection with the only change being an upgrade to 2.6.27. A number of websites, ping, traceroute work correctly but I simply can't connect to several including: - store.apple.com - youtube.com - ANZ internet banking (anz.com.au) - MSN messenger I have also tried compiling a generic 2.6.28-rc4 kernel and this still suffers from the same issue, however if I reboot into the previous Ubuntu kernel (2.6.24) or a vanilla 2.6.26 kernel the issue disappears. Steps to reproduce: 1. Use a KVM guest as a gateway to a PPP internet connection 2. Boot with kernel <= 2.6.26 3. Observe functioning networking 4. Boot into 2.6.27+ 5. Observe broken networking
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 29 Dec 2008 18:52:40 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12327 > > Summary: Intermittent TCP issues with => 2.6.27 > Product: Networking > Version: 2.5 > KernelVersion: 2.6.27 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: speedster@haveacry.com > > > Latest working kernel version: 2.6.26.8 > Earliest failing kernel version: 2.6.27 > Distribution: Ubuntu > Hardware Environment: amd64, KVM > Software Environment: > Problem Description: > > As reported in LP #296767 > (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296767) I am > experiencing > intermittent TCP issues over a PPP ADSL2+ connection with the only change > being > an upgrade to 2.6.27. > > A number of websites, ping, traceroute work correctly but I simply can't > connect to several including: > > - store.apple.com > - youtube.com > - ANZ internet banking (anz.com.au) > - MSN messenger > > I have also tried compiling a generic 2.6.28-rc4 kernel and this still > suffers > from the same issue, however if I reboot into the previous Ubuntu kernel > (2.6.24) or a vanilla 2.6.26 kernel the issue disappears. > > Steps to reproduce: > > 1. Use a KVM guest as a gateway to a PPP internet connection > 2. Boot with kernel <= 2.6.26 > 3. Observe functioning networking > 4. Boot into 2.6.27+ > 5. Observe broken networking
On Mon, 29 Dec 2008, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 29 Dec 2008 18:52:40 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=12327 > > > > Summary: Intermittent TCP issues with => 2.6.27 > > Product: Networking > > Version: 2.5 > > KernelVersion: 2.6.27 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: IPV4 > > AssignedTo: shemminger@linux-foundation.org > > ReportedBy: speedster@haveacry.com > > > > > > Latest working kernel version: 2.6.26.8 > > Earliest failing kernel version: 2.6.27 > > Distribution: Ubuntu > > Hardware Environment: amd64, KVM > > Software Environment: > > Problem Description: > > > > As reported in LP #296767 > > (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/296767) I am > experiencing > > intermittent TCP issues over a PPP ADSL2+ connection with the only change > being > > an upgrade to 2.6.27. > > > > A number of websites, ping, traceroute work correctly but I simply can't > > connect to several including: > > > > - store.apple.com > > - youtube.com > > - ANZ internet banking (anz.com.au) > > - MSN messenger > > > > I have also tried compiling a generic 2.6.28-rc4 kernel and this still > suffers > > from the same issue, however if I reboot into the previous Ubuntu kernel > > (2.6.24) or a vanilla 2.6.26 kernel the issue disappears. > > > > Steps to reproduce: > > > > 1. Use a KVM guest as a gateway to a PPP internet connection > > 2. Boot with kernel <= 2.6.26 > > 3. Observe functioning networking > > 4. Boot into 2.6.27+ > > 5. Observe broken networking Can you please describe the full topology (which is connected to where and using what, and locations of nats, tun/taps, netfilter things, etc.)... There's some contradiction between the ubuntu report description and what you're giving here. Based on your dumps I find it unlikely that the problem would be in the end host tcp but I'll verify the packets field by field still to be absolutely sure. I'd guess that either the sent packet or reply gets lost somewhere since it never arrives with 2.6.27/2.6.28-rcx.
Ilpo J
On Wed, Dec 31, 2008 at 11:22:23PM +0000, Speedster wrote: > > The gateway machine (whinge) runs as a KVM guest, and shares a physical > host with three other guests (one Windows, two Linux). Below are the > outputs of bridge topology and VLAN tagging on the physical host. What driver are you using on whinge? When you took the packet dumps, was it on whinge or the physical host? If it was taken on whinge, can you try to take the same dumps on the outgoing physical interface on the host? Thanks,
Herbert Xu wrote: > On Wed, Dec 31, 2008 at 11:22:23PM +0000, Speedster wrote: >> The gateway machine (whinge) runs as a KVM guest, and shares a physical >> host with three other guests (one Windows, two Linux). Below are the >> outputs of bridge topology and VLAN tagging on the physical host. > > What driver are you using on whinge? > > When you took the packet dumps, was it on whinge or the physical > host? If it was taken on whinge, can you try to take the same > dumps on the outgoing physical interface on the host? > > Thanks, I have tried both virtio and the default (8139too). The original dumps were taken on whinge, I have attached two new dumps showing the functional (2.6.24) and non-functional (2.6.27) states from the host. Cheers Dean
On Mon, 5 Jan 2009, Speedster wrote: > Herbert Xu wrote: > > On Wed, Dec 31, 2008 at 11:22:23PM +0000, Speedster wrote: > > > The gateway machine (whinge) runs as a KVM guest, and shares a physical > > > host with three other guests (one Windows, two Linux). Below are the > > > outputs of bridge topology and VLAN tagging on the physical host. > > > > What driver are you using on whinge? > > > > When you took the packet dumps, was it on whinge or the physical > > host? If it was taken on whinge, can you try to take the same > > dumps on the outgoing physical interface on the host? > > > > Thanks, > > I have tried both virtio and the default (8139too). And what is the actual eth hw on the host? > The original dumps were taken on whinge, I have attached two new dumps > showing the functional (2.6.24) and non-functional (2.6.27) states from > the host. At least to me those didn't reveal anything new :-(.
On Mon, Jan 05, 2009 at 08:19:46PM +0900, Speedster wrote: > > I have tried both virtio and the default (8139too). The original dumps > were taken on whinge, I have attached two new dumps showing the > functional (2.6.24) and non-functional (2.6.27) states from the host. The MAC on the PPPoE packets appears bogus: 22:07:58.344027 00:16:3e:24:f7:a1 > 00:18:18:54:19:54 Can you also attach a dump for connections that do work under 2.6.27? That should tell us whether this MAC is genuine. If it is bogus, can you dump on the other interfaces to see who is corrupting it? Cheers,
Herbert Xu wrote: > Can you also attach a dump for connections that do work under > 2.6.27? That should tell us whether this MAC is genuine. Done. I also realised my wireshark display filter was dst_host rather than simply host - it looks as though replies _do_ come back but for some reason are ignored? My apologies. The 2.6.27 dump contains a connection to a local file mirror (203.21.20.200) and an attempted connection to store.apple.com (17.149.156.10). > If it is bogus, can you dump on the other interfaces to see > who is corrupting it? > I believe the differing MAC addresses on the PPP remote endpoint are due to my ISP having multiple layer 2 termination routers for load-balancing/redundancy and not a symptom of the issue. Cheers, Dean
On Wed, Jan 07, 2009 at 10:49:55PM +0900, Speedster wrote: > > Done. I also realised my wireshark display filter was dst_host rather > than simply host - it looks as though replies _do_ come back but for > some reason are ignored? My apologies. Great I think we're getting closer. With your latest dumps it'd appear that there is an odd difference between the replies that get through and the ones that don't. The ones that're somehow dropped have 2 bytes of transport padding at the end. I suspect there's something buggy in your system that's dropping it because of this. Can you please take a look at /proc/net/snmp on the host and the guest to see if IP InDiscards is non-zero? Also now that we know the problem is definitely in the host/guest please take another set of dumps on the interfaces leading to and within the guest to see exactly which path of the system is dropping the reply. Thanks,
On Thu, 8 Jan 2009, Herbert Xu wrote: > On Wed, Jan 07, 2009 at 10:49:55PM +0900, Speedster wrote: > > > > Done. I also realised my wireshark display filter was dst_host rather > > than simply host - it looks as though replies _do_ come back but for > > some reason are ignored? My apologies. > > Great I think we're getting closer. With your latest dumps it'd > appear that there is an odd difference between the replies that > get through and the ones that don't. The ones that're somehow > dropped have 2 bytes of transport padding at the end. I suspect > there's something buggy in your system that's dropping it because > of this. > > Can you please take a look at /proc/net/snmp on the host and the > guest to see if IP InDiscards is non-zero? > > Also now that we know the problem is definitely in the host/guest > please take another set of dumps on the interfaces leading to and > within the guest to see exactly which path of the system is dropping > the reply. Another possiblity that comes into my mind is that at TCP side seqnos get messed up because of skb->len getting that padding counted in (end_seq is depending on skb->len) and something don't like encountering such a synack.
Herbert Xu wrote: > > Can you please take a look at /proc/net/snmp on the host and the > guest to see if IP InDiscards is non-zero? Both are 0 > Also now that we know the problem is definitely in the host/guest > please take another set of dumps on the interfaces leading to and > within the guest to see exactly which path of the system is dropping > the reply. Attached (all are the exact same attempted connection), and reveal some interesting information. The path the inbound traffic should take is 1. vlan50 (host) 2. tap interface vnet3 (host) / eth0 (guest) 3. ppp0 (guest) It looks as though when it is sent out the tap interface the payload length is incorrect in the PPPoE section of the frame. When it arrives via vlan50 it appears fine. Or at least that's what wireshark highlights for me :) Dean
Reply-To: shemminger@vyatta.com On Fri, 09 Jan 2009 00:04:42 +0900 Speedster <speedster@haveacry.com> wrote: > Herbert Xu wrote: > > > > Can you please take a look at /proc/net/snmp on the host and the > > guest to see if IP InDiscards is non-zero? > > Both are 0 > > > Also now that we know the problem is definitely in the host/guest > > please take another set of dumps on the interfaces leading to and > > within the guest to see exactly which path of the system is dropping > > the reply. > > Attached (all are the exact same attempted connection), and reveal some > interesting information. > > The path the inbound traffic should take is > 1. vlan50 (host) > 2. tap interface vnet3 (host) / eth0 (guest) > 3. ppp0 (guest) > > It looks as though when it is sent out the tap interface the payload > length is incorrect in the PPPoE section of the frame. When it arrives > via vlan50 it appears fine. Or at least that's what wireshark highlights > for me :) > > Dean Maybe there is an issue that GRO receive isn't handling padding properly?
On Thu, 8 Jan 2009, Stephen Hemminger wrote: > On Fri, 09 Jan 2009 00:04:42 +0900 > Speedster <speedster@haveacry.com> wrote: > > > Herbert Xu wrote: > > > > > > Can you please take a look at /proc/net/snmp on the host and the > > > guest to see if IP InDiscards is non-zero? > > > > Both are 0 > > > > > Also now that we know the problem is definitely in the host/guest > > > please take another set of dumps on the interfaces leading to and > > > within the guest to see exactly which path of the system is dropping > > > the reply. > > > > Attached (all are the exact same attempted connection), and reveal some > > interesting information. > > > > The path the inbound traffic should take is > > 1. vlan50 (host) > > 2. tap interface vnet3 (host) / eth0 (guest) > > 3. ppp0 (guest) > > > > It looks as though when it is sent out the tap interface the payload > > length is incorrect in the PPPoE section of the frame. When it arrives > > via vlan50 it appears fine. Or at least that's what wireshark highlights > > for me :) > > Maybe there is an issue that GRO receive isn't handling padding > properly? Hmm, is gro supposed to have something to do with 2.6.27??? Or are you talking something else than Herbert's recent GRO stuff?
Reply-To: shemminger@vyatta.com On Thu, 8 Jan 2009 21:39:30 +0200 (EET) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > On Thu, 8 Jan 2009, Stephen Hemminger wrote: > > > On Fri, 09 Jan 2009 00:04:42 +0900 > > Speedster <speedster@haveacry.com> wrote: > > > > > Herbert Xu wrote: > > > > > > > > Can you please take a look at /proc/net/snmp on the host and the > > > > guest to see if IP InDiscards is non-zero? > > > > > > Both are 0 > > > > > > > Also now that we know the problem is definitely in the host/guest > > > > please take another set of dumps on the interfaces leading to and > > > > within the guest to see exactly which path of the system is dropping > > > > the reply. > > > > > > Attached (all are the exact same attempted connection), and reveal some > > > interesting information. > > > > > > The path the inbound traffic should take is > > > 1. vlan50 (host) > > > 2. tap interface vnet3 (host) / eth0 (guest) > > > 3. ppp0 (guest) > > > > > > It looks as though when it is sent out the tap interface the payload > > > length is incorrect in the PPPoE section of the frame. When it arrives > > > via vlan50 it appears fine. Or at least that's what wireshark highlights > > > for me :) > > > > Maybe there is an issue that GRO receive isn't handling padding > > properly? > > Hmm, is gro supposed to have something to do with 2.6.27??? Or are you > talking something else than Herbert's recent GRO stuff? Trying to find a common thread of why splice and TCP is having issues in some cases.
On Thu, Jan 08, 2009 at 08:37:53AM -0800, Stephen Hemminger wrote: > > Maybe there is an issue that GRO receive isn't handling padding > properly? You are one release early Stephen :) Also GRO can't be enabled without ethtool yet. Cheers,
Reply-To: john.dykstra1@gmail.com On Fri, 2009-01-09 at 00:04 +0900, Speedster wrote: > Attached (all are the exact same attempted connection), and reveal some > interesting information. > > The path the inbound traffic should take is > 1. vlan50 (host) > 2. tap interface vnet3 (host) / eth0 (guest) > 3. ppp0 (guest) > > It looks as though when it is sent out the tap interface the payload > length is incorrect in the PPPoE section of the frame. When it arrives > via vlan50 it appears fine. Or at least that's what wireshark highlights > for me :) The length field in the PPPoE header doesn't change. The packet arrives from the ISP with its PPPoE header length field showing two extra bytes past the IP payload. The bridge or something close to it trims those two bytes from the sk_buff, leaving a packet whose PPPoE header is wrong, but whose upper headers make sense. The PPPoE driver in the virtual machine then drops the packet. I haven't unscrambled where the drop occurs in the PPPoE driver, but that's presumably what changed in 2.6.27. It seems to me that the drop is proper. What's wrong is trimming the sk_buff to match the IP header while ignoring the L2 header, and that's apparently been that way for a while (if I understand what the reporter is running where). Or have I screwed up my first posting to netdev? -- John On Fri, 2009-01-09 at 00:04 +0900, Speedster wrote:<br><br>> Attached (all are the exact same attempted connection), and reveal some <br>> interesting information.<br>> <br>> The path the inbound traffic should take is<br> > 1. vlan50 (host)<br>> 2. tap interface vnet3 (host) / eth0 (guest)<br>> 3. ppp0 (guest)<br>> <br>> It looks as though when it is sent out the tap interface the payload <br>> length is incorrect in the PPPoE section of the frame. When it arrives <br> > via vlan50 it appears fine. Or at least that's what wireshark highlights <br>> for me :)<br><br>The length field in the PPPoE header doesn't change. <br><br>The packet arrives from the ISP with its PPPoE header length field showing two extra bytes past the IP payload. The bridge or something close to it trims those two bytes from the sk_buff, leaving a packet whose PPPoE header is wrong, but whose upper headers make sense. The PPPoE driver in the virtual machine then drops the packet.<br> <br>I haven't unscrambled where the drop occurs in the PPPoE driver, but that's presumably what changed in 2.6.27. It seems to me that the drop is proper. What's wrong is trimming the sk_buff to match the IP header while ignoring the L2 header, and that's apparently been that way for a while (if I understand what the reporter is running where).<br> <br>Or have I screwed up my first posting to netdev?<br><br> -- John<br>
On Thu, Jan 08, 2009 at 05:58:11PM -0600, John Dykstra wrote: > > I haven't unscrambled where the drop occurs in the PPPoE driver, but that's > presumably what changed in 2.6.27. It seems to me that the drop is proper. > What's wrong is trimming the sk_buff to match the IP header while ignoring > the L2 header, and that's apparently been that way for a while (if I > understand what the reporter is running where). > > Or have I screwed up my first posting to netdev? You've hit the nail on the head :) The bridge netfilter is just one huge pile of crap that should be deleted. The least we should do is delete the VLAN/PPPOE parts of it because it's simply bogus. What if two VLANs/PPPOE sessions use the same IP address pairs? They'll be treated as a single flow which is just insane. Cheers,
Hi: It turns out that even though we have sysctl's that's supposed to control pppoe/vlan processing, they don't actually work. This patch should make them work. bridge: Fix handling of non-IP packets in FORWARD/POST_ROUTING Currently the bridge FORWARD/POST_ROUTING chains treats all non-IPv4 packets as IPv6. This packet fixes that by returning NF_ACCEPT on non-IP packets instead, just as is done in PRE_ROUTING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index a65e43a..9a1cd75 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -686,8 +686,11 @@ static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb, if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb) || IS_PPPOE_IP(skb)) pf = PF_INET; - else + else if (skb->protocol == htons(ETH_P_IPV6) || IS_VLAN_IPV6(skb) || + IS_PPPOE_IPV6(skb)) pf = PF_INET6; + else + return NF_ACCEPT; nf_bridge_pull_encap_header(skb); @@ -828,8 +831,11 @@ static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff *skb, if (skb->protocol == htons(ETH_P_IP) || IS_VLAN_IP(skb) || IS_PPPOE_IP(skb)) pf = PF_INET; - else + else if (skb->protocol == htons(ETH_P_IPV6) || IS_VLAN_IPV6(skb) || + IS_PPPOE_IPV6(skb)) pf = PF_INET6; + else + return NF_ACCEPT; #ifdef CONFIG_NETFILTER_DEBUG if (skb->dst == NULL) { Cheers,
On Fri, Jan 09, 2009 at 10:55:15PM +1100, Herbert Xu wrote: > > It turns out that even though we have sysctl's that's supposed > to control pppoe/vlan processing, they don't actually work. > > This patch should make them work. With that we can actually turn them off. bridge: Disable PPPOE/VLAN processing by default The PPPOE/VLAN processing code in the bridge netfilter is broken by design. The VLAN tag and the PPPOE session ID are an integral part of the packet flow information, yet they're completely ignored by the bridge netfilter. This is potentially a security hole as it treats all VLANs and PPPOE sessions as the same. What's more, it's actually broken for PPPOE as the bridge netfilter tries to trim the packets to the IP length without adjusting the PPPOE header (and adjusting the PPPOE header isn't much better since the PPPOE peer may require the padding to be present). Therefore we should disable this by default. It does mean that people relying on this feature may lose networking depending on how their bridge netfilter rules are configured. However, IMHO the problems this code causes are serious enough to warrant this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index 9a1cd75..cf754ac 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -58,11 +58,11 @@ static struct ctl_table_header *brnf_sysctl_header; static int brnf_call_iptables __read_mostly = 1; static int brnf_call_ip6tables __read_mostly = 1; static int brnf_call_arptables __read_mostly = 1; -static int brnf_filter_vlan_tagged __read_mostly = 1; -static int brnf_filter_pppoe_tagged __read_mostly = 1; +static int brnf_filter_vlan_tagged __read_mostly = 0; +static int brnf_filter_pppoe_tagged __read_mostly = 0; #else -#define brnf_filter_vlan_tagged 1 -#define brnf_filter_pppoe_tagged 1 +#define brnf_filter_vlan_tagged 0 +#define brnf_filter_pppoe_tagged 0 #endif static inline __be16 vlan_proto(const struct sk_buff *skb) Cheers,
Herbert Xu wrote: > On Thu, Jan 08, 2009 at 05:58:11PM -0600, John Dykstra wrote: >> I haven't unscrambled where the drop occurs in the PPPoE driver, but that's >> presumably what changed in 2.6.27. It seems to me that the drop is proper. >> What's wrong is trimming the sk_buff to match the IP header while ignoring >> the L2 header, and that's apparently been that way for a while (if I >> understand what the reporter is running where). >> >> Or have I screwed up my first posting to netdev? > > You've hit the nail on the head :) > > The bridge netfilter is just one huge pile of crap that should > be deleted. Fully agreed. So far unfortunately nobody had the stomach to get rid of this piece of junk.
Herbert Xu wrote: > It turns out that even though we have sysctl's that's supposed > to control pppoe/vlan processing, they don't actually work. Which reminds me - I think we should change them to default to off, they regulary break things for unsuspecting users. I'm not sure though what the best way to warn people for a while would be, feature-removal-schedule doesn't seem approriate. > This patch should make them work. > > bridge: Fix handling of non-IP packets in FORWARD/POST_ROUTING > > Currently the bridge FORWARD/POST_ROUTING chains treats all > non-IPv4 packets as IPv6. This packet fixes that by returning > NF_ACCEPT on non-IP packets instead, just as is done in PRE_ROUTING. Applied, thanks.
Herbert Xu wrote: > bridge: Disable PPPOE/VLAN processing by default > > The PPPOE/VLAN processing code in the bridge netfilter is broken > by design. The VLAN tag and the PPPOE session ID are an integral > part of the packet flow information, yet they're completely > ignored by the bridge netfilter. This is potentially a security > hole as it treats all VLANs and PPPOE sessions as the same. > > What's more, it's actually broken for PPPOE as the bridge netfilter > tries to trim the packets to the IP length without adjusting the > PPPOE header (and adjusting the PPPOE header isn't much better > since the PPPOE peer may require the padding to be present). > > Therefore we should disable this by default. > > It does mean that people relying on this feature may lose networking > depending on how their bridge netfilter rules are configured. > However, IMHO the problems this code causes are serious enough to > warrant this. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> A good reason to disable this crap :) Applied, thanks.
Herbert Xu wrote: > On Fri, Jan 09, 2009 at 10:55:15PM +1100, Herbert Xu wrote: >> It turns out that even though we have sysctl's that's supposed >> to control pppoe/vlan processing, they don't actually work. >> >> This patch should make them work. > <snip> > Cheers, I can confirm this resolves the issue. I have compiled a 2.6.28 kernel with Herbert's patches and I can now use the packaged 2.6.27-9 kernel in the Ubuntu guest. Cheers Dean
Have these commits made it into a kernel release yet? I haven't seen them in any of the Changelogs and am keen to get back to a packaged distribution kernel :) Sorry if if I have missed it, but if not is there any indication of when it should make it in? Cheers Dean