Bug 11721
Description
Aldo Maggi
2008-10-08 08:08:09 UTC
There's no indication of netfilter being involved here, not a single netfilter module loaded. So this appears to be a generic networking or network driver bug (please reclassify to networking for now). Some information that will most likely be needed: - ip -s link list - ip route - ip addr - tcpdump of the interface the traffic is going through Thanks. Il giorno Wed, 8 Oct 2008 08:13:19 -0700 (PDT) bugme-daemon@bugzilla.kernel.org ha scritto: > http://bugzilla.kernel.org/show_bug.cgi?id=11721 > > > > > > ------- Comment #1 from kaber@trash.net 2008-10-08 08:13 ------- > There's no indication of netfilter being involved here, not a single > netfilter module loaded. So this appears to be a generic networking > or network driver bug (please reclassify to networking for now). > > Some information that will most likely be needed: > > - ip -s link list > - ip route > - ip addr > - tcpdump of the interface the traffic is going through > > Thanks. > please find herewith attached the following files: bug11721_aggiornamenti containing: - ip -s link list - ip route - ip addr tcpdump_output containing the output of: - tcpdump -i eth0 -v tcpdump_output_during_apt-get containing the output of: - tcpdump -i eth0 -v while apt-get update was in execution thanks aldo Created attachment 18213 [details]
output of the commands you asked for except that of tcpdump
Created attachment 18214 [details]
output of tcpdump
Created attachment 18215 [details]
output of tcpdump while apt-get was in execution
I can't spot anything not working in those dumps (I'm in a hurry though and might have missed it). Are those dumps from a failing session? Thanks. yes they are. in fact ping google.it works host google.it works as well, w3m google.it doesn't , neither does apt-get update. bear with me if i repeat what i wrote in my first msg: ... if i launch iptraf in the home server i can see that udp requests to the dns servers from the machine where intrepid is installed receive answers (this is the reason why "host" and "ping" work, i think) while in the case of tcp connections the syn packet goes through but it does not receive ack. needless to say that if i boot with 2.6.26 kernel everything works flawlessly. Please attach .config from 2.6.27-rc. Created attachment 18222 [details]
config-2.6.27-rc8.200810071002
And the one from 2.6.26 (ie. known good), please. Created attachment 18232 [details]
config-2.6.26-1-686
could it be useful if i send you the tcpdump of the interface in the home server/gateway to which the pc we are testing is linked from which it is clear that if 2.6.26 is used in the latter ack are received if 2.6.27 is used no ack appears?
Well, please test the latest mainline (2.6.27-rc9-git2). Chances are that your problem has just been fixed. no, i'm sorry, i've compiled the new kernel but results are still the same. i wonder if the modules of 2.6.27 kernel used with my hw malform, somehow, the syn packets .... sorry i'm repeating what i said in my last message! Ok, the only thing you can do in that case it to try to identify the change that causes the problem to appear using bisection. See http://desprofundis.blogspot.com/2008/06/git-bisect-instructions.html for instructions. Could you try to turn off window scaling like here?: http://lkml.org/lkml/2008/2/7/471 ok, i did so: echo 0 > /proc/sys/net/ipv4/tcp_window_scaling both in my machine and in the pc i use as home_server/gateway but the problem was not solved anyway "1" was the value of proc/sys/net/ipv4/tcp_window_scaling before. and, as far as i understand from the article you cite: http://lwn.net/Articles/92727/, the problem with window_scaling was already noticed in kernel 2.6.7 while in my case everything was alright till 2.6.26.6 Sure, but tcp changes and this lkml thread was about later kernels too. Btw, could you repeat this tcpdump with 2.6.26 and 2.6.27 in working/not working case? Thanks. i set echo 0 > /proc/sys/net/ipv4/tcp_window_scaling both on the pc and on the home server/gw then i run w3m kernel.org and tcpdump both on the pc and on the home server/gw i've repeated the above procedure twice once using kernel 2.6.26.6 and another using kernel 2.6.27-rc1-git1 see please the attached files thanks aldo Created attachment 18350 [details]
tcpdump -i eth0 -v using kernel 2.6.26.6
Created attachment 18351 [details]
tcpdump -i eth2 -v on the home server/gateway
Created attachment 18352 [details]
tcpdump -i eth0 -v using kernel 2.6.27-rc1-git1
Created attachment 18353 [details]
tcpdump -i eth2 -v on the home server/gateway
while on the other pc kernel 2.6.26.6 was running
Thanks for these logs Aldo. Anyway, I'm still a bit confused, so could we clear some doubts?: 1. server/gw is 192.168.1.1, topolino, always 2.6.26 , and there is no problem with www connections? 2. pc is 192.168.1.3, paperino, with 2.6.26 no problem, with 2.6.27 problems? 3. tcpdump on gw is probably done on the local network interface; if so could you try this on the internet interface (so 2 tcpdumps yet)? Btw., let us know ifconfig, route -n, and iptables rules on this box. (If you need to mask some private data let us know too.) Could you set mtu on paperino's eth0 to 1400 before these tries? Correct me if I get this wrong. (I assume you're changing to 2.6.27 only one box (paperino?) during these tests.) sorry for not having been clearer! :-( 1. yes, server/gw is 192.168.1.1, topolino, always 2.6.26 (actually 2.6.26-1-686 linux-image from debian) , and there is no problem with www connections. 2. pc 192.168.1.3, paperino, where i'm running the tests, has no problem with 2.6.26 (both debian lenny and ubuntu intrepid are installed on that machine) problems have arosen when ubuntu intrepid upgraded to 2.6.27 (but i do not rely very much on ubuntu so my tests are made on debian lenny), at the moment 2.6.26.6 (which i've compiled from kernel.org sources) is used with lenny , i repeat with no problem, problems arise when i use kernel 2.6.27 on it (whatever is the rc, i've tried -rc1, -rc8, -rc9) 3. see please the attached files (they are the output while on paperino w3m kernel.org was executed), yes, i'm changing to 2.6.27 only paperino (192.168.1.3). just to let you know, this morning i've bought a new modem (a recent one) to see if the problems persist (anyway i'd need it ... sooner or later, it is an "adsl2+" :-D ). many thanks for your attention! aldo Created attachment 18361 [details]
ifconfig, route -n, and iptables rules on 192.168.1.1 (topolino) home server/gw
Created attachment 18362 [details]
ifconfig, route -n, on 192.168.1.3 (paperino)
Created attachment 18363 [details]
tcpdump on the wan eth on the home server/gw
while on 192.168.1.3 (paperino) was running 2.6.26.6 and w3m kernel.org was executed
Created attachment 18364 [details]
tcpdump on the wan eth on the home server/gw
while on 192.168.1.3 (paperino) 2.6.27-rc1-git1 was running and w3m kernel was executed
Alas still no idea. Looks like these syn packets to kernel.org could be different, so more details are needed. If you are not fed up with this all try to repeat these last 2 tcpdumps with these parameters: tcpdump -i eth0 -nXX -c3 'dst port 80 and tcp-syn != 0' Additionally you could try this on paperino: echo 0 > /proc/sys/net/ipv4/tcp_window_scaling echo 0 > /proc/sys/net/ipv4/tcp_timestamps echo 0 > /proc/sys/net/ipv4/tcp_sack many thanks for your assistance! Jarek P. absolutely i'm not fed up! and actually i thank you for your patience. i set paperino as you wrote and added ifconfig eth0 mtu 1400 as well i set the same value of mtu in topolino as well, i hope i was not wrong doing so! i attach the files you ask for, but now i can navigate with 2.6.27-rc1-git1 as well! afterwards i've run some more tests and have found out that the culprit is tcp_sack, if using kernel 2.6.27-rc1-gt1 it is set to "1" i cannot navigate if to "0" i can. so i enclose, should it be of any help, also the tcpdump on eth0 of topolino when on paperino tcp_sack is set to "1". thanks aldo Created attachment 18366 [details]
tcpdump -i eth0 -nXX -c3 'dst port 80 and tcp-syn != 0' on topolino
while using kernel 2.6.26.6 on paperino
Created attachment 18367 [details]
tcpdump -i eth0 -nXX -c3 'dst port 80 and tcp-syn != 0' on topolino
while using kernel 2.6.27-rc1-git1 on paperino
Created attachment 18368 [details]
tcpdump -i eth0 -nXX -c3 'dst port 80 and tcp-syn != 0' on topolino
while using kernel 2.6.27-rc1-git un paperino and on the same machine /proc/sys/net/ipv4/tcp_sack is set to "1"
Nice job Aldo! I forward this message to netdev and Cc our best tcp expert. Any new replies should be rather "Reply All" (not bugzilla only). Thanks, Jarek P. On Sat, Oct 18, 2008 at 01:20:56PM -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11721 > ... > ------- Comment #30 from sentiniate@tiscali.it 2008-10-18 13:20 ------- > absolutely i'm not fed up! and actually i thank you for your patience. > > i set paperino as you wrote and added ifconfig eth0 mtu 1400 as well > i set the same value of mtu in topolino as well, i hope i was not wrong doing > so! > > i attach the files you ask for, but now i can navigate with 2.6.27-rc1-git1 > as > well! > > afterwards i've run some more tests and have found out that the culprit is > tcp_sack, if using kernel 2.6.27-rc1-gt1 it is set to "1" i cannot navigate > if > to "0" i can. > so i enclose, should it be of any help, also the tcpdump on eth0 of topolino > when on paperino tcp_sack is set to "1". > > thanks > aldo > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. [RESEND] I forgot to add Aldo's email before - sorry!
On Sat, Oct 18, 2008 at 11:02:52PM +0200, Jarek Poplawski wrote:
> Nice job Aldo!
>
> I forward this message to netdev and Cc our best tcp expert.
> Any new replies should be rather "Reply All" (not bugzilla only).
>
> Thanks,
> Jarek P.
>
> On Sat, Oct 18, 2008 at 01:20:56PM -0700, bugme-daemon@bugzilla.kernel.org
> wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=11721
> >
> ...
> > ------- Comment #30 from sentiniate@tiscali.it 2008-10-18 13:20 -------
> > absolutely i'm not fed up! and actually i thank you for your patience.
> >
> > i set paperino as you wrote and added ifconfig eth0 mtu 1400 as well
> > i set the same value of mtu in topolino as well, i hope i was not wrong
> doing
> > so!
> >
> > i attach the files you ask for, but now i can navigate with 2.6.27-rc1-git1
> as
> > well!
> >
> > afterwards i've run some more tests and have found out that the culprit is
> > tcp_sack, if using kernel 2.6.27-rc1-gt1 it is set to "1" i cannot navigate
> if
> > to "0" i can.
> > so i enclose, should it be of any help, also the tcpdump on eth0 of
> topolino
> > when on paperino tcp_sack is set to "1".
> >
> > thanks
> > aldo
> >
> >
> > --
> > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> > ------- You are receiving this mail because: -------
> > You are on the CC list for the bug, or are watching someone who is.
>> On Sat, Oct 18, 2008 at 01:20:56PM -0700, bugme-daemon@bugzilla.kernel.org >> wrote: >>> http://bugzilla.kernel.org/show_bug.cgi?id=11721 ... >>> ------- Comment #30 from sentiniate@tiscali.it 2008-10-18 13:20 ------- ... >>> afterwards i've run some more tests and have found out that the culprit is >>> tcp_sack, if using kernel 2.6.27-rc1-gt1 it is set to "1" i cannot navigate >>> if >>> to "0" i can. ... Aldo, for curiosity, could you also try if this change only (instead of tcp_sack etc.) can't do similar effect?: echo 0 > /proc/sys/net/ipv4/tcp_timestamps Thanks, Jarek P. On Sat, 18 Oct 2008, Jarek Poplawski wrote: > [RESEND] I forgot to add Aldo's email before - sorry! > > On Sat, Oct 18, 2008 at 11:02:52PM +0200, Jarek Poplawski wrote: > > Nice job Aldo! > > > > I forward this message to netdev and Cc our best tcp expert. > > Any new replies should be rather "Reply All" (not bugzilla only). > > > > Thanks, > > Jarek P. > > > > On Sat, Oct 18, 2008 at 01:20:56PM -0700, bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=11721 > > > > > ... > > > ------- Comment #30 from sentiniate@tiscali.it 2008-10-18 13:20 ------- > > > absolutely i'm not fed up! and actually i thank you for your patience. > > > > > > i set paperino as you wrote and added ifconfig eth0 mtu 1400 as well > > > i set the same value of mtu in topolino as well, i hope i was not wrong > doing > > > so! > > > > > > i attach the files you ask for, but now i can navigate with > 2.6.27-rc1-git1 as > > > well! > > > > > > afterwards i've run some more tests and have found out that the culprit > is > > > tcp_sack, if using kernel 2.6.27-rc1-gt1 it is set to "1" i cannot > navigate if > > > to "0" i can. > > > so i enclose, should it be of any help, also the tcpdump on eth0 of > topolino > > > when on paperino tcp_sack is set to "1". So this ended up into tcp domain after all (I took earlier a brief anyway and found out that there are not that many changes 2.6.26..2.6.27 -- net/ipv4/tcp*.c include/net/tcp.h)... I compared your packet against a good one from elsewhere.. I couldn't compare your latest dumps fully because attachments 18366 and 18367 are with different TCP options (you forgot zeros to sysctls in them?)... Anyway, only thing that seemed to be different to that case from elsewhere were those extra bytes in the beginning (some below ip protocol that gets captured by tcpdump?) which are equal in both working and broken case of yours and the different ordering of the tcp options as noted by Jarek earlier. I tried to go through the fields one by one but nothing seemed to be wrong... ...Might be something crazy in the way that is too picky on tcp option ordering which wouldn't surprise me that much... :-) Please try if the patch below does any difference (on paperino is enough, the gw seems innocent here). If that didn't help, can you please restore the sysctls to 1 and redo 2.6.26.6 dump (like in attachment 18366 [details]) so that I get a fully comparable sample. -- i. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 990a584..850a4e9 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -376,6 +376,12 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, *md5_hash = NULL; } + if (unlikely(opts->mss)) { + *ptr++ = htonl((TCPOPT_MSS << 24) | + (TCPOLEN_MSS << 16) | + opts->mss); + } + if (likely(OPTION_TS & opts->options)) { if (unlikely(OPTION_SACK_ADVERTISE & opts->options)) { *ptr++ = htonl((TCPOPT_SACK_PERM << 24) | @@ -392,12 +398,6 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, *ptr++ = htonl(opts->tsecr); } - if (unlikely(opts->mss)) { - *ptr++ = htonl((TCPOPT_MSS << 24) | - (TCPOLEN_MSS << 16) | - opts->mss); - } - if (unlikely(OPTION_SACK_ADVERTISE & opts->options && !(OPTION_TS & opts->options))) { *ptr++ = htonl((TCPOPT_NOP << 24) | On Mon, Oct 20, 2008 at 12:38:51PM +0300, Ilpo J On Mon, 20 Oct 2008, Jarek Poplawski wrote:
> On Mon, Oct 20, 2008 at 12:38:51PM +0300, Ilpo J
ok setting echo 0 > /proc/sys/net/ipv4/tcp_timestamps solves the problem as well. anyway i'm willing to do whatever test you deem necessary if this can help you solve the issue, so as soon as i can i'll apply the patch provided by Ilpo Järvinen and compile (do not worry i've been a linux user for the last 10 years and up to 4 or 5 yrs ago i used to compile even three times a week!) thanks aldo On Mon, Oct 20, 2008 at 08:47:37AM -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11721 ... > ------- Comment #40 from sentiniate@tiscali.it 2008-10-20 08:47 ------- > ok setting echo 0 > /proc/sys/net/ipv4/tcp_timestamps solves the problem as > well. > > anyway i'm willing to do whatever test you deem necessary if this can help > you > solve the issue, so as soon as i can i'll apply the patch provided by Ilpo > J Ilpo, i'm sending herewith attached the output of tcpdump -i eth0 -nXX -c3 'dst port 80 and tcp-syn != 0' on topolino when: 1) kernel 2.6.26.6 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "1" 2) kernel 2.6.26.6 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "0" 3) kernel 2.6.27-rc1-git1 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "1" 4) kernel 2.6.27-rc1-git1 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "0" moreover, i read more carefully what you and Jarek wrote above, please let me know if you deem still necessary that i apply the patch you provided and compile or if you like i compile the latest kernel or whatever. thanks aldo Created attachment 18381 [details]
kernel 2.6.26.6 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "1"
Created attachment 18382 [details]
kernel 2.6.26.6 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "0"
Created attachment 18383 [details]
kernel 2.6.27-rc1-git1 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "1"
Created attachment 18384 [details]
kernel 2.6.27-rc1-git1 is running on paperino and tcp_window_scaling, tcp_timestamps, tcp_sack are set to "0"
Ah, I forgot to add bugzilla back last time, so added it there now.
On Tue, 21 Oct 2008, Aldo Maggi wrote:
> just as matter of information, two other cases similar to mine were
> reported in ubuntu bug pages:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/272896
> https://bugs.launchpad.net/ubuntu/+bug/285430
> i originated the first one and gave Jarek's first solution:
> tcp_sack=0
> which worked for the two other users.
>
> maybe they could be contacted in order to perform further tests in a
> different environment.
It's hardly surprising that I couldn't reproduce this, the non-compliance
here is most probably in isp's device or in some end-user sold embedded
device.
Can you try this another debug patch below (on 2.6.27.2 is fine). It moves
the mss to the last position but should keep timestamps in place by making
wscale as first option. It is well possible that you won't get it working
at all except with all ts,sack and wscale set to 0 (the most likely
result). Please try with all wscale,sack,ts combinations (no need to
provide dumps, just working/not working per case)... This should
tell us for quite high certaintity what is the actual option which is
causing this (would it not be the mss-at-beginning which is the most
likely cause), however, your finding may well be specific to your network
while the other people might a bit different results.
In order to provide maximal compatibility, I think we just restore the
previous ordering of the fields (basically the first patch you tested).
It has no additional cost, so it won't hurt any, but it's quite ridiculous
still that some devices care so little about basic tcp spec which has
devastating effect on interoperatibility here, they should really fix
the devices instead but knowing how little most of the isp & etc. care
(or even understand) I'm not expecting too much to happen on that
front, and those who care probably run some semi-sane stuff already
anyway... :-). ...Sadly, it's much easier and cheaper to blame the
end-user's equipment or Linux (if/once it becomes known that it's in use)
and do nothing in case one is fool enough to complain to them.
--
i.
--
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index de54f02..63b0a3f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -376,6 +376,13 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
*md5_hash = NULL;
}
+ if (unlikely(opts->ws)) {
+ *ptr++ = htonl((TCPOPT_NOP << 24) |
+ (TCPOPT_WINDOW << 16) |
+ (TCPOLEN_WINDOW << 8) |
+ opts->ws);
+ }
+
if (likely(OPTION_TS & opts->options)) {
if (unlikely(OPTION_SACK_ADVERTISE & opts->options)) {
*ptr++ = htonl((TCPOPT_SACK_PERM << 24) |
@@ -392,12 +399,6 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
*ptr++ = htonl(opts->tsecr);
}
- if (unlikely(opts->mss)) {
- *ptr++ = htonl((TCPOPT_MSS << 24) |
- (TCPOLEN_MSS << 16) |
- opts->mss);
- }
-
if (unlikely(OPTION_SACK_ADVERTISE & opts->options &&
!(OPTION_TS & opts->options))) {
*ptr++ = htonl((TCPOPT_NOP << 24) |
@@ -406,11 +407,10 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
TCPOLEN_SACK_PERM);
}
- if (unlikely(opts->ws)) {
- *ptr++ = htonl((TCPOPT_NOP << 24) |
- (TCPOPT_WINDOW << 16) |
- (TCPOLEN_WINDOW << 8) |
- opts->ws);
+ if (unlikely(opts->mss)) {
+ *ptr++ = htonl((TCPOPT_MSS << 24) |
+ (TCPOLEN_MSS << 16) |
+ opts->mss);
}
if (unlikely(opts->num_sack_blocks)) {
On Tue, Oct 21, 2008 at 12:36:33PM +0300, Ilpo J On Tue, 21 Oct 2008, Jarek Poplawski wrote:
> On Tue, Oct 21, 2008 at 12:36:33PM +0300, Ilpo J
On Tue, Oct 21, 2008 at 01:51:10PM +0300, Ilpo J On Tue, 21 Oct 2008, Jarek Poplawski wrote:
> On Tue, Oct 21, 2008 at 01:51:10PM +0300, Ilpo J
On Tue, Oct 21, 2008 at 03:18:57PM +0300, Ilpo J On Tue, 21 Oct 2008, Jarek Poplawski wrote:
> On Tue, Oct 21, 2008 at 03:18:57PM +0300, Ilpo J
On Tue, Oct 21, 2008 at 05:16:43PM +0300, Ilpo J Il giorno Tue, 21 Oct 2008 12:36:33 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> ha scritto: [...] > > Can you try this another debug patch below (on 2.6.27.2 is fine). It > moves the mss to the last position but should keep timestamps in > place by making wscale as first option. It is well possible that you > won't get it working at all except with all ts,sack and wscale set to > 0 (the most likely result). Please try with all wscale,sack,ts > combinations (no need to provide dumps, just working/not working per > case)... This should tell us for quite high certaintity what is the > actual option which is causing this (would it not be the > mss-at-beginning which is the most likely cause), however, your > finding may well be specific to your network while the other people > might a bit different results. i've compiled 2.6.27.2 source after having patched it with your today's patch. it works! i.e. i can navigate (w3m kernel.org) and update (apt-get update) i send you anyway the usual tcpdump as in previous msgs, should it be of any help. do you want me to provide some more commands output? waiting... aldo On Tue, Oct 21, 2008 at 08:18:57PM +0200, Aldo Maggi wrote:
...
> i've compiled 2.6.27.2 source after having patched it with your today's
> patch.
>
> it works! i.e. i can navigate (w3m kernel.org) and update (apt-get
> update)
Ilpo, I should say you're incredible! ...But, since I've promised to
myself not to disturb you anymore (again), I can't do this, sorry :-(
Jarek P.
From: Jarek Poplawski <jarkao2@gmail.com> Date: Tue, 21 Oct 2008 20:45:39 +0200 > On Tue, Oct 21, 2008 at 08:18:57PM +0200, Aldo Maggi wrote: > ... > > i've compiled 2.6.27.2 source after having patched it with your today's > > patch. > > > > it works! i.e. i can navigate (w3m kernel.org) and update (apt-get > > update) > > Ilpo, I should say you're incredible! ...But, since I've promised to > myself not to disturb you anymore (again), I can't do this, sorry :-( Indeed, excellent work Ilpo. Ilpo, now that we know this fixes things for sure, could you submit this formally with a proper signoff? I'll queue it up for -stable too. Thanks! On Tue, 21 Oct 2008, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 21 Oct 2008 20:45:39 +0200
>
> > On Tue, Oct 21, 2008 at 08:18:57PM +0200, Aldo Maggi wrote:
> > ...
> > > i've compiled 2.6.27.2 source after having patched it with your today's
> > > patch.
> > >
> > > it works! i.e. i can navigate (w3m kernel.org) and update (apt-get
> > > update)
> >
> > Ilpo, I should say you're incredible! ...But, since I've promised to
> > myself not to disturb you anymore (again), I can't do this, sorry :-(
>
> Indeed, excellent work Ilpo.
>
> Ilpo, now that we know this fixes things for sure, could you submit
> this formally with a proper signoff?
>
> I'll queue it up for -stable too.
>
> Thanks!
Sure, here below is one with a warning (it's the first patch + comment).
Olon, can you please check this as well if it affect to your case too
(though the symptoms were not that clear in your case).
It would be nice for Aldo to check what the result will be with my second
patch (only) using sack=1,ts=0,wscale=0. I guess it does but it's a bit
unclear if nop's in front help or not (having the patch below should
anyway help also in that case as the mss option gets moved before it
anyway).
--
[PATCH] tcp: Restore ordering of TCP options for the sake of inter-operability
This is not our bug! Sadly some devices cannot cope with the change
of TCP option ordering which was a result of the recent rewrite of
the option code (not that there was some particular reason steming
from the rewrite for the reordering) though any ordering of TCP
options is perfectly legal. Thus we restore the original ordering
to allow interoperability with/through such broken devices and add
some warning about this trap. Since the reordering just happened
without any particular reason, this change shouldn't cost us
anything.
There are already couple of known failure reports (within close
proximity of the last release), so the problem might be more
wide-spread than a single device. And other reports which may
be due to the same problem though the symptoms were less obvious.
Analysis of one of the case revealed (with very high probability)
that sack capability cannot be negotiated as the first option
(SYN never got a response).
Signed-off-by: Ilpo J
Il giorno Wed, 22 Oct 2008 13:00:01 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> ha scritto: [...] > > It would be nice for Aldo to check what the result will be with my > second patch (only) using sack=1,ts=0,wscale=0. I guess it does but > it's a bit unclear if nop's in front help or not (having the patch > below should anyway help also in that case as the mss option gets > moved before it anyway). in order to avoid misunderstandings, i list herebelow my actions: i've used on paperino the kernel 2.6.27.2 patched with ilpo's 21.10.2008 patch (see please comment 47 in http://bugzilla.kernel.org/show_bug.cgi?id=11721, and NOT with the patch contained in comment 58, right?) i've modified the following files: echo 0 > /proc/sys/net/ipv4/tcp_window_scaling echo 0 > /proc/sys/net/ipv4/tcp_timestamps echo 1 > /proc/sys/net/ipv4/tcp_sack the results are that i CAN navigate (w3m kernel.org) and update my system (apt-get update in debian). should it be of any help to ilpo, i attach the usual tcpdump on the wan eth of topolino (my home server/gw) ciao! aldo [...] On Wed, 22 Oct 2008, Aldo Maggi wrote:
> Il giorno Wed, 22 Oct 2008 13:00:01 +0300 (EEST)
> "Ilpo J
From: "Ilpo J On Sat, Oct 25, 2008 at 10:15:15PM +0200, Aldo Maggi wrote:
> Il giorno Tue, 21 Oct 2008 07:49:54 +0000
> Jarek Poplawski <jarkao2@gmail.com> ha scritto:
>
> > On Tue, Oct 21, 2008 at 09:27:21AM +0200, Aldo Maggi wrote:
> > ...
> > > as soon as i've time i'll replace the modem and run some tests.
>
> just to let you know!
> i have changed my modem with a new one.
> the bug has disappeared with no modification of the tcp files.
> this shows that the problem was due to my old zyxel.
>
> ciao!
> aldo
>
Aldo, I think it's a very useful information, so I forward this to the
people.
Thanks again,
Jarek P.
Do we know which are the affected routers/firewalls? We should name and shame them, and ensure that bug reports are filed. Just in case you're reading this bug report and wondering what you should do with this bug... There are some analysis which are claimed to be "very good" circulating around which tell you that Linux now enabled timestamps in 2.6.27. Sadly that is a _false_ claim, the timestamps were _not_ enabled for 2.6.27 (or -rc1). Don't be fooled wide-spreadness of the claims, sadly multiple distros seem to repeat and support the false claim in their "semi-official" documentation. Timestamps have been enabled already for a very long time before 2.6.27... The real change that happened from 2.6.26 to 2.6.27, as described in the commit which fixes this particular bug, was a change in the _order_ of the TCP options (timestamps are tcp options). The order change was not intentional, not that it should have broken something. The correct fix is to restore the original ordering. ...It is very easy to verify with tcpdump if 2.6.26 does send timestamps or not (hint: look into SYN packet's TCP options). I am still seeing this issue on a PPP connection with 2.6.27. 2.6.24-21 (the previous Ubuntu kernel) and a vanilla 2.6.26 kernel from kernel.org are fine. Aldo has suggested I reopen this and add the outputs of the scenarios as per comment #42. Created attachment 18741 [details]
Output of tcpdump scenarios; different kernels and tcp /proc settings
> ------- Comment #66 from speedster@haveacry.com 2008-11-08 22:03 -------
> I am still seeing this issue on a PPP connection with 2.6.27. 2.6.24-21 (the
> previous Ubuntu kernel) and a vanilla 2.6.26 kernel from kernel.org are fine.
> Aldo has suggested I reopen this and add the outputs of the scenarios as per
> comment #42.
Dean, it looks like a different problem yet, so better try to open a
new bugzilla report with more description, .configs, dmesgs, ifconfigs
etc. for working and not working case. BTW, what about ping or
connecting to other sites? Could you attach more verbose, not filtered
tcpdump on an interface closest to the internet while pinging and www?
Thanks,
Jarek P.
|