Bug 10063 - Network problems with >2.6.23.16
Summary: Network problems with >2.6.23.16
Status: VERIFIED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Ilpo Järvinen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-21 18:04 UTC by Sebastian Hyrwall
Modified: 2008-05-12 18:39 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.24.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
tcpdump-output of transfer stalling (1.21 KB, text/x-log)
2008-02-21 18:21 UTC, Sebastian Hyrwall
Details
Fix part 1 (4.21 KB, patch)
2008-05-12 05:55 UTC, Ilpo Järvinen
Details | Diff
Fix part 2 (1.93 KB, patch)
2008-05-12 05:55 UTC, Ilpo Järvinen
Details | Diff

Description Sebastian Hyrwall 2008-02-21 18:04:45 UTC
Latest working kernel version: 2.6.23
Earliest failing kernel version: 2.6.24
Distribution: Gentoo
Hardware Environment: 

model name      : Dual Core AMD Opteron(tm) Processor 170
stepping        : 2
cpu MHz         : 2009.144

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) sky2

Working on 2.6.24 (as far as I can see) in :

model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 9
cpu MHz         : 2793.222
01:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet Controller

Software Environment: TCP-transfers (FTP,HTTP)
Problem Description:

First I want to apologize that I'm not able to do alot of troubleshooting on this issue cause I don't know where to start exactly.

The problem is that I have, after upgrading to 2.6.24 from 2.6.23, experienced alot of "weird" network problems. For example, FTP-transfers to certain hosts on the Internet stalls after a while. I have one dst-host where this problem appears frequently. On another dst-host it doesn't happen as often. Sometimes the transfers doesn't even start. 

First we did alot of faultracing on the transmission but after downgrading the kernel it was apparent that the transmission wasn't the problem. 

I did not think about filing this report until i saw, http://kerneltrap.org/node/15550. So I guess I'm not alone.

I must also add that to some dst-hosts there is no problem at all. So I would guess that there might be something with the mtu,wscale or something between hosts. 

All the other dst-hosts I've tried is running 2.6.23

I will try to test alot of kernel-version to try to ease the troubleshooting. 


Steps to reproduce:

Start an FTP-transfer from the box mention above.
Comment 1 Sebastian Hyrwall 2008-02-21 18:21:49 UTC
Created attachment 14945 [details]
tcpdump-output of transfer stalling
Comment 2 Sebastian Hyrwall 2008-02-21 18:23:04 UTC
Problem still exists in 2.6.25-rc2.

Attached tcpdump-log from 2.6.25-rc2. Transfer stalls at 03:18:04.452118.

Y = Server in question running 2.6.25-rc2 atm.
X = Running 2.6.22
Comment 3 Sebastian Hyrwall 2008-02-21 18:47:53 UTC
2.6.23.5 doesn't seem to be affected. 

Also okay in 2.6.23.16. Guess It's time for the 2.6.24 rc's.
Comment 4 Sebastian Hyrwall 2008-02-21 19:11:25 UTC
Problem exists in 2.6.24-rc4 time to move down.
Comment 5 Sebastian Hyrwall 2008-02-21 19:41:51 UTC
Problem exists in 2.6.24-rc1. It's getting tricky now :)
Comment 6 Sebastian Hyrwall 2008-02-21 20:02:46 UTC
Sorry don't know how to proceed now. Maybe It's just a sky2-problem which happened between 2.6.23.16<->2.6.24-rc1.

I am unable to compile the 2.6.23.16 sky2-driver in 2.6.24-rc1 because of functions that changed names etc. If no one has any other idea (and if someone assigns this bug that is) I can try the 2.6.23.16 sky2-driver if someone can make it 2.6.24-rc1 compatible.
Comment 7 Anonymous Emailer 2008-02-24 23:09:55 UTC
Reply-To: akpm@linux-foundation.org


(plese respond via emailed reply-to-all, not via the bugzilla web interface)

On Thu, 21 Feb 2008 18:04:45 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10063

There's more info (including a tcpdump) in bugzilla.

>            Summary: Network problems with 2.6.23+
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.24.2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: zibbe@cisko.org
> 
> 
> Latest working kernel version: 2.6.23
> Earliest failing kernel version: 2.6.24
> Distribution: Gentoo
> Hardware Environment: 
> 
> model name      : Dual Core AMD Opteron(tm) Processor 170
> stepping        : 2
> cpu MHz         : 2009.144
> 
> 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
> Gigabit Ethernet Controller (rev 15) sky2
> 
> Working on 2.6.24 (as far as I can see) in :
> 
> model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
> stepping        : 9
> cpu MHz         : 2793.222
> 01:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet
> Controller
> 
> Software Environment: TCP-transfers (FTP,HTTP)
> Problem Description:
> 
> First I want to apologize that I'm not able to do alot of troubleshooting on
> this issue cause I don't know where to start exactly.
> 
> The problem is that I have, after upgrading to 2.6.24 from 2.6.23,
> experienced
> alot of "weird" network problems. For example, FTP-transfers to certain hosts
> on the Internet stalls after a while. I have one dst-host where this problem
> appears frequently. On another dst-host it doesn't happen as often. Sometimes
> the transfers doesn't even start. 
> 
> First we did alot of faultracing on the transmission but after downgrading
> the
> kernel it was apparent that the transmission wasn't the problem. 
> 
> I did not think about filing this report until i saw,
> http://kerneltrap.org/node/15550. So I guess I'm not alone.
> 
> I must also add that to some dst-hosts there is no problem at all. So I would
> guess that there might be something with the mtu,wscale or something between
> hosts. 
> 
> All the other dst-hosts I've tried is running 2.6.23
> 
> I will try to test alot of kernel-version to try to ease the troubleshooting. 
> 
> 
> Steps to reproduce:
> 
> Start an FTP-transfer from the box mention above.

I guess it might help if you can tell us the IP address of some of the
problematic servers.
Comment 8 Stephen Hemminger 2008-04-09 07:41:36 UTC
Are you sure this isn't a window scale corrupting firewall problem.
Does turning off window scaling
  echo 0 >/proc/net/ipv4/tcp_window_scaling
fix the problem?
Comment 9 Ilpo Järvinen 2008-05-06 07:04:47 UTC
I wonder if there's a larger fragment of that tcpdump log (still) available? The given one lacks info about outstanding window because it's cut from a too late point. I would need _all_ segments which have seqno 17302373 or above it (perhaps even more than that to explain the events that happen). And I doubt that's the end of the flow either, where's the reset in that case?
Comment 10 Ilpo Järvinen 2008-05-12 05:55:09 UTC
Created attachment 16111 [details]
Fix part 1
Comment 11 Ilpo Järvinen 2008-05-12 05:55:39 UTC
Created attachment 16112 [details]
Fix part 2
Comment 12 Ilpo Järvinen 2008-05-12 06:02:37 UTC
The attached patches should cure the behavior that was seen in the tcpdump snippet, the latter of the fixes should go to mainline soon (the first on already is in Linus' git tree), and eventually to stables too.
Comment 13 Sebastian Hyrwall 2008-05-12 18:39:52 UTC
Hi

Sorry for not responding faster with the info requested above. I've also been unable to take the affected servers down to do various test.

I can however confirm that after applying the patches, given by Ilpo in the recent post, the problem seems to be gone. Or atleast I haven't experienced the problem while doing filetransfers for 2h. 

Thanks for the great work :)

Note You need to log in before you can comment on or make changes to this bug.