Bug 8688 - r8169: high latency when packet fragmentation occurs (NFS)
Summary: r8169: high latency when packet fragmentation occurs (NFS)
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-28 16:59 UTC by Sylvain Le Gall
Modified: 2007-10-07 06:34 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.21.5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
config-2.6.21.5-core2 (45.93 KB, text/plain)
2007-06-28 17:02 UTC, Sylvain Le Gall
Details
dmesg-2.6.21.5-core2 (30.97 KB, text/plain)
2007-06-28 17:04 UTC, Sylvain Le Gall
Details
ifconfig-2.6.21.5-core2 (2.45 KB, text/plain)
2007-06-28 17:05 UTC, Sylvain Le Gall
Details
interrupts-2.6.21.5-core2 (751 bytes, text/plain)
2007-06-28 17:05 UTC, Sylvain Le Gall
Details
lsmod-2.6.21.5-core2 (1.90 KB, text/plain)
2007-06-28 17:06 UTC, Sylvain Le Gall
Details
lspci-2.6.21.5-core2 (1.81 KB, text/plain)
2007-06-28 17:06 UTC, Sylvain Le Gall
Details
r1000 driver (works) (17.44 KB, application/x-gtar)
2007-07-04 15:44 UTC, Sylvain Le Gall
Details
r1000 driver (oops) (39.12 KB, application/zip)
2007-07-04 15:45 UTC, Sylvain Le Gall
Details
Difference between r1000 v1.05 and v1.05a_20070227 (kontron) (7.33 KB, text/x-patch)
2007-09-05 12:53 UTC, Sylvain Le Gall
Details

Description Sylvain Le Gall 2007-06-28 16:59:53 UTC
Most recent kernel where this bug did not occur:
Distribution: Debian
Hardware Environment: MB Kontron 986LCD, Core 2 Duo T7600, 2GB DDR, plug + cable Cat6 shielded (SFTP)
Software Environment: Debian Unstable
Problem Description:

When using r8169 module, i must wait for a random time to get the whole information on big NFS shared directory. The time range from 0.5 to 13s. After the first call, the second is almost immediate. Looking at wireshark/tcpdump, it seems to happen every time i get a "Reassembled PDU" (the time between first packet and this PDU determine how much time between 0.5 and 13s it will take).
It seems that the situation get worst with time.

This is a bit annoying because the "big" directory is my home dir!

Using r1000 (realtek driver) 1.05 (not 1.05a which oops), i have a constant time of 0.005s. 

I have no problem of speed and latency in other case.

Steps to reproduce:

Computer A has a RTL8111/8168B PCI Express Gigabit Ethernet controller
Computer B is a standard computer

1. create a /dir/ with a lot of hidden file (.XXX) on A
2. export /dir/ with NFS from A to B
2. mount /dir/ && umount /dir/ && time ls /dir/ on B

It can also be useful to "touch /dir/test" which seems to trigger a new RPC call to A.
Comment 1 Sylvain Le Gall 2007-06-28 17:02:00 UTC
Created attachment 11897 [details]
config-2.6.21.5-core2
Comment 2 Sylvain Le Gall 2007-06-28 17:04:45 UTC
Created attachment 11898 [details]
dmesg-2.6.21.5-core2
Comment 3 Sylvain Le Gall 2007-06-28 17:05:16 UTC
Created attachment 11899 [details]
ifconfig-2.6.21.5-core2
Comment 4 Sylvain Le Gall 2007-06-28 17:05:40 UTC
Created attachment 11900 [details]
interrupts-2.6.21.5-core2
Comment 5 Sylvain Le Gall 2007-06-28 17:06:00 UTC
Created attachment 11901 [details]
lsmod-2.6.21.5-core2
Comment 6 Sylvain Le Gall 2007-06-28 17:06:20 UTC
Created attachment 11902 [details]
lspci-2.6.21.5-core2
Comment 7 Francois Romieu 2007-07-04 11:52:28 UTC
Can you send your r100 driver (version 1.05 and 1.05a) ?

Thanks in advance.

-- 
Ueimor
Comment 8 Sylvain Le Gall 2007-07-04 15:26:24 UTC
Where should i put the source ? (attached to this bugzilla/send to you by email). 
Comment 9 Francois Romieu 2007-07-04 15:36:54 UTC
Subject: Re:  r8169: high latency when packet fragmentation occurs (NFS)

sylvain@le-gall.net  2007-07-04 15:26:
> Where should i put the source ? (attached to this bugzilla/send to you by
> email). 

Please add it to bugzilla.
Comment 10 Sylvain Le Gall 2007-07-04 15:44:24 UTC
Created attachment 11943 [details]
r1000 driver (works)
Comment 11 Sylvain Le Gall 2007-07-04 15:45:08 UTC
Created attachment 11944 [details]
r1000 driver (oops)
Comment 12 Oliver Hamann 2007-07-22 08:46:22 UTC
I have a similar problem with the r8169 driver:

Machine A:
 SuSE Linux 9.3 64-Bit
 Kernel 2.6.21.6 or even 2.6.20.4 (from kernel.org, unmodified)
 Mainboard: ASUS P5B with on-board Realtek PCI-E Gigabit LAN controller,
 reported by the kernel driver as:
 "eth1: RTL8168b/8111b at 0xffffc20000030000, 00:18:f3:51:dd:16, IRQ 19"

Machine B:
 Windows XP System

Machine A connects as a Samba client to Machine B.

The problem is: When I edit a large file of machine B in a text editor on
machine A (via Samba), and when I save the file, the saving always hangs for
many minutes. It's even the same when editing an image file with a paint
program.

Usually the saving stops hanging and finishes when I play with a VNC client on
machine A, which even connects to machine B (just move the mouse across the VNC
window).

Copying files via the Samba connection in a file manager or shell works
without any problems, and unlike VNC, it does not let the editors continue to
save. (yes, this is somehow unbelievable)

When using another network card and driver, everything works fine.
Comment 13 Sylvain Le Gall 2007-08-04 07:23:08 UTC
I have switch to 2.6.22.1 with vserver patch (sorry, if really needed i can remove this patch).

The problem is still here, but i have additional data:
* the problem goes worse with time !

Yesterday, i just have rebooted my computer to take the new kernel into account and i was thinking the bug was gone (almost no problem).

Today, i have a :
gildor@grand:~$ time ls
bin     Desktop    download  images        news      programmation  teleir  
debian  documents  GNUstep   mail-archive  playlist  public_html    tmp

real    0m3.410s
user    0m0.000s
sys     0m0.000s

Which is not good at all (i.e. the two computer are one 3Com switch away and connected with Cat 6 cable... which should work fine).

Now, the next problem is: r1000 doesn't compile on this new kernel version.

So i really need this driver ;-) (which is far better than the r1000).

Let me know if you have problem reproducing it, i can do some test case.

Thanks and regards
Sylvain Le Gall
Comment 14 Sylvain Le Gall 2007-08-20 14:54:00 UTC
Hello,

Using the same computer/switches/computer but only adding a new Gigabit NIC (DGE 530T/skge) gives me a stable result -- without the latency of the r8169:

gildor@grand:~$ time ls 
bin  debian  Desktop  documents  download  GNUstep  images  mail-archive  news  playlist  programmation  public_html  teleir  tmp  unison.log

real    0m0.045s
user    0m0.000s
sys     0m0.004s

Even if i have a solution to circumvent this problem, i would really like to solve it. Let me know if you need more data to solve it.

Regard
Sylvain Le Gall
Comment 15 Francois Romieu 2007-08-21 06:03:14 UTC
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> :
[...]
> Even if i have a solution to circumvent this problem, i would really like to
> solve it. Let me know if you need more data to solve it.

If you have some spare time, can you try against 2.6.23-rc3:
http://www.fr.zoreil.com/people/francois/misc/20070818-2.6.23-rc3-r8169-test.patch 
 
or (tarball sits one level higher):
 
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc3/r8169-20070818/

(do not hurry, I am far away from my usual computers until 31/08).
Comment 16 Sylvain Le Gall 2007-09-05 12:48:47 UTC
After some mail exchanged with the support of Kontron (provider of my motherboard). They told me that there is an alignement problem in the RX buffer. If you are interested, i have the Realtek driver and their driver, which show a difference in the file r1000_n.c 

It could be a solution to the problem, but the driver doesn't compile (too old). 

Kontron people told me that this problem is not related to their motherboard but can be found on any chipset of Realtek...

I attach the diff.

Please let me know if you still want me to test 20070818-2.6.23-rc3-r8169-test.patch 

Regards
Sylvin Le Gall
Comment 17 Sylvain Le Gall 2007-09-05 12:53:22 UTC
Created attachment 12716 [details]
Difference between r1000 v1.05 and v1.05a_20070227 (kontron)

Show the difference in RX buffer alignement in the r1000 driver
Comment 18 Francois Romieu 2007-09-05 13:08:42 UTC
Hi Sylvain,

1. Please try 2.6.23-rc5 (without highres timer).
2. If it still does not work correctly, please try the patches at http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc5/r8169-20070903
   Each patch of the serie applies on top of the previous one.

If you need to go through 2., I would expect a change of behavior as
soon as patch #0002 of the serie is applied.

-- 
Ueimor
Comment 19 Adrian Bunk 2007-10-06 14:19:49 UTC
Please reopen this bug if:
- it is still present with kernel 2.6.23-rc9 and
- you have done the further requested testing.
Comment 20 Francois Romieu 2007-10-07 06:33:39 UTC
The bug is fixed by d78ae2dcc2acebb9a1048278f47f762c069db75c 
which has been merged during 2.6.23-rc and Sylvain has done
further testing.

Adrian, I'd appreciate if you sent a short notice before
rejecting any bug that I follow. Thanks.

-- 
Ueimor

Note You need to log in before you can comment on or make changes to this bug.