Bug 7924

Summary: same issue as closed bug 7555 with r8169 and slow transfer
Product: Drivers Reporter: Tom Van den Eynde (tom)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, nord73, Roel.Teuwen, romieu
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.19.2 Subsystem:
Regression: --- Bisected commit-id:
Attachments: .config 2.6.19.2
dmesg 2.6.19.2
ifconfig 2.6.19.2
interupts 2.6.19.2
lsmod 2.6.19.2
lspci 2.6.19.2
r8168 driver version 8.001.00 + compilation fixes
lspci -vvx from 2.6.20
.config for 2.6.21.1
PHY power-on change

Description Tom Van den Eynde 2007-02-02 07:29:17 UTC
Most recent kernel where this bug did *NOT* occur: n/a
Distribution: Debian Stable
Hardware Environment: Asus P5B-VM, Core 2 Duo, 
Software Environment: kernel 2.6.19.2 (SMP) with latest samba
Problem Description:
for a detailed description of the issue, please refer to
http://groups.google.com/group/linux.samba/browse_thread/thread/1e10d5477e92c5da/6a63ceca71472c0a?lnk=st&q=samba+running+slow&rnum=1&hl=en#6a63ceca71472c0a

and 
http://groups.google.com/group/linux.samba/browse_frm/thread/8bf6e9791ae9b3cd/0a038c363be7ba2a#0a038c363be7ba2a
as it was initially thouht to be a samba issue.
However disabling the on-board network card and installing a simple 100 Mbit old
RealTek card worked perfectly.
Therefore, I think it is an issue with the r8169 driver.
I also triend to insmod the r1000 (v1.05) module created by Realtek but that
made the kernel panic so I don't know if the issue also exists in Realtek's own
module.

Kind regards,

Tom
Comment 1 Francois Romieu 2007-02-02 08:53:10 UTC
Please attach the following informations to the current PR :
- complete (untruncated) dmesg output (add the kernel version in the description
of the attachment);
- /sbin/lspci -vvx
- /sbin/lsmod
- cat /proc/interrupts (add the kernel version in the description of the attachment)
- /sbin/ifconfig
- kernel .config (add the kernel version in the description of the attachment)
- version of the bios

-- 
Ueimor
Comment 2 Tom Van den Eynde 2007-02-02 10:27:47 UTC
Created attachment 10261 [details]
.config 2.6.19.2
Comment 3 Tom Van den Eynde 2007-02-02 10:28:15 UTC
Created attachment 10262 [details]
dmesg 2.6.19.2
Comment 4 Tom Van den Eynde 2007-02-02 10:28:59 UTC
Created attachment 10263 [details]
ifconfig 2.6.19.2

ifconfig with sanitized IP addresses
Comment 5 Tom Van den Eynde 2007-02-02 10:29:45 UTC
Created attachment 10264 [details]
interupts 2.6.19.2

output of cat /proc/interupts
Comment 6 Tom Van den Eynde 2007-02-02 10:30:16 UTC
Created attachment 10265 [details]
lsmod 2.6.19.2

output of lsmod
Comment 7 Tom Van den Eynde 2007-02-02 10:31:14 UTC
Created attachment 10266 [details]
lspci 2.6.19.2

output of lspci
Comment 8 Tom Van den Eynde 2007-02-02 10:32:47 UTC
I will provide you with a bios version as soon as I can reboot the box.

Thanks in advance,

Tom
Comment 9 Tom Van den Eynde 2007-02-03 07:51:23 UTC
BIOS revision is 0307
Comment 10 Tom Van den Eynde 2007-02-05 12:15:20 UTC
I just installed the latest kernel 2.6.20 and upgraded the BIOS to the latest
0405 release. Same issue occurs
Comment 11 Francois Romieu 2007-02-05 14:20:23 UTC
Created attachment 10297 [details]
r8168 driver version 8.001.00 + compilation fixes

Can you give the attached driver a try ?

Just drop it as a replacement to the current drivers/net/r8169.c file.

-- 
Ueimor
Comment 12 Tom Van den Eynde 2007-02-09 11:02:44 UTC
Hi, I tried the driver in 2.6.20 but with no look. Issue is still the same.

I suffered from a disk drive crash so I had to reinstall the box and reinstalled
it with a x86_64 2.6.20 now. Also tried the driver in there but same issue remains.
Comment 13 Francois Romieu 2007-02-19 15:16:39 UTC
Can you try the latest patch attached to
http://bugzilla.kernel.org/show_bug.cgi?id=5137 ?

It should not eat babies but it may be a bit rough.

-- 
Ueimor
Comment 14 Tom Van den Eynde 2007-02-24 03:19:52 UTC
I tried the patch but it didn't solve the issue
Comment 15 Francois Romieu 2007-02-24 03:46:30 UTC
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> :
[...]
> I tried the patch but it didn't solve the issue

Just to be sure: you tried attachments 10512 + 10515, right ?

Comment 16 Tom Van den Eynde 2007-02-24 03:52:27 UTC
Oh, I didn't see that. I compiled the kernel on the 20th. So I only used nr 10465

So I have to do  10512 + 10515. I will get on it this afternoon
Comment 17 Tom Van den Eynde 2007-02-24 05:20:03 UTC
I triend with the 2 suggested patches (against 2.6.21-rc1) but no luck.
The issue is still there
Comment 19 Tom Van den Eynde 2007-03-22 10:04:13 UTC
Hello,

I tried with 2.6.21-rc4 but the issue is still there.

Kind regards,

Tom
Comment 20 Francois Romieu 2007-03-22 13:27:22 UTC
Tom Van den Eynde  2007-03-22 10:04:
[...]
> I tried with 2.6.21-rc4 but the issue is still there.

It is not too surprizing as the patches in #18 are not in.

Can you try:
http://www.fr.zoreil.com/people/francois/misc/20070316-2.6.21-rc4-r8169-test.patch

-- 
Ueimor
Comment 21 Tom Van den Eynde 2007-03-29 12:46:28 UTC
I tried the patch you suggested but the same issue still occurs.
Comment 22 Francois Romieu 2007-03-29 13:24:23 UTC
tom@vandeneynde.net:
> I tried the patch you suggested but the same issue still occurs.

Ok, thanks.

From now on, please work with the last rc candidate + the aforementionned
patch _without_ NAPI. It should still suck.

I would then welcome a pcap (tcpdump/tethereal) dump of a few seconds of
traffic for both the r8169 and the working network card. The more you use
the same sequence for both tests, the easier the comparison.

Please send the detailled + registers output of mii-tool for both too.
It could give a hint.

Comment 23 Tom Van den Eynde 2007-03-29 13:38:26 UTC
OK, I will take the pcaps this weekend.
Should I test with rc-5 + the patch you provided?
What do you mean with NAPI?
Comment 24 Francois Romieu 2007-03-29 13:48:21 UTC
tom@vandeneynde.net:
> OK, I will take the pcaps this weekend.

Excellent.

> Should I test with rc-5 + the patch you provided?

Or latest git at your convenance.

> What do you mean with NAPI?

Disable CONFIG_R8169_NAPI

Comment 25 Tom Van den Eynde 2007-03-31 06:41:12 UTC
Hello,

You can download the requested debug info at
http://www.vandeneynde.net/debugr8169.tar.bz2

The archive containts the following
-rwx------ 1 tvde tvde 111M 2007-03-31 00:59 e100.cap
-rwx------ 1 tvde tvde  394 2007-03-31 00:59 e100.mii
-rwx------ 1 tvde tvde 6.9K 2007-03-31 01:01 e100.png
-rwx------ 1 tvde tvde  223 2007-03-31 01:19 kernel.txt
-rwx------ 1 tvde tvde  408 2007-03-31 00:33 r8169.mii
-rwx------ 1 tvde tvde 8.6M 2007-03-31 01:04 realtek.cap
-rwx------ 1 tvde tvde 7.8K 2007-03-31 01:05 realtek.png

The cap are full snaplength pcaps taken when trying to copy a 750Mb file over
SMB. The .mii is the mii-tool output and the .png are screenshots taken to show
the end users' problem.

If you need more info, just let me know.

Kind regards,

Tom
Comment 26 Roel Teuwen 2007-05-10 11:10:43 UTC
I'm seeing the exact same problem on 2.6.21.1 using either of the two onboard
interfaces :

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI E
                                                                               
                                    xpress Gigabit Ethernet controller (rev 01)
        Subsystem: ABIT Computer Corp. Unknown device 1073
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step
                                                                               
                                    ping- SERR- FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
                                                                               
                                     <MAbort- >SERR- <PERR-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at de00 [size=256]
        Region 2: Memory at fdeff000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at fdd00000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3h
                                                                               
                                    ot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Vital Product Data
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1
                                                                               
                                     Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] Express Endpoint IRQ 0
                Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, ExtTag+
                Device: Latency L0s <1us, L1 unlimited
                Device: AtnBtn+ AtnInd+ PwrInd+
                Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
                Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
                Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
                Link: Latency L0s unlimited, L1 unlimited
                Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
                Link: Speed 2.5Gb/s, Width x1
        Capabilities: [84] Vendor Specific Information
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [12c] Virtual Channel
        Capabilities: [148] Device Serial Number <snipped>
        Capabilities: [154] Power Budgeting
00: ec 10 68 81 07 00 10 00 01 00 00 02 10 00 00 00
10: 01 de 00 00 00 00 00 00 04 f0 ef fd 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 7b 14 73 10
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
Comment 27 Francois Romieu 2007-05-10 12:02:13 UTC
Roel, can you:
- try with/without NAPI
http://www.fr.zoreil.com/people/francois/misc/20070510-2.6.21.1-r8169-bz7924.patch
- attach lspci -vvx and .config
- describe which card does work correctly or which kernel did not exhibit this
  behavior.

-- 
Ueimor
Comment 28 Roel Teuwen 2007-05-11 02:14:10 UTC
Hello Francois,

I already tried with/without NAPI, it makes no difference. I'm currently running
without NAPI.
Everything works fine with a different card on the same infrastructure.

I've compiled the driver with the patch (several hunks applied with 1 line
offset) but can only try it this evening.
Comment 29 Roel Teuwen 2007-05-11 02:21:18 UTC
Created attachment 11479 [details]
lspci -vvx from 2.6.20
Comment 30 Roel Teuwen 2007-05-11 02:22:01 UTC
Created attachment 11480 [details]
.config for 2.6.21.1
Comment 31 Roel Teuwen 2007-05-11 02:27:45 UTC
This is a new machine, no kernel worked before. Though, I should note that with
2.6.21.1 I can very occasionally manage high speeds (12-20MB/s) for a few (10)
seconds, but this isn't easily repeatable. After those few seconds, it dies off
again and gets the bad 40kb/s speeds I'm seeing usually with older kernels. When
doing scp / sftp I can manage about 1mb/s with any kernel.

As a workaround I've now attached an usb2 100mbit adapter, which I can max out
using samba or sftp.

I've replaced every component in the network path, with the onboard ports the
speed remains bad, any other usb/pci card is working fine.
Comment 32 Roel Teuwen 2007-05-11 09:51:51 UTC
Rebooted with the patch applied, no change.
Comment 33 Roel Teuwen 2007-05-18 01:39:27 UTC
In case I wasn't clear enough. The machine that got replaced by this one had
working gigabit ethernet with a realtek addon card. Network infrastructure is
cat5e with gigabit switches.

Is there anything else I can test / check ?
Comment 34 Francois Romieu 2007-05-20 15:38:37 UTC
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> :
Roel.Teuwen@advalvas.be  2007-05-18 01:39:
> In case I wasn't clear enough. The machine that got replaced by this one had
> working gigabit ethernet with a realtek addon card. Network infrastructure is
> cat5e with gigabit switches.
> 
> Is there anything else I can test / check ?

Not much so far. A mii-tool -vv and the brand name of your motherboard
could help.

There are several different 8168 bugs. At least they really seem to
go along the 8168.

Comment 35 Roel Teuwen 2007-05-21 02:44:49 UTC
mii-tool -vv output for both r8168 interfaces (eth1 is not connected) :

Using SIOCGMIIPHY=0x8947
eth1: no link
  registers for MII PHY 32: 
    1000 7949 001c c912 0de1 0000 0004 2001
    0000 0300 0000 0000 1007 f880 0000 3000
    0060 4000 0000 0040 1060 0000 080d 2108
    2740 8c00 0040 4013 8409 8000 0123 0000
  product info: vendor 00:07:32, model 17 rev 2
  basic mode:   autonegotiation enabled
  basic status: no link
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
----
Using SIOCGMIIPHY=0x8947
eth2: negotiated 100baseTx-FD flow-control, link ok
  registers for MII PHY 32: 
    1000 796d 001c c912 0de1 cde1 000f 2001
    4780 0300 3800 0000 1007 f880 0000 3000
    0060 ac80 0000 6c42 1060 0000 441c 2108
    2740 8c00 0040 0106 097c 8000 0123 0000
  product info: vendor 00:07:32, model 17 rev 2
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

Curiously, "capabilities", "negotiated", etc seems wrong.

ethtool output :

Settings for eth2:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: g
        Current message level: 0x00000033 (51)
        Link detected: yes
-----
Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: Unknown! (0)
        Duplex: Half
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: g
        Current message level: 0x00000033 (51)
        Link detected: no

Motherboard is :

Base Board Information
        Manufacturer: http://www.abit.com.tw/
        Product Name: AB9/AB9RPO(Intel965+ICH8)
        Version: 1.x (BIOS:15)

I have the AB9Pro with the two onboard nics.
Comment 36 Francois Romieu 2007-05-27 15:30:54 UTC
Roel, can you try 2.6.22-rc3 with 
http://www.fr.zoreil.com/people/francois/misc/20070527-2.6.22-rc3-r8169.patch
and attach the output of 'ethtool -e eth1', 'ethtool -e eth2' ?

Thanks in advance.

Comment 37 Roel Teuwen 2007-05-29 10:03:14 UTC
tested -rc3 and the patch : same problems, getting 100kb/s now, and a peak of
16mb/s during one second somewhere 5 seconds after starting the transfer.

eth2 is cabled and configured, eth1 is down.

Even though eth1 is not cabled, eth1 shows 'link detected : yes"

 Settings for eth1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: Unknown! (0)
        Duplex: Half
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: g
        Current message level: 0x00000033 (51)
        Link detected: yes


# ethtool -e eth1
Offset          Values
------          ------
0x0000          a4 04 a4 04 a4 04 a4 04 b0 43 b0 43 b0 43 b0 43
0x0010          a0 05 a0 05 a0 05 a0 05 ec 51 ec 51 ec 51 ec 51
0x0020          cc 41 cc 41 cc 41 cc 41 10 04 10 04 10 04 10 04
0x0030          00 80 00 80 00 80 00 80 00 40 00 40 00 40 00 40
0x0040          34 5e 34 5e 34 5e 34 5e 00 a0 00 a0 00 a0 00 a0
0x0050          14 7c 14 7c 14 7c 14 7c 08 df 08 df 08 df 08 df
0x0060          08 01 08 01 08 01 08 01 8c fc 8c fc 8c fc 8c fc
0x0070          00 40 00 40 00 40 00 40 10 0c 10 0c 10 0c 10 0c
0x0080          a0 05 a0 05 a0 05 a0 05 b0 43 b0 43 b0 43 b0 43
0x0090          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00a0          fc ff fc ff fc ff fc ff fc ff fc ff fc ff fc ff
0x00b0          fc ff fc ff fc ff fc ff 7c 00 7c 00 7c 00 7c 00
0x00c0          00 1c 00 1c 00 1c 00 1c 5c fb 5c fb 5c fb 5c fb
0x00d0          40 c0 40 c0 40 c0 40 c0 c0 07 c0 07 c0 07 c0 07
0x00e0          fc 06 fc 06 fc 06 fc 06 00 00 00 00 00 00 00 00
0x00f0          80 01 80 01 80 01 80 01 00 c0 00 c0 00 c0 00 c0

# ethtool -e eth2
Offset          Values
------          ------
0x0000          a4 04 a4 04 a4 04 a4 04 b0 43 b0 43 b0 43 b0 43
0x0010          a0 05 a0 05 a0 05 a0 05 ec 51 ec 51 ec 51 ec 51
0x0020          cc 41 cc 41 cc 41 cc 41 10 04 10 04 10 04 10 04
0x0030          00 80 00 80 00 80 00 80 00 40 00 40 00 40 00 40
0x0040          34 5e 34 5e 34 5e 34 5e 00 a4 00 a4 00 a4 00 a4
0x0050          14 7c 14 7c 14 7c 14 7c 08 df 08 df 08 df 08 df
0x0060          08 01 08 01 08 01 08 01 8c fc 8c fc 8c fc 8c fc
0x0070          00 40 00 40 00 40 00 40 10 0c 10 0c 10 0c 10 0c
0x0080          a0 05 a0 05 a0 05 a0 05 b0 43 b0 43 b0 43 b0 43
0x0090          00 00 00 00 00 00 00 00 00 04 00 04 00 04 00 04
0x00a0          fc ff fc ff fc ff fc ff fc ff fc ff fc ff fc ff
0x00b0          fc ff fc ff fc ff fc ff 7c 00 7c 00 7c 00 7c 00
0x00c0          00 1c 00 1c 00 1c 00 1c 5c fb 5c fb 5c fb 5c fb
0x00d0          40 c0 40 c0 40 c0 40 c0 c0 07 c0 07 c0 07 c0 07
0x00e0          fc 06 fc 06 fc 06 fc 06 00 00 00 00 00 00 00 00
0x00f0          80 01 80 01 80 01 80 01 00 c0 00 c0 00 c0 00 c0
Comment 38 Roel Teuwen 2007-06-23 16:33:19 UTC
Ok, it seems I have found a way to have slow transfers and fast transfers completely repeatable now. Things are starting to get strange... it appears to depends on the file that I try to transfer. When transferring an ubunto .iso file, things are slow, when transferring a TV recording in mpeg, everything is fast.

With a different network card, both are fast.

Hope this helps somehow... :-/
Comment 39 Francois Romieu 2007-07-31 15:22:05 UTC
Created attachment 12216 [details]
PHY power-on change

Roel, can you try the attached patch on top of 2.6.23-rc1 (or above) ?

Thansk in advance.

-- 
Ueimor
Comment 40 Roel Teuwen 2007-08-01 12:47:56 UTC
No change in the symptoms, I'm afraid. Most files I tested transferred at 100KB/s, but I successfully transferred 1 file at high speed (50MB/s). Transferring the same file a second time is slow again, though.
Comment 41 Roel Teuwen 2007-09-10 10:45:54 UTC
Francois,

Excellent news. I'm now running 2.6.23-rc5-git1 with 20070903-2.6.23-rc5-r8169-test.patch applied on top, and the transfer speed is now always around 40MB/s
I will keep monitoring the status, but it seems the issue has been solved.

Best regards,

Roel
Comment 42 Francois Romieu 2007-09-10 13:03:22 UTC
Thanks for the news Roel.

Can you narrow the fix and check if patches #0001 and #0002 are enough ?

The patch kit is located at:
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.23-rc5/r8169-20070903/

Just to gain a little background:
- was this with NAPI enabled ?
- do the 40Mb/s stand in either direction ?

-- 
Ueimor
Comment 43 Roel Teuwen 2007-09-15 04:05:59 UTC
Speeds are ok in both directions.
NAPI has been disabled since the problems began.

I will perform some tests with and without NAPI, and with just 0001 and 0002 with and without NAPI as soon as I can reboot the machine.

I've not tested 2.6.23-rc5 vanilla without any of your patches, should I try that as well ?
Comment 44 Roel Teuwen 2007-09-15 04:55:25 UTC
Tests seem fine in both directions with or without NAPI with 0001 and 0002 applied.
Not tested without them.

rebooted between tests.

Best regards,

Roel