Bug 11062

Summary: r8169 works bad w/ 8168B NIC
Product: Drivers Reporter: Andrew Kirilenko (icedank)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: CLOSED CODE_FIX    
Severity: high CC: bloch, casper
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25 Subsystem:
Regression: --- Bisected commit-id:
Attachments: information & logs
Output of lspci
output of dmesg
r8169: the RxMissed register exists in the 8169 only
r8169: the RxMissed register exists in the 8169 only
dmesg

Description Andrew Kirilenko 2008-07-09 10:21:26 UTC
I'm having problems with NIC on my new box. If I'm turning all two NICs in BIOS - no chances (made about 20 reboots) that even one will work. If I'm turning only one on - it sometimes works (4 times for 20 reboots). Actually, it always comes on, but no traffic go through it. I have applied http://userweb.kernel.org/~romieu/r8169/2.6.26-rc6/20080701-r8169-test.patch which changed nothing - no one successfull boot w/ two NICs turned on and 5 successful boots per 20 reboots w/ only one NIC turned on.

I'll attach all possible info in one tarball (too much to copypaste here). There are two dirs (working and nonworking) with dmesg, ifconfig and interrupts and some common info (like kernel config, lshw, lspci which is common for both).

Also, rmmod r8169 && modprobe r8169 produces strange output in dmesg which is attched as rmmod-modprobe-result.

If it'll be necessary, I'll provide any other info required, will apply & test any patches provided and can discuss the problem via jabber.
Comment 1 Andrew Kirilenko 2008-07-09 10:22:06 UTC
Created attachment 16772 [details]
information & logs
Comment 2 Francois Romieu 2008-07-10 14:11:12 UTC
Can you try 0003-r8169-avoid-thrashing-PCI-conf-space-above-RTL_GIGA.patch
from http://userweb.kernel.org/~romieu/r8169/2.6.26-rc9/20080710/ ?

-- 
Ueimor
Comment 3 Francois Romieu 2008-07-10 14:13:25 UTC
... and please do not load nvidia's module before testing.

-- 
Ueimor
Comment 4 Andrew Kirilenko 2008-07-10 18:04:40 UTC
It looks that `nomsi` kernel option did the trick - two reboots and both successfull. I'll do several more reboots later on today (or tomorrow) with nomsi option. After this I'll do tests without this option and w/ your patch applied. BTW, should I apply other patches from patchset?
Comment 5 Francois Romieu 2008-07-10 23:50:01 UTC
icedank@gmail.com  2008-07-10 18:04 :
> It looks that `nomsi` kernel option did the trick - two reboots and both
> successfull. I'll do several more reboots later on today (or tomorrow) with
> nomsi option. After this I'll do tests without this option and w/ your patch
> applied. BTW, should I apply other patches from patchset?

The single patch could be enough to make a difference.

I'll welcome both test reports though: with and without the whole serie.
Comment 6 Kostik 2008-07-11 05:49:05 UTC
Same problem. Actually 2 of them:

1. I'm using dualboot WinXP and Debian Etch. When rebooting from WinXP to Linux nic's led doesn't light and nic is not recognized. The only thing that helps is unplug power cord for >5 seconds.

2. In linux the nic sometimes becomes extremelly slow allowing transfer rate ~100kbits/s. It may become normal then, but I didn't found any solution for this.
Comment 7 Kostik 2008-07-11 05:53:16 UTC
Created attachment 16797 [details]
Output of lspci
Comment 8 Kostik 2008-07-11 05:55:31 UTC
Created attachment 16798 [details]
output of dmesg

When the problem #2 occur there is nothing about that in the logs.
Comment 9 Andrew Kirilenko 2008-07-11 09:29:30 UTC
10 reboots w/ 0003 patch applied - network is working pefectly (only one NIC). Will test w/ two NICs enabled later.

What about this fancy line from ifconfig:

RX packets:690 errors:0 dropped:2732265271 overruns:0 frame:0

Not that it's stealing my food, but it's kinda weird :)
Comment 10 Andrew Kirilenko 2008-07-11 14:41:59 UTC
The same result w/ full patchset! Great job!
Comment 11 Francois Romieu 2008-08-02 08:15:19 UTC
Kostik (and others :o) ), can you try 2.6.27-rc1 and tell if the 'nomsi' option
is still needed or useful ? The suggested patch has made its way in mainline so
it is worth testing.

And please, please, please do not use closed-sources binary modules while testing. 

Thanks for your help.

-- 
Ueimor
Comment 12 Adam Huffman 2008-08-03 03:11:16 UTC
It's worked without the 'nomsi' option for a couple of boots now, using Fedora kernel 2.6.27-0.211.rc1.git3.fc10.x86_64.  Sound has gone a bit weird but that's clearly a different problem...

No binary modules are loaded.
Comment 13 Adam Huffman 2008-08-04 15:42:43 UTC
Still working with that kernel, though it seems more sluggish than it should be.  For instance, when downloading new packages, an ssh session to a machine on the local network is much less responsive than it should be.  As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=11062#c9, I see odd output in ifconfig - not sure whether it's real or a cosmetic problem:

RX packets:283836 errors:0 dropped:353062358544 overruns:0 frame:0
          TX packets:160089 errors:0 dropped:0 overruns:0 carrier:0

Another odd thing is that there is a similar dropped packet figure for eth1, even though that's not connected to anything.
Comment 14 Francois Romieu 2008-08-20 14:56:08 UTC
Adam Huffman  2008-08-04 15:42:43:
[...]
> I see odd output in ifconfig - not sure whether it's real or a cosmetic
> problem:
>
> RX packets:283836 errors:0 dropped:353062358544 overruns:0 frame:0
>           TX packets:160089 errors:0 dropped:0 overruns:0 carrier:0

Can you try http://bugzilla.kernel.org/attachment.cgi?id=17345 against 2.6.27-rc ?

-- 
Ueimor
Comment 15 Adam Huffman 2008-08-25 05:37:18 UTC
Have tried applying it to -rc4, but there are problems:

cat /home/adam/Kernel/romieu-rtl-patch | patch -p1 --dry-run
patching file drivers/net/r8169.c
Hunk #1 succeeded at 186 (offset -9 lines).
Hunk #2 FAILED at 209.
Hunk #3 succeeded at 2152 (offset -134 lines).
Hunk #4 FAILED at 2276.
Hunk #5 succeeded at 3052 (offset -135 lines).
Hunk #6 succeeded at 3076 (offset -134 lines).
Hunk #7 succeeded at 3197 (offset -135 lines).
Hunk #8 succeeded at 3223 (offset -134 lines).
2 out of 8 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej
Comment 16 Francois Romieu 2008-08-26 12:39:23 UTC
Created attachment 17467 [details]
r8169: the RxMissed register exists in the 8169 only
Comment 17 Adam Huffman 2008-09-01 16:31:50 UTC
Have booted into a patched rc4 a couple of times and the dropped packets count is similar i.e. the problem still seems to be there.
Comment 18 Andrew Zabolotny 2008-09-02 05:39:40 UTC
My five cents.

I have a Gigabyte EX38-DS4 board with two RTL8111B cards:

05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

With kernel 2.6.25.14 I got big problems with line speed detection. I have the cards connected to a 100Mbit switch (tried both with a 46m cable and with a ~1.5m cable and with a ~30cm cable) and it most of the time will detect the line speed a 1 gigabit (checked with ethtool), and won't allow to switch it to any other speed (even if I modify the "advertise" option). The nomsi option did not make any difference.

After reading the above discussion I downloaded latest kernel (2.6.27rc5) and tried just the driver from there (compiled it manually with make -C /lib/modules/`uname -r`/build M=`pwd`/kernel-2.6.27/drivers/net modules) and replaced the stock r8169.ko. This helped with line speed autodetection, but still doesn't allow to change line speed manually (with ethtool). However, I don't care much about it as it works for me now :) And btw, "RX packets" is 0 for me too.
Comment 19 Francois Romieu 2008-09-02 13:45:03 UTC
Created attachment 17582 [details]
r8169: the RxMissed register exists in the 8169 only

Adam, can you try this new version against 2.6.27-rc ?

Thanks for your help.

-- 
Ueimor
Comment 20 Adam Huffman 2008-09-04 14:23:07 UTC
Just tried it and the large dropped packet count is still there:

eth0      Link encap:Ethernet  HWaddr 00:1F:D0:20:BD:83  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:6419837104 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:123 Base address:0xe000 

eth1      Link encap:Ethernet  HWaddr 00:1F:D0:20:AD:92  
          inet addr:192.168.1.112  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21f:d0ff:fe20:ad92/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:86 errors:0 dropped:6420200250 overruns:0 frame:0
          TX packets:88 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:67543 (65.9 KiB)  TX bytes:9276 (9.0 KiB)
          Interrupt:122 

This is the same rc4 kernel, with the previous patch reverted and the new one applied.
Comment 21 Francois Romieu 2008-09-04 14:30:53 UTC
Can you send your dmesg ?

-- 
Ueimor
Comment 22 Adam Huffman 2008-09-04 14:35:56 UTC
Created attachment 17623 [details]
dmesg
Comment 23 Francois Romieu 2008-09-09 15:28:54 UTC
Adam, can you check that the patch was correctly applied and built ?

I would expect this bug to be fixed in exactly the same way as
http://bugzilla.kernel.org/show_bug.cgi?id=10180

The symptoms are the same, there is no RxMissed for either of your
or Hermann's 8168 and the XID should make them behave the same wrt
the test for the availability of RxMissed.

I can be wrong but I would really welcome that you check the patch
carefully against some recent 2.6.27-rc kernel.

-- 
Ueimor
Comment 24 Adam Huffman 2008-09-09 16:21:39 UTC
Francois - have just applied the patch to a fresh -rc5 tree and it has worked - no spurious listing of dropped packets.

Thanks a lot for your time on this.

Adam