Kernel Bug Tracker – Bug 11062
r8169 works bad w/ 8168B NIC
Last modified: 2008-10-11 10:22:37 UTC
I'm having problems with NIC on my new box. If I'm turning all two NICs in BIOS - no chances (made about 20 reboots) that even one will work. If I'm turning only one on - it sometimes works (4 times for 20 reboots). Actually, it always comes on, but no traffic go through it. I have applied http://userweb.kernel.org/~romieu/r8169/2.6.26-rc6/20080701-r8169-test.patch which changed nothing - no one successfull boot w/ two NICs turned on and 5 successful boots per 20 reboots w/ only one NIC turned on.
I'll attach all possible info in one tarball (too much to copypaste here). There are two dirs (working and nonworking) with dmesg, ifconfig and interrupts and some common info (like kernel config, lshw, lspci which is common for both).
Also, rmmod r8169 && modprobe r8169 produces strange output in dmesg which is attched as rmmod-modprobe-result.
If it'll be necessary, I'll provide any other info required, will apply & test any patches provided and can discuss the problem via jabber.
Created attachment 16772 [details]
information & logs
Can you try 0003-r8169-avoid-thrashing-PCI-conf-space-above-RTL_GIGA.patch
from http://userweb.kernel.org/~romieu/r8169/2.6.26-rc9/20080710/ ?
... and please do not load nvidia's module before testing.
It looks that `nomsi` kernel option did the trick - two reboots and both successfull. I'll do several more reboots later on today (or tomorrow) with nomsi option. After this I'll do tests without this option and w/ your patch applied. BTW, should I apply other patches from patchset?
email@example.com 2008-07-10 18:04 :
> It looks that `nomsi` kernel option did the trick - two reboots and both
> successfull. I'll do several more reboots later on today (or tomorrow) with
> nomsi option. After this I'll do tests without this option and w/ your patch
> applied. BTW, should I apply other patches from patchset?
The single patch could be enough to make a difference.
I'll welcome both test reports though: with and without the whole serie.
Same problem. Actually 2 of them:
1. I'm using dualboot WinXP and Debian Etch. When rebooting from WinXP to Linux nic's led doesn't light and nic is not recognized. The only thing that helps is unplug power cord for >5 seconds.
2. In linux the nic sometimes becomes extremelly slow allowing transfer rate ~100kbits/s. It may become normal then, but I didn't found any solution for this.
Created attachment 16797 [details]
Output of lspci
Created attachment 16798 [details]
output of dmesg
When the problem #2 occur there is nothing about that in the logs.
10 reboots w/ 0003 patch applied - network is working pefectly (only one NIC). Will test w/ two NICs enabled later.
What about this fancy line from ifconfig:
RX packets:690 errors:0 dropped:2732265271 overruns:0 frame:0
Not that it's stealing my food, but it's kinda weird :)
The same result w/ full patchset! Great job!
Kostik (and others :o) ), can you try 2.6.27-rc1 and tell if the 'nomsi' option
is still needed or useful ? The suggested patch has made its way in mainline so
it is worth testing.
And please, please, please do not use closed-sources binary modules while testing.
Thanks for your help.
It's worked without the 'nomsi' option for a couple of boots now, using Fedora kernel 2.6.27-0.211.rc1.git3.fc10.x86_64. Sound has gone a bit weird but that's clearly a different problem...
No binary modules are loaded.
Still working with that kernel, though it seems more sluggish than it should be. For instance, when downloading new packages, an ssh session to a machine on the local network is much less responsive than it should be. As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=11062#c9, I see odd output in ifconfig - not sure whether it's real or a cosmetic problem:
RX packets:283836 errors:0 dropped:353062358544 overruns:0 frame:0
TX packets:160089 errors:0 dropped:0 overruns:0 carrier:0
Another odd thing is that there is a similar dropped packet figure for eth1, even though that's not connected to anything.
Adam Huffman 2008-08-04 15:42:43:
> I see odd output in ifconfig - not sure whether it's real or a cosmetic problem:
> RX packets:283836 errors:0 dropped:353062358544 overruns:0 frame:0
> TX packets:160089 errors:0 dropped:0 overruns:0 carrier:0
Can you try http://bugzilla.kernel.org/attachment.cgi?id=17345 against 2.6.27-rc ?
Have tried applying it to -rc4, but there are problems:
cat /home/adam/Kernel/romieu-rtl-patch | patch -p1 --dry-run
patching file drivers/net/r8169.c
Hunk #1 succeeded at 186 (offset -9 lines).
Hunk #2 FAILED at 209.
Hunk #3 succeeded at 2152 (offset -134 lines).
Hunk #4 FAILED at 2276.
Hunk #5 succeeded at 3052 (offset -135 lines).
Hunk #6 succeeded at 3076 (offset -134 lines).
Hunk #7 succeeded at 3197 (offset -135 lines).
Hunk #8 succeeded at 3223 (offset -134 lines).
2 out of 8 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej
Created attachment 17467 [details]
r8169: the RxMissed register exists in the 8169 only
Have booted into a patched rc4 a couple of times and the dropped packets count is similar i.e. the problem still seems to be there.
My five cents.
I have a Gigabyte EX38-DS4 board with two RTL8111B cards:
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
With kernel 184.108.40.206 I got big problems with line speed detection. I have the cards connected to a 100Mbit switch (tried both with a 46m cable and with a ~1.5m cable and with a ~30cm cable) and it most of the time will detect the line speed a 1 gigabit (checked with ethtool), and won't allow to switch it to any other speed (even if I modify the "advertise" option). The nomsi option did not make any difference.
After reading the above discussion I downloaded latest kernel (2.6.27rc5) and tried just the driver from there (compiled it manually with make -C /lib/modules/`uname -r`/build M=`pwd`/kernel-2.6.27/drivers/net modules) and replaced the stock r8169.ko. This helped with line speed autodetection, but still doesn't allow to change line speed manually (with ethtool). However, I don't care much about it as it works for me now :) And btw, "RX packets" is 0 for me too.
Created attachment 17582 [details]
r8169: the RxMissed register exists in the 8169 only
Adam, can you try this new version against 2.6.27-rc ?
Thanks for your help.
Just tried it and the large dropped packet count is still there:
eth0 Link encap:Ethernet HWaddr 00:1F:D0:20:BD:83
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:6419837104 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:123 Base address:0xe000
eth1 Link encap:Ethernet HWaddr 00:1F:D0:20:AD:92
inet addr:192.168.1.112 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21f:d0ff:fe20:ad92/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:86 errors:0 dropped:6420200250 overruns:0 frame:0
TX packets:88 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:67543 (65.9 KiB) TX bytes:9276 (9.0 KiB)
This is the same rc4 kernel, with the previous patch reverted and the new one applied.
Can you send your dmesg ?
Created attachment 17623 [details]
Adam, can you check that the patch was correctly applied and built ?
I would expect this bug to be fixed in exactly the same way as
The symptoms are the same, there is no RxMissed for either of your
or Hermann's 8168 and the XID should make them behave the same wrt
the test for the availability of RxMissed.
I can be wrong but I would really welcome that you check the patch
carefully against some recent 2.6.27-rc kernel.
Francois - have just applied the patch to a fresh -rc5 tree and it has worked - no spurious listing of dropped packets.
Thanks a lot for your time on this.