Most recent kernel where this bug did not occur: This is recently bought hardware, and I haven't found an older kernel where this bug did not occur. Distribution: Debian unstable Hardware Environment: nforce2 chipset, sata_sil/r8169 combo pci card Software Environment: wget, scp Problem Description: The network card seems pretty stable and functional at low speeds. But as soon as I transfer things at relatively higher speeds (> 10MB/sec) it locks up. CPU intensive transfers (like scp) will usually lock it up faster than wget, but given a large enough transfer (1GB) it will lock up with wget too. When it locks up, it locks up hard - keyboard lights don't work etc. I include some pointers of some previous discussion on the netdev mailinglist: http://marc.theaimsgroup.com/?l=linux-netdev&m=114986904805281&w=2 http://marc.theaimsgroup.com/?l=linux-netdev&m=115010829624722&w=2 http://marc.theaimsgroup.com/?l=linux-netdev&m=115065165514318&w=2 Steps to reproduce: Transfer a 1GB file with wget or scp at 100MBit or Gigabit speeds.
Created attachment 8517 [details] current kernel .config
Created attachment 8518 [details] full dmesg log
Created attachment 8519 [details] full lspci output
As an additional data point: * the r1000 driver from Realtek has the same issue * windows 2000 and its driver are perfectly stable
I'm seeing this as well, except with a Netgear GA311 card in PCI slot 3 of an Abit VT7 motherboard. Found this bug searching around before I do a kernel upgrade, looking to see if anything sounds familiar. ------------------- storage:/usr/share# uname -a Linux storage 2.6.8-2-386 #1 Tue Aug 16 12:46:35 UTC 2005 i686 GNU/Linux storage:/usr/share# lspci 0000:00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0258 0000:00:00.1 Host bridge: VIA Technologies, Inc.: Unknown device 1258 0000:00:00.2 Host bridge: VIA Technologies, Inc.: Unknown device 2258 0000:00:00.3 Host bridge: VIA Technologies, Inc.: Unknown device 3258 0000:00:00.4 Host bridge: VIA Technologies, Inc.: Unknown device 4258 0000:00:00.7 Host bridge: VIA Technologies, Inc.: Unknown device 7258 0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge 0000:00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 0000:00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Mast er IDE (rev 06) 0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 0000:00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) 0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South] 0000:01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1) storage:/usr/share# dmesg .... r8169 Gigabit Ethernet driver 1.2 loaded ACPI: PCI interrupt 0000:00:0a.0[A] -> GSI 18 (level, low) -> IRQ 169 eth0: Identified chip type is 'RTL8169s/8110s'. eth0: RTL8169 at 0xe0820000, 00:14:6c:c1:b2:07, IRQ 169 eth0: Auto-negotiation Enabled. eth0: 1000Mbps Full-duplex operation. ....
Please give the upcoming 2.6.20-rc1 a try. -- Ueimor
I see this too, with Linus' tree as of now, 2.6.20rc2-git (29th Dec). To trigger the bug I did an "scp -pr remote:hugefiles/ .". I was expecting the crash, so I let it work for a few minutes. I then decided I'd browse the web while the copying was underway. As soon as the browser window had been restored (un-minimized), the system froze. I'll hook up a serial port debugging cable later today and do some more testing. The motherboard (MSI K9A Platinum) has two ports (identifying as different chips), for this test the first one below was used. CPU: model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ # uname -a Linux x 2.6.20-rc2-git #0 SMP Fri Dec 29 03:54:00 EET 2006 x86_64 GNU/Linux # lspci -nn -vvv ... 02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 01) Subsystem: Micro-Star International Co., Ltd. Unknown device [1462:280c] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 18 Region 0: I/O ports at a800 [size=256] Region 2: Memory at fe9ff000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at fe9c0000 [disabled] [size=128K] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [60] Express Endpoint IRQ 0 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, ExtTag+ Device: Latency L0s <1us, L1 unlimited Device: AtnBtn+ AtnInd+ PwrInd+ Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 128 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x1 Capabilities: [84] Vendor Specific Information 03:03.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169SC Gigabit Ethernet [10ec:8167] (rev 10) Subsystem: Micro-Star International Co., Ltd. Unknown device [1462:280c] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (8000ns min, 16000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 21 Region 0: I/O ports at b800 [size=256] Region 1: Memory at feaff400 (32-bit, non-prefetchable) [size=256] Expansion ROM at dfe00000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- # dmesg .... r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 18 (level, low) -> IRQ 18 PCI: Setting latency timer of device 0000:02:00.0 to 64 eth0: RTL8168b/8111b at 0xffffc20000042000, 00:16:17:9b:26:ca, IRQ 18 r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 21 (level, low) -> IRQ 21 eth1: RTL8169sc/8110sc at 0xffffc20000044400, 00:16:17:9b:26:cb, IRQ 21 ... r8169: eth0: link up r8169: eth0: link up ... eth0: no IPv6 routers present
Created attachment 9961 [details] Fix a performance regression on plain 8169
Leonard, can you send your .config and full dmesg ? Please add the patch above to your 2.6.20-rc2. It will (almost surely) not fix your problem but the driver is wrong without it. -- Ueimor
Francois, Good news - in further tests, the driver passes with a clean record. The machine kept freezing, but finally I shut down X11 and did the tests from the consoles. Then it was no problem at all to transfer more than 35 GB with two simultaneous "scp -pr" commands on a completely saturated 100 Mbps link to two other (much slower machines) for a continuous link speed of about 10.8 MB/s, if scp:s numbers for the huge files are to be believed. Further support that the driver is ok is that 1) in the console, I was able to watch TV using aatv(1) without problems, while under X11 it would crash within 2 -3 seconds, most often immediately and that 2) glxgears would likewise crash very fast. All in all, it looks like the gfx card is to blame (a brand new, previously unproven one), not the r8169 driver. I will do further tests tonight, with a crossover cable between the two ports on the motherboard. Unfortunately it's just a CAT-5 cable, so I probably won't be able to reach gigabit speeds. I will also test your latest patch then and post a final note here.
Francois, More good news: here are the results of round 2 of my tests at 100 Mbps (I don't have 1000 Mbps hardware at hand). Test setup 1: three machines a, b and c. Machine b has two realtek ports (b1 and b2), a and c have other makes. I set up two simultaneous nc pipes a>b1>c and a<b2<c, each starting with "cat knoppix.iso|" and ending in "|md5sum", comparing the sums. The iso image was 695 MeBi. The test was repeated three times. Result: all ok. Test setup 2: The same machines, but this time there are four pipes, set up as follows: a<b1 and a>b1, b2<c and b2>c, all four cat:ing the same image and md5:ing. The test was repeated three times. Result: all ok. Both tests were done using git master as of today on machine b, with b running with no X11 running (as my machine crashes then).
I'll verify and try to confirm with the above patch on rc2, compiling as I type this...
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> : [...] > I'll verify and try to confirm with the above patch on rc2, compiling > as I type this... Even if it works outside of X, please, please, take the time to attach a complete dmesg, a .config and an 'lspci -vvx'. It helps to find bug patterns.
Francois: [...] The previous message was intended for Leonard, sorry. -- Ueimor
I just got back from newyear's festivities, and had a kernel waiting for me to try out. I just did, and am happy to report that sending a 200MB and 500MB file with scp seemed to have worked flawlessly. I'll do some further testing with a clearer head, but it seems to look good for the time being. I'll add further comments later. Happy New Year, by the way ;)
For lspci/config/dmesg files, please see my attachments in Bug 7759, which is for the same box.
Does the current kernel fix the issue for everybody ? I'd welcome a datapoint before publishing new stuff. -- Ueimor
Just tested on 2.6.20 - and sad to report I still have the same issue. I don't know what changed since rc2; I really thought we had a winner then. Maybe I just got lucky that night. I triggered the bug now by booting in single mode, and transferring a big file using scp. (It doesn't trigger immediately, I had to try twice with files >500MB before it locked up). I tried with CONFIG_R8169_NAPI set and unset. I'm still going to try with CONFIG_R8169_VLAN unset, just in case (I seem to remember with my rc2 config that they were both unset) I'll keep you posted.
r8169 consistently hangs on high loads for me as well. Test setup: Two identical servers, running 'iperf -s' on one and 'iperf -c <IP>' on the other. Result: No traffic gets through. ifconfig reports: RX packets:33 errors:0 dropped:20 overruns:0 frame:20. We're running Ubuntu Edgy on Opteron 1218 (SMP). # uname -a Linux dub 2.6.20.4 #1 SMP Wed Mar 28 22:42:31 CEST 2007 x86_64 GNU/Linux
Created attachment 10986 [details] lspci -nn -vvv
Created attachment 10987 [details] dmesg
I compiled a kernel with '#define RTL8169_DEBUG 1' in r8169.c to get more debugging information. However, I see no extra information in the logs. Feel free to instruct me how to provide additional debugging information.
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> : [...] > I compiled a kernel with '#define RTL8169_DEBUG 1' in r8169.c to get more > debugging information. However, I see no extra information in the logs. Feel > free to instruct me how to provide additional debugging information. Can you give a try to the patchkit available at: http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.21-rc4/r8169-20070316
FYI: Patching against 2.6.21-rc4: # for f in ../patches/r8169/*.patch; patch -p1 --input=$f patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c patching file drivers/net/r8169.c Reversed (or previously applied) patch detected! Assume -R? [n] Apply anyway? [n] y Hunk #1 FAILED at 250. Hunk #2 FAILED at 2518. 2 out of 2 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej patching file drivers/net/r8169.c patching file drivers/net/r8169.c I will try compiling anyway.
christian@rishoj.net: > FYI: Patching against 2.6.21-rc4: > > # for f in ../patches/r8169/*.patch; patch -p1 --input=$f Please echo the name of the patch. The serie should contain 13 patches.
# for f in ../patches/r8169/*.patch; do echo "Applying $f"; patch -p1 --input=$f; doneApplying ../patches/r8169/0001-r8169-fix-suspend-resume-for-down-interface.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0002-r8169-add-per-device-hw_start-handler-1-2.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0003-r8169-add-per-device-hw_start-handler-2-2.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0004-r8169-merge-with-version-6.001.00-of-Realtek-s-r8169-driver.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0005-r8169-merge-with-version-8.001.00-of-Realtek-s-r8168-driver.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0006-r8169-confusion-between-hardware-and-IP-header-alignment.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0007-r8169-small-8101-comment.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0008-r8169-remove-the-media-option.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0009-r8169-cleanup.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0010-r8169-MSI-support.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0011-r8169-add-bit-description-for-the-TxPoll-register.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0011-r8169.c-add-bit-description-for-the-TxPoll-register.patch patching file drivers/net/r8169.c Reversed (or previously applied) patch detected! Assume -R? [n] Apply anyway? [n] Skipping patch. 2 out of 2 hunks ignored -- saving rejects to file drivers/net/r8169.c.rej Applying ../patches/r8169/0012-r8169-align-the-IP-header-when-there-is-no-DMA-constraint.patch patching file drivers/net/r8169.c Applying ../patches/r8169/0013-r8169-mac-address-change-support.patch patching file drivers/net/r8169.c Turns out patch 0011 was there twice, though not in the series file. I suppose I ought to learn using quilt. Ignoring one of the duplicates, the series applies. Compiling now...
Perfect! After applying the patchset: % iperf -i1 -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 10.198.56.22 port 5001 connected with 10.198.56.23 port 48526 [ 4] 0.0- 1.0 sec 101 MBytes 847 Mbits/sec [ 4] 1.0- 2.0 sec 102 MBytes 855 Mbits/sec [ 4] 2.0- 3.0 sec 102 MBytes 855 Mbits/sec [ 4] 3.0- 4.0 sec 102 MBytes 855 Mbits/sec [ 4] 4.0- 5.0 sec 102 MBytes 855 Mbits/sec [ 4] 5.0- 6.0 sec 102 MBytes 855 Mbits/sec [ 4] 6.0- 7.0 sec 102 MBytes 855 Mbits/sec [ 4] 7.0- 8.0 sec 102 MBytes 855 Mbits/sec [ 4] 8.0- 9.0 sec 102 MBytes 855 Mbits/sec [ 4] 9.0-10.0 sec 102 MBytes 855 Mbits/sec [ 4] 0.0-10.0 sec 1019 MBytes 854 Mbits/sec % ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:18:E7:16:04:7C inet addr:10.198.56.22 Bcast:10.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::218:e7ff:fe16:47c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:7200 Metric:1 RX packets:150238 errors:0 dropped:0 overruns:0 frame:0 TX packets:75161 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1078873520 (1.0 GiB) TX bytes:4960790 (4.7 MiB) Interrupt:11 Base address:0x6c00 This is much appreciated. Any idea when this patchset will make it into the kernel?
Created attachment 11023 [details] r8169 oops on AMD64 machine I have been following this bug, thinking it might be related to my problem, see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=231269 recap: with recent kernels I couldn't even ifup my r8169 card. So I tried the above patchkit set on the most recent fedora development kernel - and - voila, no panic. However, I then proceeded to try the patchkit on the most recent kernel from here, ie 2.6.21-rc5-git7 (using the fedora devel config) and the symptoms are just like before. Panic as soon as the device is configured (ifup eth0). The module insmod's fine otherwise. The kernel dump is attached. I hope this is not totally unrelated.
bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> : [...] > So I tried the above patchkit set on the most recent fedora development kernel > - and - voila, no panic. Ok. > However, I then proceeded to try the patchkit on the most recent kernel from > here, ie 2.6.21-rc5-git7 (using the fedora devel config) and the symptoms are > just like before. Panic as soon as the device is configured (ifup eth0). The > module insmod's fine otherwise. Ok. > The kernel dump is attached. > I hope this is not totally unrelated. Please try against latest 2.6.21-rcX-git_of_the_day the patchkit available at: http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.21-rc5/r8169-20070403
success. I have tried the patch on the above mentioned 2.6.21-rc5-git7 and also on the latest 2.6.21-rc5-git9. I have done various stress test including a script that repeatedly ran modprobe r8169 ; ifup eth0 ; ifdown eth0 ; rmmod r8169. No problems so far. Thanks
Mourad, will you be kind enough to give 2.6.23-rc1 a try when it goes out ? -- Ueimor
OK, I will.
I just tried with 2.6.23-rc1, and it seemed to work... at first. I booted in single user mode, and transferred 1GB of data to another machine - twice. This succeeded, however I made the observation that it only transferred at 100Mbit speed - a quick check with mii-tool confirmed this: capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD ... no gigabit speed available. Just to make sure the stability problems were fixed, I started using the machine like I normally do - I wanted to see if it stayed stable after a day or so of "normal" use. Sadly enough, it locked up within 5 minutes, while finishing copying a file (using Nautilus) to a (cifs) network share. As usual, it became completely unresponsive, no mouse movement, no capslock, no SysRq, no console switching, no network login. However, there was incessant harddisk activity, like it was trashing continuously. So basically: - gigabit speeds don't work - freezes still happen, albeit a lot less quickly than it used to. I managed to transfer quite a lot of data before it locked up. When it finally did freeze, there seemed to be a lot of harddisk activity (swapping?).
1. CIFS == user space smbd or in kernel cifs support ? It may make sense to monitor the swap/mem activity with 'vmstat 1' during the file copy. 2. Sorry for the gigabit regression :o/ Can you send the output of 'mii-tool -vv eth0' for an an old kernel and for 2.6.23-rc1 ? -- Ueimor
1. in kernel cifs support. I tried monitoring with vmstat 1 when I did the 2x1GB transfers, but of course it only seems to happen when I'm not monitoring... I'll see if I can get some vmstat output 2. To be fair, I'm not sure it actually is a regression - the oldest kernel I have around is 2.6.18 and that one's even worse; I remember seeing a link at 1000Mbit when I first reported it (was it 2.6.16?), and I know the hardware is supposed to be able to do it. You can see in my old email here that I believed I was running at 1000Mbit: http://marc.theaimsgroup.com/?l=linux-netdev&m=115010829624722&w=2 However, I just noticed there's a discrepancy between what mii-tool reports and what ethtool reports: one says I have link at 1000Mbit, the other tells me I'm at 100Mbit. Probably also the reason why I thought 2.6.16 was running at 1000Mbit - it probably never did? To illustrate, with 2.6.22: mii-tool: Using SIOCGMIIPHY=0x8947 eth1: negotiated 100baseTx-FD flow-control, link ok registers for MII PHY 32: 1000 796d 001c c910 0de1 cde1 000d 2001 40bd 0300 7800 1000 1007 f880 0000 3000 0060 acc0 0000 0000 0060 0000 ef84 0108 2740 6789 0000 010e 0990 0000 0000 98e0 product info: vendor 00:07:32, model 17 rev 0 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control ethtool: Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000033 (51) Link detected: yes For completeness sake, here's mii-tool output for the other kernels: 2.6.18: Using SIOCGMIIPHY=0x8947 eth1: 10 Mbit, half duplex, link ok registers for MII PHY 32: 0000 794d 001c c910 0de1 0020 0004 2001 0000 0300 0000 1000 1007 f880 0000 3000 0060 0c40 0000 0440 0060 0000 009a 0108 2740 6669 0000 8000 8400 0000 0000 48b0 product info: vendor 00:07:32, model 17 rev 0 basic mode: 10 Mbit, half duplex basic status: link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 10baseT-HD 2.6.23-rc1: Using SIOCGMIIPHY=0x8947 eth1: negotiated 100baseTx-FD flow-control, link ok registers for MII PHY 32: 1000 796d 001c c910 0de1 cde1 000d 2001 4680 0300 3800 1000 1007 f880 0000 3000 0060 acc0 0000 0000 0060 0000 ef84 0108 2740 6669 0000 010f 0910 0000 0000 98e0 product info: vendor 00:07:32, model 17 rev 0 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
Mourad, there have been several changes in the r8169 driver from 2.6.23-rc1 to 2.6.23. May I ask you to give 2.6.23 a try ? Thanks in advance. -- Ueimor
Ping. -- Ueimor
Hi, sorry I didn't get back to you sooner. With 2.6.23 I can still get a complete freeze with that card. It's just a matter of sending enough data (SCP'ing a couple of GB usually does it). I'm seriously starting to wonder whether this could be a hardware issue after all. Like I said in the beginning, I did test it in Windows and it seemed perfectly stable, but it could have been a (un)lucky fluke. If there's any other way I could try to figure out whether this is a hardware issue, let me know (I'd prefer not to have to install Windows again, but I'll do so if really needed) I've stopped using this network card obviously, but I don't mind continuing to plug it in and test new kernel versions now and again. I also won't mind if you'd prefer to close this bug, if you're satisfied this is most likely a hardware issue.
Hi all. I think that I have almost same issue. My machine is MSI M673 laptop. It has followed Ethernet card: 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) Subsystem: Micro-Star International Co., Ltd. Unknown device 3fdf Flags: bus master, fast devsel, latency 0, IRQ 18 I/O ports at b800 [size=256] Memory at f8cff000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at f8cc0000 [disabled] [size=128K] Capabilities: [40] Power Management version 2 Capabilities: [48] Vital Product Data <?> Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable- Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [84] Vendor Specific Information <?> Kernel driver in use: r8169 Kernel modules: r8169 It's connected to 100Mbit D-Link ethernet switch. Card just "freezes" when trasfer rate is too hight with no messages in syslog. After this I need to reconfigure network interface (ifdown lan0 && ifup lan0) to make it works again. My distro is debian unstable with self-build 2.6.24 kernel. dmesg and some other useful info about laptop is available at http://inhex.net/dion/lj/m673/ If issue is not same, I will open new bug
Same driver, same problem: 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E PCI Express Fast Ethernet controller (rev 01) Subsystem: Toshiba America Info Systems Unknown device ff00 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 220 Region 0: I/O ports at 4000 [size=256] Region 2: Memory at da000000 (64-bit, non-prefetchable) [size=4K] [virtual] Expansion ROM at d4000000 [disabled] [size=64K] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+ Address: 00000000fee0100c Data: 41e9 Capabilities: [60] Express Endpoint IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+ Device: Latency L0s <1us, L1 unlimited Device: AtnBtn+ AtnInd+ PwrInd+ Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x1 Capabilities: [84] Vendor Specific Information Capabilities: [100] Advanced Error Reporting Capabilities: [12c] Virtual Channel Capabilities: [148] Device Serial Number xx-xx-xx-xx-xx-xx-xx-xx Capabilities: [154] Power Budgeting That's the build-in network chip of a Toshiba Sattelite A110-178. I've seen this for ages now with more or less recent kernels (Linus' git). When I do this on console I get this: CPU 1: Machine Check Exception 000000000005 Bank0: b200004000000800 Bank5: b200120020080400 It seems to me that this is easier to reproduce if the receiver is slower than me, e.g. sending stuff to my PentiumI at 10 MBit/s even froze if I limit the transfer to something like 50kByte/s. I have NAPI enabled an thought this fixed it but I still have this issues when I copy larger files. When I saw this using scp from console I had the effect that suddenly the transfer rate dropped and within seconds the system froze.
Is the behavior the same with: - 2.6.27-rc7 - 2.6.27-rc6 + http://userweb.kernel.org/~romieu/r8169/2.6.27-rc6/20080913-r8169-test.patch There are enough changes in the r8169 driver for it to deserve a try. Please note that the 8168 (Dmitry) and the 8101 (Rolf) will not necessarily behave the same. Actually, one can expect differences as soon as the XID displayed by the r8169 driver in the kernel log (since 2.6.23) are not the same. -- Ueimor
I'm on 2.6.27-rc7-git now and was not able to reproduce this until now.
Freeze is still there but looks like it is harder to hit. Or I just had luck.
Looks like works for me with 2.6.27.x kernels. At least I can't reproduce it for now.
I tried Linus tree from 2009-02-20 (that's basically 2.6.29-rc6 when looking at net drivers) and still got this.
I had the same problem with a "Intel® D945GCLF2 inkl. Intel® Atom 330" mainboard. It comes with a r8168b Gigabit NIC on board. Archlinux tried to use the r8169 kernel module, but at high transfer rates, the NIC freezed. It did not respond to ping, nore was i able to ping other computers from that server. After several minutes it automagicaly worked again, but only at low transfer speeds. I tried the kernel module for the r8168 from the realtek homepage. You have to fix some defines to get it to compile with the 2.6.29 kernel, but than it compiles and works. To check if its working, i transfered about 320 GB from my Desktop to the Server running that r8168 module from realtek doing 118 MB/s avg (RAID 0 in Desktop, RAID10 on Server) No freezing, no locks.
Just like Florian, I use an Intel Essential Series D945GCLF2 Board with Realtek RTL8111/8168B NIC, and I'm experiencing the same problems with the Module "/lib64/modules/2.6.29-gentoo-r2/kernel/drivers/net/r8169.ko". The NIC is currently attached to a 100 Mbit hub, and when large amounts of data are transferred simultaneously inbound and outbound, I see transmit timeouts: ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0xcd/0x16f() Hardware name: NETDEV WATCHDOG: eth0 (r8169): transmit timed out Modules linked in: nfsd lockd nfs_acl sunrpc exportfs smsc47m1 smsc47m192 hwmon_vid ehci_hcd uhci_hcd i2c_i801 usbcore Pid: 0, comm: swapper Tainted: G W 2.6.29-gentoo-r2 #3 Call Trace: <IRQ> [<ffffffff8103b483>] warn_slowpath+0xd3/0x10f [<ffffffff811763b9>] ? cpumask_next_and+0x2b/0x3c [<ffffffff81032005>] ? enqueue_task_fair+0x25/0x92 [<ffffffff8102efcf>] ? enqueue_task+0x50/0x5b [<ffffffff8102f0cc>] ? activate_task+0x28/0x31 [<ffffffff81035a1b>] ? try_to_wake_up+0x255/0x267 [<ffffffff81035a3a>] ? default_wake_function+0xd/0xf [<ffffffff812bf4cd>] ? dev_watchdog+0x0/0x16f [<ffffffff8102f5f1>] ? __wake_up_common+0x46/0x75 [<ffffffff812bf49d>] ? netif_tx_lock+0x48/0x78 [<ffffffff812bf4cd>] ? dev_watchdog+0x0/0x16f [<ffffffff812bf59a>] dev_watchdog+0xcd/0x16f [<ffffffff81043e60>] run_timer_softirq+0x18b/0x200 [<ffffffff81057296>] ? clockevents_program_event+0x77/0x80 [<ffffffff810403be>] __do_softirq+0x83/0x121 [<ffffffff8100d2bc>] call_softirq+0x1c/0x28 [<ffffffff8100e1d4>] do_softirq+0x34/0x76 [<ffffffff81040154>] irq_exit+0x3f/0x79 [<ffffffff8101bd07>] smp_apic_timer_interrupt+0x93/0xac [<ffffffff8100ccf3>] apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff8101219e>] ? mwait_idle+0x6e/0x73 [<ffffffff8100b244>] ? enter_idle+0x22/0x24 [<ffffffff8100b298>] ? cpu_idle+0x52/0x93 [<ffffffff8131832f>] ? start_secondary+0x175/0x17a ---[ end trace f425effd8183898b ]--- r8169: eth0: link up I downloaded Realtek drivers from http://152.104.125.41/downloads/downloadsView.aspx?Langid=1&PNid=5&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false#2 but the source won't compile out of the box. Florian, could you please tell me what modifiations you made? Thanks!
Created attachment 21186 [details] Patch to get r8168 Realtek module to compile cleanly with 2.6.29 kernel Ok, i added the patch. Just apply it, and it should compile with 2.6.29 kernel. There are two warnings (unused variable and foo defined but not used) which you can ignore.
> Ok, i added the patch. Just apply it, and it should compile with 2.6.29 > kernel. Yes, it compiles OK and the NIC works with the new module. Thank you! I'll perform some load testing to see how it behaves. For the record, I now use the following modules on my machine: # lsmod Module Size Used by smsc47m1 10168 0 smsc47m192 15288 0 hwmon_vid 2616 1 smsc47m192 hwmon 2648 2 smsc47m1,smsc47m192 af_packet 14216 2 nfsd 100520 13 lockd 67044 1 nfsd nfs_acl 2936 1 nfsd sunrpc 179112 10 nfsd,lockd,nfs_acl exportfs 4200 1 nfsd ehci_hcd 48400 0 uhci_hcd 31576 0 r8168 40296 0 usbcore 154704 3 ehci_hcd,uhci_hcd iTCO_wdt 12352 0 i2c_i801 9364 0 iTCO_vendor_support 3356 1 iTCO_wdt bitrev 1960 1 r8168 crc32 3960 1 r8168
The "r8168" module works fine here. Is there a chance to add this module to the Linux Kernel sources?
This ought to be fixed in 2.6.30. Can you give it a try ? -- Ueimor
I've built a kernel "2.6.30-gentoo-r1" with the r8169 module, but I did not yet have the opportunity to do a network stress test. I hope to find the required time during the next weekend.
I performed some tests today, and so far I have not experienced any failures with Kernel 2.6.30-gentoo-r1 and the r8169 NIC driver module. Nice work, Francois.
I do little work. Many people contribute. Thanks for your patience. -- Ueimor