Ben Hutchings suggested me to make a separate report. Here it is. More info (kernel logs, kernel oops and so on) can be found at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528362 r8169 hangs usually under heavily loads, although it hangs too while only reading email. The error message always says transmit timed out and refers to dev_watchdog [ 1388.000145] ------------[ cut here ]------------ [ 1388.000154] WARNING: at /build/buildd-linux-2.6_2.6.29-4-i386-vcILAN/linux-2.6-2.6.29/debian/build/source_i386_none/net/sched/sch_generic.c:226 dev_watchdog+0xa8/0x13b() [ 1388.000163] Hardware name: Satellite A110 [ 1388.000167] NETDEV WATCHDOG: eth0 (r8169): transmit timed out [ 1388.000172] Modules linked in: i915 drm i2c_algo_bit binfmt_misc ppdev parport_pc lp parport ipv6 acpi_cpufreq cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats nls_utf8 nls_cp437 vfat fat nls_base fuse firewire_sbp2 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec arc4 snd_hwdep ecb snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy iwl3945 snd_seq_oss pcmcia snd_seq_midi snd_rawmidi snd_seq_midi_event rfkill snd_seq mac80211 snd_timer joydev snd_seq_device yenta_socket lib80211 snd rsrc_nonstatic i2c_i801 psmouse soundcore pcmcia_core rng_core i2c_core pcspkr cfg80211 evdev serio_raw snd_page_alloc container battery video button ac output ext3 jbd mbcache sg sr_mod cdrom sd_mod crc_t10dif ide_pci_generic ide_core ata_generic ata_piix sdhci_pci sdhci uhci_hcd libata mmc_core led_class firewire_ohci firewire_core crc_itu_t scsi_mod ehci_hcd r8169 usbcore mii intel_agp agpgart thermal processor fan thermal_sys dm_mirror dm_region_hash dm_log dm_mod [ 1388.000373] Pid: 0, comm: swapper Not tainted 2.6.29-2-686 #1 [ 1388.000378] Call Trace: [ 1388.000389] [<c0125e98>] warn_slowpath+0x80/0xb6 [ 1388.000398] [<c011fb5e>] update_rq_clock+0xe/0x1c [ 1388.000407] [<c0120e8c>] try_to_wake_up+0x14e/0x157 [ 1388.000415] [<c011a44c>] place_entity+0x6c/0x9b [ 1388.000423] [<c011ca00>] put_prev_task_fair+0x77/0xd2 [ 1388.000431] [<c011cf40>] enqueue_task_fair+0x19/0x51 [ 1388.000438] [<c011a849>] enqueue_task+0x52/0x5d [ 1388.000445] [<c011a94d>] activate_task+0x1c/0x21 [ 1388.000453] [<c0120e8c>] try_to_wake_up+0x14e/0x157 [ 1388.000462] [<c028e731>] dev_watchdog+0xa8/0x13b [ 1388.000471] [<c01162e2>] default_spin_lock_flags+0x5/0x7 [ 1388.000480] [<c02e7359>] _spin_lock_irqsave+0x25/0x2b [ 1388.000488] [<c012da35>] lock_timer_base+0x19/0x35 [ 1388.000495] [<c012dbcb>] __mod_timer+0x96/0x9f [ 1388.000504] [<c012d574>] run_timer_softirq+0x14a/0x1b4 [ 1388.000512] [<c028e689>] dev_watchdog+0x0/0x13b [ 1388.000521] [<c012a368>] __do_softirq+0x8c/0x115 [ 1388.000528] [<c012a436>] do_softirq+0x45/0x53 [ 1388.000536] [<c012a55a>] irq_exit+0x35/0x62 [ 1388.000544] [<c01051a5>] do_IRQ+0x64/0x77 [ 1388.000552] [<c0103a87>] common_interrupt+0x27/0x2c [ 1388.000589] [<f80b7081>] acpi_idle_enter_bm+0x279/0x2c9 [processor] [ 1388.000601] [<c026ebf1>] cpuidle_idle_call+0x5d/0x90 [ 1388.000609] [<c010260a>] cpu_idle+0x5e/0x78 [ 1388.000614] ---[ end trace 8c9b2278a50f5bb7 ]--- Most of time network device works again by using sudo rmmod r8169 mii ; sudo modprobe r8169 ; sudo modprobe mii I've tested kernel options noacpi pci=nomsi and the only difference is that it takes more or less time to appear the problem. Thanks in advance. Regards, Victor
Created attachment 24424 [details] After "transmit time out" network card hangs. I just started reading emails ...
Created attachment 24490 [details] file /var/log/kern.log part 1/3 file /var/log/kern.log part 1/3 This time bug does not let me to connect even using wireless. As dmesg has no information about it, I decided to attach file /var/log/kern.log, which has not only info about this time. It is splitted in 3 attachments. Hope it is useful. Thanks. Regards, Victor.
Created attachment 24491 [details] file /var/log/kern.log part 2/3 file /var/log/kern.log part 2/3 This time bug does not let me to connect even using wireless. As dmesg has no information about it, I decided to attach file /var/log/kern.log, which has not only info about this time. It is splitted in 3 attachments. Hope it is useful. Thanks. Regards, Victor.
Created attachment 24492 [details] file /var/log/kern.log part 3/3 file /var/log/kern.log part 3/3 This time bug does not let me to connect even using wireless. As dmesg has no information about it, I decided to attach file /var/log/kern.log, which has not only info about this time. It is splitted in 3 attachments. Hope it is useful. Thanks. Regards, Victor.
I'm seeing the same problem in a more extreme way.. If I have the nvidia blob loaded doing any heavy network loads from within gnome immediately hard resets the machine (no kernel crash log, nada....the box literally hard crashes and resets) if under 'normal' usage the box will either hard lock or reset within 30 minutes. If I open a terminal session within gnome and with the nvidia blob loaded I can copy over the network to my hearts content even at gigabit speeds. If I remove the nvidia blob and use either nouveau or vesa driver I can copy over the network to my hearts content from either a terminal session or from within gnome. I have removed the nvidia blob and am working without the enhanced graphics else I have in effect a non usable box. Below is the dmesg for the realtek if you need me to load the nvidia driver and dmesg I can do that but it'll have to wait a few days. r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded r8169 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 r8169 0000:06:00.0: setting latency timer to 64 alloc irq_desc for 31 on node 0 alloc kstat_irqs on node 0 r8169 0000:06:00.0: irq 31 for MSI/MSI-X eth0: RTL8168c/8111c at 0xffffc90000c76000, xx:xx:xx:xx:xx:xx, XID 3c4000c0 IRQ 31
Created attachment 24539 [details] Another kernel log. This time I didn't hibernate my laptop ... I think that it takes more time when hpet increases min_delta_ns to 22000 nsec. This time it only increased to 15000 nsec and maybe by increasing it to 22000 from the beginning bug may not occur so frequently. Thanks, Victor. [ 635.001094] CE: hpet increasing min_delta_ns to 15000 nsec
There is a similar issue with transmit timeouts on that chip, but especially when using jumbo frames (MTU>1500): http://bugzilla.kernel.org/show_bug.cgi?id=9882 For me, these timeouts don't seem to occur with MTU 1500, only with higher values. But still, perhaps these two issues are related.
Maybe, but logs at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528362 show that I've been using mtu=1500. Thanks, Victor 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000 link/ether 00:16:d4:2d:bd:0b brd ff:ff:ff:ff:ff:ff 3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:18:de:a8:42:a5 brd ff:ff:ff:ff:ff:ff inet 138.100.214.93/20 brd 138.100.223.255 scope global wlan0 inet6 fe80::218:deff:fea8:42a5/64 scope link valid_lft forever preferred_lft forever
Realtek offers their own version of the network driver (called "r8168") at www.realtek.com.tw, maybe try it and see if it solves the issue for you? Of course this is not a good permanent solution, but would be valuable to know that the bug is in the kernel's driver, or is present in Realtek's too.
r8168 from realtek (http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=5&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false#RTL8111B/RTL8168B/RTL8111/RTL8168<br>RTL8111C/RTL8111CP/RTL8111D(L)<br>RTL8168C/RTL8111DP/RTL8111E<br>RTL8105E) has some errors in src/Makefile I changed this file to compile and it is maybe interesting for others. File I used is this one: ################################################################################ # # r8168 is the Linux device driver released for RealTek RTL8168B/8111B, # RTL8168C/8111C, RTL8168CP/8111CP, RTL8168D/8111D, and RTL8168DP/8111DP, and # RTL8168E/8111E Gigabit Ethernet controllers with PCI-Express interface. # # Copyright(c) 2009 Realtek Semiconductor Corp. All rights reserved. # # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the Free # Software Foundation; either version 2 of the License, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for # more details. # # You should have received a copy of the GNU General Public License along with # this program; if not, see <http://www.gnu.org/licenses/>. # # Author: # Realtek NIC software team <nicfae@realtek.com> # No. 2, Innovation Road II, Hsinchu Science Park, Hsinchu 300, Taiwan # ################################################################################ ################################################################################ # This product is covered by one or more of the following patents: # US5,307,459, US5,434,872, US5,732,094, US6,570,884, US6,115,776, and US6,327,625. ################################################################################ PWD := $(shell pwd) KVER := $(shell uname -r) KDIR := /lib/modules/$(KVER)/build KMISC := /lib/modules/$(KVER)/kernel/drivers/net/ KEXT := $(shell echo $(KVER) | sed -ne 's/^2\.[567]\..*/k/p')o KFLAG := 2$(shell echo $(KVER) | sed -ne 's/^2\.[4]\..*/4/p')x EXTRA_CFLAGS += -DCONFIG_R8168_NAPI #EXTRA_CFLAGS += -DCONFIG_R8168_VLAN modules: ifeq ($(KFLAG),24x) $(MAKE) -f Makefile_linux24x else $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules strip --strip-debug r8168.$(KEXT) endif clean: rm -rf *.o *.ko *~ core* .dep* .*.d .*.cmd *.mod.c *.a *.s .*.flags .tmp_versions Module.symvers Modules.symvers Module.markers *.order echo "PWD is $(PWD)" install: install -m 744 -c r8168.$(KEXT) $(KMISC) ifneq ($(KFLAG),24x) r8168-objs := r8168_n.o r8168-objs += r8168_asf.o r8168-objs += rtl_eeprom.o obj-m += r8168.o endif#($(KFLAG),24x)
Created attachment 24615 [details] New error messages. I was trying to use r8168 from realtek instead from r8169 and I saw a different message in dmesg. Hope it is useful. Regards, Victor.
r8168 does not work for me ... After building module, I had to blacklist r8169 and update initrd by using sudo update-initramfs -u Next I reboot and saw that r8168 was not loaded automatically, so I loaded manually the module. Results: no interface, no logs and module loaded. vpablos@exodo4:~$ sudo modprobe r8168 vpablos@exodo4:~$ ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1086 errors:0 dropped:0 overruns:0 frame:0 TX packets:1086 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:106967 (104.4 KiB) TX bytes:106967 (104.4 KiB) wlan0 Link encap:Ethernet HWaddr 00:18:de:a8:42:a5 inet addr:138.100.214.93 Bcast:138.100.223.255 Mask:255.255.240.0 inet6 addr: fe80::218:deff:fea8:42a5/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5549 errors:0 dropped:0 overruns:0 frame:0 TX packets:3657 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2702734 (2.5 MiB) TX bytes:656086 (640.7 KiB) vpablos@exodo4:~$ lsmod | grep r81 r8168 51047 0 vpablos@exodo4:~$ dmesg | tail -n 5 [ 99.043546] wlan0: RX AssocResp from 00:18:6e:27:ce:84 (capab=0x411 status=0 aid=1) [ 99.043550] wlan0: associated [ 99.045338] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [ 101.346440] padlock: VIA PadLock not detected. [ 109.500120] wlan0: no IPv6 routers present vpablos@exodo4:~$ Maybe it is not compatible with 2.6.32 or I am doing something wrong ...
We have a lot of Fedora 12 and 13 users hitting this bug: https://bugzilla.redhat.com/show_bug.cgi?id=538920 it would be great if someone could work on it. If there's any specific info required please let me know and I'll pass it on to the affected Fedora users...
(In reply to comment #13) I would like to upgrade to FC13(from 10) when this problem will solved. Problem seems to be fixed in 2.6.34-rc7. Backporting module changes to fedora 13 kernel would not help. Kernel error disapears, but something strange with msi support cause RX overflows occurs too fast. network works for 5 seconds then freeze for 10 seconds.. There is no problems with 2.6.34-rc7, may be earlier, but module fixes come in 2.6.34-rc4
> There is no problems with 2.6.34-rc7 Did you also try with jumbo frames (e.g. MTU=4000 or 7200)?
(In reply to comment #15) > > There is no problems with 2.6.34-rc7 > > Did you also try with jumbo frames (e.g. MTU=4000 or 7200)? I'll check it out on Monday (31.05)
(In reply to comment #15) > > There is no problems with 2.6.34-rc7 > > Did you also try with jumbo frames (e.g. MTU=4000 or 7200)? Yes, it works. I tried different payload sizes, with and without msi support. All OK. I tried not very great big traffic, but I checked that kernel bug dissapeared since module fixes. This module causes reconnection after five seconds of traffic with earlier kernel. Fortunately 2.6.34-rc7 module keeps work on heavy traffic. Reset handler will proceed correct, even if reconnection is possible on very heavy traffic.
Nope, not fixed for me in the final 2.6.34. Just set my MTU to 7154 and launched: # ping <other ip on my gigabit LAN> -s 7100 -M do -f -l 200 The card hung up shortly after: [ 297.792010] ------------[ cut here ]------------ [ 297.792022] WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x227/0x230() [ 297.792026] Hardware name: GA-MA790FX-DQ6 [ 297.792030] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 297.792033] Modules linked in: nls_utf8 cifs radeon ttm drm_kms_helper drm i2c_algo_bit sco bridge stp bnep rfcomm l2cap crc16 bluetooth rfkill vboxnetadp vboxnetflt cpufreq_stats cpufreq_powersave cpufreq_userspace cpufreq_conservative kvm_amd kvm fuse ip_tables x_tables vboxdrv powernow_k8 k8temp it87 hwmon_vid loop dm_crypt saa7134_alsa tuner_simple tuner_types tda9887 snd_ice1724 snd_ice17xx_ak4xxx tda8290 snd_ac97_codec tuner ac97_bus snd_ak4xxx_adda snd_ak4114 snd_pt2258 snd_i2c snd_ak4113 snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq sg saa7134 sd_mod crc_t10dif sr_mod snd_timer ir_common snd_seq_device v4l2_common cdrom videodev ata_generic snd ahci v4l1_compat v4l2_compat_ioctl32 pata_atiixp libata edac_core wmi psmouse videobuf_dma_sg tpm_tis videobuf_core tpm scsi_mod tpm_bios soundcore serio_raw ir_core i2c_piix4 tveeprom snd_page_alloc evdev processor k10temp i2c_core edac_mce_amd button usbhid hid nfs lockd fscache nfs_acl auth_rpcgss sunrpc ohci_hcd ehci_hcd usbcore nls_base thermal fan thermal_sys dm_mirror dm_region_hash dm_log dm_mod r8169 mii [ 297.792146] Pid: 0, comm: swapper Not tainted 2.6.34-rm1-amd-slab-nomcsmt #1 [ 297.792150] Call Trace: [ 297.792153] <IRQ> [<ffffffff810547d3>] ? warn_slowpath_common+0x73/0xb0 [ 297.792165] [<ffffffff81054870>] ? warn_slowpath_fmt+0x40/0x50 [ 297.792171] [<ffffffff811e1ca1>] ? strlcpy+0x41/0x50 [ 297.792176] [<ffffffff812e87a7>] ? dev_watchdog+0x227/0x230 [ 297.792182] [<ffffffff8106e5e0>] ? delayed_work_timer_fn+0x0/0x40 [ 297.792187] [<ffffffff81042fac>] ? __wake_up+0x3c/0x60 [ 297.792193] [<ffffffff81063ba5>] ? run_timer_softirq+0x185/0x320 [ 297.792198] [<ffffffff8107c97b>] ? ktime_get+0x5b/0xe0 [ 297.792203] [<ffffffff8105b32f>] ? __do_softirq+0xaf/0x1d0 [ 297.792209] [<ffffffff8100ae1c>] ? call_softirq+0x1c/0x30 [ 297.792214] [<ffffffff8100d245>] ? do_softirq+0x65/0xa0 [ 297.792219] [<ffffffff8105b215>] ? irq_exit+0x85/0x90 [ 297.792224] [<ffffffff810252da>] ? smp_apic_timer_interrupt+0x6a/0xa0 [ 297.792229] [<ffffffff8100a8d3>] ? apic_timer_interrupt+0x13/0x20 [ 297.792232] <EOI> [<ffffffff8102ee32>] ? native_safe_halt+0x2/0x10 [ 297.792243] [<ffffffff810136ee>] ? default_idle+0x2e/0x80 [ 297.792248] [<ffffffff81013796>] ? c1e_idle+0x56/0x110 [ 297.792253] [<ffffffff81008d8a>] ? cpu_idle+0xaa/0x100 [ 297.792259] [<ffffffff8139df94>] ? start_secondary+0x1ee/0x1f2 [ 297.792263] ---[ end trace e627a1c632239584 ]--- [ 297.808068] r8169 0000:02:00.0: eth0: link up
I have been experiencing the same problem for a long time, too (more details at http://bugs.debian.org/526983). The laptop is a Satellite A110 (as with the original poster). The card information is: [ 0.833183] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 0.833219] r8169 0000:05:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 [ 0.833278] r8169 0000:05:00.0: setting latency timer to 64 [ 0.833357] alloc irq_desc for 43 on node -1 [ 0.833360] alloc kstat_irqs on node -1 [ 0.833381] r8169 0000:05:00.0: irq 43 for MSI/MSI-X [ 0.833975] r8169 0000:05:00.0: eth0: RTL8101e at 0xffffc9000065e000, 00:16:d4:8a:aa:ed, XID 14000000 IRQ 43 [ 0.920725] alloc irq_desc for 20 on node -1 [ 0.920729] alloc kstat_irqs on node -1 The actual trace, using debian kernel 2.6.35-1~experimental.1: [ 170.832114] ------------[ cut here ]------------ [ 170.832129] WARNING: at /build/mattems-linux-2.6_2.6.35-1~experimental.1-amd64-OCge0v/linux-2.6-2.6.35/debian/build/source_amd64_none/net/sched/sch_generic.c:258 dev_watchdog+0xef/0x18c() [ 170.832136] Hardware name: Satellite A110 [ 170.832141] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 170.832144] Modules linked in: sco bridge stp bnep rfcomm l2cap crc16 bluetooth cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 uinput fuse xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_DSCP xt_TCPMSS ipt_LOG ipt_REJECT iptable_mangle iptable_filter xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp nf_conntrack ip_tables x_tables coretemp acpi_cpufreq mperf firewire_sbp2 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss arc4 snd_pcm ecb snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq iwl3945 snd_timer snd_seq_device iwlcore i915 snd drm_kms_helper pcmcia mac80211 drm joydev soundcore cfg80211 snd_page_alloc tpm_tis yenta_socket i2c_i801 i2c_algo_bit pcmcia_rsrc tpm serio_raw rfkill tpm_bios rng_core pcmcia_core video i2c_core psmouse container battery output ac processor evdev button ext3 jbd mbcache dm_mod usbhid hid sg sr_mod sd_mod cdrom crc_t10dif ata_generic uhci_hcd ata_piix libata ehci_hcd sdhci_pci sdhci scsi_mod firewire_ohci usbcore mmc_core led_class firewire_core crc_itu_t r8169 thermal mii nls_base thermal_sys [last unloaded: scsi_wait_scan] [ 170.832314] Pid: 0, comm: swapper Not tainted 2.6.35-trunk-amd64 #1 [ 170.832318] Call Trace: [ 170.832322] <IRQ> [<ffffffff81044307>] ? warn_slowpath_common+0x78/0x8c [ 170.832339] [<ffffffff810443ba>] ? warn_slowpath_fmt+0x45/0x4a [ 170.832346] [<ffffffff8126492e>] ? netif_tx_lock+0x3d/0x65 [ 170.832353] [<ffffffff81264a45>] ? dev_watchdog+0xef/0x18c [ 170.832360] [<ffffffff8101e743>] ? lapic_next_event+0x18/0x1d [ 170.832368] [<ffffffff8105e173>] ? hrtimer_interrupt+0x112/0x1bc [ 170.832375] [<ffffffff810502b4>] ? run_timer_softirq+0x1cd/0x299 [ 170.832382] [<ffffffff81264956>] ? dev_watchdog+0x0/0x18c [ 170.832389] [<ffffffff81049b26>] ? __do_softirq+0xe4/0x1aa [ 170.832397] [<ffffffff8108b7df>] ? handle_IRQ_event+0x4c/0x104 [ 170.832406] [<ffffffff810098dc>] ? call_softirq+0x1c/0x30 [ 170.832412] [<ffffffff8100aeb3>] ? do_softirq+0x3f/0x79 [ 170.832417] [<ffffffff810499a6>] ? irq_exit+0x36/0x7a [ 170.832423] [<ffffffff8100a615>] ? do_IRQ+0xa3/0xb9 [ 170.832431] [<ffffffff81300bd3>] ? ret_from_intr+0x0/0x11 [ 170.832435] <EOI> [<ffffffffa03a74b7>] ? acpi_idle_enter_bm+0x264/0x29c [processor] [ 170.832461] [<ffffffffa03a74b0>] ? acpi_idle_enter_bm+0x25d/0x29c [processor] [ 170.832469] [<ffffffff81238286>] ? cpuidle_idle_call+0x8f/0xed [ 170.832476] [<ffffffff81007b5d>] ? cpu_idle+0xa3/0xdd [ 170.832484] [<ffffffff816c5d92>] ? start_kernel+0x3ef/0x3fa [ 170.832491] [<ffffffff816c53ba>] ? x86_64_start_kernel+0xf9/0x106 [ 170.832496] ---[ end trace e4f0b2b50888042c ]--- [ 170.849242] r8169 0000:05:00.0: eth0: link up In this case, the network came back immediately. Sometimes, it doesn't, and the solution is to suspend to RAM and to resume again. Some other times, the computer simply locks up (apparently triggered by using the network, but I never got a trace when that happened). I hope that this information is useful. Thanks in advance. Regards, Carlos
Running with kernel option pcie_aspm=off on multiple machines has bypassed the problem for me. Even with 200GB data transfers and jumbo packets, no glitches.
Adding option pcie_aspm=off does not help in this case, as the problem still happens. In any case, dmesg says: pci 0000:05:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force' whether this option is given or not. Also, this is an RTL8101e adapter. No gigabit...
Hello everyone, I have couple of laptops with Realtek, running OpenSUSE and Fedora. Any help needed with testing?
Did the fixes in v3.2 change anything?
I use Debian Wheezy (Linux 3.2.0-2-amd64 #1 SMP Mon May 21 17:45:41 UTC 2012 x86_64 GNU/Linux) with the RTL8111/8168B PCI Express Gigabit Ethernet controller built in to the Gigabyte GA-990FXA-UDF5 motherboard. The problem happens soon after machine boot and use of the XFCE4 desktop environment for me, under both high and low net load: ============================================================= [ 2049.744023] WARNING: at /build/buildd-linux-2.6_3.2.15-1-amd64-EOdTQR/linux-2.6-3.2.15/debian/build/source_amd64_none/net/sched/sch_generic.c:255 dev_watchdog+0xe9/0x148() [ 2049.744025] Hardware name: GA-990FXA-UD5 [ 2049.744026] NETDEV WATCHDOG: eth2 (r8169): transmit queue 0 timed out [ 2049.744027] Modules linked in: sg pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) mperf cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_stats parport_pc ppdev lp parport rfcomm bnep bluetooth rfkill oss_usb(O) oss_hdaudio(O) osscore(O) binfmt_misc fuse sr_mod cdrom dm_crypt w83793 hwmon_vid loop radeon ttm sp5100_tco drm_kms_helper drm power_supply fam15h_power edac_mce_amd i2c_piix4 i2c_algo_bit k10temp button processor edac_core evdev i2c_core pcspkr mxm_wmi wmi thermal_sys ext4 crc16 jbd2 mbcache dm_mod raid1 md_mod usbhid hid sd_mod crc_t10dif ata_generic mpt2sas raid_class scsi_transport_sas ohci_hcd firewire_ohci firewire_core crc_itu_t ahci libahci libata xhci_hcd r8169 mii ehci_hcd usbcore scsi_mod usb_common [last unloaded: scsi_wait_scan] [ 2049.744067] Pid: 0, comm: swapper/7 Tainted: G O 3.2.0-2-amd64 #1 [ 2049.744069] Call Trace: [ 2049.744070] <IRQ> [<ffffffff81046811>] ? warn_slowpath_common+0x78/0x8c [ 2049.744076] [<ffffffff810468bd>] ? warn_slowpath_fmt+0x45/0x4a [ 2049.744078] [<ffffffff812a1e75>] ? netif_tx_lock+0x40/0x72 [ 2049.744081] [<ffffffff812a1fd6>] ? dev_watchdog+0xe9/0x148 [ 2049.744083] [<ffffffff81051ebc>] ? run_timer_softirq+0x19a/0x261 [ 2049.744085] [<ffffffff812a1eed>] ? netif_tx_unlock+0x46/0x46 [ 2049.744088] [<ffffffff810659ff>] ? timekeeping_get_ns+0xd/0x2a [ 2049.744091] [<ffffffff8104be30>] ? __do_softirq+0xb9/0x177 [ 2049.744093] [<ffffffff813504ac>] ? call_softirq+0x1c/0x30 [ 2049.744096] [<ffffffff8100f8e5>] ? do_softirq+0x3c/0x7b [ 2049.744098] [<ffffffff8104c098>] ? irq_exit+0x3c/0x9a [ 2049.744100] [<ffffffff81023fe8>] ? smp_apic_timer_interrupt+0x74/0x82 [ 2049.744103] [<ffffffff8134ed1e>] ? apic_timer_interrupt+0x6e/0x80 [ 2049.744104] <EOI> [<ffffffff81023cb0>] ? lapic_next_event+0xe/0x13 [ 2049.744107] [<ffffffff8102b2c4>] ? native_safe_halt+0x2/0x3 [ 2049.744112] [<ffffffffa0243c47>] ? acpi_safe_halt+0x21/0x39 [processor] [ 2049.744116] [<ffffffffa02440b3>] ? acpi_idle_enter_c1+0x57/0xb3 [processor] [ 2049.744121] [<ffffffff8126b8ab>] ? cpuidle_idle_call+0xec/0x179 [ 2049.744124] [<ffffffff8100d248>] ? cpu_idle+0xa5/0xf2 [ 2049.744127] [<ffffffff8133b77f>] ? start_secondary+0x1d5/0x1db [ 2049.744128] ---[ end trace 685ef6db5d21a5fd ]--- [ 2049.764691] r8169 0000:06:00.0: eth2: link up ============================================================= The last message then repeats. 'pcie_aspm=off' has no effect. As others have said, this permanently destroys network connectivity soon after boot. I have since downloaded the official realtek driver 'LINUX driver for kernel 3.x and 2.6.x and 2.4.x' off http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false and the problem no longer occurs.
For other users of my board, it seems that disabling the IOMMU in the BIOS is required for the official realtek driver to work (this made no difference to the problem with the default driver).
Created attachment 73504 [details] 8168evl hack for the 990FXA based motherboards
(In reply to comment #25) > For other users of my board, it seems that disabling the IOMMU in the BIOS is > required for the official realtek driver to work (this made no difference to > the problem with the default driver). You should be fine with the patch above and any recent kernel. -- Ueimor
Thanks - I will look into patching the default driver with this and disable the official realtek driver for testing.
Thanks Francois - I have now been running this for a day and there have been no cut outs of network activity at all :)
The patch mentioned in comment#26 does not seem to be in mainline. Did some other change in mainline fix this?
Is this problem really fixed? Linux server 3.12.6 #2 SMP Fri Dec 20 20:25:49 CET 2013 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ AuthenticAMD GNU/Linux Feb 7 02:36:36 server kernel: ------------[ cut here ]------------ Feb 7 02:36:36 server kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x227/0x230() Feb 7 02:36:36 server kernel: NETDEV WATCHDOG: enp1s10 (r8169): transmit queue 0 timed out Feb 7 02:36:36 server kernel: Modules linked in: nf_conntrack_irc nf_conntrack_ftp nf_conntrack_tftp xt_owner ipt_REJECT xt_tcpudp xt_conntrack xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter ip_tables x_tables ipv6 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd video backlight wmi Feb 7 02:36:36 server kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G A 3.12.6 #2 Feb 7 02:36:36 server kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./K10N78FullHD-hSLI.. , BIOS P2.10 09/22/2009 Feb 7 02:36:36 server kernel: 0000000000000009 ffffffff8159691a ffff8801dfd03e28 ffffffff81091171 Feb 7 02:36:36 server kernel: 0000000000000000 ffff8801dfd03e78 0000000000000001 0000000000000001 Feb 7 02:36:36 server kernel: 00000000000000f0 ffffffff810911e7 ffffffff81727940 0000000000000030 Feb 7 02:36:36 server kernel: Call Trace: Feb 7 02:36:36 server kernel: <IRQ> [<ffffffff8159691a>] ? dump_stack+0x41/0x51 Feb 7 02:36:36 server kernel: [<ffffffff81091171>] ? warn_slowpath_common+0x81/0xb0 Feb 7 02:36:36 server kernel: [<ffffffff810911e7>] ? warn_slowpath_fmt+0x47/0x50 Feb 7 02:36:36 server kernel: [<ffffffff814f9047>] ? dev_watchdog+0x227/0x230 Feb 7 02:36:36 server kernel: [<ffffffff814f8e20>] ? dev_graft_qdisc+0x90/0x90 Feb 7 02:36:36 server kernel: [<ffffffff8109b24a>] ? call_timer_fn.isra.34+0x2a/0x90 Feb 7 02:36:36 server kernel: [<ffffffff8109b40a>] ? run_timer_softirq+0x15a/0x1f0 Feb 7 02:36:36 server kernel: [<ffffffff810953de>] ? __do_softirq+0xce/0x190 Feb 7 02:36:36 server kernel: [<ffffffff8159cdcc>] ? call_softirq+0x1c/0x30 Feb 7 02:36:36 server kernel: [<ffffffff81037da5>] ? do_softirq+0x35/0x70 Feb 7 02:36:36 server kernel: [<ffffffff81095585>] ? irq_exit+0x45/0x50 Feb 7 02:36:36 server kernel: [<ffffffff8105de7b>] ? smp_apic_timer_interrupt+0x3b/0x50 Feb 7 02:36:36 server kernel: [<ffffffff8159c7ca>] ? apic_timer_interrupt+0x6a/0x70 Feb 7 02:36:36 server kernel: <EOI> [<ffffffff8103e5a8>] ? amd_e400_idle+0x68/0xe0 Feb 7 02:36:36 server kernel: [<ffffffff810c5509>] ? cpu_startup_entry+0xd9/0x130 Feb 7 02:36:36 server kernel: ---[ end trace cf05e20aa4169875 ]--- Feb 7 02:36:36 server kernel: r8169 0000:01:0a.0 enp1s10: link up
I havent had any further incidences anyway.
Ih had this problem last night when i had a lot on this network device. I ran in this bug every time (over the last years) when I have lot of traffic. I use to compile a new kernel every week... Now, I found this Bug report the first time. 01:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10) Subsystem: Realtek Semiconductor Co., Ltd. RTL8169/8110 Family PCI Gigabit Ethernet NIC Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16 I/O ports at d800 [size=256] Memory at fcfffc00 (32-bit, non-prefetchable) [size=256] Expansion ROM at <ignored> [disabled] Capabilities: [dc] Power Management version 2 Kernel driver in use: r8169 enp1s10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 Settings for enp1s10: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supported pause frame use: No Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: Symmetric Receive-only Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Link partner advertised pause frame use: Symmetric Receive-only Link partner advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000033 (51) drv probe ifdown ifup Link detected: yes
Some mistakes, sorry: I had this problem last night when i had a lot of traffic on this network device. 00:00.0 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a2) 00:01.0 ISA bridge: NVIDIA Corporation MCP78S [GeForce 8200] LPC Bridge (rev a2) 00:01.1 SMBus: NVIDIA Corporation MCP78S [GeForce 8200] SMBus (rev a1) 00:01.2 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a1) 00:01.3 Co-processor: NVIDIA Corporation MCP78S [GeForce 8200] Co-Processor (rev a2) 00:01.4 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a1) 00:02.0 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1) 00:02.1 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1) 00:04.0 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1) 00:04.1 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1) 00:07.0 Audio device: NVIDIA Corporation MCP72XE/MCP72P/MCP78U/MCP78S High Definition Audio (rev a1) 00:08.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1) 00:09.0 SATA controller: NVIDIA Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2) 00:0b.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1) 00:10.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1) 00:12.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1) 00:13.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10) 02:00.0 VGA compatible controller: NVIDIA Corporation C77 [GeForce 8200] (rev a2)
Perhaps this patch helps. It seems there is a lack of one case in the function static void rtl_init_rxcfg(struct rtl8169_private *tp) Since I added "case RTL_GIGA_MAC_VER_36", there was no hang of the network device any more. As I said, the error occurs, when I have *much* traffic on the Realtek-RTL8169-NIC. To produce a lot of traffic, I use some NFS-Streams (nfs4) in different direction from and to this server with Realtek-NIC in the local area network. # cat /usr/src/realtek_r8169.patch --- drivers/net/ethernet/realtek/r8169.orig 2014-02-08 14:17:43.258088394 +0100 +++ drivers/net/ethernet/realtek/r8169.c 2014-02-08 14:19:01.858031701 +0100 @@ -4237,6 +4237,7 @@ case RTL_GIGA_MAC_VER_24: case RTL_GIGA_MAC_VER_34: case RTL_GIGA_MAC_VER_35: + case RTL_GIGA_MAC_VER_36: RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST); break; case RTL_GIGA_MAC_VER_40: Perhaps somebody can test or comment this patch.
Created attachment 125961 [details] write reordering and netdev watchdog debug information
(In reply to Mario Bachmann from comment #35) > Perhaps this patch helps. It seems there is a lack of one case in the > function > static void rtl_init_rxcfg(struct rtl8169_private *tp) See commit 3ced8c955e74d319f3e3997f7169c79d524dfd06 and http://marc.info/?l=linux-netdev&m=137859576524585&w=4 for some history information. Your motherboard is not the usual 8168evl AMD IOMMU supporting one though. Could you give https://bugzilla.kernel.org/attachment.cgi?id=125961 a try and send the debug log ? (opening a different problem report would be welcome too) Thanks. -- Ueimor
I am not sure, what my motherboard should have to do with it. I use a PCI-Network-Card with Realtek-Chipset. I do _not_ use the Onboard-NIC (it is disabled in the BIOS). The patch with the debug-output adds only debug output and changes nothing. I think I will try my little patch until the next crash. Why opening "a different problem"? What is the difference between my problem and the initally problem in this thread? The description "r8169 hangs when your transmission speed is really high." seems to be exactly my problem. Am I wrong? Please correct me. Greetings Mario (In reply to Francois Romieu from comment #37) > Your motherboard is not the usual 8168evl AMD IOMMU supporting one though. > > Could you give https://bugzilla.kernel.org/attachment.cgi?id=125961 a try and > send the debug log ? > > (opening a different problem report would be welcome too) > > Thanks. > > -- > Ueimor
(In reply to Mario Bachmann from comment #38) > I am not sure, what my motherboard should have to do with it. Notwithstanding the generous mess in the thread, it ended with a bug exhibiting rather reproducible symptoms on a specific kind of motherboard and network device. Your setup shares none of those. > I use a > PCI-Network-Card with Realtek-Chipset. I do _not_ use the Onboard-NIC (it is > disabled in the BIOS). PCI or PCI express ? > The patch with the debug-output adds only debug output and changes nothing. Would you be kind enough to attach said debug output, up to the point where the problem happens ? If so it would be nice to include the lines related to the device detection as well. > I think I will try my little patch until the next crash. If it hides the bug, it won't help figuring what the problem is. :o/ > Why opening "a different problem" ? What is the difference between my problem > and the initally problem in this thread? The description "r8169 hangs when > your transmission speed is really high." seems to be exactly my problem. > Am I wrong? Please correct me. It would not had been practical to fill all high speed related r8169 hangs under the same PR since 2003, especially as the number of devices grew ~10x and the driver has been heavily modified in this time frame. (please remove unused material and don't top post) -- Ueimor
Excuse me. So I just posted in the wrong thread. Perhaps my "Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10)" is a _differnet_ network device... The line "Kernel driver in use: r8169" is just a coincidence. Thank you and greetings.
The same bug? 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 01) [ 146.788047] ------------[ cut here ]------------ [ 146.788062] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x23f/0x250() [ 146.788065] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 146.788066] Modules linked in: dm_crypt snd_hda_codec_hdmi snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_ctxfi snd_hwdep snd_pcm_oss videobuf2_vmalloc snd_mixer_oss videobuf2_memops videobuf2_core snd_pcm v4l2_common snd_seq_dummy snd_seq_midi snd_seq_oss videodev joydev snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device nvidia(PO) snd_timer serio_raw rtc_cmos snd soundcore k10temp sp5100_tco edac_core edac_mce_amd drm i2c_piix4 shpchp mac_hid binfmt_misc xfs libcrc32c raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear raid10 raid1 uas usb_storage hid_generic usbhid hid psmouse pata_acpi atkbd pata_atiixp pata_jmicron ahci libahci r8169 mii wmi [ 146.788106] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P W O 3.19.0-150212 #1 [ 146.788108] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790FX-DS5/GA-MA790FX-DS5, BIOS F8i 07/19/2010 [ 146.788110] ffffffff817adcad 86c0b30a1601187d ffffffff817adcad ffffffff815eabf2 [ 146.788112] ffff8801bfc83e00 ffffffff81048ac7 0000000000000000 ffff8801b58483a0 [ 146.788114] ffff8801b5848000 0000000000000002 0000000000000001 ffffffff81048b58 [ 146.788117] Call Trace: [ 146.788119] <IRQ> [<ffffffff815eabf2>] ? dump_stack+0x47/0x67 [ 146.788126] [<ffffffff81048ac7>] ? warn_slowpath_common+0x77/0xb0 [ 146.788129] [<ffffffff81048b58>] ? warn_slowpath_fmt+0x58/0x80 [ 146.788132] [<ffffffff8106eca7>] ? vtime_gen_account_irq_exit+0x27/0x60 [ 146.788135] [<ffffffff8109b40d>] ? run_posix_cpu_timers+0x4d/0x5b0 [ 146.788140] [<ffffffff8151959f>] ? dev_watchdog+0x23f/0x250 [ 146.788143] [<ffffffff81519360>] ? dev_graft_qdisc+0x80/0x80 [ 146.788145] [<ffffffff81096445>] ? call_timer_fn.isra.32+0x15/0x80 [ 146.788148] [<ffffffff81519360>] ? dev_graft_qdisc+0x80/0x80 [ 146.788150] [<ffffffff81096700>] ? run_timer_softirq+0x1c0/0x250 [ 146.788153] [<ffffffff8104c0aa>] ? __do_softirq+0xfa/0x230 [ 146.788155] [<ffffffff8104c396>] ? irq_exit+0xe6/0x100 [ 146.788158] [<ffffffff81030729>] ? smp_apic_timer_interrupt+0x39/0x50 [ 146.788161] [<ffffffff815f196a>] ? apic_timer_interrupt+0x6a/0x70 [ 146.788162] <EOI> [<ffffffff8100c630>] ? arch_remove_reservations+0x110/0x110 [ 146.788167] [<ffffffff8100c632>] ? default_idle+0x2/0x10 [ 146.788171] [<ffffffff81078dff>] ? cpu_startup_entry+0x20f/0x2f0 [ 146.788174] [<ffffffff810a38a5>] ? tick_check_new_device+0xd5/0x100 [ 146.788176] [<ffffffff8102e840>] ? start_secondary+0x1a0/0x1e0 [ 146.788178] ---[ end trace 44a2ddf66d6ec379 ]---
I have the same error with the recently upgraded Kubuntu 15.04, which prevents connecting to the DHCP server.
(In reply to Max Kotov from comment #41) > The same bug? > 3.19.0-150212 #1 --> see bug #99521