Bug 14962 (VPC) - r8169 hangs when your transmission speed is really high.
Summary: r8169 hangs when your transmission speed is really high.
Status: CLOSED CODE_FIX
Alias: VPC
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-30 13:44 UTC by Victor Pablos Ceruelo
Modified: 2015-06-06 20:10 UTC (History)
15 users (show)

See Also:
Kernel Version: 2.6.29-2.6.32
Subsystem:
Regression: No
Bisected commit-id:


Attachments
After "transmit time out" network card hangs. I just started reading emails ... (121.87 KB, text/plain)
2010-01-04 09:02 UTC, Victor Pablos Ceruelo
Details
file /var/log/kern.log part 1/3 (204.37 KB, text/plain)
2010-01-09 09:49 UTC, Victor Pablos Ceruelo
Details
file /var/log/kern.log part 2/3 (185.21 KB, text/plain)
2010-01-09 09:49 UTC, Victor Pablos Ceruelo
Details
file /var/log/kern.log part 3/3 (968.58 KB, text/plain)
2010-01-09 09:50 UTC, Victor Pablos Ceruelo
Details
Another kernel log. This time I didn't hibernate my laptop ... (115.43 KB, text/plain)
2010-01-13 10:58 UTC, Victor Pablos Ceruelo
Details
New error messages. (78.69 KB, text/plain)
2010-01-18 11:08 UTC, Victor Pablos Ceruelo
Details
8168evl hack for the 990FXA based motherboards (482 bytes, patch)
2012-06-03 22:36 UTC, Francois Romieu
Details | Diff
write reordering and netdev watchdog debug information (1.96 KB, patch)
2014-02-13 16:44 UTC, Francois Romieu
Details | Diff

Description Victor Pablos Ceruelo 2009-12-30 13:44:42 UTC
Ben Hutchings suggested me to make a separate report.
Here it is.

More info (kernel logs, kernel oops and so on) can be found at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528362

r8169 hangs usually under heavily loads, although it hangs too while only reading email.

The error message always says transmit timed out and refers to dev_watchdog

[ 1388.000145] ------------[ cut here ]------------
[ 1388.000154] WARNING: at /build/buildd-linux-2.6_2.6.29-4-i386-vcILAN/linux-2.6-2.6.29/debian/build/source_i386_none/net/sched/sch_generic.c:226 dev_watchdog+0xa8/0x13b()
[ 1388.000163] Hardware name: Satellite A110
[ 1388.000167] NETDEV WATCHDOG: eth0 (r8169): transmit timed out
[ 1388.000172] Modules linked in: i915 drm i2c_algo_bit binfmt_misc ppdev parport_pc lp parport ipv6 acpi_cpufreq cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats nls_utf8 nls_cp437 vfat fat nls_base fuse firewire_sbp2 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec arc4 snd_hwdep ecb snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy iwl3945 snd_seq_oss pcmcia snd_seq_midi snd_rawmidi snd_seq_midi_event rfkill snd_seq mac80211 snd_timer joydev snd_seq_device yenta_socket lib80211 snd rsrc_nonstatic i2c_i801 psmouse soundcore pcmcia_core rng_core i2c_core pcspkr cfg80211 evdev serio_raw snd_page_alloc container battery video button ac output ext3 jbd mbcache sg sr_mod cdrom sd_mod crc_t10dif ide_pci_generic ide_core ata_generic ata_piix sdhci_pci sdhci uhci_hcd libata mmc_core led_class firewire_ohci firewire_core crc_itu_t scsi_mod ehci_hcd r8169 usbcore mii intel_agp agpgart thermal processor fan thermal_sys dm_mirror dm_region_hash dm_log dm_mod
[ 1388.000373] Pid: 0, comm: swapper Not tainted 2.6.29-2-686 #1
[ 1388.000378] Call Trace:
[ 1388.000389]  [<c0125e98>] warn_slowpath+0x80/0xb6
[ 1388.000398]  [<c011fb5e>] update_rq_clock+0xe/0x1c
[ 1388.000407]  [<c0120e8c>] try_to_wake_up+0x14e/0x157
[ 1388.000415]  [<c011a44c>] place_entity+0x6c/0x9b
[ 1388.000423]  [<c011ca00>] put_prev_task_fair+0x77/0xd2
[ 1388.000431]  [<c011cf40>] enqueue_task_fair+0x19/0x51
[ 1388.000438]  [<c011a849>] enqueue_task+0x52/0x5d
[ 1388.000445]  [<c011a94d>] activate_task+0x1c/0x21
[ 1388.000453]  [<c0120e8c>] try_to_wake_up+0x14e/0x157
[ 1388.000462]  [<c028e731>] dev_watchdog+0xa8/0x13b
[ 1388.000471]  [<c01162e2>] default_spin_lock_flags+0x5/0x7
[ 1388.000480]  [<c02e7359>] _spin_lock_irqsave+0x25/0x2b
[ 1388.000488]  [<c012da35>] lock_timer_base+0x19/0x35
[ 1388.000495]  [<c012dbcb>] __mod_timer+0x96/0x9f
[ 1388.000504]  [<c012d574>] run_timer_softirq+0x14a/0x1b4
[ 1388.000512]  [<c028e689>] dev_watchdog+0x0/0x13b
[ 1388.000521]  [<c012a368>] __do_softirq+0x8c/0x115
[ 1388.000528]  [<c012a436>] do_softirq+0x45/0x53
[ 1388.000536]  [<c012a55a>] irq_exit+0x35/0x62
[ 1388.000544]  [<c01051a5>] do_IRQ+0x64/0x77
[ 1388.000552]  [<c0103a87>] common_interrupt+0x27/0x2c
[ 1388.000589]  [<f80b7081>] acpi_idle_enter_bm+0x279/0x2c9 [processor]
[ 1388.000601]  [<c026ebf1>] cpuidle_idle_call+0x5d/0x90
[ 1388.000609]  [<c010260a>] cpu_idle+0x5e/0x78
[ 1388.000614] ---[ end trace 8c9b2278a50f5bb7 ]---

Most of time network device works again by using 
sudo rmmod r8169 mii ; sudo modprobe r8169 ; sudo modprobe mii

I've tested kernel options 

noacpi
pci=nomsi

and the only difference is that it takes more or less time to appear the problem.

Thanks in advance.
Regards, 

Victor
Comment 1 Victor Pablos Ceruelo 2010-01-04 09:02:17 UTC
Created attachment 24424 [details]
After "transmit time out" network card hangs. I just started reading emails ...
Comment 2 Victor Pablos Ceruelo 2010-01-09 09:49:04 UTC
Created attachment 24490 [details]
file /var/log/kern.log part 1/3

file /var/log/kern.log part 1/3

This time bug does not let me to connect even using wireless.
As dmesg has no information about it, I decided to attach file /var/log/kern.log, 
which has not only info about this time.

It is splitted in 3 attachments.

Hope it is useful.
Thanks.

Regards, 
Victor.
Comment 3 Victor Pablos Ceruelo 2010-01-09 09:49:52 UTC
Created attachment 24491 [details]
file /var/log/kern.log part 2/3

file /var/log/kern.log part 2/3

This time bug does not let me to connect even using wireless.
As dmesg has no information about it, I decided to attach file /var/log/kern.log, 
which has not only info about this time.

It is splitted in 3 attachments.

Hope it is useful.
Thanks.

Regards, 
Victor.
Comment 4 Victor Pablos Ceruelo 2010-01-09 09:50:51 UTC
Created attachment 24492 [details]
file /var/log/kern.log part 3/3

file /var/log/kern.log part 3/3

This time bug does not let me to connect even using wireless.
As dmesg has no information about it, I decided to attach file /var/log/kern.log, 
which has not only info about this time.

It is splitted in 3 attachments.

Hope it is useful.
Thanks.

Regards, 
Victor.
Comment 5 Anthony Name 2010-01-12 02:24:29 UTC
I'm seeing the same problem in a more extreme way..

If I have the nvidia blob loaded doing any heavy network loads from within gnome immediately hard resets the machine (no kernel crash log, nada....the box literally hard crashes and resets) if under 'normal' usage the box will either hard lock or reset within 30 minutes.

If I open a terminal session within gnome and with the nvidia blob loaded I can copy over the network to my hearts content even at gigabit speeds.

If I remove the nvidia blob and use either nouveau or vesa driver I can copy over the network to my hearts content from either a terminal session or from within gnome.

I have removed the nvidia blob and am working without the enhanced graphics else I have in effect a non usable box.  Below is the dmesg for the realtek if you need me to load the nvidia driver and dmesg I can do that but it'll have to wait a few days.

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
r8169 0000:06:00.0: setting latency timer to 64
  alloc irq_desc for 31 on node 0
  alloc kstat_irqs on node 0
r8169 0000:06:00.0: irq 31 for MSI/MSI-X
eth0: RTL8168c/8111c at 0xffffc90000c76000, xx:xx:xx:xx:xx:xx, XID 3c4000c0 IRQ 31
Comment 6 Victor Pablos Ceruelo 2010-01-13 10:58:13 UTC
Created attachment 24539 [details]
Another kernel log. This time I didn't hibernate my laptop ...

I think that it takes more time when hpet increases min_delta_ns to 22000 nsec.
This time it only increased to 15000 nsec and maybe by increasing it to 22000 from the beginning bug may not occur so frequently.

Thanks, 
Victor.

[  635.001094] CE: hpet increasing min_delta_ns to 15000 nsec
Comment 7 Roman Mamedov 2010-01-17 09:00:54 UTC
There is a similar issue with transmit timeouts on that chip, but especially when using jumbo frames (MTU>1500): http://bugzilla.kernel.org/show_bug.cgi?id=9882
For me, these timeouts don't seem to occur with MTU 1500, only with higher values.
But still, perhaps these two issues are related.
Comment 8 Victor Pablos Ceruelo 2010-01-17 11:32:15 UTC
Maybe, but logs at 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528362

show that I've been using mtu=1500.

Thanks, 

Victor

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 00:16:d4:2d:bd:0b brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:18:de:a8:42:a5 brd ff:ff:ff:ff:ff:ff
    inet 138.100.214.93/20 brd 138.100.223.255 scope global wlan0
    inet6 fe80::218:deff:fea8:42a5/64 scope link 
       valid_lft forever preferred_lft forever
Comment 9 Roman Mamedov 2010-01-17 11:57:43 UTC
Realtek offers their own version of the network driver (called "r8168") at www.realtek.com.tw, maybe try it and see if it solves the issue for you? Of course this is not a good permanent solution, but would be valuable to know that the bug is in the kernel's driver, or is present in Realtek's too.
Comment 10 Victor Pablos Ceruelo 2010-01-18 10:59:11 UTC
r8168 from realtek
(http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=5&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false#RTL8111B/RTL8168B/RTL8111/RTL8168<br>RTL8111C/RTL8111CP/RTL8111D(L)<br>RTL8168C/RTL8111DP/RTL8111E<br>RTL8105E)

has some errors in src/Makefile
I changed this file to compile and it is maybe interesting for others.
File I used is this one:

################################################################################
# 
# r8168 is the Linux device driver released for RealTek RTL8168B/8111B, 
# RTL8168C/8111C, RTL8168CP/8111CP, RTL8168D/8111D, and RTL8168DP/8111DP, and
# RTL8168E/8111E Gigabit Ethernet controllers with PCI-Express interface.
# 
# Copyright(c) 2009 Realtek Semiconductor Corp. All rights reserved.
# 
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 2 of the License, or (at your option)
# any later version.
# 
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
# more details.
# 
# You should have received a copy of the GNU General Public License along with
# this program; if not, see <http://www.gnu.org/licenses/>.
# 
# Author:
# Realtek NIC software team <nicfae@realtek.com>
# No. 2, Innovation Road II, Hsinchu Science Park, Hsinchu 300, Taiwan
# 
################################################################################

################################################################################
# This product is covered by one or more of the following patents:
# US5,307,459, US5,434,872, US5,732,094, US6,570,884, US6,115,776, and US6,327,625.
################################################################################

PWD		:= $(shell pwd)
KVER		:= $(shell uname -r)
KDIR		:= /lib/modules/$(KVER)/build
KMISC		:= /lib/modules/$(KVER)/kernel/drivers/net/
KEXT		:= $(shell echo $(KVER) | sed -ne 's/^2\.[567]\..*/k/p')o
KFLAG		:= 2$(shell echo $(KVER) | sed -ne 's/^2\.[4]\..*/4/p')x

EXTRA_CFLAGS += -DCONFIG_R8168_NAPI
#EXTRA_CFLAGS += -DCONFIG_R8168_VLAN

modules:
ifeq ($(KFLAG),24x)
	$(MAKE) -f Makefile_linux24x
else
	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
	strip --strip-debug r8168.$(KEXT)
endif

clean:
	rm -rf *.o *.ko *~ core* .dep* .*.d .*.cmd *.mod.c *.a *.s .*.flags .tmp_versions Module.symvers Modules.symvers Module.markers *.order
	echo "PWD is $(PWD)"


install:
	install -m 744 -c r8168.$(KEXT) $(KMISC)

ifneq ($(KFLAG),24x)
r8168-objs :=  r8168_n.o
r8168-objs +=  r8168_asf.o
r8168-objs +=  rtl_eeprom.o
obj-m += r8168.o
endif#($(KFLAG),24x)
Comment 11 Victor Pablos Ceruelo 2010-01-18 11:08:19 UTC
Created attachment 24615 [details]
New error messages.

I was trying to use r8168 from realtek instead from r8169 and I saw a different message in dmesg.
Hope it is useful.

Regards, 

Victor.
Comment 12 Victor Pablos Ceruelo 2010-01-18 11:31:37 UTC
r8168 does not work for me ...

After building module, I had to blacklist r8169 and update initrd by using
 sudo update-initramfs -u

Next I reboot and saw that r8168 was not loaded automatically, so
I loaded manually the module.

Results: no interface, no logs and module loaded.

vpablos@exodo4:~$ sudo modprobe r8168
vpablos@exodo4:~$ ifconfig 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1086 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1086 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:106967 (104.4 KiB)  TX bytes:106967 (104.4 KiB)

wlan0     Link encap:Ethernet  HWaddr 00:18:de:a8:42:a5  
          inet addr:138.100.214.93  Bcast:138.100.223.255  Mask:255.255.240.0
          inet6 addr: fe80::218:deff:fea8:42a5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5549 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3657 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2702734 (2.5 MiB)  TX bytes:656086 (640.7 KiB)

vpablos@exodo4:~$ lsmod | grep r81
r8168                  51047  0 
vpablos@exodo4:~$ dmesg | tail -n 5
[   99.043546] wlan0: RX AssocResp from 00:18:6e:27:ce:84 (capab=0x411 status=0 aid=1)
[   99.043550] wlan0: associated
[   99.045338] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[  101.346440] padlock: VIA PadLock not detected.
[  109.500120] wlan0: no IPv6 routers present
vpablos@exodo4:~$ 

Maybe it is not compatible with 2.6.32 or I am doing something wrong ...
Comment 13 Adam Williamson 2010-03-10 21:09:46 UTC
We have a lot of Fedora 12 and 13 users hitting this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=538920

it would be great if someone could work on it. If there's any specific info required please let me know and I'll pass it on to the affected Fedora users...
Comment 14 vadim 2010-05-28 15:17:02 UTC
(In reply to comment #13)
I would like to upgrade to FC13(from 10) when this problem will solved.

Problem seems to be fixed in 2.6.34-rc7.

Backporting module changes to fedora 13 kernel would not help. Kernel error disapears, but something strange with msi support cause RX overflows occurs too fast. network works for 5 seconds then freeze for 10 seconds..

There is no problems with 2.6.34-rc7, may be earlier, but module fixes come in 2.6.34-rc4
Comment 15 Roman Mamedov 2010-05-28 19:56:39 UTC
> There is no problems with 2.6.34-rc7

Did you also try with jumbo frames (e.g. MTU=4000 or 7200)?
Comment 16 vadim 2010-05-28 21:49:14 UTC
(In reply to comment #15)
> > There is no problems with 2.6.34-rc7
> 
> Did you also try with jumbo frames (e.g. MTU=4000 or 7200)?

I'll check it out on Monday (31.05)
Comment 17 vadim 2010-05-31 08:56:26 UTC
(In reply to comment #15)
> > There is no problems with 2.6.34-rc7
> 
> Did you also try with jumbo frames (e.g. MTU=4000 or 7200)?

Yes, it works. I tried different payload sizes, with and without msi support. All OK.
I tried not very great big traffic, but I checked that kernel bug dissapeared since module fixes. This module causes reconnection after five seconds of traffic with earlier kernel. Fortunately 2.6.34-rc7 module keeps work on heavy traffic. Reset handler will proceed correct, even if reconnection is possible on very heavy traffic.
Comment 18 Roman Mamedov 2010-06-01 13:29:01 UTC
Nope, not fixed for me in the final 2.6.34.

Just set my MTU to 7154 and launched:
# ping <other ip on my gigabit LAN> -s 7100 -M do -f -l 200

The card hung up shortly after:

[  297.792010] ------------[ cut here ]------------
[  297.792022] WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x227/0x230()
[  297.792026] Hardware name: GA-MA790FX-DQ6
[  297.792030] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  297.792033] Modules linked in: nls_utf8 cifs radeon ttm drm_kms_helper drm i2c_algo_bit sco bridge stp bnep rfcomm l2cap crc16 bluetooth rfkill vboxnetadp vboxnetflt cpufreq_stats cpufreq_powersave cpufreq_userspace cpufreq_conservative kvm_amd kvm fuse ip_tables x_tables vboxdrv powernow_k8 k8temp it87 hwmon_vid loop dm_crypt saa7134_alsa tuner_simple tuner_types tda9887 snd_ice1724 snd_ice17xx_ak4xxx tda8290 snd_ac97_codec tuner ac97_bus snd_ak4xxx_adda snd_ak4114 snd_pt2258 snd_i2c snd_ak4113 snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq sg saa7134 sd_mod crc_t10dif sr_mod snd_timer ir_common snd_seq_device v4l2_common cdrom videodev ata_generic snd ahci v4l1_compat v4l2_compat_ioctl32 pata_atiixp libata edac_core wmi psmouse videobuf_dma_sg tpm_tis videobuf_core tpm scsi_mod tpm_bios soundcore serio_raw ir_core i2c_piix4 tveeprom snd_page_alloc evdev processor k10temp i2c_core edac_mce_amd button usbhid hid nfs lockd fscache nfs_acl auth_rpcgss sunrpc ohci_hcd ehci_hcd usbcore nls_base thermal fan thermal_sys dm_mirror dm_region_hash dm_log dm_mod r8169 mii
[  297.792146] Pid: 0, comm: swapper Not tainted 2.6.34-rm1-amd-slab-nomcsmt #1
[  297.792150] Call Trace:
[  297.792153]  <IRQ>  [<ffffffff810547d3>] ? warn_slowpath_common+0x73/0xb0
[  297.792165]  [<ffffffff81054870>] ? warn_slowpath_fmt+0x40/0x50
[  297.792171]  [<ffffffff811e1ca1>] ? strlcpy+0x41/0x50
[  297.792176]  [<ffffffff812e87a7>] ? dev_watchdog+0x227/0x230
[  297.792182]  [<ffffffff8106e5e0>] ? delayed_work_timer_fn+0x0/0x40
[  297.792187]  [<ffffffff81042fac>] ? __wake_up+0x3c/0x60
[  297.792193]  [<ffffffff81063ba5>] ? run_timer_softirq+0x185/0x320
[  297.792198]  [<ffffffff8107c97b>] ? ktime_get+0x5b/0xe0
[  297.792203]  [<ffffffff8105b32f>] ? __do_softirq+0xaf/0x1d0
[  297.792209]  [<ffffffff8100ae1c>] ? call_softirq+0x1c/0x30
[  297.792214]  [<ffffffff8100d245>] ? do_softirq+0x65/0xa0
[  297.792219]  [<ffffffff8105b215>] ? irq_exit+0x85/0x90
[  297.792224]  [<ffffffff810252da>] ? smp_apic_timer_interrupt+0x6a/0xa0
[  297.792229]  [<ffffffff8100a8d3>] ? apic_timer_interrupt+0x13/0x20
[  297.792232]  <EOI>  [<ffffffff8102ee32>] ? native_safe_halt+0x2/0x10
[  297.792243]  [<ffffffff810136ee>] ? default_idle+0x2e/0x80
[  297.792248]  [<ffffffff81013796>] ? c1e_idle+0x56/0x110
[  297.792253]  [<ffffffff81008d8a>] ? cpu_idle+0xaa/0x100
[  297.792259]  [<ffffffff8139df94>] ? start_secondary+0x1ee/0x1f2
[  297.792263] ---[ end trace e627a1c632239584 ]---
[  297.808068] r8169 0000:02:00.0: eth0: link up
Comment 19 Carlos Fonseca 2010-08-09 18:08:03 UTC
I have been experiencing the same problem for a long time, too (more details at http://bugs.debian.org/526983). The laptop is a Satellite A110 (as with the original poster).

The card information is:

[    0.833183] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    0.833219] r8169 0000:05:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    0.833278] r8169 0000:05:00.0: setting latency timer to 64
[    0.833357]   alloc irq_desc for 43 on node -1
[    0.833360]   alloc kstat_irqs on node -1
[    0.833381] r8169 0000:05:00.0: irq 43 for MSI/MSI-X
[    0.833975] r8169 0000:05:00.0: eth0: RTL8101e at 0xffffc9000065e000, 00:16:d4:8a:aa:ed, XID 14000000 IRQ 43
[    0.920725]   alloc irq_desc for 20 on node -1
[    0.920729]   alloc kstat_irqs on node -1

The actual trace, using debian kernel 2.6.35-1~experimental.1:

[  170.832114] ------------[ cut here ]------------
[  170.832129] WARNING: at /build/mattems-linux-2.6_2.6.35-1~experimental.1-amd64-OCge0v/linux-2.6-2.6.35/debian/build/source_amd64_none/net/sched/sch_generic.c:258 dev_watchdog+0xef/0x18c()
[  170.832136] Hardware name: Satellite A110
[  170.832141] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  170.832144] Modules linked in: sco bridge stp bnep rfcomm l2cap crc16 bluetooth cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 uinput fuse xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_DSCP xt_TCPMSS ipt_LOG ipt_REJECT iptable_mangle iptable_filter xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp nf_conntrack ip_tables x_tables coretemp acpi_cpufreq mperf firewire_sbp2 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss arc4 snd_pcm ecb snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq iwl3945 snd_timer snd_seq_device iwlcore i915 snd drm_kms_helper pcmcia mac80211 drm joydev soundcore cfg80211 snd_page_alloc tpm_tis yenta_socket i2c_i801 i2c_algo_bit pcmcia_rsrc tpm serio_raw rfkill tpm_bios rng_core pcmcia_core video i2c_core psmouse container battery output ac processor evdev button ext3 jbd mbcache dm_mod usbhid hid sg sr_mod sd_mod cdrom crc_t10dif ata_generic uhci_hcd ata_piix libata ehci_hcd sdhci_pci sdhci scsi_mod firewire_ohci usbcore mmc_core led_class firewire_core crc_itu_t r8169 thermal mii nls_base thermal_sys [last unloaded: scsi_wait_scan]
[  170.832314] Pid: 0, comm: swapper Not tainted 2.6.35-trunk-amd64 #1
[  170.832318] Call Trace:
[  170.832322]  <IRQ>  [<ffffffff81044307>] ? warn_slowpath_common+0x78/0x8c
[  170.832339]  [<ffffffff810443ba>] ? warn_slowpath_fmt+0x45/0x4a
[  170.832346]  [<ffffffff8126492e>] ? netif_tx_lock+0x3d/0x65
[  170.832353]  [<ffffffff81264a45>] ? dev_watchdog+0xef/0x18c
[  170.832360]  [<ffffffff8101e743>] ? lapic_next_event+0x18/0x1d
[  170.832368]  [<ffffffff8105e173>] ? hrtimer_interrupt+0x112/0x1bc
[  170.832375]  [<ffffffff810502b4>] ? run_timer_softirq+0x1cd/0x299
[  170.832382]  [<ffffffff81264956>] ? dev_watchdog+0x0/0x18c
[  170.832389]  [<ffffffff81049b26>] ? __do_softirq+0xe4/0x1aa
[  170.832397]  [<ffffffff8108b7df>] ? handle_IRQ_event+0x4c/0x104
[  170.832406]  [<ffffffff810098dc>] ? call_softirq+0x1c/0x30
[  170.832412]  [<ffffffff8100aeb3>] ? do_softirq+0x3f/0x79
[  170.832417]  [<ffffffff810499a6>] ? irq_exit+0x36/0x7a
[  170.832423]  [<ffffffff8100a615>] ? do_IRQ+0xa3/0xb9
[  170.832431]  [<ffffffff81300bd3>] ? ret_from_intr+0x0/0x11
[  170.832435]  <EOI>  [<ffffffffa03a74b7>] ? acpi_idle_enter_bm+0x264/0x29c [processor]
[  170.832461]  [<ffffffffa03a74b0>] ? acpi_idle_enter_bm+0x25d/0x29c [processor]
[  170.832469]  [<ffffffff81238286>] ? cpuidle_idle_call+0x8f/0xed
[  170.832476]  [<ffffffff81007b5d>] ? cpu_idle+0xa3/0xdd
[  170.832484]  [<ffffffff816c5d92>] ? start_kernel+0x3ef/0x3fa
[  170.832491]  [<ffffffff816c53ba>] ? x86_64_start_kernel+0xf9/0x106
[  170.832496] ---[ end trace e4f0b2b50888042c ]---
[  170.849242] r8169 0000:05:00.0: eth0: link up

In this case, the network came back immediately. Sometimes, it doesn't, and the solution is to suspend to RAM and to resume again. Some other times, the computer simply locks up (apparently triggered by using the network, but I never got a trace when that happened).

I hope that this information is useful. Thanks in advance.

Regards,

Carlos
Comment 20 Mace Moneta 2010-08-09 19:11:55 UTC
Running with kernel option pcie_aspm=off on multiple machines has bypassed the problem for me.  Even with 200GB data transfers and jumbo packets, no glitches.
Comment 21 Carlos Fonseca 2010-08-11 14:56:55 UTC
Adding option pcie_aspm=off does not help in this case, as the problem still happens. In any case, dmesg says:

pci 0000:05:00.0: disabling ASPM on pre-1.1 PCIe device.  You can enable it with 'pcie_aspm=force'

whether this option is given or not. Also, this is an RTL8101e adapter. No gigabit...
Comment 22 Vadim Plessky 2011-02-13 18:07:08 UTC
Hello everyone,

I have couple of laptops with Realtek, running OpenSUSE and Fedora.

Any help needed with testing?
Comment 23 Jonathan Nieder 2012-02-20 19:43:56 UTC
Did the fixes in v3.2 change anything?
Comment 24 OmegaPhil 2012-06-03 12:57:25 UTC
I use Debian Wheezy (Linux 3.2.0-2-amd64 #1 SMP Mon May 21 17:45:41 UTC 2012 x86_64 GNU/Linux) with the RTL8111/8168B PCI Express Gigabit Ethernet controller built in to the Gigabyte GA-990FXA-UDF5 motherboard.

The problem happens soon after machine boot and use of the XFCE4 desktop environment for me, under both high and low net load:

=============================================================

[ 2049.744023] WARNING: at /build/buildd-linux-2.6_3.2.15-1-amd64-EOdTQR/linux-2.6-3.2.15/debian/build/source_amd64_none/net/sched/sch_generic.c:255 dev_watchdog+0xe9/0x148()
[ 2049.744025] Hardware name: GA-990FXA-UD5
[ 2049.744026] NETDEV WATCHDOG: eth2 (r8169): transmit queue 0 timed out
[ 2049.744027] Modules linked in: sg pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) mperf cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_stats parport_pc ppdev lp parport rfcomm bnep bluetooth rfkill oss_usb(O) oss_hdaudio(O) osscore(O) binfmt_misc fuse sr_mod cdrom dm_crypt w83793 hwmon_vid loop radeon ttm sp5100_tco drm_kms_helper drm power_supply fam15h_power edac_mce_amd i2c_piix4 i2c_algo_bit k10temp button processor edac_core evdev i2c_core pcspkr mxm_wmi wmi thermal_sys ext4 crc16 jbd2 mbcache dm_mod raid1 md_mod usbhid hid sd_mod crc_t10dif ata_generic mpt2sas raid_class scsi_transport_sas ohci_hcd firewire_ohci firewire_core crc_itu_t ahci libahci libata xhci_hcd r8169 mii ehci_hcd usbcore scsi_mod usb_common [last unloaded: scsi_wait_scan]
[ 2049.744067] Pid: 0, comm: swapper/7 Tainted: G           O 3.2.0-2-amd64 #1
[ 2049.744069] Call Trace:
[ 2049.744070]  <IRQ>  [<ffffffff81046811>] ? warn_slowpath_common+0x78/0x8c
[ 2049.744076]  [<ffffffff810468bd>] ? warn_slowpath_fmt+0x45/0x4a
[ 2049.744078]  [<ffffffff812a1e75>] ? netif_tx_lock+0x40/0x72
[ 2049.744081]  [<ffffffff812a1fd6>] ? dev_watchdog+0xe9/0x148
[ 2049.744083]  [<ffffffff81051ebc>] ? run_timer_softirq+0x19a/0x261
[ 2049.744085]  [<ffffffff812a1eed>] ? netif_tx_unlock+0x46/0x46
[ 2049.744088]  [<ffffffff810659ff>] ? timekeeping_get_ns+0xd/0x2a
[ 2049.744091]  [<ffffffff8104be30>] ? __do_softirq+0xb9/0x177
[ 2049.744093]  [<ffffffff813504ac>] ? call_softirq+0x1c/0x30
[ 2049.744096]  [<ffffffff8100f8e5>] ? do_softirq+0x3c/0x7b
[ 2049.744098]  [<ffffffff8104c098>] ? irq_exit+0x3c/0x9a
[ 2049.744100]  [<ffffffff81023fe8>] ? smp_apic_timer_interrupt+0x74/0x82
[ 2049.744103]  [<ffffffff8134ed1e>] ? apic_timer_interrupt+0x6e/0x80
[ 2049.744104]  <EOI>  [<ffffffff81023cb0>] ? lapic_next_event+0xe/0x13
[ 2049.744107]  [<ffffffff8102b2c4>] ? native_safe_halt+0x2/0x3
[ 2049.744112]  [<ffffffffa0243c47>] ? acpi_safe_halt+0x21/0x39 [processor]
[ 2049.744116]  [<ffffffffa02440b3>] ? acpi_idle_enter_c1+0x57/0xb3 [processor]
[ 2049.744121]  [<ffffffff8126b8ab>] ? cpuidle_idle_call+0xec/0x179
[ 2049.744124]  [<ffffffff8100d248>] ? cpu_idle+0xa5/0xf2
[ 2049.744127]  [<ffffffff8133b77f>] ? start_secondary+0x1d5/0x1db
[ 2049.744128] ---[ end trace 685ef6db5d21a5fd ]---
[ 2049.764691] r8169 0000:06:00.0: eth2: link up

=============================================================

The last message then repeats. 'pcie_aspm=off' has no effect. As others have said, this permanently destroys network connectivity soon after boot.

I have since downloaded the official realtek driver 'LINUX driver for kernel 3.x and 2.6.x and 2.4.x' off http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false and the problem no longer occurs.
Comment 25 OmegaPhil 2012-06-03 13:29:41 UTC
For other users of my board, it seems that disabling the IOMMU in the BIOS is required for the official realtek driver to work (this made no difference to the problem with the default driver).
Comment 26 Francois Romieu 2012-06-03 22:36:10 UTC
Created attachment 73504 [details]
8168evl hack for the 990FXA based motherboards
Comment 27 Francois Romieu 2012-06-03 22:39:02 UTC
(In reply to comment #25)
> For other users of my board, it seems that disabling the IOMMU in the BIOS is
> required for the official realtek driver to work (this made no difference to
> the problem with the default driver).

You should be fine with the patch above and any recent kernel.

-- 
Ueimor
Comment 28 OmegaPhil 2012-06-04 15:21:15 UTC
Thanks - I will look into patching the default driver with this and disable the official realtek driver for testing.
Comment 29 OmegaPhil 2012-06-07 13:15:07 UTC
Thanks Francois - I have now been running this for a day and there have been no cut outs of network activity at all :)
Comment 30 Jonathan Nieder 2012-06-19 17:32:03 UTC
The patch mentioned in comment#26 does not seem to be in mainline.  Did
some other change in mainline fix this?
Comment 31 Mario Bachmann 2014-02-07 18:43:29 UTC
Is this problem really fixed? 
Linux server 3.12.6 #2 SMP Fri Dec 20 20:25:49 CET 2013 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ AuthenticAMD GNU/Linux

Feb  7 02:36:36 server kernel: ------------[ cut here ]------------
Feb  7 02:36:36 server kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x227/0x230()
Feb  7 02:36:36 server kernel: NETDEV WATCHDOG: enp1s10 (r8169): transmit queue 0 timed out
Feb  7 02:36:36 server kernel: Modules linked in: nf_conntrack_irc nf_conntrack_ftp nf_conntrack_tftp xt_owner ipt_REJECT xt_tcpudp xt_conntrack xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter ip_tables x_tables ipv6 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd video backlight wmi
Feb  7 02:36:36 server kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G       A     3.12.6 #2
Feb  7 02:36:36 server kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./K10N78FullHD-hSLI..  , BIOS P2.10 09/22/2009
Feb  7 02:36:36 server kernel: 0000000000000009 ffffffff8159691a ffff8801dfd03e28 ffffffff81091171
Feb  7 02:36:36 server kernel: 0000000000000000 ffff8801dfd03e78 0000000000000001 0000000000000001
Feb  7 02:36:36 server kernel: 00000000000000f0 ffffffff810911e7 ffffffff81727940 0000000000000030
Feb  7 02:36:36 server kernel: Call Trace:
Feb  7 02:36:36 server kernel: <IRQ>  [<ffffffff8159691a>] ? dump_stack+0x41/0x51
Feb  7 02:36:36 server kernel: [<ffffffff81091171>] ? warn_slowpath_common+0x81/0xb0
Feb  7 02:36:36 server kernel: [<ffffffff810911e7>] ? warn_slowpath_fmt+0x47/0x50
Feb  7 02:36:36 server kernel: [<ffffffff814f9047>] ? dev_watchdog+0x227/0x230
Feb  7 02:36:36 server kernel: [<ffffffff814f8e20>] ? dev_graft_qdisc+0x90/0x90
Feb  7 02:36:36 server kernel: [<ffffffff8109b24a>] ? call_timer_fn.isra.34+0x2a/0x90
Feb  7 02:36:36 server kernel: [<ffffffff8109b40a>] ? run_timer_softirq+0x15a/0x1f0
Feb  7 02:36:36 server kernel: [<ffffffff810953de>] ? __do_softirq+0xce/0x190
Feb  7 02:36:36 server kernel: [<ffffffff8159cdcc>] ? call_softirq+0x1c/0x30
Feb  7 02:36:36 server kernel: [<ffffffff81037da5>] ? do_softirq+0x35/0x70
Feb  7 02:36:36 server kernel: [<ffffffff81095585>] ? irq_exit+0x45/0x50
Feb  7 02:36:36 server kernel: [<ffffffff8105de7b>] ? smp_apic_timer_interrupt+0x3b/0x50
Feb  7 02:36:36 server kernel: [<ffffffff8159c7ca>] ? apic_timer_interrupt+0x6a/0x70
Feb  7 02:36:36 server kernel: <EOI>  [<ffffffff8103e5a8>] ? amd_e400_idle+0x68/0xe0
Feb  7 02:36:36 server kernel: [<ffffffff810c5509>] ? cpu_startup_entry+0xd9/0x130
Feb  7 02:36:36 server kernel: ---[ end trace cf05e20aa4169875 ]---
Feb  7 02:36:36 server kernel: r8169 0000:01:0a.0 enp1s10: link up
Comment 32 OmegaPhil 2014-02-07 18:53:08 UTC
I havent had any further incidences anyway.
Comment 33 Mario Bachmann 2014-02-07 19:30:49 UTC
Ih had this problem last night when i had a lot on this network device. I ran in this bug every time (over the last years) when I have lot of traffic. I use to compile a new kernel every week... Now, I found this Bug report the first time. 

01:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10)
	Subsystem: Realtek Semiconductor Co., Ltd. RTL8169/8110 Family PCI Gigabit Ethernet NIC
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
	I/O ports at d800 [size=256]
	Memory at fcfffc00 (32-bit, non-prefetchable) [size=256]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [dc] Power Management version 2
	Kernel driver in use: r8169

enp1s10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

Settings for enp1s10:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Half 1000baseT/Full 
	Link partner advertised pause frame use: Symmetric Receive-only
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: g
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: yes
Comment 34 Mario Bachmann 2014-02-07 19:32:27 UTC
Some mistakes, sorry: 
I had this problem last night when i had a lot of traffic on this network device.

00:00.0 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a2)
00:01.0 ISA bridge: NVIDIA Corporation MCP78S [GeForce 8200] LPC Bridge (rev a2)
00:01.1 SMBus: NVIDIA Corporation MCP78S [GeForce 8200] SMBus (rev a1)
00:01.2 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a1)
00:01.3 Co-processor: NVIDIA Corporation MCP78S [GeForce 8200] Co-Processor (rev a2)
00:01.4 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a1)
00:02.0 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1)
00:02.1 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1)
00:04.0 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1)
00:04.1 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1)
00:07.0 Audio device: NVIDIA Corporation MCP72XE/MCP72P/MCP78U/MCP78S High Definition Audio (rev a1)
00:08.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
00:09.0 SATA controller: NVIDIA Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2)
00:0b.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
00:10.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
00:12.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
00:13.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10)
02:00.0 VGA compatible controller: NVIDIA Corporation C77 [GeForce 8200] (rev a2)
Comment 35 Mario Bachmann 2014-02-11 20:21:32 UTC
Perhaps this patch helps. It seems there is a lack of one case in the function 
static void rtl_init_rxcfg(struct rtl8169_private *tp)

Since I added "case RTL_GIGA_MAC_VER_36", there was no hang of the network device any more. 

As I said, the error occurs, when I have *much* traffic on the Realtek-RTL8169-NIC. To produce a lot of traffic, I use some NFS-Streams (nfs4) in different direction from and to this server with Realtek-NIC in the local area network. 

# cat /usr/src/realtek_r8169.patch 
--- drivers/net/ethernet/realtek/r8169.orig	2014-02-08 14:17:43.258088394 +0100
+++ drivers/net/ethernet/realtek/r8169.c	2014-02-08 14:19:01.858031701 +0100
@@ -4237,6 +4237,7 @@
 	case RTL_GIGA_MAC_VER_24:
 	case RTL_GIGA_MAC_VER_34:
 	case RTL_GIGA_MAC_VER_35:
+	case RTL_GIGA_MAC_VER_36:
 		RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
 		break;
 	case RTL_GIGA_MAC_VER_40:

Perhaps somebody can test or comment this patch.
Comment 36 Francois Romieu 2014-02-13 16:44:50 UTC
Created attachment 125961 [details]
write reordering and netdev watchdog debug information
Comment 37 Francois Romieu 2014-02-13 16:50:48 UTC
(In reply to Mario Bachmann from comment #35)
> Perhaps this patch helps. It seems there is a lack of one case in the
> function 
> static void rtl_init_rxcfg(struct rtl8169_private *tp)

See commit 3ced8c955e74d319f3e3997f7169c79d524dfd06 and
http://marc.info/?l=linux-netdev&m=137859576524585&w=4
for some history information.

Your motherboard is not the usual 8168evl AMD IOMMU supporting one though.

Could you give https://bugzilla.kernel.org/attachment.cgi?id=125961 a try and
send the debug log ?

(opening a different problem report would be welcome too)

Thanks.

-- 
Ueimor
Comment 38 Mario Bachmann 2014-02-13 19:09:31 UTC
I am not sure, what my motherboard should have to do with it. I use a PCI-Network-Card with Realtek-Chipset. I do _not_ use the Onboard-NIC (it is disabled in the BIOS). 

The patch with the debug-output adds only debug output and changes nothing. I think I will try my little patch until the next crash. 

Why opening "a different problem"? What is the difference between my problem and the initally problem in this thread? The description "r8169 hangs when your transmission speed is really high." seems to be exactly my problem. 
Am I wrong? Please correct me. 

Greetings
Mario

(In reply to Francois Romieu from comment #37)
> Your motherboard is not the usual 8168evl AMD IOMMU supporting one though.
> 
> Could you give https://bugzilla.kernel.org/attachment.cgi?id=125961 a try and
> send the debug log ?
> 
> (opening a different problem report would be welcome too)
> 
> Thanks.
> 
> -- 
> Ueimor
Comment 39 Francois Romieu 2014-02-13 21:13:06 UTC
(In reply to Mario Bachmann from comment #38)
> I am not sure, what my motherboard should have to do with it.

Notwithstanding the generous mess in the thread, it ended with a bug
exhibiting rather reproducible symptoms on a specific kind of motherboard
and network device.

Your setup shares none of those.

> I use a
> PCI-Network-Card with Realtek-Chipset. I do _not_ use the Onboard-NIC (it is
> disabled in the BIOS).

PCI or PCI express ?

> The patch with the debug-output adds only debug output and changes nothing.

Would you be kind enough to attach said debug output, up to the point
where the problem happens ? If so it would be nice to include the lines
related to the device detection as well.

> I think I will try my little patch until the next crash. 

If it hides the bug, it won't help figuring what the problem is. :o/

> Why opening "a different problem" ? What is the difference between my problem
> and the initally problem in this thread? The description "r8169 hangs when
> your transmission speed is really high." seems to be exactly my problem. 
> Am I wrong? Please correct me.

It would not had been practical to fill all high speed related r8169 hangs
under the same PR since 2003, especially as the number of devices grew ~10x
and the driver has been heavily modified in this time frame.

(please remove unused material and don't top post)

-- 
Ueimor
Comment 40 Mario Bachmann 2014-02-13 21:46:48 UTC
Excuse me. So I just posted in the wrong thread. 

Perhaps my "Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller (rev 10)" is a _differnet_ network device... 

The line "Kernel driver in use: r8169" is just a coincidence. 

Thank you and greetings.
Comment 41 Max Kotov 2015-02-12 21:52:49 UTC
The same bug?

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 01)

[  146.788047] ------------[ cut here ]------------
[  146.788062] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x23f/0x250()
[  146.788065] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[  146.788066] Modules linked in: dm_crypt snd_hda_codec_hdmi snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_ctxfi snd_hwdep snd_pcm_oss videobuf2_vmalloc snd_mixer_oss videobuf2_memops videobuf2_core snd_pcm v4l2_common snd_seq_dummy snd_seq_midi snd_seq_oss videodev joydev snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device nvidia(PO) snd_timer serio_raw rtc_cmos snd soundcore k10temp sp5100_tco edac_core edac_mce_amd drm i2c_piix4 shpchp mac_hid binfmt_misc xfs libcrc32c raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear raid10 raid1 uas usb_storage hid_generic usbhid hid psmouse pata_acpi atkbd pata_atiixp pata_jmicron ahci libahci r8169 mii wmi
[  146.788106] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P        W  O   3.19.0-150212 #1
[  146.788108] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790FX-DS5/GA-MA790FX-DS5, BIOS F8i 07/19/2010
[  146.788110]  ffffffff817adcad 86c0b30a1601187d ffffffff817adcad ffffffff815eabf2
[  146.788112]  ffff8801bfc83e00 ffffffff81048ac7 0000000000000000 ffff8801b58483a0
[  146.788114]  ffff8801b5848000 0000000000000002 0000000000000001 ffffffff81048b58
[  146.788117] Call Trace:
[  146.788119]  <IRQ>  [<ffffffff815eabf2>] ? dump_stack+0x47/0x67
[  146.788126]  [<ffffffff81048ac7>] ? warn_slowpath_common+0x77/0xb0
[  146.788129]  [<ffffffff81048b58>] ? warn_slowpath_fmt+0x58/0x80
[  146.788132]  [<ffffffff8106eca7>] ? vtime_gen_account_irq_exit+0x27/0x60
[  146.788135]  [<ffffffff8109b40d>] ? run_posix_cpu_timers+0x4d/0x5b0
[  146.788140]  [<ffffffff8151959f>] ? dev_watchdog+0x23f/0x250
[  146.788143]  [<ffffffff81519360>] ? dev_graft_qdisc+0x80/0x80
[  146.788145]  [<ffffffff81096445>] ? call_timer_fn.isra.32+0x15/0x80
[  146.788148]  [<ffffffff81519360>] ? dev_graft_qdisc+0x80/0x80
[  146.788150]  [<ffffffff81096700>] ? run_timer_softirq+0x1c0/0x250
[  146.788153]  [<ffffffff8104c0aa>] ? __do_softirq+0xfa/0x230
[  146.788155]  [<ffffffff8104c396>] ? irq_exit+0xe6/0x100
[  146.788158]  [<ffffffff81030729>] ? smp_apic_timer_interrupt+0x39/0x50
[  146.788161]  [<ffffffff815f196a>] ? apic_timer_interrupt+0x6a/0x70
[  146.788162]  <EOI>  [<ffffffff8100c630>] ? arch_remove_reservations+0x110/0x110
[  146.788167]  [<ffffffff8100c632>] ? default_idle+0x2/0x10
[  146.788171]  [<ffffffff81078dff>] ? cpu_startup_entry+0x20f/0x2f0
[  146.788174]  [<ffffffff810a38a5>] ? tick_check_new_device+0xd5/0x100
[  146.788176]  [<ffffffff8102e840>] ? start_secondary+0x1a0/0x1e0
[  146.788178] ---[ end trace 44a2ddf66d6ec379 ]---
Comment 42 Zs 2015-05-13 07:10:36 UTC
I have the same error with the recently upgraded Kubuntu 15.04, which prevents connecting to the DHCP server.
Comment 43 H.-Dirk Schmitt 2015-06-06 20:10:08 UTC
(In reply to Max Kotov from comment #41)
> The same bug?
> 3.19.0-150212 #1

--> see bug #99521

Note You need to log in before you can comment on or make changes to this bug.