Bug 9882

Summary: NETDEV WATCHDOG: transmit timed out if mtu > 2500.
Product: Drivers Reporter: Roman Timofeev (cryptoloop)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, dirk, rm+bko, stijn+bugs, v.plessky
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: No Bisected commit-id:
Attachments: my dmesg
dmesg w/o nvidia driver.
dmesg of 2.6.28-git4
dmesg of a kernel which exhibits the problem (2.6.32.1)

Description Roman Timofeev 2008-02-03 13:25:11 UTC
Distribution: any
Hardware Environment: gigabyte ga-p35-s3
Problem Description:
When i set mtu > 2500 in dmesg i see:

NETDEV WATCHDOG: eth1: transmit timed out
r8169: eth1: link up

Steps to reproduce:
I have 2 box: A(with RTL8111/8168B PCI Express Gigabit Ethernet controller) and B(with nvidia forcedeth).
on A: ifconfig eth1 mtu 6000
on B: ifconfig eth0 mtu 6000 ; iperf -s
on A: iperf -c _address_of_box_B_

In dmesg on A: NETDEV WATCHDOG: eth1: transmit timed out
When i use mtu<2500 all works fine.
Comment 1 Roman Timofeev 2008-02-03 13:26:27 UTC
On the same machine with freebsd 7 all works fine too.
Comment 2 Francois Romieu 2008-02-03 15:03:54 UTC
Roman, can you send a full dmesg ?

-- 
Ueimor
Comment 3 Roman Timofeev 2008-02-03 15:26:39 UTC
Created attachment 14694 [details]
my dmesg

dmesg added. something else?
Comment 4 Francois Romieu 2008-02-03 15:40:52 UTC
Can you check that the problem is still there when you do not
load the nvidia module and check if there is enough free memory
when the problem appears ?

-- 
Ueimor
Comment 5 Roman Timofeev 2008-02-03 15:52:22 UTC
Created attachment 14695 [details]
dmesg w/o nvidia driver.

oops, sorry. I'm remove nvidia driver and reproduce this problem.
Also, i'm define RTL8169_DEBUG.

localhost ~ # ethtool -S eth1
NIC statistics:
     tx_packets: 2034
     rx_packets: 1994
     tx_errors: 54
     rx_errors: 207
     rx_missed: 0
     align_errors: 747
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 1979
     broadcast: 15
     multicast: 15
     tx_aborted: 0
     tx_underrun: 54
Comment 6 Roman Timofeev 2008-04-15 07:49:07 UTC
Somebody alive?
I can provide ssh access to that machine.
Comment 7 Natalie Protasevich 2008-06-01 23:19:34 UTC
Is the problem still there with latest kernel?
Roman, maybe you can attach you /var/log/messages, since you've enabled driver debug. I'd say try setting debug verbosity to max (16), and set kernel verbosity to max (dmesg -n 7).
Comment 8 Roman Timofeev 2008-06-11 06:15:46 UTC
Bug found in latest kernels(2.6.26-rcXX too).
Also, i'm enabled debug in driver and build kernel with full debug.
In dmesg, after loading driver:
---------
[ 1193.780511] r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded
[ 1193.780612] ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 16 (level, low) -> IRQ 16
[ 1193.780633] PCI: Setting latency timer of device 0000:04:00.0 to 64
[ 1193.780642] r8169: mac_version = 0x0c
[ 1193.780783] kobject: 'eth0' (f492f3d4): kobject_add_internal: parent: 'net', set: 'devices'
[ 1193.780840] kobject: 'eth0' (f492f3d4): kobject_uevent_env
[ 1193.780843] kobject: 'eth0' (f492f3d4): fill_kobj_path: path = '/class/net/eth0'
[ 1193.780846] kobject: '0000:04:00.0' (f7d0b13c): fill_kobj_path: path = '/devices/pci0000:00/0000:00:1c.4/0000:04:00.0'
[ 1193.781246] eth0: RTL8168b/8111b at 0xf8b98000, 00:1a:4d:4b:dd:be, XID 38000000 IRQ 218
[ 1193.781247] r8169: mac_version = 0x0c
[ 1193.781248] r8169: Set MAC Reg C+CR Offset 0x82h = 0x01h
[ 1193.782348] kobject: 'drivers' (f4860e00): kobject_add_internal: parent: 'r8169', set: '<NULL>'
[ 1193.782362] kobject: 'r8169' (f5680a80): kobject_uevent_env
[ 1193.782366] kobject: 'r8169' (f5680a80): fill_kobj_path: path = '/bus/pci/drivers/r8169'
--------
Nothing interest. But i see some strange effects in this tests:
hostA(192.168.0.1) - box with r8169
hostB(192.168.0.5) - box with other gigabit nic.

1)
on hostB:
 iperf(server mode)
on hostA:
 ping 192.168.0.5 + iperf(client mode).
 After traffic freeze, ping says:
 ping: sendmsg: No buffer space available
 ping: sendmsg: No buffer space available
 ping: sendmsg: No buffer space available
 ping: sendmsg: No buffer space available

2)
on hostB:
 ping 192.168.0.1 + tcpdump + iperf(server mode)
on hostA:
 tcpdump + iperf(client mode)

After traffic freeze, tcpdump on hostA says:
22:37:55.393477 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 8, length 64
22:37:55.393514 IP 192.168.0.1 > 192.168.0.5: ICMP echo reply, id 3914, seq 8, length 64
22:37:56.393466 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 9, length 64
22:37:56.393494 IP 192.168.0.1 > 192.168.0.5: ICMP echo reply, id 3914, seq 9, length 64

But, on hostB:
22:37:58.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 11, length 64
22:37:59.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 12, length 64
22:38:00.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 13, length 64
22:38:01.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 14, length 64
22:38:02.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 15, length 64
22:38:03.426413 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 16, length 64
Comment 9 Francois Romieu 2008-09-16 15:38:24 UTC
Roman, is there any change if you use one of either 2.6.27-rc6 or 2.6.27-rc6 + 
http://www.kernel.org/~romieu/r8169/2.6.27-rc6/20080913-r8169-test.patch ?

-- 
Ueimor
Comment 10 Roman Timofeev 2009-01-02 12:58:50 UTC
Also, bug present in latest 2.6.28-git4 kernel.

some new info:
after start recieving with jumbo frames on r8169 side i see errors in console with ping like: no buffer space available

Also, oops:

------------[ cut here ]------------                                                                                                                                 
WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1f9/0x210()                                                                                                   
Hardware name: P35-S3                                                                                                                                                
NETDEV WATCHDOG: eth0 (r8169): transmit timed out                                                                                                                    
Modules linked in: af_packet binfmt_misc ppdev cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative wmi video output sbs sbshc pci_slot fan container battery microcode ac parport_pc lp parport fuse sr_mod cdrom snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul ata_generic pata_acpi evdev e100 snd_emu10k1 r8169 mii snd_ac97_codec pata_jmicron ac97_bus psmouse snd_pcm_oss i2c_core snd_mixer_oss serio_raw snd_pcm thermal snd_page_alloc snd_util_mem snd_hwdep snd_seq_dummy button snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq processor snd_timer snd_seq_device snd emu10k1_gp pcspkr pata_ali sata_uli gameport soundcore shpchp pci_hotplug iTCO_wdt iTCO_vendor_support ehci_hcd uhci_hcd intel_agp agpgart usbcore                                                                                                           
Pid: 0, comm: swapper Not tainted 2.6.28-git4 #3                                                                                                                                    
Call Trace:                                                                                                                                                                         
 [<c0128e67>] warn_slowpath+0x87/0xe0
 [<c03a11f7>] ip_rcv_finish+0xc7/0x310
 [<c03a11f7>] ip_rcv_finish+0xc7/0x310
 [<c0142a81>] getnstimeofday+0x41/0xe0
 [<c0113b30>] lapic_next_event+0x10/0x20
 [<c01460a3>] clockevents_program_event+0xa3/0x170
 [<c029bd3f>] strlcpy+0x1f/0x60
 [<c038ff09>] dev_watchdog+0x1f9/0x210
 [<c0142a81>] getnstimeofday+0x41/0xe0
 [<c0113b30>] lapic_next_event+0x10/0x20
 [<c01460a3>] clockevents_program_event+0xa3/0x170
 [<c0131f7d>] cascade+0x5d/0x80
 [<c038fd10>] dev_watchdog+0x0/0x210
 [<c0132120>] run_timer_softirq+0x100/0x1c0
 [<c01472d2>] tick_dev_program_event+0x32/0xb0
 [<c012df77>] __do_softirq+0x97/0x160
 [<c0140428>] hrtimer_interrupt+0xc8/0x1c0
 [<c012e07d>] do_softirq+0x3d/0x50
 [<c012e215>] irq_exit+0x65/0x80
 [<c01142ba>] smp_apic_timer_interrupt+0x5a/0x90
 [<c0103a00>] apic_timer_interrupt+0x28/0x30
 [<c0109dff>] mwait_idle+0x2f/0x40
 [<c01022c4>] cpu_idle+0x64/0xc0
---[ end trace 4490b620b0662d8c ]---
r8169: eth0: link up
Comment 11 Roman Timofeev 2009-01-02 13:02:49 UTC
Created attachment 19609 [details]
dmesg of 2.6.28-git4
Comment 12 Roman Timofeev 2009-01-02 14:41:51 UTC
also, no any changes with 2.6.27-rc6 + http://www.kernel.org/~romieu/r8169/2.6.27-rc6/20080913-r8169-test.patch 

I think this is tx-ring related problem.
Comment 13 Roman Mamedov 2009-09-24 13:24:53 UTC
I also have this issue with the latest 2.6.31 kernel, Realtek 8111B (on-board at Gigabyte GA-MA790FX) and MTU 4000.
Comment 14 Roman Mamedov 2010-01-17 08:49:19 UTC
Just happened in 2.6.32.1, with MTU 4082, during heavy outgoing traffic.

-------------------------------------------------------------------------

[ 7248.828017] nfs: server 192.168.0.100 not responding, still trying
[ 7251.272013]  CIFS VFS: No response to cmd 47 mid 28187
[ 7251.272022]  CIFS VFS: Write2 ret -11, wrote 0
[ 7251.780010] ------------[ cut here ]------------
[ 7251.780024] WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x273/0x280()
[ 7251.780028] Hardware name: GA-MA790FX-DQ6
[ 7251.780031] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 7251.780034] Modules linked in: nls_utf8 cifs sco bridge stp bnep rfcomm l2cap crc16 vboxnetflt vboxnetadp kvm_amd kvm fuse ip_tables x_tables fglrx(P) agpgart vboxdrv powernow_k8 k8temp it87 hwmon_vid loop dm_crypt tuner_simple tuner_types tea5767 snd_ice1724 snd_ice17xx_ak4xxx snd_ac97_codec tuner sg tvaudio sd_mod sr_mod ac97_bus crc_t10dif snd_ak4xxx_adda tda7432 cdrom ata_generic snd_ak4114 msp3400 snd_pcm_oss ide_pci_generic snd_mixer_oss bttv snd_pcm v4l2_common videodev v4l1_compat v4l2_compat_ioctl32 snd_page_alloc ir_common snd_pt2258 snd_i2c i2c_algo_bit snd_seq_midi ahci videobuf_dma_sg videobuf_core snd_rawmidi libata btcx_risc btusb snd_seq_midi_event tveeprom snd_seq snd_timer snd_seq_device scsi_mod i2c_piix4 atiixp psmouse ide_core bluetooth snd i2c_core processor amd64_edac_mod serio_raw button edac_core evdev rfkill soundcore usbhid hid nfs lockd fscache nfs_acl auth_rpcgss sunrpc ohci_hcd ehci_hcd usbcore nls_base thermal fan thermal_sys dm_mirror dm_region_hash dm_log dm_mod r8169 mii
[ 7251.780141] Pid: 0, comm: swapper Tainted: P           2.6.32.1-rm1-slab-amd #1
[ 7251.780144] Call Trace:
[ 7251.780148]  <IRQ>  [<ffffffff812ccbf3>] ? dev_watchdog+0x273/0x280
[ 7251.780157]  [<ffffffff812ccbf3>] ? dev_watchdog+0x273/0x280
[ 7251.780165]  [<ffffffff8105d094>] ? warn_slowpath_common+0x74/0xd0
[ 7251.780171]  [<ffffffff8105d151>] ? warn_slowpath_fmt+0x51/0x60
[ 7251.780176]  [<ffffffff81045ddd>] ? enqueue_task+0x5d/0x70
[ 7251.780181]  [<ffffffff81045ef0>] ? activate_task+0x20/0x30
[ 7251.780186]  [<ffffffff81055101>] ? try_to_wake_up+0xf1/0x270
[ 7251.780192]  [<ffffffff811e1951>] ? strlcpy+0x41/0x50
[ 7251.780197]  [<ffffffff812b311b>] ? netdev_drivername+0x3b/0x40
[ 7251.780201]  [<ffffffff812ccbf3>] ? dev_watchdog+0x273/0x280
[ 7251.780205]  [<ffffffff8104c2d3>] ? __wake_up+0x43/0x70
[ 7251.780210]  [<ffffffff812cc980>] ? dev_watchdog+0x0/0x280
[ 7251.780215]  [<ffffffff8106cc32>] ? run_timer_softirq+0x192/0x310
[ 7251.780221]  [<ffffffff810641e3>] ? __do_softirq+0xf3/0x1d0
[ 7251.780226]  [<ffffffff81087816>] ? tick_dev_program_event+0x36/0xb0
[ 7251.780231]  [<ffffffff810130ec>] ? call_softirq+0x1c/0x30
[ 7251.780236]  [<ffffffff810152b5>] ? do_softirq+0x65/0xa0
[ 7251.780240]  [<ffffffff81063fb5>] ? irq_exit+0x75/0x90
[ 7251.780245]  [<ffffffff8102be1c>] ? smp_apic_timer_interrupt+0x6c/0xa0
[ 7251.780250]  [<ffffffff81012ab3>] ? apic_timer_interrupt+0x13/0x20
[ 7251.780252]  <EOI>  [<ffffffff8102b170>] ? lapic_next_event+0x0/0x20
[ 7251.780262]  [<ffffffff81034d72>] ? native_safe_halt+0x2/0x10
[ 7251.780267]  [<ffffffff8101ac15>] ? default_idle+0x45/0x80
[ 7251.780272]  [<ffffffff8101acaa>] ? c1e_idle+0x5a/0x120
[ 7251.780278]  [<ffffffff81011120>] ? cpu_idle+0xb0/0x100
[ 7251.780282] ---[ end trace b8c5922724f59f39 ]---
[ 7251.796056] r8169: eth0: link up
[ 7253.400017] nfs: server 192.168.0.100 not responding, still trying
[ 7256.176015] nfs: server 192.168.0.100 not responding, still trying
[ 7256.816013]  CIFS VFS: No response to cmd 47 mid 28188
[ 7256.816021]  CIFS VFS: Write2 ret -11, wrote 0
[ 7257.288533] nfs: server 192.168.0.100 OK
[ 7257.292026] nfs: server 192.168.0.100 OK
[ 7257.292346] nfs: server 192.168.0.100 OK
Comment 15 Roman Mamedov 2010-01-17 08:53:26 UTC
Created attachment 24599 [details]
dmesg of a kernel which exhibits the problem (2.6.32.1)
Comment 16 Stijn Tintel 2015-03-23 18:51:06 UTC
Please reopen this bug, it still happens in 3.19.2 during iperf3 testing with MTU 9000.

[357265.115971] ------------[ cut here ]------------
[357265.115988] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x231/0x240()
[357265.115992] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[357265.115995] Modules linked in: md5 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl algif_skcipher af_alg openvswitch nfnetlink vhost_net tun vhost macvtap macvlan pci_stub msr cpuid saa7134_alsa tda1004x saa7134_dvb videobuf2_dvb dvb_core rc_pinnacle_color rc_hauppauge ir_kbd_i2c tda827x tda8290 ivtv_alsa tuner_simple tuner_types wm8775 tuner cx25840 saa7134 coretemp hwmon x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel iTCO_wdt iTCO_vendor_support mxm_wmi videobuf2_dma_sg videobuf2_memops ivtv videobuf2_core tveeprom cx2341x rc_core v4l2_common videodev snd_hda_codec_realtek snd_hda_codec_generic i2c_i801 i2c_algo_bit media evdev uas snd_hda_intel
[357265.116047]  snd_hda_controller snd_hda_codec xhci_pci snd_hwdep r8169 lpc_ich snd_pcm mii mfd_core snd_timer snd mei_me soundcore wmi mei processor button dm_zero dm_thin_pool dm_persistent_data libcrc32c dm_bio_prison dm_service_time dm_round_robin dm_queue_length dm_multipath dm_delay xts aesni_intel lrw glue_helper ablk_helper cryptd aes_x86_64 cbc sha256_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000 fuse nfs lockd grace sunrpc fscache btrfs xor raid6_pq ext2 dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod firewire_core crc_itu_t hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech ff_memless hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid xhci_hcd ohci_hcd
[357265.116101]  uhci_hcd usb_storage hid ehci_pci ehci_hcd usbcore usb_common arcmsr sr_mod cdrom sg pata_marvell [last unloaded: microcode]
[357265.116114] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.2-gentoo #1
[357265.116117] Hardware name: Gigabyte Technology Co., Ltd. P67A-UD4-B3/P67A-UD4-B3, BIOS F8 03/21/2012
[357265.116120]  ffffffff817c6e95 ffff880217403da8 ffffffff8149f7cc 0000000000000001
[357265.116124]  ffff880217403df8 ffff880217403de8 ffffffff81047410 ffffffff81272be0
[357265.116128]  0000000000000000 ffff8800e2232000 0000000000000001 0000000000000000
[357265.116132] Call Trace:
[357265.116135]  <IRQ>  [<ffffffff8149f7cc>] dump_stack+0x45/0x57
[357265.116146]  [<ffffffff81047410>] warn_slowpath_common+0x80/0xc0
[357265.116151]  [<ffffffff81272be0>] ? __rdmsr_on_cpu+0x40/0x40
[357265.116155]  [<ffffffff81047491>] warn_slowpath_fmt+0x41/0x50
[357265.116161]  [<ffffffff813e8d41>] dev_watchdog+0x231/0x240
[357265.116166]  [<ffffffff813e8b10>] ? dev_graft_qdisc+0x80/0x80
[357265.116171]  [<ffffffff81098d47>] call_timer_fn.isra.31+0x17/0x70
[357265.116176]  [<ffffffff81098f71>] run_timer_softirq+0x1d1/0x2a0
[357265.116181]  [<ffffffff810a590a>] ? clockevents_program_event+0x6a/0xf0
[357265.116186]  [<ffffffff8104a88a>] __do_softirq+0xda/0x200
[357265.116190]  [<ffffffff8104aba6>] irq_exit+0x86/0xb0
[357265.116195]  [<ffffffff8102f265>] smp_apic_timer_interrupt+0x45/0x60
[357265.116200]  [<ffffffff814a59ea>] apic_timer_interrupt+0x6a/0x70
[357265.116203]  <EOI>  [<ffffffff8138e985>] ? cpuidle_enter_state+0x55/0xc0
[357265.116211]  [<ffffffff8138e977>] ? cpuidle_enter_state+0x47/0xc0
[357265.116216]  [<ffffffff8138eaa2>] cpuidle_enter+0x12/0x20
[357265.116220]  [<ffffffff8107aa7e>] cpu_startup_entry+0x26e/0x2e0
[357265.116225]  [<ffffffff8149a3dd>] rest_init+0x6d/0x70
[357265.116230]  [<ffffffff81a9afcd>] start_kernel+0x448/0x455
[357265.116234]  [<ffffffff81a9a975>] ? set_init_arg+0x55/0x55
[357265.116239]  [<ffffffff81a9a5ad>] x86_64_start_reservations+0x2a/0x2c
[357265.116243]  [<ffffffff81a9a677>] x86_64_start_kernel+0xc8/0xcc
[357265.116246] ---[ end trace c6914737cf524707 ]---
[357265.131572] r8169 0000:08:00.0 eth0: link up
Comment 17 Alan 2015-03-23 19:37:29 UTC
Please open a new bug - its changed so much its unlikely to be connected
Comment 18 H.-Dirk Schmitt 2015-06-06 20:03:16 UTC
new bug is opened - bug #99521