Bug 9882
Summary: | NETDEV WATCHDOG: transmit timed out if mtu > 2500. | ||
---|---|---|---|
Product: | Drivers | Reporter: | Roman Timofeev (cryptoloop) |
Component: | Network | Assignee: | Francois Romieu (romieu) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, dirk, rm+bko, stijn+bugs, v.plessky |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
my dmesg
dmesg w/o nvidia driver. dmesg of 2.6.28-git4 dmesg of a kernel which exhibits the problem (2.6.32.1) |
Description
Roman Timofeev
2008-02-03 13:25:11 UTC
On the same machine with freebsd 7 all works fine too. Roman, can you send a full dmesg ? -- Ueimor Created attachment 14694 [details]
my dmesg
dmesg added. something else?
Can you check that the problem is still there when you do not load the nvidia module and check if there is enough free memory when the problem appears ? -- Ueimor Created attachment 14695 [details]
dmesg w/o nvidia driver.
oops, sorry. I'm remove nvidia driver and reproduce this problem.
Also, i'm define RTL8169_DEBUG.
localhost ~ # ethtool -S eth1
NIC statistics:
tx_packets: 2034
rx_packets: 1994
tx_errors: 54
rx_errors: 207
rx_missed: 0
align_errors: 747
tx_single_collisions: 0
tx_multi_collisions: 0
unicast: 1979
broadcast: 15
multicast: 15
tx_aborted: 0
tx_underrun: 54
Somebody alive? I can provide ssh access to that machine. Is the problem still there with latest kernel? Roman, maybe you can attach you /var/log/messages, since you've enabled driver debug. I'd say try setting debug verbosity to max (16), and set kernel verbosity to max (dmesg -n 7). Bug found in latest kernels(2.6.26-rcXX too). Also, i'm enabled debug in driver and build kernel with full debug. In dmesg, after loading driver: --------- [ 1193.780511] r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded [ 1193.780612] ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 16 (level, low) -> IRQ 16 [ 1193.780633] PCI: Setting latency timer of device 0000:04:00.0 to 64 [ 1193.780642] r8169: mac_version = 0x0c [ 1193.780783] kobject: 'eth0' (f492f3d4): kobject_add_internal: parent: 'net', set: 'devices' [ 1193.780840] kobject: 'eth0' (f492f3d4): kobject_uevent_env [ 1193.780843] kobject: 'eth0' (f492f3d4): fill_kobj_path: path = '/class/net/eth0' [ 1193.780846] kobject: '0000:04:00.0' (f7d0b13c): fill_kobj_path: path = '/devices/pci0000:00/0000:00:1c.4/0000:04:00.0' [ 1193.781246] eth0: RTL8168b/8111b at 0xf8b98000, 00:1a:4d:4b:dd:be, XID 38000000 IRQ 218 [ 1193.781247] r8169: mac_version = 0x0c [ 1193.781248] r8169: Set MAC Reg C+CR Offset 0x82h = 0x01h [ 1193.782348] kobject: 'drivers' (f4860e00): kobject_add_internal: parent: 'r8169', set: '<NULL>' [ 1193.782362] kobject: 'r8169' (f5680a80): kobject_uevent_env [ 1193.782366] kobject: 'r8169' (f5680a80): fill_kobj_path: path = '/bus/pci/drivers/r8169' -------- Nothing interest. But i see some strange effects in this tests: hostA(192.168.0.1) - box with r8169 hostB(192.168.0.5) - box with other gigabit nic. 1) on hostB: iperf(server mode) on hostA: ping 192.168.0.5 + iperf(client mode). After traffic freeze, ping says: ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available 2) on hostB: ping 192.168.0.1 + tcpdump + iperf(server mode) on hostA: tcpdump + iperf(client mode) After traffic freeze, tcpdump on hostA says: 22:37:55.393477 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 8, length 64 22:37:55.393514 IP 192.168.0.1 > 192.168.0.5: ICMP echo reply, id 3914, seq 8, length 64 22:37:56.393466 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 9, length 64 22:37:56.393494 IP 192.168.0.1 > 192.168.0.5: ICMP echo reply, id 3914, seq 9, length 64 But, on hostB: 22:37:58.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 11, length 64 22:37:59.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 12, length 64 22:38:00.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 13, length 64 22:38:01.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 14, length 64 22:38:02.426412 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 15, length 64 22:38:03.426413 IP 192.168.0.5 > 192.168.0.1: ICMP echo request, id 3914, seq 16, length 64 Roman, is there any change if you use one of either 2.6.27-rc6 or 2.6.27-rc6 + http://www.kernel.org/~romieu/r8169/2.6.27-rc6/20080913-r8169-test.patch ? -- Ueimor Also, bug present in latest 2.6.28-git4 kernel. some new info: after start recieving with jumbo frames on r8169 side i see errors in console with ping like: no buffer space available Also, oops: ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1f9/0x210() Hardware name: P35-S3 NETDEV WATCHDOG: eth0 (r8169): transmit timed out Modules linked in: af_packet binfmt_misc ppdev cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative wmi video output sbs sbshc pci_slot fan container battery microcode ac parport_pc lp parport fuse sr_mod cdrom snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul ata_generic pata_acpi evdev e100 snd_emu10k1 r8169 mii snd_ac97_codec pata_jmicron ac97_bus psmouse snd_pcm_oss i2c_core snd_mixer_oss serio_raw snd_pcm thermal snd_page_alloc snd_util_mem snd_hwdep snd_seq_dummy button snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq processor snd_timer snd_seq_device snd emu10k1_gp pcspkr pata_ali sata_uli gameport soundcore shpchp pci_hotplug iTCO_wdt iTCO_vendor_support ehci_hcd uhci_hcd intel_agp agpgart usbcore Pid: 0, comm: swapper Not tainted 2.6.28-git4 #3 Call Trace: [<c0128e67>] warn_slowpath+0x87/0xe0 [<c03a11f7>] ip_rcv_finish+0xc7/0x310 [<c03a11f7>] ip_rcv_finish+0xc7/0x310 [<c0142a81>] getnstimeofday+0x41/0xe0 [<c0113b30>] lapic_next_event+0x10/0x20 [<c01460a3>] clockevents_program_event+0xa3/0x170 [<c029bd3f>] strlcpy+0x1f/0x60 [<c038ff09>] dev_watchdog+0x1f9/0x210 [<c0142a81>] getnstimeofday+0x41/0xe0 [<c0113b30>] lapic_next_event+0x10/0x20 [<c01460a3>] clockevents_program_event+0xa3/0x170 [<c0131f7d>] cascade+0x5d/0x80 [<c038fd10>] dev_watchdog+0x0/0x210 [<c0132120>] run_timer_softirq+0x100/0x1c0 [<c01472d2>] tick_dev_program_event+0x32/0xb0 [<c012df77>] __do_softirq+0x97/0x160 [<c0140428>] hrtimer_interrupt+0xc8/0x1c0 [<c012e07d>] do_softirq+0x3d/0x50 [<c012e215>] irq_exit+0x65/0x80 [<c01142ba>] smp_apic_timer_interrupt+0x5a/0x90 [<c0103a00>] apic_timer_interrupt+0x28/0x30 [<c0109dff>] mwait_idle+0x2f/0x40 [<c01022c4>] cpu_idle+0x64/0xc0 ---[ end trace 4490b620b0662d8c ]--- r8169: eth0: link up Created attachment 19609 [details]
dmesg of 2.6.28-git4
also, no any changes with 2.6.27-rc6 + http://www.kernel.org/~romieu/r8169/2.6.27-rc6/20080913-r8169-test.patch I think this is tx-ring related problem. I also have this issue with the latest 2.6.31 kernel, Realtek 8111B (on-board at Gigabyte GA-MA790FX) and MTU 4000. Just happened in 2.6.32.1, with MTU 4082, during heavy outgoing traffic. ------------------------------------------------------------------------- [ 7248.828017] nfs: server 192.168.0.100 not responding, still trying [ 7251.272013] CIFS VFS: No response to cmd 47 mid 28187 [ 7251.272022] CIFS VFS: Write2 ret -11, wrote 0 [ 7251.780010] ------------[ cut here ]------------ [ 7251.780024] WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x273/0x280() [ 7251.780028] Hardware name: GA-MA790FX-DQ6 [ 7251.780031] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 7251.780034] Modules linked in: nls_utf8 cifs sco bridge stp bnep rfcomm l2cap crc16 vboxnetflt vboxnetadp kvm_amd kvm fuse ip_tables x_tables fglrx(P) agpgart vboxdrv powernow_k8 k8temp it87 hwmon_vid loop dm_crypt tuner_simple tuner_types tea5767 snd_ice1724 snd_ice17xx_ak4xxx snd_ac97_codec tuner sg tvaudio sd_mod sr_mod ac97_bus crc_t10dif snd_ak4xxx_adda tda7432 cdrom ata_generic snd_ak4114 msp3400 snd_pcm_oss ide_pci_generic snd_mixer_oss bttv snd_pcm v4l2_common videodev v4l1_compat v4l2_compat_ioctl32 snd_page_alloc ir_common snd_pt2258 snd_i2c i2c_algo_bit snd_seq_midi ahci videobuf_dma_sg videobuf_core snd_rawmidi libata btcx_risc btusb snd_seq_midi_event tveeprom snd_seq snd_timer snd_seq_device scsi_mod i2c_piix4 atiixp psmouse ide_core bluetooth snd i2c_core processor amd64_edac_mod serio_raw button edac_core evdev rfkill soundcore usbhid hid nfs lockd fscache nfs_acl auth_rpcgss sunrpc ohci_hcd ehci_hcd usbcore nls_base thermal fan thermal_sys dm_mirror dm_region_hash dm_log dm_mod r8169 mii [ 7251.780141] Pid: 0, comm: swapper Tainted: P 2.6.32.1-rm1-slab-amd #1 [ 7251.780144] Call Trace: [ 7251.780148] <IRQ> [<ffffffff812ccbf3>] ? dev_watchdog+0x273/0x280 [ 7251.780157] [<ffffffff812ccbf3>] ? dev_watchdog+0x273/0x280 [ 7251.780165] [<ffffffff8105d094>] ? warn_slowpath_common+0x74/0xd0 [ 7251.780171] [<ffffffff8105d151>] ? warn_slowpath_fmt+0x51/0x60 [ 7251.780176] [<ffffffff81045ddd>] ? enqueue_task+0x5d/0x70 [ 7251.780181] [<ffffffff81045ef0>] ? activate_task+0x20/0x30 [ 7251.780186] [<ffffffff81055101>] ? try_to_wake_up+0xf1/0x270 [ 7251.780192] [<ffffffff811e1951>] ? strlcpy+0x41/0x50 [ 7251.780197] [<ffffffff812b311b>] ? netdev_drivername+0x3b/0x40 [ 7251.780201] [<ffffffff812ccbf3>] ? dev_watchdog+0x273/0x280 [ 7251.780205] [<ffffffff8104c2d3>] ? __wake_up+0x43/0x70 [ 7251.780210] [<ffffffff812cc980>] ? dev_watchdog+0x0/0x280 [ 7251.780215] [<ffffffff8106cc32>] ? run_timer_softirq+0x192/0x310 [ 7251.780221] [<ffffffff810641e3>] ? __do_softirq+0xf3/0x1d0 [ 7251.780226] [<ffffffff81087816>] ? tick_dev_program_event+0x36/0xb0 [ 7251.780231] [<ffffffff810130ec>] ? call_softirq+0x1c/0x30 [ 7251.780236] [<ffffffff810152b5>] ? do_softirq+0x65/0xa0 [ 7251.780240] [<ffffffff81063fb5>] ? irq_exit+0x75/0x90 [ 7251.780245] [<ffffffff8102be1c>] ? smp_apic_timer_interrupt+0x6c/0xa0 [ 7251.780250] [<ffffffff81012ab3>] ? apic_timer_interrupt+0x13/0x20 [ 7251.780252] <EOI> [<ffffffff8102b170>] ? lapic_next_event+0x0/0x20 [ 7251.780262] [<ffffffff81034d72>] ? native_safe_halt+0x2/0x10 [ 7251.780267] [<ffffffff8101ac15>] ? default_idle+0x45/0x80 [ 7251.780272] [<ffffffff8101acaa>] ? c1e_idle+0x5a/0x120 [ 7251.780278] [<ffffffff81011120>] ? cpu_idle+0xb0/0x100 [ 7251.780282] ---[ end trace b8c5922724f59f39 ]--- [ 7251.796056] r8169: eth0: link up [ 7253.400017] nfs: server 192.168.0.100 not responding, still trying [ 7256.176015] nfs: server 192.168.0.100 not responding, still trying [ 7256.816013] CIFS VFS: No response to cmd 47 mid 28188 [ 7256.816021] CIFS VFS: Write2 ret -11, wrote 0 [ 7257.288533] nfs: server 192.168.0.100 OK [ 7257.292026] nfs: server 192.168.0.100 OK [ 7257.292346] nfs: server 192.168.0.100 OK Created attachment 24599 [details]
dmesg of a kernel which exhibits the problem (2.6.32.1)
Please reopen this bug, it still happens in 3.19.2 during iperf3 testing with MTU 9000. [357265.115971] ------------[ cut here ]------------ [357265.115988] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x231/0x240() [357265.115992] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [357265.115995] Modules linked in: md5 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl algif_skcipher af_alg openvswitch nfnetlink vhost_net tun vhost macvtap macvlan pci_stub msr cpuid saa7134_alsa tda1004x saa7134_dvb videobuf2_dvb dvb_core rc_pinnacle_color rc_hauppauge ir_kbd_i2c tda827x tda8290 ivtv_alsa tuner_simple tuner_types wm8775 tuner cx25840 saa7134 coretemp hwmon x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel iTCO_wdt iTCO_vendor_support mxm_wmi videobuf2_dma_sg videobuf2_memops ivtv videobuf2_core tveeprom cx2341x rc_core v4l2_common videodev snd_hda_codec_realtek snd_hda_codec_generic i2c_i801 i2c_algo_bit media evdev uas snd_hda_intel [357265.116047] snd_hda_controller snd_hda_codec xhci_pci snd_hwdep r8169 lpc_ich snd_pcm mii mfd_core snd_timer snd mei_me soundcore wmi mei processor button dm_zero dm_thin_pool dm_persistent_data libcrc32c dm_bio_prison dm_service_time dm_round_robin dm_queue_length dm_multipath dm_delay xts aesni_intel lrw glue_helper ablk_helper cryptd aes_x86_64 cbc sha256_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000 fuse nfs lockd grace sunrpc fscache btrfs xor raid6_pq ext2 dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod firewire_core crc_itu_t hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech ff_memless hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid xhci_hcd ohci_hcd [357265.116101] uhci_hcd usb_storage hid ehci_pci ehci_hcd usbcore usb_common arcmsr sr_mod cdrom sg pata_marvell [last unloaded: microcode] [357265.116114] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.2-gentoo #1 [357265.116117] Hardware name: Gigabyte Technology Co., Ltd. P67A-UD4-B3/P67A-UD4-B3, BIOS F8 03/21/2012 [357265.116120] ffffffff817c6e95 ffff880217403da8 ffffffff8149f7cc 0000000000000001 [357265.116124] ffff880217403df8 ffff880217403de8 ffffffff81047410 ffffffff81272be0 [357265.116128] 0000000000000000 ffff8800e2232000 0000000000000001 0000000000000000 [357265.116132] Call Trace: [357265.116135] <IRQ> [<ffffffff8149f7cc>] dump_stack+0x45/0x57 [357265.116146] [<ffffffff81047410>] warn_slowpath_common+0x80/0xc0 [357265.116151] [<ffffffff81272be0>] ? __rdmsr_on_cpu+0x40/0x40 [357265.116155] [<ffffffff81047491>] warn_slowpath_fmt+0x41/0x50 [357265.116161] [<ffffffff813e8d41>] dev_watchdog+0x231/0x240 [357265.116166] [<ffffffff813e8b10>] ? dev_graft_qdisc+0x80/0x80 [357265.116171] [<ffffffff81098d47>] call_timer_fn.isra.31+0x17/0x70 [357265.116176] [<ffffffff81098f71>] run_timer_softirq+0x1d1/0x2a0 [357265.116181] [<ffffffff810a590a>] ? clockevents_program_event+0x6a/0xf0 [357265.116186] [<ffffffff8104a88a>] __do_softirq+0xda/0x200 [357265.116190] [<ffffffff8104aba6>] irq_exit+0x86/0xb0 [357265.116195] [<ffffffff8102f265>] smp_apic_timer_interrupt+0x45/0x60 [357265.116200] [<ffffffff814a59ea>] apic_timer_interrupt+0x6a/0x70 [357265.116203] <EOI> [<ffffffff8138e985>] ? cpuidle_enter_state+0x55/0xc0 [357265.116211] [<ffffffff8138e977>] ? cpuidle_enter_state+0x47/0xc0 [357265.116216] [<ffffffff8138eaa2>] cpuidle_enter+0x12/0x20 [357265.116220] [<ffffffff8107aa7e>] cpu_startup_entry+0x26e/0x2e0 [357265.116225] [<ffffffff8149a3dd>] rest_init+0x6d/0x70 [357265.116230] [<ffffffff81a9afcd>] start_kernel+0x448/0x455 [357265.116234] [<ffffffff81a9a975>] ? set_init_arg+0x55/0x55 [357265.116239] [<ffffffff81a9a5ad>] x86_64_start_reservations+0x2a/0x2c [357265.116243] [<ffffffff81a9a677>] x86_64_start_kernel+0xc8/0xcc [357265.116246] ---[ end trace c6914737cf524707 ]--- [357265.131572] r8169 0000:08:00.0 eth0: link up Please open a new bug - its changed so much its unlikely to be connected new bug is opened - bug #99521 |