Bug 211827
Summary: | r8169: NETDEV WATCHDOG: ens2 (r8169): transmit queue 0 timed out, when UDP message size > 5076B | ||
---|---|---|---|
Product: | Drivers | Reporter: | Josef Oškera (joskera) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | NEW --- | ||
Severity: | normal | CC: | hkallweit1 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.10.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | RTL8168e ethtool and lspci |
I am sorry for bad text formatting, but without markdown shows me preview only wall of text. Seems you triggered a hw issue with this chip version. Do you face the same issue with vendor driver r8168? Please also test with r8169 and latest linux-next. net-next on commit d310ec03a34e92a77302edb804f7d68ee4f01ba0 same issue ``` [Feb24 09:43] ------------[ cut here ]------------ [ +0.004640] NETDEV WATCHDOG: ens2 (r8169): transmit queue 0 timed out [ +0.006477] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x246/0x250 [ +0.008271] Modules linked in: sctp ip6_udp_tunnel udp_tunnel xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt ipmi_ssif iTCO_vendor_support kvm dcdbas irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate dell_smbios mei_me i2c_i801 dell_wmi_descriptor pcspkr ioatdma acpi_ipmi intel_uncore wmi_bmof mei i2c_smbus lpc_ich intel_pch_thermal ipmi_si dca ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm r8169 megaraid_sas libata crc32c_intel tg3 bnx2 realtek i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ +0.084017] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G S I 5.11.0+ #2 [ +0.007389] Hardware name: Dell Inc. PowerEdge R740/0F9N89, BIOS 2.3.10 08/15/2019 [ +0.007567] RIP: 0010:dev_watchdog+0x246/0x250 [ +0.004444] Code: e8 ef 89 fd ff eb ad 4c 89 e7 c6 05 06 25 13 01 01 e8 4e 63 fa ff 89 d9 4c 89 e6 48 c7 c7 a0 61 dc 84 48 89 c2 e8 76 aa 15 00 <0f> 0b eb 8f 66 0f 1f 44 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 [ +0.018781] RSP: 0018:ffffa61386830ed0 EFLAGS: 00010286 [ +0.005224] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f [ +0.007134] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f [ +0.007132] RBP: ffff8bb0e2f403dc R08: 0000000000000000 R09: c0000000ffff7fff [ +0.007130] R10: 0000000000000001 R11: ffffa61386830cd8 R12: ffff8bb0e2f40000 [ +0.007132] R13: 0000000000000002 R14: ffff8bb0e2f40480 R15: 0000000000000001 [ +0.007141] FS: 0000000000000000(0000) GS:ffff8bb820040000(0000) knlGS:0000000000000000 [ +0.008104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.005745] CR2: 0000558c9bfb1000 CR3: 0000000dd0810001 CR4: 00000000007706e0 [ +0.007141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.007149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.007150] PKRU: 55555554 [ +0.002712] Call Trace: [ +0.002454] <IRQ> [ +0.002027] ? pfifo_fast_enqueue+0x140/0x140 [ +0.004359] call_timer_fn+0x29/0xf0 [ +0.003588] run_timer_softirq+0x1c1/0x3d0 [ +0.004099] ? ktime_get+0x3e/0xa0 [ +0.003403] ? clockevents_program_event+0x94/0xf0 [ +0.004793] ? sched_clock+0x5/0x10 [ +0.003501] __do_softirq+0xc9/0x28e [ +0.003588] asm_call_irq_on_stack+0xf/0x20 [ +0.004185] </IRQ> [ +0.002108] do_softirq_own_stack+0x37/0x40 [ +0.004182] irq_exit_rcu+0xd4/0xe0 [ +0.003494] sysvec_apic_timer_interrupt+0x34/0x80 [ +0.004791] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ +0.005139] RIP: 0010:cpuidle_enter_state+0xd6/0x350 [ +0.004965] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 15 d8 9a ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 32 02 00 00 31 ff e8 0e 4f a1 ff fb 45 85 f6 <0f> 88 e0 00 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ +0.018744] RSP: 0018:ffffa613803f7e80 EFLAGS: 00000206 [ +0.005227] RAX: ffff8bb82006b280 RBX: 0000000000000003 RCX: 000000000000001f [ +0.007132] RDX: 0000005d3e4e7234 RSI: 000000003351fed6 RDI: 0000000000000000 [ +0.007132] RBP: ffffc60b60040000 R08: 0000000000000002 R09: 000000000002ab00 [ +0.007131] R10: 0000e131bdc9c5de R11: ffff8bb82006a004 R12: 0000005d3e4e7234 [ +0.007133] R13: ffffffff854c3560 R14: 0000000000000003 R15: 0000000000000000 [ +0.007133] cpuidle_enter+0x29/0x40 [ +0.003579] do_idle+0x250/0x2a0 [ +0.003233] cpu_startup_entry+0x19/0x20 [ +0.003924] start_secondary+0x11b/0x160 [ +0.003926] secondary_startup_64_no_verify+0xc2/0xcb [ +0.005054] ---[ end trace 8639964f6bc6756d ]--- ``` Vendor driver (version 8.048.03) doesn't have this problem, works normally with MTU >= 6000. Could you please check whether the following makes a difference? diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 0a20dae32..f704da3f2 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2285,14 +2285,14 @@ static void r8168dp_hw_jumbo_disable(struct rtl8169_private *tp) static void r8168e_hw_jumbo_enable(struct rtl8169_private *tp) { - RTL_W8(tp, MaxTxPacketSize, 0x3f); + RTL_W8(tp, MaxTxPacketSize, 0x24); RTL_W8(tp, Config3, RTL_R8(tp, Config3) | Jumbo_En0); RTL_W8(tp, Config4, RTL_R8(tp, Config4) | 0x01); } static void r8168e_hw_jumbo_disable(struct rtl8169_private *tp) { - RTL_W8(tp, MaxTxPacketSize, 0x0c); + RTL_W8(tp, MaxTxPacketSize, 0x3f); RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Jumbo_En0); RTL_W8(tp, Config4, RTL_R8(tp, Config4) & ~0x01); } -- 2.30.1 With patch it works correctly. note: I didn't pay attention enough and in 5.11.0+ didn't work TCP (in 5.10.0 it was UDP). But with patch UDP and TCP work. Fixed with 6cf739131a15 ("r8169: fix jumbo packet handling on RTL8168e") |
Created attachment 295343 [details] RTL8168e ethtool and lspci When I try UDP_STREAM netperf test with MTU >= 6000 or with message size bigger than 5076 bytes (with MTU 9000), netperf throughput drops to 0 and warning appears. TCP_STREAM works normaly. From RTL8168evl, RTL8168c, RTL8168b, RTL8168e is problematic only RTL8168e. ``` Kernel: 5.10.0 NIC: r8169 0000:3b:00.0 eth0: RTL8168e/8111e, 00:e0:4c:68:03:99, XID 2c2, IRQ 41 (3b:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)) For example with MTU 9000 this will cause the warning: $ netperf -4 -H 192.168.3.225 -t UDP_STREAM -l 5 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.225 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 65507 9.61 3 0 0.16 212992 9.61 0 0.00 [ 2052.603468] ------------[ cut here ]------------ [ 2052.608101] NETDEV WATCHDOG: ens2 (r8169): transmit queue 0 timed out [ 2052.614557] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x246/0x250 [ 2052.622828] Modules linked in: sctp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt ghash_clmulni_intel iTCO_vendor_support rapl intel_cstate dcdbas dell_smbios ipmi_ssif dell_wmi_descriptor wmi_bmof i2c_i801 intel_uncore mei_me ioatdma pcspkr mei i2c_smbus dca acpi_ipmi lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm megaraid_sas crc32c_intel libata r8169 tg3 bnx2 realtek i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ 2052.702834] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G S I 5.10.0 #3 [ 2052.710138] Hardware name: Dell Inc. PowerEdge R740/0F9N89, BIOS 2.3.10 08/15/2019 [ 2052.717722] RIP: 0010:dev_watchdog+0x246/0x250 [ 2052.722166] Code: e8 3f bb fd ff eb ad 4c 89 e7 c6 05 0d 09 15 01 01 e8 ae a6 fa ff 89 d9 4c 89 e6 48 c7 c7 38 b6 9b ac 48 89 c2 e8 5c 08 15 00 <0f> 0b eb 8f 66 0f 1f 44 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 [ 2052.740912] RSP: 0018:ffffac3586820ed0 EFLAGS: 00010286 [ 2052.746138] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f [ 2052.753269] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f [ 2052.760400] RBP: ffff9a7200d043dc R08: 0000000000000000 R09: c0000000ffff7fff [ 2052.767533] R10: 0000000000000001 R11: ffffac3586820cd8 R12: ffff9a7200d04000 [ 2052.774664] R13: 0000000000000002 R14: ffff9a7200d04480 R15: 0000000000000001 [ 2052.781798] FS: 0000000000000000(0000) GS:ffff9a78e0040000(0000) knlGS:0000000000000000 [ 2052.789882] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2052.795628] CR2: 00007fe7b3b7b000 CR3: 0000000045610001 CR4: 00000000007706e0 [ 2052.802761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2052.809890] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2052.817015] PKRU: 55555554 [ 2052.819727] Call Trace: [ 2052.822180] <IRQ> [ 2052.824200] ? pfifo_fast_enqueue+0x140/0x140 [ 2052.828561] call_timer_fn+0x29/0xf0 [ 2052.832137] run_timer_softirq+0x1c1/0x3d0 [ 2052.836239] ? ktime_get+0x3e/0xa0 [ 2052.839645] ? clockevents_program_event+0x94/0xf0 [ 2052.844436] __do_softirq+0xc4/0x287 [ 2052.848013] asm_call_irq_on_stack+0xf/0x20 [ 2052.852200] </IRQ> [ 2052.854307] do_softirq_own_stack+0x37/0x40 [ 2052.858493] irq_exit_rcu+0xd2/0xe0 [ 2052.861987] sysvec_apic_timer_interrupt+0x34/0x80 [ 2052.866777] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 2052.871916] RIP: 0010:cpuidle_enter_state+0xd6/0x350 [ 2052.876880] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 35 10 9c ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 32 02 00 00 31 ff e8 1e 71 a2 ff fb 45 85 f6 <0f> 88 e0 00 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 2052.895623] RSP: 0018:ffffac35803e7e80 EFLAGS: 00000206 [ 2052.900851] RAX: ffff9a78e006ac80 RBX: 0000000000000003 RCX: 000000000000001f [ 2052.907982] RDX: 000001dde8b32383 RSI: 0000000033520030 RDI: 0000000000000000 [ 2052.915113] RBP: ffff9a78e0076500 R08: 0000000000000002 R09: 000000000002a500 [ 2052.922245] R10: 0000247f8cccbf66 R11: ffff9a78e0069c44 R12: 000001dde8b32383 [ 2052.929378] R13: ffffffffad0c26e0 R14: 0000000000000003 R15: 0000000000000000 [ 2052.936512] cpuidle_enter+0x29/0x40 [ 2052.940091] do_idle+0x24b/0x2a0 [ 2052.943321] cpu_startup_entry+0x19/0x20 [ 2052.947249] start_secondary+0x10d/0x150 [ 2052.951174] secondary_startup_64_no_verify+0xc2/0xcb [ 2052.956225] ---[ end trace f25588e080843187 ]--- ```