Bug 211827 - r8169: NETDEV WATCHDOG: ens2 (r8169): transmit queue 0 timed out, when UDP message size > 5076B
Summary: r8169: NETDEV WATCHDOG: ens2 (r8169): transmit queue 0 timed out, when UDP me...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-18 12:26 UTC by Josef Oškera
Modified: 2021-02-26 20:39 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.10.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
RTL8168e ethtool and lspci (8.08 KB, text/plain)
2021-02-18 12:26 UTC, Josef Oškera
Details

Description Josef Oškera 2021-02-18 12:26:27 UTC
Created attachment 295343 [details]
RTL8168e ethtool and lspci

When I try UDP_STREAM netperf test with MTU >= 6000 or with message size bigger than 5076 bytes (with MTU 9000), netperf throughput drops to 0 and warning appears. TCP_STREAM works normaly. 
From RTL8168evl, RTL8168c, RTL8168b, RTL8168e is problematic only RTL8168e.


```
Kernel: 5.10.0
NIC: r8169 0000:3b:00.0 eth0: RTL8168e/8111e, 00:e0:4c:68:03:99, XID 2c2, IRQ 41
     (3b:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06))


For example with MTU 9000 this will cause the warning:

$ netperf -4 -H 192.168.3.225 -t UDP_STREAM -l 5
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.3.225 () port 0 AF_INET
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   9.61            3      0       0.16
212992           9.61            0              0.00


[ 2052.603468] ------------[ cut here ]------------
[ 2052.608101] NETDEV WATCHDOG: ens2 (r8169): transmit queue 0 timed out
[ 2052.614557] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x246/0x250
[ 2052.622828] Modules linked in: sctp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt ghash_clmulni_intel iTCO_vendor_support rapl intel_cstate dcdbas dell_smbios ipmi_ssif dell_wmi_descriptor wmi_bmof i2c_i801 intel_uncore mei_me ioatdma pcspkr mei i2c_smbus dca acpi_ipmi lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm megaraid_sas crc32c_intel libata r8169 tg3 bnx2 realtek i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod fuse
[ 2052.702834] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G S        I       5.10.0 #3
[ 2052.710138] Hardware name: Dell Inc. PowerEdge R740/0F9N89, BIOS 2.3.10 08/15/2019
[ 2052.717722] RIP: 0010:dev_watchdog+0x246/0x250
[ 2052.722166] Code: e8 3f bb fd ff eb ad 4c 89 e7 c6 05 0d 09 15 01 01 e8 ae a6 fa ff 89 d9 4c 89 e6 48 c7 c7 38 b6 9b ac 48 89 c2 e8 5c 08 15 00 <0f> 0b eb 8f 66 0f 1f 44 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6
[ 2052.740912] RSP: 0018:ffffac3586820ed0 EFLAGS: 00010286
[ 2052.746138] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f
[ 2052.753269] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[ 2052.760400] RBP: ffff9a7200d043dc R08: 0000000000000000 R09: c0000000ffff7fff
[ 2052.767533] R10: 0000000000000001 R11: ffffac3586820cd8 R12: ffff9a7200d04000
[ 2052.774664] R13: 0000000000000002 R14: ffff9a7200d04480 R15: 0000000000000001
[ 2052.781798] FS:  0000000000000000(0000) GS:ffff9a78e0040000(0000) knlGS:0000000000000000
[ 2052.789882] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2052.795628] CR2: 00007fe7b3b7b000 CR3: 0000000045610001 CR4: 00000000007706e0
[ 2052.802761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2052.809890] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2052.817015] PKRU: 55555554
[ 2052.819727] Call Trace:
[ 2052.822180]  <IRQ>
[ 2052.824200]  ? pfifo_fast_enqueue+0x140/0x140
[ 2052.828561]  call_timer_fn+0x29/0xf0
[ 2052.832137]  run_timer_softirq+0x1c1/0x3d0
[ 2052.836239]  ? ktime_get+0x3e/0xa0
[ 2052.839645]  ? clockevents_program_event+0x94/0xf0
[ 2052.844436]  __do_softirq+0xc4/0x287
[ 2052.848013]  asm_call_irq_on_stack+0xf/0x20
[ 2052.852200]  </IRQ>
[ 2052.854307]  do_softirq_own_stack+0x37/0x40
[ 2052.858493]  irq_exit_rcu+0xd2/0xe0
[ 2052.861987]  sysvec_apic_timer_interrupt+0x34/0x80
[ 2052.866777]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2052.871916] RIP: 0010:cpuidle_enter_state+0xd6/0x350
[ 2052.876880] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 35 10 9c ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 32 02 00 00 31 ff e8 1e 71 a2 ff fb 45 85 f6 <0f> 88 e0 00 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
[ 2052.895623] RSP: 0018:ffffac35803e7e80 EFLAGS: 00000206
[ 2052.900851] RAX: ffff9a78e006ac80 RBX: 0000000000000003 RCX: 000000000000001f
[ 2052.907982] RDX: 000001dde8b32383 RSI: 0000000033520030 RDI: 0000000000000000
[ 2052.915113] RBP: ffff9a78e0076500 R08: 0000000000000002 R09: 000000000002a500
[ 2052.922245] R10: 0000247f8cccbf66 R11: ffff9a78e0069c44 R12: 000001dde8b32383
[ 2052.929378] R13: ffffffffad0c26e0 R14: 0000000000000003 R15: 0000000000000000
[ 2052.936512]  cpuidle_enter+0x29/0x40
[ 2052.940091]  do_idle+0x24b/0x2a0
[ 2052.943321]  cpu_startup_entry+0x19/0x20
[ 2052.947249]  start_secondary+0x10d/0x150
[ 2052.951174]  secondary_startup_64_no_verify+0xc2/0xcb
[ 2052.956225] ---[ end trace f25588e080843187 ]---

```
Comment 1 Josef Oškera 2021-02-18 12:30:25 UTC
I am sorry for bad text formatting, but without markdown shows me preview only wall of text.
Comment 2 Heiner Kallweit 2021-02-19 09:53:58 UTC
Seems you triggered a hw issue with this chip version. Do you face the same issue with vendor driver r8168? Please also test with r8169 and latest linux-next.
Comment 3 Josef Oškera 2021-02-24 16:58:07 UTC
net-next on commit d310ec03a34e92a77302edb804f7d68ee4f01ba0
same issue

```
[Feb24 09:43] ------------[ cut here ]------------
[  +0.004640] NETDEV WATCHDOG: ens2 (r8169): transmit queue 0 timed out
[  +0.006477] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x246/0x250
[  +0.008271] Modules linked in: sctp ip6_udp_tunnel udp_tunnel xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt ipmi_ssif iTCO_vendor_support kvm dcdbas irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate dell_smbios mei_me i2c_i801 dell_wmi_descriptor pcspkr ioatdma acpi_ipmi intel_uncore wmi_bmof mei i2c_smbus lpc_ich intel_pch_thermal ipmi_si dca ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm r8169 megaraid_sas libata crc32c_intel tg3 bnx2 realtek i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod fuse
[  +0.084017] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G S        I       5.11.0+ #2
[  +0.007389] Hardware name: Dell Inc. PowerEdge R740/0F9N89, BIOS 2.3.10 08/15/2019
[  +0.007567] RIP: 0010:dev_watchdog+0x246/0x250
[  +0.004444] Code: e8 ef 89 fd ff eb ad 4c 89 e7 c6 05 06 25 13 01 01 e8 4e 63 fa ff 89 d9 4c 89 e6 48 c7 c7 a0 61 dc 84 48 89 c2 e8 76 aa 15 00 <0f> 0b eb 8f 66 0f 1f 44 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6
[  +0.018781] RSP: 0018:ffffa61386830ed0 EFLAGS: 00010286
[  +0.005224] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f
[  +0.007134] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[  +0.007132] RBP: ffff8bb0e2f403dc R08: 0000000000000000 R09: c0000000ffff7fff
[  +0.007130] R10: 0000000000000001 R11: ffffa61386830cd8 R12: ffff8bb0e2f40000
[  +0.007132] R13: 0000000000000002 R14: ffff8bb0e2f40480 R15: 0000000000000001
[  +0.007141] FS:  0000000000000000(0000) GS:ffff8bb820040000(0000) knlGS:0000000000000000
[  +0.008104] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.005745] CR2: 0000558c9bfb1000 CR3: 0000000dd0810001 CR4: 00000000007706e0
[  +0.007141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.007149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0.007150] PKRU: 55555554
[  +0.002712] Call Trace:
[  +0.002454]  <IRQ>
[  +0.002027]  ? pfifo_fast_enqueue+0x140/0x140
[  +0.004359]  call_timer_fn+0x29/0xf0
[  +0.003588]  run_timer_softirq+0x1c1/0x3d0
[  +0.004099]  ? ktime_get+0x3e/0xa0
[  +0.003403]  ? clockevents_program_event+0x94/0xf0
[  +0.004793]  ? sched_clock+0x5/0x10
[  +0.003501]  __do_softirq+0xc9/0x28e
[  +0.003588]  asm_call_irq_on_stack+0xf/0x20
[  +0.004185]  </IRQ>
[  +0.002108]  do_softirq_own_stack+0x37/0x40
[  +0.004182]  irq_exit_rcu+0xd4/0xe0
[  +0.003494]  sysvec_apic_timer_interrupt+0x34/0x80
[  +0.004791]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  +0.005139] RIP: 0010:cpuidle_enter_state+0xd6/0x350
[  +0.004965] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 15 d8 9a ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 32 02 00 00 31 ff e8 0e 4f a1 ff fb 45 85 f6 <0f> 88 e0 00 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
[  +0.018744] RSP: 0018:ffffa613803f7e80 EFLAGS: 00000206
[  +0.005227] RAX: ffff8bb82006b280 RBX: 0000000000000003 RCX: 000000000000001f
[  +0.007132] RDX: 0000005d3e4e7234 RSI: 000000003351fed6 RDI: 0000000000000000
[  +0.007132] RBP: ffffc60b60040000 R08: 0000000000000002 R09: 000000000002ab00
[  +0.007131] R10: 0000e131bdc9c5de R11: ffff8bb82006a004 R12: 0000005d3e4e7234
[  +0.007133] R13: ffffffff854c3560 R14: 0000000000000003 R15: 0000000000000000
[  +0.007133]  cpuidle_enter+0x29/0x40
[  +0.003579]  do_idle+0x250/0x2a0
[  +0.003233]  cpu_startup_entry+0x19/0x20
[  +0.003924]  start_secondary+0x11b/0x160
[  +0.003926]  secondary_startup_64_no_verify+0xc2/0xcb
[  +0.005054] ---[ end trace 8639964f6bc6756d ]---
```
Comment 4 Josef Oškera 2021-02-24 17:01:08 UTC
Vendor driver (version 8.048.03) doesn't have this problem, works normally with MTU >= 6000.
Comment 5 Heiner Kallweit 2021-02-24 22:05:05 UTC
Could you please check whether the following makes a difference?


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0a20dae32..f704da3f2 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2285,14 +2285,14 @@ static void r8168dp_hw_jumbo_disable(struct rtl8169_private *tp)
 
 static void r8168e_hw_jumbo_enable(struct rtl8169_private *tp)
 {
-	RTL_W8(tp, MaxTxPacketSize, 0x3f);
+	RTL_W8(tp, MaxTxPacketSize, 0x24);
 	RTL_W8(tp, Config3, RTL_R8(tp, Config3) | Jumbo_En0);
 	RTL_W8(tp, Config4, RTL_R8(tp, Config4) | 0x01);
 }
 
 static void r8168e_hw_jumbo_disable(struct rtl8169_private *tp)
 {
-	RTL_W8(tp, MaxTxPacketSize, 0x0c);
+	RTL_W8(tp, MaxTxPacketSize, 0x3f);
 	RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Jumbo_En0);
 	RTL_W8(tp, Config4, RTL_R8(tp, Config4) & ~0x01);
 }
-- 
2.30.1
Comment 6 Josef Oškera 2021-02-25 14:28:47 UTC
With patch it works correctly.

note: I didn't pay attention enough and in 5.11.0+ didn't work TCP (in 5.10.0 it was UDP). But with patch UDP and TCP work.
Comment 7 Heiner Kallweit 2021-02-26 20:39:28 UTC
Fixed with 6cf739131a15 ("r8169: fix jumbo packet handling on RTL8168e")

Note You need to log in before you can comment on or make changes to this bug.