Bug 216884 - NETDEV WATCHDOG: enp6s0 (r8169): transmit queue 0 timed out
Summary: NETDEV WATCHDOG: enp6s0 (r8169): transmit queue 0 timed out
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: Other Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-04 15:43 UTC by HougeLangley
Modified: 2023-03-12 15:25 UTC (History)
5 users (show)

See Also:
Kernel Version: 6.2.0-rc2
Tree: Mainline
Subsystem:
Regression: No


Attachments
LoongArch 6.2.0-rc2-dmesg-full-log (77.09 KB, text/plain)
2023-01-04 15:43 UTC, HougeLangley
Details

Description HougeLangley 2023-01-04 15:43:49 UTC
Created attachment 303523 [details]
LoongArch 6.2.0-rc2-dmesg-full-log

Full dmesg log see attachment.

[  185.824079] ------------[ cut here ]------------
[  185.825695] NETDEV WATCHDOG: enp6s0 (r8169): transmit queue 0 timed out
[  185.826826] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x2d8/0x2e0
[  185.827899] Modules linked in: rfkill tun acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler fuse dm_mirror dm_region_hash dm_log efivarfs
[  185.828992] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc2 #1
[  185.830059] Hardware name: LOONGSON LOONGSON T100 T2208A/Loongson-LS2C5LE, BIOS Loongson-UDK2018-V2.0.0-prebeta9 11/01/2022
[  185.831144] $ 0   : 0000000000000000 9000000000f5bba8 90000000016cc000 900000010004bdd0
[  185.832243] $ 4   : 000000000000003b 90000000017d6410 0000000000000000 900000010004bc50
[  185.833337] $ 8   : 0000000000000040 ffffffffffffffff 0000000000000002 0000000000000001
[  185.834433] $12   : 0000000000000102 0000000000000027 0000000000000102 00000000ffffdfff
[  185.835529] $16   : 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  185.836632] $20   : 0000000000000430 ffff800012008fa8 9000000000f5b8d0 900000010d0e0488
[  185.837730] $24   : 900000010d0e03dc 900000010d0e0000 0000000000000000 90000000016de000
[  185.838836] $28   : ffff800012008fa8 ffffffffffffffff ffff800012008ff0 900000010004be90
[  185.839948] era   : 9000000000f5bba8 dev_watchdog+0x2d8/0x2e0
[  185.841041] ra    : 9000000000f5bba8 dev_watchdog+0x2d8/0x2e0
[  185.842108] CSR crmd: 000000b0	
[  185.842110] CSR prmd: 00000004	
[  185.843162] CSR euen: 00000000	
[  185.844207] CSR ecfg: 00071c1d	
[  185.845255] CSR estat: 000c0000	
[  185.847340] ExcCode : c (SubCode 0)
[  185.848377] PrId  : 0014c011 (Loongson-64bit)
[  185.849414] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc2 #1
[  185.850455] Hardware name: LOONGSON LOONGSON T100 T2208A/Loongson-LS2C5LE, BIOS Loongson-UDK2018-V2.0.0-prebeta9 11/01/2022
[  185.851518] Stack : 0000000000000000 9000000000222be8 90000000016cc000 900000010004ba70
[  185.852594]         900000010004ba70 0000000000000000 0000000000000000 900000010004ba70
[  185.853665]         0000000000000040 ffffffffffffffff 0000000000000002 0000000000000001
[  185.854735]         900000010004ba70 6fff800010e28000 9000000100d4f000 00000000ffffdfff
[  185.855807]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  185.856884]         0000000000000430 6fff800010e28000 9000000000f5b8d0 0000000000000000
[  185.857948]         0000000000000000 9000000001597c58 0000000000000000 90000000016de000
[  185.859003]         ffff800012008fa8 ffffffffffffffff ffff800012008ff0 900000010004be90
[  185.860062]         0000000000000000 9000000000222c00 00007ffff34cb210 00000000000000b0
[  185.861127]         0000000000000004 0000000000000000 0000000000071c1d 00000000000c0000
[  185.862181]         ...
[  185.863215] Call Trace:
[  185.863218] [<9000000000222c00>] show_stack+0x4c/0x15c
[  185.865240] [<90000000011491f0>] dump_stack_lvl+0x60/0x88
[  185.866249] [<900000000112fd28>] __warn+0x84/0xc8
[  185.867248] [<9000000001104724>] report_bug+0xa8/0x150
[  185.868245] [<900000000114981c>] do_bp+0x2dc/0x33c
[  185.869238] [<9000000000221600>] __arch_cpu_idle+0x20/0x24

[  185.871185] ---[ end trace 0000000000000000 ]---
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-05 08:05:15 UTC
Is this new in 6.2 (e.g. did 6.1 work)?
Comment 2 HougeLangley 2023-01-05 09:23:00 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1)
> Is this new in 6.2 (e.g. did 6.1 work)?

Yes, this is 6.2, 6.1 is working
Comment 3 Heiner Kallweit 2023-01-05 10:14:51 UTC
Then please bisect.
Comment 4 Heiner Kallweit 2023-01-05 10:21:18 UTC
You can also check whether the following makes a difference under 6.2-rc.

echo 0 > /sys/class/net/enp6s0/napi_defer_hard_irqs
Comment 5 HougeLangley 2023-01-05 14:59:06 UTC
(In reply to Heiner Kallweit from comment #4)
> You can also check whether the following makes a difference under 6.2-rc.
> 
> echo 0 > /sys/class/net/enp6s0/napi_defer_hard_irqs

Thanks, I will try tomorrow
Comment 6 HougeLangley 2023-01-06 13:54:56 UTC
(In reply to Heiner Kallweit from comment #4)
> You can also check whether the following makes a difference under 6.2-rc.
> 
> echo 0 > /sys/class/net/enp6s0/napi_defer_hard_irqs

I have try. It's not working
Comment 7 Heiner Kallweit 2023-01-06 17:50:34 UTC
Apart from using software interrupt coalescing per default there has been no relevant change to r8169 since 6.1. You can bisect to find the offending commit.

Something that hasn't changed since 6.1 but might be related:
I saw the following in your dmesg log:
r8169 0000:06:00.0: limiting MRRS to 256
This message comes from loongson_mrrs_quirk().

However this quirk reduces the MRRS only at device enable time, it doesn't prevent that the device increases the MRRS value later.
That's what r8169 does in rtl_jumbo_config().
So the current quirk may not be sufficient.
Comment 8 HougeLangley 2023-01-07 07:30:09 UTC
(In reply to Heiner Kallweit from comment #7)
> Apart from using software interrupt coalescing per default there has been no
> relevant change to r8169 since 6.1. You can bisect to find the offending
> commit.
> 
> Something that hasn't changed since 6.1 but might be related:
> I saw the following in your dmesg log:
> r8169 0000:06:00.0: limiting MRRS to 256
> This message comes from loongson_mrrs_quirk().
> 
> However this quirk reduces the MRRS only at device enable time, it doesn't
> prevent that the device increases the MRRS value later.
> That's what r8169 does in rtl_jumbo_config().
> So the current quirk may not be sufficient.

Thanks, I will ask for the reason.
Comment 9 WANG Xuerui 2023-01-07 08:35:30 UTC
Hi, according to LoongArch maintainer (Huacai) the issue was confirmed long time ago (it was already present back when Loongson products were still MIPS), but the fix is still under review: https://lore.kernel.org/all/20230106095143.3158998-2-chenhuacai@loongson.cn/

(And it's exactly because the MRRS was increased after device was enabled. Thanks for the analysis.)
Comment 10 HougeLangley 2023-01-07 08:37:11 UTC
(In reply to WANG Xuerui from comment #9)
> Hi, according to LoongArch maintainer (Huacai) the issue was confirmed long
> time ago (it was already present back when Loongson products were still
> MIPS), but the fix is still under review:
> https://lore.kernel.org/all/20230106095143.3158998-2-chenhuacai@loongson.cn/
> 
> (And it's exactly because the MRRS was increased after device was enabled.
> Thanks for the analysis.)

Thanks for your reply.
Comment 11 Bjorn Helgaas 2023-01-31 00:07:44 UTC
Can somebody explain why this behavior changed between v6.1 and v6.2?

I applied the MRRS restriction patch (https://lore.kernel.org/r/20230106095143.3158998-2-chenhuacai@loongson.cn), but if it's really a regression in v6.2, I can try to merge it earlier.
Comment 12 HougeLangley 2023-02-01 15:34:34 UTC
(In reply to Bjorn Helgaas from comment #11)
> Can somebody explain why this behavior changed between v6.1 and v6.2?
> 
> I applied the MRRS restriction patch
> (https://lore.kernel.org/r/20230106095143.3158998-2-chenhuacai@loongson.cn),
> but if it's really a regression in v6.2, I can try to merge it earlier.

I have been test these patches, it's working on 6.2-rc6

$ uname -a
Linux Gentoo-Loongson 6.2.0-rc6 #1 SMP PREEMPT Wed Feb  1 23:20:01 CST 2023 loongarch64 GNU/Linux
Comment 13 HougeLangley 2023-02-01 15:38:09 UTC
[    8.436684] RTL8211DN Gigabit Ethernet r8169-0-600:00: attached PHY driver (mii_bus:phy_addr=r8169-0-600:00, irq=MAC)
[    8.436939] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x000157, prod_id: 0x0000, dev_id: 0x23)
[    8.719772] r8169 0000:06:00.0 enp6s0: Link is Down
[    9.479422] ipmi_si IPI0001:00: IPMI kcs interface initialized
[   11.459350] r8169 0000:06:00.0 enp6s0: Link is Up - 1Gbps/Full - flow control rx/tx
[   11.459446] IPv6: ADDRCONF(NETDEV_CHANGE): enp6s0: link becomes ready
Comment 14 WANG Xuerui 2023-02-02 04:45:11 UTC
(In reply to Bjorn Helgaas from comment #11)
> Can somebody explain why this behavior changed between v6.1 and v6.2?
> 
> I applied the MRRS restriction patch
> (https://lore.kernel.org/r/20230106095143.3158998-2-chenhuacai@loongson.cn),
> but if it's really a regression in v6.2, I can try to merge it earlier.

Sorry for not replying immediately, but IMO this is technically not a regression: the hardware errata is there forever and Linux behavior never changed in this regard. So I personally think it's fine waiting till the next merge window which is close enough to not bother people. And thanks for your response!
Comment 15 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-02-03 09:23:06 UTC
(In reply to WANG Xuerui from comment #14)
> Sorry for not replying immediately, but IMO this is technically not a
> regression: the hardware errata is there forever and Linux behavior never
> changed in this regard.

Well, kinda, as the reporter in comment #2 clearly states that 6.1 has been working. So I assume some other change exposed the issue to happen, which afaics leaves us with these options:

* apply the fix soon to resolve the issue, even if it's strictly just a side-effect that the change is fixing the regression reported here  
* force the reporter to bisect, so we find the change that exposed it
* once reviewed and acked, merge the fix in the next merge window wtih a CC <stable.. to get it backported to 6.1.y -- which likely means that it will be backported right after 6.3-rc1 is out (which in the end might mean that the change gets less tested than merging it quickly now)
Comment 16 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-02-03 09:32:44 UTC
(In reply to HougeLangley from comment #2)

> Yes, this is 6.2, 6.1 is working

FWIW, in 
https://lore.kernel.org/all/CAAhV-H6L3V8M4igCWBH=PzuDcoH0KreWkfqHexQwB2v+2TSi=A@mail.gmail.com/
this was said:
```
Yes, this patch can fix that issue. But I don't think this is a
regression, vanila 6.1 kernel also has this problem, maybe the
reporter uses a patched 6.1 kernel.
```
Comment 17 timo 2023-02-28 00:11:15 UTC
I think I have the same issue on 6.1, with two different NICs. 

-----

First one (lan0 = Intel 82579V NIC):

[900148.768008] ------------[ cut here ]------------
[900148.768029] NETDEV WATCHDOG: lan0 (e1000e): transmit queue 0 timed out
[900148.768049] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x207/0x210
[900148.768059] Modules linked in: ppp_async ppp_generic slhc mptcp_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag ip_set_hash_net xt_set ip_set_hash_ip ip_set xt_REDIRECT xt_nat xt_TCPMSS xt_MASQUERADE xt_conntrack xt_tcpudp nft_compat nft_chain_nat nf_tables nfnetlink binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nls_ascii nls_cp437 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp vfat fat coretemp kvm_intel kvm i915 joydev snd_hda_intel irqbypass snd_intel_dspcfg snd_intel_sdw_acpi drm_buddy snd_hda_codec drm_display_helper snd_hda_core ghash_clmulni_intel cryptd snd_hwdep hid_generic ppdev cec sha512_ssse3 mei_wdt mei_hdcp evdev rc_core sha512_generic usbhid ttm snd_pcm rapl intel_cstate iTCO_wdt intel_pmc_bxt iTCO_vendor_support hid intel_uncore mei_me at24 pcspkr snd_timer drm_kms_helper watchdog snd mei i2c_algo_bit soundcore parport_pc parport button sg tcp_bbr sch_fq wireguard libchacha20poly1305
[900148.768273]  chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_nat_pptp nf_conntrack_pptp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c loop msr drivetemp fuse drm efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci libahci xhci_pci libata xhci_hcd crct10dif_pclmul crct10dif_common crc32_pclmul scsi_mod crc32c_intel e1000e i2c_i801 i2c_smbus ehci_pci ehci_hcd scsi_common lpc_ich usbcore r8169 realtek mdio_devres ptp pps_core libphy usb_common fan video wmi
[900148.768427] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-5-amd64 #1  Debian 6.1.12-1
[900148.768433] Hardware name:  /DB75EN, BIOS ENB7510H.86A.0046.2013.0704.1354 07/04/2013
[900148.768437] RIP: 0010:dev_watchdog+0x207/0x210
[900148.768443] Code: 00 e9 40 ff ff ff 48 89 df c6 05 9d ea 5d 01 01 e8 fe 7a f9 ff 44 89 e9 48 89 de 48 c7 c7 40 3e 1a b1 48 89 c2 e8 3f c2 1a 00 <0f> 0b e9 22 ff ff ff 66 90 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f
[900148.768448] RSP: 0018:ffffc3a9400e4e80 EFLAGS: 00010286
[900148.768454] RAX: 0000000000000000 RBX: ffffa0f7e215c000 RCX: 0000000000000000
[900148.768458] RDX: 0000000000000103 RSI: ffffffffb1133216 RDI: 00000000ffffffff
[900148.768462] RBP: ffffa0f7e215c488 R08: 0000000000000000 R09: ffffc3a9400e4d08
[900148.768465] R10: 0000000000000003 R11: ffffffffb1ad43a8 R12: ffffa0f7e215c3dc
[900148.768469] R13: 0000000000000000 R14: ffffffffb06255c0 R15: ffffa0f7e215c488
[900148.768473] FS:  0000000000000000(0000) GS:ffffa0f8d6500000(0000) knlGS:0000000000000000
[900148.768477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[900148.768481] CR2: 000056024825e280 CR3: 00000001d0010003 CR4: 00000000000606e0
[900148.768486] Call Trace:
[900148.768491]  <IRQ>
[900148.768496]  ? pfifo_fast_reset+0x140/0x140
[900148.768504]  call_timer_fn+0x27/0x130
[900148.768511]  __run_timers+0x21c/0x2a0
[900148.768519]  run_timer_softirq+0x2b/0x50
[900148.768525]  __do_softirq+0xf0/0x2fe
[900148.768531]  __irq_exit_rcu+0xc7/0x130
[900148.768538]  sysvec_apic_timer_interrupt+0x9e/0xc0
[900148.768545]  </IRQ>
[900148.768548]  <TASK>
[900148.768551]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[900148.768559] RIP: 0010:cpuidle_enter_state+0xde/0x420
[900148.768566] Code: 00 00 31 ff e8 03 2e 98 ff 45 84 ff 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 25 03 00 00 31 ff e8 08 e9 9e ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[900148.768570] RSP: 0018:ffffc3a9400afe90 EFLAGS: 00000246
[900148.768575] RAX: ffffa0f8d6500000 RBX: ffffa0f8d653bd00 RCX: 0000000000000000
[900148.768579] RDX: 0000000000000001 RSI: ffffffffb1133216 RDI: ffffffffb110c815
[900148.768582] RBP: 0000000000000004 R08: 0000000000000004 R09: 000000002ac37c0f
[900148.768585] R10: 0000000000000008 R11: 0000000000008285 R12: ffffffffb1b9e6c0
[900148.768589] R13: 000332ae378fc1ed R14: 0000000000000004 R15: 0000000000000000
[900148.768596]  cpuidle_enter+0x29/0x40
[900148.768601]  do_idle+0x20c/0x2b0
[900148.768608]  cpu_startup_entry+0x19/0x20
[900148.768615]  start_secondary+0x11a/0x140
[900148.768621]  secondary_startup_64_no_verify+0xe5/0xeb
[900148.768630]  </TASK>
[900148.768633] ---[ end trace 0000000000000000 ]---
[900148.771441] e1000e 0000:00:19.0 lan0: Reset adapter unexpectedly
[900152.629952] e1000e 0000:00:19.0 lan0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

-----

Second one (lan1 = Realtek RTL8168 NIC):

[  405.046724] ------------[ cut here ]------------
[  405.046748] NETDEV WATCHDOG: lan1 (r8169): transmit queue 0 timed out
[  405.046769] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x207/0x210
[  405.046779] Modules linked in: ip_set_hash_net xt_set ip_set_hash_ip ip_set xt_REDIRECT xt_nat xt_TCPMSS xt_MASQUERADE xt_conntrack xt_tcpudp nft_compat nft_chain_nat nf_tables nfnetlink intel_rapl_msr intel_rapl_common binfmt_misc x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp nls_ascii snd_hda_codec_generic ledtrig_audio nls_cp437 kvm_intel vfat fat kvm joydev irqbypass i915 snd_hda_intel snd_intel_dspcfg ghash_clmulni_intel snd_intel_sdw_acpi cryptd snd_hda_codec hid_generic drm_buddy sha512_ssse3 drm_display_helper mei_hdcp snd_hda_core usbhid mei_wdt snd_hwdep ppdev cec hid rc_core evdev sha512_generic snd_pcm rapl ttm intel_cstate iTCO_wdt intel_pmc_bxt mei_me iTCO_vendor_support intel_uncore at24 snd_timer mei pcspkr watchdog drm_kms_helper snd i2c_algo_bit soundcore parport_pc parport button sg tcp_bbr sch_fq wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nf_nat_pptp
[  405.046969]  nf_conntrack_pptp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c loop msr drivetemp drm configfs fuse efi_pstore efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci xhci_pci ehci_pci libahci xhci_hcd crct10dif_pclmul crct10dif_common e1000e r8169 ehci_hcd libata crc32_pclmul realtek mdio_devres usbcore scsi_mod crc32c_intel i2c_i801 i2c_smbus scsi_common lpc_ich ptp libphy pps_core usb_common fan video wmi
[  405.047090] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-5-amd64 #1  Debian 6.1.12-1
[  405.047095] Hardware name:  /DB75EN, BIOS ENB7510H.86A.0046.2013.0704.1354 07/04/2013
[  405.047099] RIP: 0010:dev_watchdog+0x207/0x210
[  405.047104] Code: 00 e9 40 ff ff ff 48 89 df c6 05 9d ea 5d 01 01 e8 fe 7a f9 ff 44 89 e9 48 89 de 48 c7 c7 40 3e 3a 96 48 89 c2 e8 3f c2 1a 00 <0f> 0b e9 22 ff ff ff 66 90 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f
[  405.047109] RSP: 0018:ffffb423c0003e80 EFLAGS: 00010286
[  405.047115] RAX: 0000000000000000 RBX: ffff98d28cb64000 RCX: 0000000000000000
[  405.047119] RDX: 0000000000000103 RSI: ffffffff96333216 RDI: 00000000ffffffff
[  405.047123] RBP: ffff98d28cb64488 R08: 0000000000000000 R09: ffffb423c0003d08
[  405.047126] R10: 0000000000000003 R11: ffffffff96cd43a8 R12: ffff98d28cb643dc
[  405.047130] R13: 0000000000000000 R14: ffffffff958255c0 R15: ffff98d28cb64488
[  405.047133] FS:  0000000000000000(0000) GS:ffff98d396400000(0000) knlGS:0000000000000000
[  405.047138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.047141] CR2: 000055e2f7997c00 CR3: 0000000102810004 CR4: 00000000000606f0
[  405.047146] Call Trace:
[  405.047151]  <IRQ>
[  405.047156]  ? pfifo_fast_reset+0x140/0x140
[  405.047161]  call_timer_fn+0x27/0x130
[  405.047168]  __run_timers+0x21c/0x2a0
[  405.047176]  run_timer_softirq+0x2b/0x50
[  405.047181]  __do_softirq+0xf0/0x2fe
[  405.047188]  __irq_exit_rcu+0xc7/0x130
[  405.047195]  sysvec_apic_timer_interrupt+0x9e/0xc0
[  405.047202]  </IRQ>
[  405.047205]  <TASK>
[  405.047208]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[  405.047216] RIP: 0010:cpuidle_enter_state+0xde/0x420
[  405.047222] Code: 00 00 31 ff e8 03 2e 98 ff 45 84 ff 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 25 03 00 00 31 ff e8 08 e9 9e ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[  405.047226] RSP: 0018:ffffffff96c03e48 EFLAGS: 00000246
[  405.047232] RAX: ffff98d396400000 RBX: ffff98d39643bd00 RCX: 0000000000000000
[  405.047235] RDX: 0000000000000000 RSI: ffffffff96333216 RDI: ffffffff9630c815
[  405.047239] RBP: 0000000000000004 R08: 0000000000000002 R09: 000000002ac37c0f
[  405.047242] R10: 0000000000000018 R11: 0000000000008087 R12: ffffffff96d9e6c0
[  405.047245] R13: 0000005e4eaa57cd R14: 0000000000000004 R15: 0000000000000000
[  405.047252]  ? cpuidle_enter_state+0xbd/0x420
[  405.047258]  cpuidle_enter+0x29/0x40
[  405.047263]  do_idle+0x20c/0x2b0
[  405.047270]  cpu_startup_entry+0x19/0x20
[  405.047275]  rest_init+0xcb/0xd0
[  405.047281]  arch_call_rest_init+0xa/0x14
[  405.047290]  start_kernel+0x6fe/0x727
[  405.047297]  secondary_startup_64_no_verify+0xe5/0xeb
[  405.047306]  </TASK>
[  405.047308] ---[ end trace 0000000000000000 ]---
Comment 18 Heiner Kallweit 2023-02-28 05:26:57 UTC
(In reply to timo from comment #17)
> I think I have the same issue on 6.1, with two different NICs. 
> 
I don't think so. This bug report is about Loongson architecture.
You have an old system and supposedly some ASPM incompatibility.
Comment 19 Bjorn Helgaas 2023-03-10 16:59:52 UTC
Based on comment #9, I assume the problem HougeLangley reported is related to the Loongson MRRS restrictions.  There are several commits related to that:

  https://git.kernel.org/linus/1f58cca5cf2b ("PCI: Add Loongson PCI Controller support")
  https://git.kernel.org/linus/8b3517f88ff2 ("PCI: loongson: Prevent LS7A MRRS increases")
  https://git.kernel.org/linus/c768f8c5f40f ("PCI: loongson: Add more devices that need MRRS quirk")

1f58cca5cf2b appeared in v5.8.  The others were recently merged in v6.3-rc1, so as far as I know, the issue on Loongson *should* now be resolved, and we can close this unless HougeLangley says the problem still happens in v6.3-rc1 or later.

I assume the problem timo is seeing is different.  Timo, can you open a new report (either bugzilla or, if you suspect a network issue, directly on netdev@vger.kernel.org and cc: the relevant NIC maintainers)?

If you suspect a PCI issue, e.g., ASPM, you can try to verify that by booting with "pcie_aspm=off" or by disabling ASPM with the sysfs interface: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-pci?id=v6.2#n420
Comment 20 timo 2023-03-12 15:25:15 UTC
Sorry for the confusion; my reply was clearly posted in the wrong thread.

In my case, the errors happened when I connected a wireless router (TP-Link Archer C9) to these NICs of my server. Initially, I thought the error was in the Linux kernel of the server. However, I have recently replaced the wireless router by a different model and since then, I don't see the 'transmit queue 0 timed out' errors anymore. Not sure what happened; maybe the old device was sending repeated pause frames? Anyhow, it seems to be solved.

Note You need to log in before you can comment on or make changes to this bug.