Bug 198931

Summary: Network connection on r8152 stops with "Tx status -71"
Product: Drivers Reporter: Jean-Louis Dupond (jean-louis)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: normal CC: aelschuring, anarcat, bugkernel, danny, dion, doubeon1, elia.f.geretto, francesco.giudici, hi, hijacker, intelligence.dance, jcollins, jean-louis, jwrdegoede, kernel, koema, konstantin.sobolev, linux, main.haarp, mango, marctraider, michiel, mliska, olebowle, ongun.kanat+kernelbugzilla, p.s.vanderheide, pdecat, peter.hahn, peter_hayman, prashanthk0539, richard, ries.infotec+kernel, Rob.Tetour, russianneuromancer, smihael, stian.skjelstad, sundman, ted437, timur.kristof, tomasmark79, truls, vasvir2, vpsink, weberkai, ydewid
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.16-rc2 (drm-tip) Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg of Linux 4.17.0
dmesg of Linux 4.19rc8
dmesg of Linux 5.4.0
lsusb output mjanssens wd15 xps9360
WARNING: at net/sched/sch_generic.c:467 dev_watchdog+0x24d/0x260
dmesg of Linux 6.8.0-rc7

Description Jean-Louis Dupond 2018-02-25 13:03:56 UTC
Hi All!

I have the following setup:
Precision 5520
Dell WD15 Docking

On the WD15 docking there is an network port r8152.

Now once in a while (like 1 time a week), the connection dies, and kernel prints the following message:
kernel: [ 6164.073282] r8152 4-1.2:1.0 enxa44cc890f4c8: Tx status -71

simply replugging the dock, or disable/enable the network interface fixes the problem.

Question is, how comes this appear :)

Feel free to ask for additional information!
Comment 1 RussianNeuroMancer 2018-10-19 13:23:48 UTC
Seems like a have same issue on Dell Latitude 7285 and HP EliteBook Folio G1 with Belkin USB-C Express Dock 3.1 HD F4U093:

[ 1090.235874] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[ 1090.235879] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 1090.235886] pcieport 0000:00:1c.0:   device [8086:9d10] error status/mask=00003000/00002000
[ 1090.235889] pcieport 0000:00:1c.0:    [12] Timeout               
[ 1589.760804] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx status -2
[ 1594.048998] ------------[ cut here ]------------
[ 1594.049003] NETDEV WATCHDOG: enx58ef68a8892b (r8152): transmit queue 0 timed out
[ 1594.049040] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:461 dev_watchdog+0x221/0x230
[ 1594.049042] Modules linked in: [...]
[ 1594.049294] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.19.0-041900rc8-generic #201810150631
[ 1594.049296] Hardware name: Dell Inc. Latitude 7285/0VVWNX, BIOS 1.2.0 07/09/2018
[ 1594.049303] RIP: 0010:dev_watchdog+0x221/0x230
[ 1594.049307] Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 26 ff f5 00 01 e8 c3 b6 fc ff 89 d9 4c 89 ee 48 c7 c7 08 2d 7b 82 48 89 c2 e8 61 26 7b ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
[ 1594.049310] RSP: 0018:ffffc2438192bd70 EFLAGS: 00010282
[ 1594.049314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 1594.049317] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffffa0341e216420
[ 1594.049320] RBP: ffffc2438192bda0 R08: 0000000000000001 R09: 0000000000000511
[ 1594.049322] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000001
[ 1594.049325] R13: ffffa033fac37000 R14: ffffa033fac374c0 R15: ffffa034166cf680
[ 1594.049329] FS:  0000000000000000(0000) GS:ffffa0341e200000(0000) knlGS:0000000000000000
[ 1594.049332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1594.049335] CR2: 00007f26eec71020 CR3: 000000038a20a005 CR4: 00000000003606f0
[ 1594.049340] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1594.049342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1594.049344] Call Trace:
[ 1594.049356]  ? pfifo_fast_change_tx_queue_len+0x2e0/0x2e0
[ 1594.049363]  call_timer_fn+0x30/0x130
[ 1594.049371]  run_timer_softirq+0x3ea/0x420
[ 1594.049376]  ? __switch_to_asm+0x34/0x70
[ 1594.049381]  ? __switch_to+0xad/0x500
[ 1594.049385]  ? __switch_to_asm+0x40/0x70
[ 1594.049388]  ? __switch_to_asm+0x34/0x70
[ 1594.049392]  ? __switch_to_asm+0x40/0x70
[ 1594.049397]  __do_softirq+0xdc/0x2b5
[ 1594.049403]  run_ksoftirqd+0x2b/0x40
[ 1594.049410]  smpboot_thread_fn+0xd0/0x170
[ 1594.049416]  kthread+0x120/0x140
[ 1594.049421]  ? sort_range+0x30/0x30
[ 1594.049426]  ? kthread_bind+0x40/0x40
[ 1594.049431]  ret_from_fork+0x35/0x40
[ 1594.049435] ---[ end trace 3fcb83dc58402212 ]---
[ 1594.049468] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout
[ 1599.172616] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout
[ 1604.288619] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout
[ 1610.176579] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout

Full logs:
Comment 2 RussianNeuroMancer 2018-10-19 13:24:37 UTC
Created attachment 279099 [details]
dmesg of Linux 4.17.0
Comment 3 RussianNeuroMancer 2018-10-19 13:25:08 UTC
Created attachment 279101 [details]
dmesg of Linux 4.19rc8
Comment 4 RussianNeuroMancer 2018-11-28 11:43:09 UTC
Jean-Louis, can you please verify if issue is still reproducible for you on Linux 4.20rc4? For me, at least with one dock (Belkin USB-C Express Dock 3.1 HD F4U093) and one device (HP Elite x2 1013 G3) this issue is no longer reproducible. I will verify other laptops with Linux 4.20 later.
Comment 5 Jean-Louis Dupond 2018-12-04 12:14:35 UTC
I haven't seen this the last months. Running Ubuntu 18.10 with 4.18.0-11-generic
Comment 6 Konstantin Sobolev 2019-12-04 23:22:05 UTC
I have a very similar setup: Dell Precision 7540 with WD19DC dock that has RTL8153 adapter. It crashes periodically with similar symptoms, my current kernel is 5.4.1

[76658.437411] ------------[ cut here ]------------
[76658.437412] NETDEV WATCHDOG: enp57s0u2u4 (r8152): transmit queue 0 timed out
[76658.437421] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x21f/0x230
[76658.437421] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi r8152 mii tun md4 cifs dm_zero fuse raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_crypt dm_mirror dm_region_hash dm_log dm_mod dax ohci_pci ohci_hcd uhci_hcd ehci_pci ehci_hcd mousedev hid_multitouch dell_rbtn input_leds dell_laptop dell_wmi dell_smbios i2c_designware_platform atkbd rtsx_pci_sdmmc mmc_core mei_hdcp i2c_designware_core dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof intel_rapl_msr libps2 dcdbas dell_smm_hwmon btusb btrtl btbcm uvcvideo x86_pkg_temp_thermal videobuf2_vmalloc btintel intel_powerclamp videobuf2_memops coretemp videobuf2_v4l2 ucsi_acpi bluetooth processor_thermal_device videodev intel_lpss_pci typec_ucsi mei_me i2c_i801 rtsx_pci intel_soc_dts_iosf ecdh_generic intel_lpss mei mfd_core ecc videobuf2_common intel_rapl_common intel_pch_thermal typec wmi i8042 int3403_thermal int3400_thermal i2c_hid dell_smo8800
[76658.437439]  int340x_thermal_zone serio acpi_thermal_rel intel_pmc_core evdev i915
[76658.437441] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            5.4.1-gentoo #6
[76658.437442] Hardware name: Dell Inc. Precision 7540/0CYJDT, BIOS 1.4.0 09/23/2019
[76658.437443] RIP: 0010:dev_watchdog+0x21f/0x230
[76658.437444] Code: 85 c0 75 e8 eb a8 4c 89 ef c6 05 5d 62 b3 00 01 e8 e6 c8 fc ff 44 89 e1 4c 89 ee 48 c7 c7 48 c1 49 9f 48 89 c2 e8 ea 11 8a ff <0f> 0b eb 89 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 c7 47 08 00
[76658.437444] RSP: 0018:ffffb139401b8e80 EFLAGS: 00010282
[76658.437445] RAX: 0000000000000000 RBX: ffff9f890e1a2a00 RCX: 00000000000011d4
[76658.437445] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffffa39e53ac
[76658.437446] RBP: ffff9f8914cc7440 R08: 0000000000000001 R09: 00000000000011d4
[76658.437446] R10: 0000000000028978 R11: 0000000000000001 R12: 0000000000000000
[76658.437447] R13: ffff9f8914cc7000 R14: ffff9f8914cc7440 R15: 0000000000000001
[76658.437447] FS:  0000000000000000(0000) GS:ffff9f891c080000(0000) knlGS:0000000000000000
[76658.437448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76658.437448] CR2: 00007fc9000030b8 CR3: 00000009c4384006 CR4: 00000000003606e0
[76658.437448] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76658.437449] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76658.437449] Call Trace:
[76658.437450]  <IRQ>
[76658.437452]  ? qdisc_put_unlocked+0x30/0x30
[76658.437454]  call_timer_fn+0x26/0x120
[76658.437454]  run_timer_softirq+0x17d/0x470
[76658.437456]  ? enqueue_hrtimer+0x31/0x80
[76658.437457]  ? __hrtimer_run_queues+0x11b/0x260
[76658.437458]  __do_softirq+0xd6/0x2ba
[76658.437460]  irq_exit+0x9b/0xa0
[76658.437461]  smp_apic_timer_interrupt+0x5b/0x110
[76658.437462]  apic_timer_interrupt+0xf/0x20
[76658.437462]  </IRQ>
[76658.437464] RIP: 0010:cpuidle_enter_state+0xa8/0x400
[76658.437464] Code: c5 0f 1f 44 00 00 31 ff e8 85 fb 9b ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 2d 03 00 00 31 ff e8 7c d1 a0 ff fb 45 85 e4 <0f> 88 6c 02 00 00 4c 2b 2c 24 49 63 cc 48 8d 04 49 48 c1 e0 05 8b
[76658.437465] RSP: 0018:ffffb139400dfe70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[76658.437465] RAX: ffff9f891c0a7bc0 RBX: ffffffff9f6a1ce0 RCX: 000045b86eed58ba
[76658.437466] RDX: 000045b86efc9af4 RSI: 000045b86eed58ba RDI: 0000000000000000
[76658.437466] RBP: ffffd1393fab4a10 R08: 000045b86eed58d6 R09: 00000000000001bf
[76658.437466] R10: ffff9f891c0a6c20 R11: ffff9f891c0a6c00 R12: 0000000000000002
[76658.437467] R13: 000045b86eed58d6 R14: 0000000000000002 R15: ffff9f8915f5c740
[76658.437468]  cpuidle_enter+0x24/0x40
[76658.437470]  do_idle+0x1bf/0x230
[76658.437471]  cpu_startup_entry+0x14/0x20
[76658.437472]  start_secondary+0x131/0x160
[76658.437473]  secondary_startup_64+0xa4/0xb0
[76658.437474] ---[ end trace 907e490a0cd3c160 ]---
[76658.437476] r8152 4-2.4:1.0 enp57s0u2u4: Tx timeout
[76659.788078] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=83035 end=83036) time 243 us, min 1431, max 1439, scanline start 1421, end 1443
[76660.958672] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
[76660.958758] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
[76660.958848] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
[76660.958940] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
Comment 7 Peter 2020-01-27 13:20:07 UTC
I have a similiar setup and similiar problem:
Setup: Lenovo Thinkpad t480, Think Pad USB-C Dock 40A90090EU [1], Ubuntu 16.04, Kernel  4.15.0-74-generic #83~16.04.1-Ubuntu

Network connection is periodically crashing.
Dmesg shows `r8152 4-1.1:1.0 enxe04f43991e1c: Rx status -71` in that case. 
I noticed that this seams to depend on the use of the network connection. E.g. if I compile a lot using icecream to distribute compilation jobs, it seams to be a lot less stable. 

Using `rmmod r8152 && modprobe r8152` fixes the problem temporarily. 


[1] https://support.lenovo.com/de/de/accessories/acc100348
Comment 8 RussianNeuroMancer 2020-01-27 14:00:29 UTC
@Peter check Comment 4
Re-test on newer kernel (you can take it from mainline PPA).
Comment 9 Timur Kristóf 2020-03-02 12:08:11 UTC
This still happens to me on 5.5.6-201.fc31.x86_64. My dmesg is full of these messages:

[12696.189484] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12702.333456] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12707.965422] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12713.085385] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12718.205360] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12724.349321] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12729.981295] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12735.101256] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12740.221235] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12746.365199] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12751.997171] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12757.117155] r8152 6-1:1.0 enp10s0u1: Tx timeout
Comment 10 RussianNeuroMancer 2020-03-03 08:06:00 UTC
Timur, you are using same docking station as Jean-Louis or some other?
Comment 11 Timur Kristóf 2020-03-03 12:47:07 UTC
RussianNeuroMancer, I use a Dell XPS 13 9370 with a Lenovo ThinkPad branded Thunderbolt 3 dock. The model number is DBB9003L1. (The dock is not mine, I'm just borrowing it from a collegaue for a week.) I think these docks mostly use the same hardware under the hood, I think I've also seen a Fedora bug report about the same issue with the Dell TB16 here: https://bugzilla.redhat.com/show_bug.cgi?id=1460789
Comment 12 RussianNeuroMancer 2020-03-03 13:01:35 UTC
I see. By the way, since my Comment 4 I was able to reproduce this issue again. This time with Linux 5.4 on Dell Venue 8 Pro 5855 and Dell WD15 Dock.
Comment 13 RussianNeuroMancer 2020-03-03 14:13:28 UTC
Created attachment 287779 [details]
dmesg of Linux 5.4.0
Comment 14 BniceJada 2020-03-05 14:03:27 UTC
Same problem here. Dell Latitude 7480 (BIOS 1.16.1) with WD15 dock (Port Controller on v1.1.8). I am using 5.5.7-zen1-1-zen (but the same problem also occured with the standard arch kernel).

It has not occured with 5.4.2.arch1-1 but it for sure occured with 5.4.5.arch1-1 (I had holidays inbetween and the troubles started after them).

--
Mar 05 13:42:34 hostname kernel: ------------[ cut here ]------------
Mar 05 13:42:34 hostname kernel: NETDEV WATCHDOG: enp59s0u1u2 (r8152): transmit queue 0 timed out
Mar 05 13:42:34 hostname kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x268/0x270
Mar 05 13:42:34 hostname kernel: Modules linked in: md4 nls_utf8 cifs dns_resolver fscache libdes rfcomm ip6t_REJECT nf_reject_ipv6 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_na>
Mar 05 13:42:34 hostname kernel:  coretemp snd_hda_codec_generic ledtrig_audio kvm_intel snd_pcm_dmaengine snd_hda_intel dell_wmi_descriptor dcdbas snd_intel_dspcfg dell_smm_hwmon snd_hda_codec kvm cfg80211 snd_hda_core snd_hwdep snd_pcm e1000e fuse irqbypass i>
Mar 05 13:42:34 hostname kernel:  libps2 aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_hcd rtsx_pci i8042 serio i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_agp intel_gtt agpgart btrfs blake2b_generic libcr>
Mar 05 13:42:34 hostname kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.7-zen1-1-zen #1
Mar 05 13:42:34 hostname kernel: Hardware name: Dell Inc. Latitude 7480/00F6D3, BIOS 1.16.1 10/03/2019
Mar 05 13:42:34 hostname kernel: RIP: 0010:dev_watchdog+0x268/0x270
Mar 05 13:42:34 hostname kernel: Code: 47 9c 69 ff eb 8a 4c 89 f7 c6 05 dc 05 db 00 01 e8 0d fa f9 ff 44 89 e9 4c 89 f6 48 c7 c7 d0 2a 5a 8e 48 89 c2 e8 0f 92 73 ff <0f> 0b e9 68 ff ff ff 90 0f 1f 44 00 00 48 c7 47 08 00 00 00 00 48
Mar 05 13:42:34 hostname kernel: RSP: 0018:ffffb39300164e60 EFLAGS: 00010286
Mar 05 13:42:34 hostname kernel: RAX: 0000000000000000 RBX: ffff8cdc200b2000 RCX: 0000000000000000
Mar 05 13:42:34 hostname kernel: RDX: 0000000000000103 RSI: 00000000000000f6 RDI: 00000000ffffffff
Mar 05 13:42:34 hostname kernel: RBP: ffff8cdc0e5bf45c R08: 0000000000000515 R09: 0000000000000003
Mar 05 13:42:34 hostname kernel: R10: 0000000000000001 R11: 0000000000003c00 R12: ffff8cdc0e5bf480
Mar 05 13:42:34 hostname kernel: R13: 0000000000000000 R14: ffff8cdc0e5bf000 R15: ffff8cdc200b2080
Mar 05 13:42:34 hostname kernel: FS:  0000000000000000(0000) GS:ffff8cdc26500000(0000) knlGS:0000000000000000
Mar 05 13:42:34 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 05 13:42:34 hostname kernel: CR2: 00007fe15a0d3000 CR3: 000000019f20a001 CR4: 00000000003606e0
Mar 05 13:42:34 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 05 13:42:34 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 05 13:42:34 hostname kernel: Call Trace:
Mar 05 13:42:34 hostname kernel:  <IRQ>
Mar 05 13:42:34 hostname kernel:  ? qdisc_put_unlocked+0x30/0x30
Mar 05 13:42:34 hostname kernel:  ? qdisc_put_unlocked+0x30/0x30
Mar 05 13:42:34 hostname kernel:  call_timer_fn+0x2d/0x150
Mar 05 13:42:34 hostname kernel:  ? qdisc_put_unlocked+0x30/0x30
Mar 05 13:42:34 hostname kernel:  run_timer_softirq+0xaec/0xce0
Mar 05 13:42:34 hostname kernel:  __do_softirq+0x111/0x374
Mar 05 13:42:34 hostname kernel:  ? hrtimer_interrupt+0x235/0x3e0
Mar 05 13:42:34 hostname kernel:  irq_exit+0xc9/0x120
Mar 05 13:42:34 hostname kernel:  smp_apic_timer_interrupt+0xa6/0x1a0
Mar 05 13:42:34 hostname kernel:  apic_timer_interrupt+0xf/0x20
Mar 05 13:42:34 hostname kernel:  </IRQ>
Mar 05 13:42:34 hostname kernel: RIP: 0010:cpuidle_enter_state+0xc9/0x850
Mar 05 13:42:34 hostname kernel: Code: e8 8c b0 85 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 00 06 00 00 31 ff e8 3e 09 8d ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 1f 04 00 00 49 63 d4 4c 2b 6c 24 10 48 8d 04 52 48
Mar 05 13:42:34 hostname kernel: RSP: 0018:ffffb393000dbe50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Mar 05 13:42:34 hostname kernel: RAX: ffff8cdc26500000 RBX: ffff8cdc26537800 RCX: 000000000000001f
Mar 05 13:42:34 hostname kernel: RDX: 0000000000000000 RSI: 000000002f32988b RDI: 0000000000000000
Mar 05 13:42:34 hostname kernel: RBP: ffffffff8e8bea60 R08: 00000a3f27ed44df R09: 00000a3f251f7ba7
Mar 05 13:42:34 hostname kernel: R10: 0000000000000007 R11: 0000000000000007 R12: 0000000000000008
Mar 05 13:42:34 hostname kernel: R13: 00000a3f27ed44df R14: 0000000000000008 R15: ffff8cdc22a98000
Mar 05 13:42:34 hostname kernel:  cpuidle_enter+0x29/0x40
Mar 05 13:42:34 hostname kernel:  do_idle+0x20c/0x2c0
Mar 05 13:42:34 hostname kernel:  cpu_startup_entry+0x19/0x20
Mar 05 13:42:34 hostname kernel:  start_secondary+0x1c6/0x220
Mar 05 13:42:34 hostname kernel:  secondary_startup_64+0xb6/0xc0
Mar 05 13:42:34 hostname kernel: ---[ end trace 358d3d81e0691439 ]---
Mar 05 13:42:34 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:40 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:46 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:51 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:56 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Comment 15 Jamin W. Collins 2020-03-08 20:43:20 UTC
I've been encountering this problem with every relatively recent (4.9+) kernel, and possibly older ones as well.

System: Lenovo W530

USB adapter: 
Cable Matters 3 Port USB 3.0 Hub with Ethernet (USB Hub with Ethernet, Gigabit Ethernet USB Hub ) Supporting 10/100/1000 Mbps Ethernet Network in Black
https://smile.amazon.com/gp/product/B01J6583NK/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

I've encountered the problem with Arch's main linux kernels and their LTS builds.

The interface seems to have trouble once it is put under any sort of load (30% or more utilization) on the host system. Removing and reloading the module can sometimes temporarily improve things, but (from what I've seen) the issue always returns within a few minutes to an hour.
Comment 17 RussianNeuroMancer 2020-03-09 18:26:12 UTC
So if it's same issue there is at least several workarounds: 

"usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link)
Install tlp and change USB_BLACKLIST option in /etc/default/tlp to "0bda:8153" (from second askubuntu link)
Patch /drivers/usb/core/quirks.c with following line (mentioned in tlp bugreport)
{ USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

Unfortunately, this week I doesn't have access to Dell WD15 docking station. Is any else can try at least first or second workaround?
Comment 18 BniceJada 2020-03-12 05:47:10 UTC
(In reply to RussianNeuroMancer from comment #17)
> So if it's same issue there is at least several workarounds: 
> 
> "usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link)
> Install tlp and change USB_BLACKLIST option in /etc/default/tlp to
> "0bda:8153" (from second askubuntu link)
> Patch /drivers/usb/core/quirks.c with following line (mentioned in tlp
> bugreport)
> { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

I can confirm that blacklisting "0bda:8153" for USB_BLACKLIST in my tlp.conf seems to work fine for me. Prior to this change I lost network connection each night and now I have connection straight for the last two nights (three days)
Comment 19 Peter 2020-03-13 10:20:51 UTC
(In reply to RussianNeuroMancer from comment #17)

> "usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link)

I can confirm that adding "usbcore.quirks=0bda:8153:k" to kernel boot options worked for me.
Comment 20 Hans de Goede 2020-03-13 11:04:25 UTC
So reading through this bug report, the solution, or at least a workaround would seem to be to add USB_QUIRK_NO_LPM entries for the troublesome rtl8152 / rtl8153 based ethernet adapters to drivers/usb/core/quirks.c. There actually already is at least one line in there for a dock with a r8153 nic:

        /* Microsoft Surface Dock Ethernet (RTL8153 GigE) */
        { USB_DEVICE(0x045e, 0x07c6), .driver_info = USB_QUIRK_NO_LPM },

There is mention of several docks here; but upon checking various logs, they all seem to use the generic realtek usb-id for the RTL8153 GigE NIC.

So it seems that the solution is adding the following lines to: drivers/usb/core/quirks.c :

        /* Generic RTL8153 GigE adapters */
        { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

I will submit a patch upstream for this.
Comment 21 Peter 2020-03-13 11:37:49 UTC
Unfortunately I was to enthusiastic about this. 
I wrote my comment after 1 day of working and 1 night of downloading huge amount of big data without problems. But after that using icecream distributed compiler daemon again crashed my connection. 

So it seams to be better but not solved for me.
Comment 22 Hans de Goede 2020-03-13 12:04:16 UTC
(In reply to Peter from comment #21)
> Unfortunately I was to enthusiastic about this. 
> I wrote my comment after 1 day of working and 1 night of downloading huge
> amount of big data without problems. But after that using icecream
> distributed compiler daemon again crashed my connection. 
> 
> So it seams to be better but not solved for me.

I'm sorry to hear that the issue is not 100% resolved. Still I've found enough other bug-reports where people are having success with this option when used with a RTL813 device, that I believe that it is worthwhile to submit a patch for this upstream, see. e.g. :
https://bugzilla.redhat.com/show_bug.cgi?id=1713657
Comment 23 RussianNeuroMancer 2020-03-13 13:29:52 UTC
> https://bugzilla.redhat.com/show_bug.cgi?id=1713657

I wonder why blacklist in tlp didn't help him, but usbcore.quirks does.
Comment 24 RussianNeuroMancer 2020-03-13 14:21:22 UTC
> But after that using icecream distributed compiler daemon again crashed my
> connection. 

> So it seams to be better but not solved for me.

Try this:

1. remove lines 737-737 here https://github.com/torvalds/linux/blob/0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733

2. remove lines 6900 and 6901 here https://github.com/torvalds/linux/blob/0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900

Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD.
Comment 25 Hans de Goede 2020-03-13 14:29:54 UTC
(In reply to RussianNeuroMancer from comment #24)
> Try this:
> 
> 1. remove lines 737-737 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733
> 
> 2. remove lines 6900 and 6901 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900
> 
> Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on
> HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD.

Hmm, so in essence that swaps the driver which is specifically made for the RTL8153 with the generic USB ethernet class driver.

Although it might be interesting to try that there are known issues with that.

E.g. with a Lenovo thunderbolt 3 gen 2 dock, when the laptop is turned off while connected to the dock, most of the dock is turned off, but the ethernet card still has power (for wake on lan I guess) and when using the cdc_ether driver, then the RTL8153 nick will start spamming the network as fast as it can after the laptop has been turned off, which in my case made my entire (wired) home network unusable (*).

So I actually send a patch upstream doing the opposite, adding the Lenovo specific USB-ids for the RTL8153 to the blacklist in cdc_ether and to the white/device-id list in r8152.c which solved the dock jamming my wired network after the laptop turned off.

*) I'm using a cheap unmanaged switch a better switch may have kept the network at least somewhat usable
Comment 26 Marcus Sundman 2020-04-30 02:11:35 UTC
(In reply to RussianNeuroMancer from comment #24)
> > But after that using icecream distributed compiler daemon again crashed my
> > connection. 
> 
> > So it seams to be better but not solved for me.
> 
> Try this:
> 
> 1. remove lines 737-737 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733
> 
> 2. remove lines 6900 and 6901 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900
> 
> Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on
> HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD.

My 0bda:8153 also stops working with the cdc_ether driver (without it saying anything in syslog).
Blacklisting 0bda:8153 in TLP didn't work.
Adding 0bda:8153 quirks kernel parameter didn't work.
Using the newest r8152.53.56-2.12.0 driver from realtek didn't work.
Comment 27 Michiel Janssens 2020-05-01 08:38:23 UTC
Similar issues here with a Dell dock WD15, connected from Dell XPS 13 9360.
Since several months the wired connection from the dock 0bda:8153 dies, but the network stack isn't notified. A reboot after this waits endlessly on services to stop. Sometimes Gnome gui locks up shortly after logging back in the system and being presented with the issue. I have to do REISUB to get the system working again.
The issue doesn't appear while working on the the system, mostly when leaving it running by itself for a while. I haven't found a way to actually trigger it.

At the moment I'm running openSUSE Tumbleweed with kernel 5.6.6, issue is still happening.
I tried quircks, but no result.
For several days i'm now testing running it with usbcore.autosuspend=-1 and have left the system running for longer periods. The issue didn't happen so far.

Side note:
Commit 75d7676ead19b1fbb5e0ee934c9ccddcb666b68c doesn't seem to have fixed the message "Tx status -71" from the original bug reporter. (Tx timeout, in my case)
That still happens once in a while.
Comment 28 Marcus Sundman 2020-05-01 21:42:27 UTC
The usbcore.autosuspend=-1 kernel parameter doesn't resolve it for me.
Also, I can trigger the problem in seconds, simply by reading at gigabit speeds.
Comment 29 Hans de Goede 2020-05-04 13:05:08 UTC
At least for thise seeing issues with Dell's WD15 dock I think that trying something similar to this quirk might help:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b63e48fb50e1ca71db301ca9082befa6f16c55c4

To try this, first do:

lsusb -t

To find the Bus and Dev number of any USB hub(s) inside the dock.

Then do:

lsusb

And lookup the same Bus and Dev number to get the vendor- and product-id used for the hub, e.g. 0bda:0487

Then try booting with this added to your kernel commandline:

usbcore.quirks=0bda:0487:k

Replacing the 0bda:0487 with the <vend>:<prod> ids for your hub (from the lsusb output). If you want to try this on more then one USB device, you can specify the NO_LPM quirk for multiple USB devices like this:

usbcore.quirks=0bda:0487:k,0bda:0488:k

Please give this a try and see if that helps. Also note that the same thing can be used to set the NO_LPM quirk on the USB ethernet-chip itself if it has a different USB-id which is not yet in the kernel's quirks list.
Comment 30 Marcus Sundman 2020-05-04 22:41:26 UTC
(In reply to Hans de Goede from comment #29)
> Replacing the 0bda:0487 with the <vend>:<prod> ids for your hub (from the
> lsusb output). If you want to try this on more then one USB device, you can
> specify the NO_LPM quirk for multiple USB devices like this:
> 
> usbcore.quirks=0bda:0487:k,0bda:0488:k

This didn't work.

I have 3 devices:
Bus 003 Device 009: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 003 Device 008: ID 0bda:0411 Realtek Semiconductor Corp. 4-Port USB 3.0 Hub
Bus 002 Device 006: ID 0bda:5411 Realtek Semiconductor Corp. 4-Port USB 2.0 Hub

I added these kernel params:
usbcore.quirks=0bda:8153:k,0bda:5411:k,0bda:0411:k usbcore.autosuspend=-1

Still fails with either
Rx status -71
or
Tx status -71
after reading 50 MB/s over the network for a minute or few.

> Please give this a try and see if that helps. Also note that the same thing
> can be used to set the NO_LPM quirk on the USB ethernet-chip itself if it
> has a different USB-id which is not yet in the kernel's quirks list.

I'm not sure how to do that. As far as I can tell my ethernet chip is at 0bda:8153 (which in my case is at usb@3:1.4, which maps to device 9, which maps to 0bda:8153).
Comment 31 Hans de Goede 2020-05-05 13:11:56 UTC
@Marcus Sundman, right you have already set the flag for your ethernet usb controller by adding the 0bda:8153:k part to the quirks. So it seems that at least for you setting the NO_LPM flag does not help.

Does your dock have updateable firmware? If so you may want to try to update the firmware. The first generation thunderbolt docks from all vendors were notoriously buggy and the all need the latest firmware to work at least somewhat reliable. Getting the latest firmware is also strongly advised for people using Windows since there really were quite a few issues with these devices which are fixed with fw updates.

Yes, I wrote somewhat reliable, the best fix for thunderbolt dock issues often is getting a second generation or newer dock :(
Comment 32 Michiel Janssens 2020-05-05 14:23:03 UTC
(In reply to Hans de Goede from comment #29)
Thanks for posting this instruction!
I already had seen the commit for the WD19, but it wasn't clear how I should investigate that on my system.
The WD15 doc adds 2 usb busses with both a Microchip USB hub
I removed the usbcore.autosuspend=-1 parameter and will test for several days with usbcore.quirks=0424:5537:k, which is the hub which has the 0bda:8153 as child.
I will add attachments with my lsusb output.
Comment 33 Michiel Janssens 2020-05-05 14:27:36 UTC
Created attachment 288917 [details]
lsusb output mjanssens wd15 xps9360
Comment 34 Michiel Janssens 2020-05-06 10:41:55 UTC
It didn't take days to get results.

Just the hub where 0bda:8153 is child
usbcore.quirks=0424:5537:k
result: 0bda:8153 dies after a while, without log entry, needed REISUB to reboot

Both hubs which are added by connecting WD15
usbcore.quirks=0424:5537:k,0424:2137:k
result: 0bda:8153 dies after a while, without log entry, needed REISUB to reboot

So I'm back to using usbcore.autosuspend=-1.
Please advise if I missed something (or incorrect dev id) I could test.
Comment 35 Hans de Goede 2020-05-06 10:43:36 UTC
(In reply to Michiel Janssens from comment #34)
> It didn't take days to get results.
> 
> Just the hub where 0bda:8153 is child
> usbcore.quirks=0424:5537:k
> result: 0bda:8153 dies after a while, without log entry, needed REISUB to
> reboot
> 
> Both hubs which are added by connecting WD15
> usbcore.quirks=0424:5537:k,0424:2137:k
> result: 0bda:8153 dies after a while, without log entry, needed REISUB to
> reboot
> 
> So I'm back to using usbcore.autosuspend=-1.
> Please advise if I missed something (or incorrect dev id) I could test.

There have been a lot of firmware updates for the wd15, do you have these all applied?
Comment 36 Michiel Janssens 2020-05-06 11:32:54 UTC
(In reply to Hans de Goede from comment #35)

> There have been a lot of firmware updates for the wd15, do you have these
> all applied?

Good catch, i'm on 1.0.4 according to fwupdmgr. Latest is 1.0.6 on the dell site.
Unfortunately the wd15 appears not (yet) to be fully supported via fwupdmgr so Windows is the only option, sigh. I will try to update, test again and report.
Bios is current by the way.
Comment 37 Marcus Sundman 2020-05-07 00:11:04 UTC
(In reply to Hans de Goede from comment #31)
> @Marcus Sundman, right you have already set the flag for your ethernet usb
> controller by adding the 0bda:8153:k part to the quirks. So it seems that at
> least for you setting the NO_LPM flag does not help.
I also tried without usbcore.autosuspend=-1 but that also didn't help.

> Does your dock have updateable firmware? If so you may want to try to update
> the firmware.
It's a LogiLink UA0173A, and it doesn't seem to have any firmware available (only newer drivers, which I already tried).
Comment 38 RussianNeuroMancer 2020-05-07 03:49:50 UTC
Just for the record, I was able to reproduce this issue even on NanoPi-M1 (Allwinner H3) with Linux 5.4.32 attached to Belkin USB-C Express Dock 3.1 HD F4U093 (did this for convenience, just to quickly get working keyboard and mouse without reattaching keyboard and mouse cables from dock to board). Quirk was included in 5.4 since 5.4.28 so it already applied. Unfortunately, I didn't expected this issue to be reproducible with NanoPi-M1 board, so I didn't saved lsusb -t before/after this happened.
Comment 39 Michiel Janssens 2020-05-07 10:30:09 UTC
(In reply to Michiel Janssens from comment #36)

I ran several updaters from Dell under Windows. My WD15 firmware components (4 of them) were already current, apparently the main version is some sort of wrapper. So no updates to Bios or WD15 firmware are possible.
At the moment I run kernel 5.6.8, so I ran all tests again:
- with or without usbcore.quirks=0424:5537:k,0424:2137:k the nic dies after a while
- with usbcore.autosuspend=-1 the nic remains alive
Comment 40 Marcus Sundman 2020-06-13 02:01:35 UTC
Still the same problem with Realtek's new driver, r8152.53.56-2.13.0, on ubuntu's 5.4.0-37-generic with usbcore.autosuspend=-1.

It fails with 'Tx status -71' or 'Rx status -71':
> net_ratelimit: 22 callbacks suppressed
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71
> ...

But sometimes that quickly turns into this:
> xhci_hcd 0000:03:00.0: WARN: TRB error for slot 3 ep 3 on endpoint
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -84
> xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared
> r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22
> xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared
> r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22
> xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared
> r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22
> ...

I've also tried adjusting the nic's Rx ring size from 100 to 20 or 2000, but still the same crash seconds after starting a gigabit speed download.
Comment 41 Nikolay Kichukov 2020-06-16 15:23:32 UTC
GNU/Gentoo, 64bit here, kernel 5.7.2, same problem on Lenovo 40AS USB-C dock:

None of the suggested "workarounds" helped, here is the lsusb tree:

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
        |__ Port 1: Dev 4, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
        |__ Port 3: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M

And the patch applied to the kernel(the ids differ):
cat /etc/portage/patches/sys-kernel/gentoo-sources-5.7.2/lenovo-usbc-dock-rtl-ethernet-quirk.patch 
--- a/drivers/usb/core/quirks.c	2020-06-01 01:49:15.000000000 +0200
+++ b/drivers/usb/core/quirks.c	2020-06-15 12:01:39.028377907 +0200
@@ -384,6 +384,11 @@
 	/* Generic RTL8153 based ethernet adapters */
 	{ USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },
 
+	/* Lenovo USB-C Ethernet RTL8153 based ethernet adapters */
+       { USB_DEVICE(0x1d6b, 0x0003), .driver_info = USB_QUIRK_NO_LPM },
+	{ USB_DEVICE(0x17ef, 0xa391), .driver_info = USB_QUIRK_NO_LPM },
+	{ USB_DEVICE(0x17ef, 0xa387), .driver_info = USB_QUIRK_NO_LPM },
+
 	/* Action Semiconductor flash disk */
 	{ USB_DEVICE(0x10d6, 0x2200), .driver_info =
 			USB_QUIRK_STRING_FETCH_255 },

and booting with:
usbcore.quirks=17ef:a387:k,17ef:a391:k,1d6b:0003:k

or 

usbcore.autosuspend=-1

does not help.

Same problem happens on laptops connected to this lenovo docks running windows OSes.
Comment 42 Peter Ries 2020-07-30 09:08:57 UTC
Hi, I'm glad I found this (old) bug. It affects me as well and still is up to date.

I'm running Kubuntu 20.04 with Mainline Kernel 5.7.9 and tried 5.8.0-rc7. My brand new Thinkpad T14 (AMD Version, 32GB RAM) is connected to a Thinkpad USB-C dock Gen 2. Laptop and Dock run latest firmware.

**Testcase** is copying a large video file (~ 2GB) from Laptop to NAS.
**Error** lots of "r8152 5-1.1:1.0 enx482ae36d721f: Tx status -71" in dmesg log after a few seconds. Connection lost. Need to use either WiFi or reconnect dock.

To find the culprit:
Copying via WiFi connection or Laptops LAN port (r8169) works.
Using another dock (DELL docking station) works.
Connection a DELL Windows Laptop to Lenovo dock works
Hardware must be OK then!

So I suppose the cause lies in r8152 driver.

As I can reproduce this "on demand" I could provide more information/logs if you tell me what's needed and how to do it ;)

BR
Peter
Comment 43 Peter Ries 2020-08-07 06:24:43 UTC
update to my previous comment...

Kubuntu's Network Manager obviously set up the USB Network Adapter with "Link Negotiation: ignore" for whatever reason.

I changed it to "Auto" and now it is more stable - even solid. I just pumped a 20G VM Diskimage to NAS and no error occurred. I'm optimistic that this setting solved my problem.

Time will tell ...
Comment 44 Peter Ries 2020-08-08 17:20:53 UTC
(In reply to Peter Ries from comment #43)
> update to my previous comment...
> 
> Kubuntu's Network Manager obviously set up the USB Network Adapter with
> "Link Negotiation: ignore" for whatever reason.
> 
> I changed it to "Auto" and now it is more stable - even solid. I just pumped
> a 20G VM Diskimage to NAS and no error occurred. I'm optimistic that this
> setting solved my problem.
> 
> Time will tell ...

update: no problems anymore. stable for 1,5 days constant usage :)
Comment 45 smihael 2020-08-17 18:25:56 UTC
The same problem occurs with Anker Ethernet Adapter combined with USB hub (https://www.anker.com/products/variant/aluminum-3port-usb-30-and-ethernet-hub/A7514041) on 4.15.0-38-generic kernel in KDE neon (based on Ubuntu 18.04).

dmesg outputs "r8152 4-3.3:1.0 eth0: Tx status -71" for multiple times.

lsusb -t output

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 3: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M

lsusb output

Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. 
Bus 004 Device 002: ID 2109:0812 VIA Labs, Inc. VL812 Hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

None of the workarounds involving changing boot paramters suggested above helped (usbcore.autosuspend=-1; usbcore.quirks=2109:0812:k,0bda:8153:k). Actually with usbcore.quirks I couldn't even boot as the process hung at "Switching to clocksource tsc" error. 

The adapter works flawlessly with 4.14.x kernel.

Interestingly, the adapter works fine when certain devices (e.g. wireless mouse's receiver) are connected in the USB hub.
Comment 46 Weber K. 2020-09-27 05:07:54 UTC
Hi!

I have the same error, but only in USB 3.0 port.

My hub doesn't support LPM, so I think I have other problem (usbcore.quirks=0bda:8153:k).

I've tried usbcore.quirks=0bda:8153:j and I've got no -71 error.

HTH

Weber Kai
Comment 47 Weber K. 2020-09-27 08:09:35 UTC
Sorry fellows, please ignore my previous comment.
After 40 minutes the network stopped.
Comment 48 Weber K. 2020-09-28 02:23:47 UTC
Hi fellows,

I think read_bulk_callback should treat EPROTO error code...
But I don't know exactly how... But adding EPROTO to ESHUTDOWN the driver becomes more stable...

Thanks
HTH
Comment 49 Pekka Järveläinen 2020-10-16 13:02:22 UTC
Hi, before my crash I have many
 4478.945334] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[ 4602.649511] CPU2: Core temperature above threshold, cpu clock throttled (total events 
[ 4626.376508] CPU5: Core temperature/speed normal
[ 4681.081753] perf: interrupt took too long (3202 > 3131), lowering kernel.perf_event_max_sample_rate to 62000
[ 4895.218338] perf: interrupt took too long (4065 > 4002), lowering kernel.perf_event_max_sample_rate to 49000

Can they be part of the problem?

I have had three crashes all during zoom meeting which means high traffic, videos going and coming.

Pekka
Comment 50 Emtee 2020-11-11 15:39:23 UTC
I have the same issue with three

Bus 002 Device 004: ID 13b1:0041 Linksys Gigabit Ethernet Adapter
Bus 002 Device 003: ID 13b1:0041 Linksys Gigabit Ethernet Adapter
Bus 002 Device 002: ID 13b1:0041 Linksys Gigabit Ethernet Adapter

devices on platform:

Linux debian 4.19.155-redundant #1 SMP PREEMPT Mon Nov 9 01:54:50 CET 2020 x86_64 GNU/Linux
debian
    description: Mini PC
    product: NUC7CJYS
    vendor: Intel(R) Client Systems
    version: J67993-403
    serial: G6JY936009MK
    width: 64 bits
    capabilities: smbios-3.1.1 dmi-3.1.1 smp vsyscall32
    configuration: boot=normal chassis=mini family=JY uuid=8A72A51B-79C4-85CA-66A9-1C697A088052
  *-core
       description: Motherboard
       product: NUC7JYB
       vendor: Intel Corporation
       physical id: 0
       version: J67970-402
       serial: GEJY93500752
       slot: Default string
     *-firmware
          description: BIOS
          vendor: Intel Corp.
          physical id: 0
          version: JYGLKCPX.86A.0057.2020.1020.1637

Latest firmware.

It looks to be an USB enumeration issue here. When rebooting the device, the problem always shows up after a while, when physically turning the unit on and off, the issue no longer appears and it can run stable forever.

I basically made a habit of shutting down entirely when I need to reboot.
Comment 51 Yorick de Wid 2020-11-27 21:50:43 UTC
Dell Inc. XPS 15 9570/0HWTMH, BIOS 1.17.1 07/09/2020

Kernel: 5.8.0-7630-generic
Chipset: RTL8153b-2 (version 9)


After a full day of debugging the r8152 driver on an USB-c dock 
it does look like the RTL 8152 chipset is unable to keep pace with the transmission queue (tx_queue). The driver keeps sending URB blocks towards the chipset but at some point the chipset will no longer fire status interrupts. This stalls the write_bulk_callback which in turn sets of the netdev timeout watchdog. The timeout then tries to reset the USB device but the RTL chipset is no longer responding. Power cycle the USB port is the only option to reset the chip. The behavior is quite deterministic. When sending bulk packets over the 1000TX interface (Gigabit that is) it doesn't take long before the watchdog reports queue congestion. If I simulate a lockup by deliberately slowing down the tasklet, the timeout ans TX status -2/-71 is almost triggered immediately.

The netdev timeout resetting the USB device seems a bit silly. There is no lock held when the USB device is reset, and the timeout handler is invoked every (5 * 1000Hz). A better solution would be to power cycle the USB port.

Because this bug as been away for a while and now reappears it could be a firmware issue. The most recent firmware was updated a little more than a year ago. 

Y.
Comment 52 Emtee 2020-11-28 01:14:05 UTC
All my issues with the USB Adapters have disappeared after I used these kernel parameters;

usbcore.old_scheme_first=1 usbcore.use_both_schemes=0 usbcore.autosuspend=-1 pci_aspm=off

Not sure which of them, or a combination of actually does it.

Been running for a week 24/7 with them now.
Comment 53 Peter Ries 2020-12-13 07:20:04 UTC
Strangely this issue reappeared some days ago. It was gone after I set link negotiation in network manager to "auto" (kubuntu st it to ignore for some reason) and worked around three months flawlessly.

A week ago sending a 4 gb to my NAS the error reappeared, but was gone after I set manual negotiation in network manager 1 Gbit/full duplex. It worked the again for some days but yesterday with a lot of traffic in my LAN a network sync immediately "killed" the USB-C dock networkinterface.

I didn't change anything beside regular apt updates and installing the latest mainline kernel 5.9.x branch - don't know what happened.

May be I'll give autosuspend a try...

Yorick de Wid's analysis https://bugzilla.kernel.org/show_bug.cgi?id=198931#c51 seems quite promising to be the root cause.
Comment 54 Peter Ries 2020-12-13 08:12:04 UTC
usbcore.autosuspend=-1 

didn't help with my Lenovo USB-C dock gen 2...
Comment 55 Yorick de Wid 2020-12-13 11:09:57 UTC
Peter Ries can you specify which firmware version is being loaded? The driver writes the version to dmesg on load. For example; RTL8153b-2 (version 9).

Its hard to track exact firmware versions. If it happens to be a firmware issue we might be able to downgrade. Haven't tried an older version as yet.

SHA1 checksum on 5.8.0-7630-generic (which should be the latest):

3008299c2fee3f5a5e3b2d8e16919d230204542c  rtl8105e-1.fw
c6fcde458093a4ef60b534feacc9dd564098ff9b  rtl8106e-1.fw
221d833a22040e4014bf34b31481712180b77594  rtl8106e-2.fw
9d390948663bf6885d86586588c428186d5dff7e  rtl8107e-1.fw
a065c863146d8216d8cc84a6b754968613848b32  rtl8107e-2.fw
5da573149e80587668e1d4bcbcbced184e51ac03  rtl8125a-3.fw
21c7c428112bd9e24713192302513a95ba41ed5e  rtl8125b-1.fw
a588787b9ebeec9cbfdbd46612a63f53ad5b1d62  rtl8125b-2.fw
3d87c04720c4b4709e4673707c4c104e28be1c1b  rtl8153a-2.fw
e467098b1cbb04022805cd777eb66585022524a6  rtl8153a-3.fw
cce086e885091c348bf521924f306f240f8dcc08  rtl8153a-4.fw
2b268656c6cd7d03dc47ca8eaec2f31ca668c53c  rtl8153b-2.fw
9872f469227555937d4063b1420a0ff23790da59  rtl8168d-1.fw
0ab15a6c812fafc38dd896972fa5fcb46cca1068  rtl8168d-2.fw
61fdc2ba78caf36a6551554f089d1c964159d247  rtl8168e-1.fw
60e16292fd4eb90138a3e2061305030b4993de79  rtl8168e-2.fw
6c3721e8e5d19f62b3da13519e2496ffabb3ffb4  rtl8168e-3.fw
c7c01066ddfc0215ad8977af5a3cd654b6f7ed10  rtl8168f-1.fw
24bf10a38bcb1b4652f71653e41d1a444f303c3f  rtl8168f-2.fw
bf3495d9233f3abaceab194e732bdb9c350a68a5  rtl8168fp-3.fw
12ed6246c8c4d6344d4840acc11de30d7c3ff1ec  rtl8168g-1.fw
0ead82c11625a677600a589cc4590722ea2f6de7  rtl8168g-2.fw
36e09340d99f9290fd9cc62c48c11b8112558b8a  rtl8168g-3.fw
c012f50c24ef64dcc48615d584709f8094df6af7  rtl8168h-1.fw
439686559c1fa53820c5e740b71566ee874171ba  rtl8168h-2.fw
c4fda34e80b4124377a7554636327fd03e697ee2  rtl8402-1.fw
38a89b6f1b57795a470675184733aff335cb41ae  rtl8411-1.fw
57c4e659337aacc88a52e9940bc0246a0f02e47b  rtl8411-2.fw
Comment 56 Peter Ries 2020-12-13 15:59:59 UTC
Hi Yorick,

I found this line:

[   16.848492] r8152 5-1.1:1.0: load rtl8153b-2 v1 10/23/19 successfully


Is it what you're looking for? Otherwise let me know how to find out.

Thanks for investigating!
Comment 57 Peter Ries 2020-12-13 17:15:29 UTC
and this one, too:

[   16.883982] r8152 5-1.1:1.0 eth0: v1.11.11
Comment 58 Yorick de Wid 2020-12-13 19:38:19 UTC
> I found this line:
> 
> [   16.848492] r8152 5-1.1:1.0: load rtl8153b-2 v1 10/23/19 successfully

Thats the one. The v9 chipset is also the version im running. Its too early to conclude but it strokes with the presumption that the rtl8153b-2 firmware contains a bug.
Comment 59 Patrick Decat 2020-12-16 08:36:48 UTC
Hi, I had this crash with my Dell XPS 9350 and a Dell USB 3.0 Ethernet adapter for the first time yesterday:

# lsusb | grep -i RTL8153
Bus 002 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

# journalctl -b -1 | grep "rtl8153"
Dec 15 15:36:40 patrickxps kernel: r8152 2-2:1.0: load rtl8153a-3 v2 02/07/20 successfully

# uname -a
Linux patrickxps 5.9.14-zen1-1-zen #1 ZEN SMP PREEMPT Sat, 12 Dec 2020 14:36:44 +0000 x86_64 GNU/Linux


Dec 15 18:18:50 patrickxps kernel: ------------[ cut here ]------------
Dec 15 18:18:50 patrickxps kernel: NETDEV WATCHDOG: enp0s20f0u2 (r8152): transmit queue 0 timed out
Dec 15 18:18:50 patrickxps kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x26b/0x280
Dec 15 18:18:50 patrickxps kernel: Modules linked in: rfcomm ccm cmac snd_hda_codec_hdmi algif_hash algif_skcipher af_alg bnep snd_hda_codec_realtek snd_hda_codec_generic cdc_ether usb>
Dec 15 18:18:50 patrickxps kernel:  intel_uncore psmouse soundcore rc_core processor_thermal_device i2c_i801 input_leds pcspkr tpm_tis i2c_smbus mei_me rfkill intel_gtt intel_rapl_comm>
Dec 15 18:18:50 patrickxps kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: P          IOE     5.9.14-zen1-1-zen #1
Dec 15 18:18:50 patrickxps kernel: Hardware name: Dell Inc. XPS 13 9350/0PWNCR, BIOS 1.13.0 02/10/2020
Dec 15 18:18:50 patrickxps kernel: RIP: 0010:dev_watchdog+0x26b/0x280
Dec 15 18:18:50 patrickxps kernel: Code: fa 1e 64 ff eb 87 4c 89 f7 c6 05 de c3 f7 00 01 e8 8a 02 fa ff 44 89 e9 4c 89 f6 48 c7 c7 b8 c2 df 86 48 89 c2 e8 6e 31 18 00 <0f> 0b e9 65 ff >
Dec 15 18:18:50 patrickxps kernel: RSP: 0018:ffffa234c019ceb0 EFLAGS: 00010282
Dec 15 18:18:50 patrickxps kernel: RAX: 0000000000000000 RBX: ffff89334d358400 RCX: 0000000000000000
Dec 15 18:18:50 patrickxps kernel: RDX: 0000000000000103 RSI: 0000000000000027 RDI: 00000000ffffffff
Dec 15 18:18:50 patrickxps kernel: RBP: ffff8933422da3dc R08: 0000000000000452 R09: 0000000000000004
Dec 15 18:18:50 patrickxps kernel: R10: 0000000000000001 R11: 0000000000007434 R12: ffff8933422da480
Dec 15 18:18:50 patrickxps kernel: R13: 0000000000000000 R14: ffff8933422da000 R15: ffff89334d358480
Dec 15 18:18:50 patrickxps kernel: FS:  0000000000000000(0000) GS:ffff893376d80000(0000) knlGS:0000000000000000
Dec 15 18:18:50 patrickxps kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 15 18:18:50 patrickxps kernel: CR2: 00007f7d5ef1b4c0 CR3: 0000000211fcc004 CR4: 00000000003706e0
Dec 15 18:18:50 patrickxps kernel: Call Trace:
Dec 15 18:18:50 patrickxps kernel:  <IRQ>
Dec 15 18:18:50 patrickxps kernel:  ? qdisc_put_unlocked+0x30/0x30
Dec 15 18:18:50 patrickxps kernel:  ? qdisc_put_unlocked+0x30/0x30
Dec 15 18:18:50 patrickxps kernel:  call_timer_fn+0x2d/0x150
Dec 15 18:18:50 patrickxps kernel:  run_timer_softirq+0x8e7/0xb50
Dec 15 18:18:50 patrickxps kernel:  __do_softirq+0xff/0x340
Dec 15 18:18:50 patrickxps kernel:  asm_call_irq_on_stack+0x12/0x20
Dec 15 18:18:50 patrickxps kernel:  </IRQ>
Dec 15 18:18:50 patrickxps kernel:  do_softirq_own_stack+0x5d/0x80
Dec 15 18:18:50 patrickxps kernel:  irq_exit_rcu+0xd2/0x120
Dec 15 18:18:50 patrickxps kernel:  sysvec_apic_timer_interrupt+0x47/0xe0
Dec 15 18:18:50 patrickxps kernel:  asm_sysvec_apic_timer_interrupt+0x12/0x20
Dec 15 18:18:50 patrickxps kernel: RIP: 0010:cpuidle_enter_state+0xdf/0x7f0
Dec 15 18:18:50 patrickxps kernel: Code: e8 16 87 7f ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 8c 05 00 00 31 ff e8 48 e3 86 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 >
Dec 15 18:18:50 patrickxps kernel: RSP: 0018:ffffa234c00dfea0 EFLAGS: 00000246
Dec 15 18:18:50 patrickxps kernel: RAX: ffff893376d80000 RBX: ffff893376db6800 RCX: 000000000000001f
Dec 15 18:18:50 patrickxps kernel: RDX: 0000000000000000 RSI: ffffffff86d4f968 RDI: ffffffff86d5a1f9
Dec 15 18:18:50 patrickxps kernel: RBP: ffffffff872cef60 R08: 000008dd96ed4562 R09: 0000000000000008
Dec 15 18:18:50 patrickxps kernel: R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000002
Dec 15 18:18:50 patrickxps kernel: R13: ffffffff872cf048 R14: 0000000000000002 R15: 000008dd96ed4562
Dec 15 18:18:50 patrickxps kernel:  cpuidle_enter+0x29/0x40
Dec 15 18:18:50 patrickxps kernel:  do_idle+0x1ed/0x280
Dec 15 18:18:50 patrickxps kernel:  cpu_startup_entry+0x19/0x20
Dec 15 18:18:50 patrickxps kernel:  secondary_startup_64+0xb6/0xc0
Dec 15 18:18:50 patrickxps kernel: ---[ end trace 32ac432b0caddcb1 ]---
Dec 15 18:18:50 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:18:56 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:02 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:07 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:12 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:19 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout

FWIW, I used to have another way more frequent crash with r8152 and another model of adapter before, see https://bugzilla.kernel.org/show_bug.cgi?id=200977#c19
Comment 60 Yorick de Wid 2020-12-16 09:01:58 UTC
Patrick Decat whats the checksum of rtl8153a-3.fw?

sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw
Comment 61 Patrick Decat 2020-12-16 15:01:33 UTC
(In reply to Yorick de Wid from comment #60)
> Patrick Decat whats the checksum of rtl8153a-3.fw?
> 
> sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw

Here you go:

sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw
e467098b1cbb04022805cd777eb66585022524a6  /usr/lib/firmware/rtl_nic/rtl8153a-3.fw
Comment 62 Peter H 2021-02-22 18:45:33 UTC
Hi all, glad to see I'm not the only one. I also have the Lenovo USB-C Dock gen 2, and am connecting to it from a Surface Pro 7 (running Arch with the surface-linux kernel 5.10.16), and am similarly loading the "rtl8153b-2 v1 10/23/19" firmware. 


An interesting thing that I learned recently is that my ethernet works perfectly if I boot without an external screen plugged into the dock. As soon as I plug one in, the "Tx status -71" messages return and the ethernet dies. I don't know if that helps at all, maybe someone else has found this too? I can also confirm the ethernet and external display work fine together running Windows.
Comment 63 Danny O'Brien 2021-02-23 04:37:17 UTC
Just as an FYI -- I upgraded to the latest realtek firmware (in Debian's sid/unstable distribution, version 20210208-1 <https://packages.debian.org/sid/firmware-realtek>) -- this fixed the problem for me. sha1sum is cce086e885091c348bf521924f306f240f8dcc08 , in /usr/lib/firmware/rtl_nic/rtl8153a-4.fw
Comment 64 Danny O'Brien 2021-02-23 19:47:05 UTC
I spoke too soon, the problem persists!
Comment 65 Yorick de Wid 2021-02-23 22:41:27 UTC
(In reply to Danny O'Brien from comment #63)
> Just as an FYI -- I upgraded to the latest realtek firmware (in Debian's
> sid/unstable distribution, version 20210208-1
> <https://packages.debian.org/sid/firmware-realtek>) -- this fixed the
> problem for me. sha1sum is cce086e885091c348bf521924f306f240f8dcc08 , in
> /usr/lib/firmware/rtl_nic/rtl8153a-4.fw

The firmware hasn't been updated for over a year, see official kernel repo. 

Because the lockup is likely caused by a data race in the firmware, its to be expected that higher interrupt count (additional peripherals) will trigger the issue sooner.

Just to give a little update, Realtek is currently testing against hardware with known issues.
Comment 66 Tim 2021-03-18 11:14:43 UTC
I am suffering from this problem too, using a tp-link UE330, which has the Realtek 8153, is a USB A 3.0 device with a hub and gigabit ethernet. I have had no issues with the hub (it keeps even working after the ethernet stops working), but the gigabit ethernet just stops working completely after some time. Sometimes it can go for a couple days, one time it stoped working after 10 minutes.

I am running the linux kernel 5.4.103 and modinfo says it has the v1.10.11 of the r8152 driver. I am going to update to the v2.14.0 that is in the Realtek webpage (https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-usb-3-0-software), but if like some of you say here the issue is a firmware issue I am not that hopeful.

How can I check which firmware my device is loading? From reading this bug page I have seen there are different versions of the firmware at /usr/lib/firmware/rtl_nic . In my case for rtl8153x-x.fw there is a-2, a-3, a-4 and b-2.

@Yorick de Wid, good to know Realtek is aware of the problem. What can we realistically expect from them? The lastest drivers, which I am guessing also includes de firmware, is from 19/10/2020 and this chipset has been out for years. You'd assume they have had time to iron everything out.
Comment 67 Yorick de Wid 2021-03-19 09:23:36 UTC
dmesg should log the requested firmware version by the driver.

Hayes pushed a few patches upstream last month. These changes include power flow regulation of the driver and URB speeds. Those are driver level changes and preliminary tests show they are working on recent kernels. I've been running iperf for a few days and nothing has broken down nor did I see any timing issues. 

I can't speak to all problems here but a combination of hardware and this chipset may be resolved by these patches. I'd expect Ubuntu will backport these drivers to LRS as well.
Comment 68 Nikolay Kichukov 2021-03-19 10:27:52 UTC
Hi Yorick de Wid,

Can you refer me to those 'few patches' from last month? I just want to verify if the kernel I run has them, 5.11.7 here, and perhaps give this 'hardware' its last chance this time. I have been trying to make it work for too long now, without any success.

Thanks.
Comment 70 Nikolay Kichukov 2021-03-19 12:04:49 UTC
Thanks, both patches are not yet in the latest stable kernel: 5.11.7, so I may backport them and give it a try. Thanks for sharing!
Comment 71 Tim 2021-03-20 06:45:50 UTC
@Yorick de Wid

From my ignorance on how the kernel and drivers work, I can see those two commits are for the r8152 driver only, it does not touch any other part of the code. Could we just add those two commits to the r8152 driver and compile it as a module instead of recompiling the whole netdev subsystem? (If so, it also would be nice if Realtek publishes it even if it is as beta driver in their webpage).

I have updated the r8152 driver with the v2.14.0 version from the Realtek website. The ethernet is still randomly stoping to work. I will try to see if I can apply the patches to the v2.14.0 driver. If it compiles without errors I will test running with the patched drivers. Is there a better way of doing this? Maybe there are some more changes in the kernel branch of r8152 and I should get the code of the driver from there? Any advice and/or instructions would be welcomed.

I really do not understand how Realtek releases a product like this. I have this gigabit ethernet device connected to a router with only fast ethernet (100Mb/s), so it is going at less than 10% of its speed and it is still hanging daily. This is not some race condition that happens when some specific or weird heavy load goes through the device, this is happening with normal very low load traffic. How was this not caught up during development or testing? Did they even try the device they are selling? Now I understand why everywhere you read people badmouth Realtek products and recommend getting Intel NIC's.
Comment 72 Yorick de Wid 2021-03-20 11:49:15 UTC
Rarely ever does anyone compile the kernel from a subsystem. Just compile the r8152 module like any other KBuild module.

There are many variants of the RTL815x chipset, some made by Realtek, some not. The chipset family is employed in a large variety of products, and minor PCB design flaws like trace length can cause timing issues which are notoriously hard to diagnose. Even though this *does* feel like a firmware/driver issue, it's impossible to account for all the ways this microcontroller is being used.
Comment 73 Tim 2021-03-20 15:01:02 UTC
@Yorick de Wid

I managed to compile and install the driver from the Realtek webpage, but I have no idea how to download only the r8152 part out of the kernel. Could you give some link or instructions so the ones who want to give it a try like me can? Thanks.
Comment 74 Tim 2021-03-31 04:51:19 UTC
I have finally managed to compile the kernel of my system adding the r8152 upstream patches that were missing. The device, tp-link UE330, is still hanging.

I have compiled 5.11.8 . This already included this recommended patch https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=7a0ae61acde2cebd69665837170405eced86a6c7 , and all the patches made to r8152 in 2020 as far as I can see. The only r8152 patches that are missing are the two from 2021, again as far as I can see, this two: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a08c0d309d8c078d22717d815cf9853f6f2c07bd, https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=80fd850b31f09263ad175b2f640d5c5c6f76ed41 (the last one is the other recommended at #69).

So with 5.11.8 compiled with the two r8152 patches from 2021, the device is still hanging. No progress.
Comment 75 Daniel Squires 2021-04-11 10:11:44 UTC
I have a load of USB3 Ethernet dongles using this driver. Reported by lsusb as follows:

Bus 002 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

I experience this problem (connection drops and dmesg shows repeated TX Status -71 message) on various systems, Dell laptop Ubuntu 18.04 and 20.04, Gentoo and raspberry pi. On all systems the problem either only happens when plugged into USB3 or is far worse when plugged into USB3, obviously USB2 does now allow the full bandwidth to be achieved so is not a good solution.

running iperf will generally cause the problem within a very short time frame. I am using 3 of them with a raspberry pi 4 and have been having this problem and started to investigate. I noticed it was failing to load the firmware file (rtl8153a-4.fw) and I guess falling back on a default internal firmware. The fw file turns out to be missing in the Raspbian linux-firmware package. Having manually copied it from Ubuntu 20.04 it now loads successfully for 2 of the connected adapter but fails on the 3rd:



pi@router:~ $ lsusb | grep 8153
Bus 002 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 001 Device 005: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 001 Device 007: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

pi@router:~ $ dmesg | grep r8152
[    2.104683] usbcore: registered new interface driver r8152
[    3.683627] r8152 2-1.3:1.0: Direct firmware load for rtl_nic/rtl8153a-4.fw failed with error -2
[    3.683675] r8152 2-1.3:1.0: unable to load firmware patch rtl_nic/rtl8153a-4.fw (-2)
[    3.724435] r8152 2-1.3:1.0 eth0: v1.11.11
[    4.574454] r8152 1-1.4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[    4.614895] r8152 1-1.4:1.0 eth1: v1.11.11
[    4.903594] r8152 1-1.3.4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[    4.945192] r8152 1-1.3.4:1.0 eth2: v1.11.11
[    8.767705] r8152 1-1.3.4:1.0 wan: renamed from eth2
[    8.806489] r8152 1-1.4:1.0 voip: renamed from eth1
[    8.865195] r8152 2-1.3:1.0 house: renamed from eth0
[   16.461782] r8152 2-1.3:1.0 house: carrier on
[   16.545744] r8152 2-1.3:1.0 house: Promiscuous mode enabled
[   17.079827] r8152 1-1.3.4:1.0 wan: carrier on
[   28.923646] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled
[ 4664.551927] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled
[ 4664.583404] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled
[ 4664.660467] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled


Any ideas why it fails to load for one of the three above?
Comment 76 Daniel Squires 2021-04-11 10:53:30 UTC
Seems like fw load fails when plugged into a USB3 port, but succeeds when plugged into a USB2 one!
Comment 77 Vladyslav Shtabovenko 2021-04-22 13:40:02 UTC
Since I'm also suffering from these issues (Dell WD15 + Thinkpad T14 AMD), I'm wondering if someone managed to unfreeze the machine in the situation described by
Michiel Janssens




> Similar issues here with a Dell dock WD15, connected from Dell XPS 13 9360.
> Since several months the wired connection from the dock 0bda:8153 dies, but
> the > network stack isn't notified. A reboot after this waits endlessly on
> services to > stop. Sometimes Gnome gui locks up shortly after logging back
> in the system and > being presented with the issue. I have to do REISUB to
> get the system working  again.




I mean, once the connection is lost, the system will start hanging on all operations that are somehow related to the networking, so that even sudo won't work (su is possible, though). Restarting/killing NetworkManager or dhclient doesn't succeed either. So essentially you have your graphical session (Gnome 3 in my case) where the mouse cursor still moves but everything else doesn't react. Trying to "save" the system from a virtual console ultimately fails b/c everything network-related hangs. At the end of the day REISUB is the only thing you can do - meaning that you will loose all your unsaved work.




Perhaps someone has found a trick how one could avoid REISUB in such a situation? It's really annoying since you cannot reliably predict when such a freeze would happen.
Comment 78 Tim 2021-04-26 10:53:17 UTC
So now that I have been using the kernel with the lastest patches for a month, I want to report some experiences that might be useful for users and even maybe for the developers to finally fix this issue. Again, my device is the TP-Link U330.

Right now, the device seems to only hang on reboot. Everytime I reboot the device hangs several times in a very short time, up to 11 times once in 5 minutes, until it does not hang anymore and then it can work without hanging for weeks (I think 3 weeks is as far as I tested it until I had to reboot again).

For users, if you install ifupdown2 you can recover the device without rebooting or disconnecting or even disrupting the network. Once you have ifupdown2 installed in your linux system and the device is down (you can check with the command 'ip addr'), use these two commands to recover he device:

- ifup enxxxxx
[enxxxxx is the name of the device in your system]
- ifreload -a
[-a is probably not needed but it is very quick and does not disrupt the network so I did not looked further]

This will get your device working up again.

I created a script that does this and tried to execute it everytime the device goes down. For some reason when the device goes down because of the bug the system does not run the script. The script gets triggered when I put the device down manually or when rebooting, but for some unknown reason it does not get triggered when it goes down because of the bug. So I have resorted to run the script every 30 seconds, check if the device is up or down and if it is down, put it up again with those two commands. It is not pretty, but it makes my system workable at least, until a proper solution is released by Realtek. If anyone is interested in the script I can share it.
Comment 79 Tim 2021-04-26 10:56:27 UTC
I forgot one thing, the above procedure to recover the device has only been tested with the two 2021 kernel patches. I have not tested it without them so I do not know if it would work or not without the patches.
Comment 80 Vladyslav Shtabovenko 2021-04-26 12:26:06 UTC
Could you please specify which patches do you mean? I'm currently on 5.11.16-100.fc32.x86_64, so I guess that they are not included yet.

Another question, what does ifupdown2 do that "simpler" tools cannot do
when it comes to recovering a frozen machine? Once I've even tried rmmod
on the r8152 module but even that command froze.
Comment 81 Michiel Janssens 2021-04-26 12:37:21 UTC
Hi Vladyslav Shtabovenko, have you tried running usbcore.autosuspend=-1?
I still have this in my boot parameters, running kernel 5.11.15.
Since adding this parameter I haven't had any lockups or network freezes.
Comment 82 Tim 2021-04-27 06:15:18 UTC
@Vladyslav Shtabovenko I specified the patches I included to compile the kernel in comment #74.

I needed to install ifupdown2 for other reasons in my system so I have only tried with ifupdown2. But ifupdown2 claims to be able to put up, down and reload the interfaces with less disruption to the network system than previous software, so I suspect it is needed, but can not be sure.
Comment 83 Vladyslav Shtabovenko 2021-05-09 20:42:40 UTC
Many thanks. In my case the total freezes occur not so often (perhaps once in 3-5 weeks), so I didn't try any radical measures yet. The point is that T14 AMD already has many issues with the power management and I'm sort of reluctant to worsen the battery life even further by disabling usb autosuspend. I'm currently testing TLP with the option

USB_BLACKLIST="0bda:8153 0bda:4014"

in /etc/tlp.conf. Not sure if it helps or not, but I didn't have any "fatal" hangers so far. Perhaps it even solves the issue for me. Should there be another freeze as described in my first posting, I will comment on that here.
Comment 84 Tomáš Mark 2021-06-21 18:09:35 UTC
The same here:

dmesg

Jun 21 17:15:34 raspiwall kernel: [25199.657912] r8152 2-1:1.0 wan0: skb_to_sgvec fail -90
Jun 21 17:15:43 raspiwall kernel: [25208.468109] ------------[ cut here ]------------
Jun 21 17:15:43 raspiwall kernel: [25208.468150] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:468 dev_watchdog+0x308/0x30c
Jun 21 17:15:43 raspiwall kernel: [25208.468168] NETDEV WATCHDOG: wan0 (r8152): transmit queue 0 timed out
Jun 21 17:15:43 raspiwall kernel: [25208.468182] Modules linked in: bnep hci_uart btbcm bluetooth ecdh_generic ecc cdc_ether r8152(O) nft_chain_nat xt_MASQUERADE xt_nat nf_nat nf_log_ipv4 nf_log_common nft_limit nft_counter ipt_REJECT nf_reject_ipv4 xt_multiport xt_tcpudp xt_LOG xt_limit xt_recent xt_addrtype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink hid_logitech_hidpp joydev sr_mod cdrom sg vc4 hid_logitech_dj cec brcmfmac v3d gpu_sched brcmutil drm_kms_helper sha256_generic drm cfg80211 drm_panel_orientation_quirks rfkill snd_soc_core raspberrypi_hwmon bcm2835_codec(C) v4l2_mem2mem snd_compress bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) snd_bcm2835(C) videobuf2_dma_contig snd_pcm_dmaengine videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_pcm vc_sm_cma(C) snd_timer snd syscopyarea sysfillrect rpivid_mem sysimgblt fb_sys_fops backlight uio_pdrv_genirq nvmem_rmem uio ip_tables x_tables ipv6
Jun 21 17:15:43 raspiwall kernel: [25208.469358] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C O      5.10.42-v7l-lutein79+ #1
Jun 21 17:15:43 raspiwall kernel: [25208.469368] Hardware name: BCM2711
Jun 21 17:15:43 raspiwall kernel: [25208.469378] Backtrace:
Jun 21 17:15:43 raspiwall kernel: [25208.469413] [<c0b6fae8>] (dump_backtrace) from [<c0b6fe78>] (show_stack+0x20/0x24)
Jun 21 17:15:43 raspiwall kernel: [25208.469429]  r7:ffffffff r6:00000000 r5:60000113 r4:c12e6b3c
Jun 21 17:15:43 raspiwall kernel: [25208.469448] [<c0b6fe58>] (show_stack) from [<c0b74210>] (dump_stack+0xcc/0xf8)
Jun 21 17:15:43 raspiwall kernel: [25208.469469] [<c0b74144>] (dump_stack) from [<c0220bcc>] (__warn+0xfc/0x114)
Jun 21 17:15:43 raspiwall kernel: [25208.469485]  r10:c133dfb8 r9:00000009 r8:c0a52894 r7:000001d4 r6:00000009 r5:c0a52894
Jun 21 17:15:43 raspiwall kernel: [25208.469496]  r4:c0eab8ac r3:c1205094
Jun 21 17:15:43 raspiwall kernel: [25208.469513] [<c0220ad0>] (__warn) from [<c0b7061c>] (warn_slowpath_fmt+0xa4/0xd8)
Jun 21 17:15:43 raspiwall kernel: [25208.469526]  r7:000001d4 r6:c0eab8ac r5:c1205048 r4:c0eab870
Jun 21 17:15:43 raspiwall kernel: [25208.469545] [<c0b7057c>] (warn_slowpath_fmt) from [<c0a52894>] (dev_watchdog+0x308/0x30c)
Jun 21 17:15:43 raspiwall kernel: [25208.469560]  r9:eff0b540 r8:c371b000 r7:c1203d00 r6:c354c200 r5:c371b2a8 r4:00000000
Jun 21 17:15:43 raspiwall kernel: [25208.469580] [<c0a5258c>] (dev_watchdog) from [<c02ab178>] (call_timer_fn+0x40/0x1bc)
Jun 21 17:15:43 raspiwall kernel: [25208.469594]  r8:c1201d9c r7:002601b8 r6:c0a5258c r5:00000100 r4:c371b2a8
Jun 21 17:15:43 raspiwall kernel: [25208.469612] [<c02ab138>] (call_timer_fn) from [<c02ac798>] (run_timer_softirq+0x5b0/0x698)
Jun 21 17:15:43 raspiwall kernel: [25208.469625]  r8:c1201d9c r7:00000000 r6:c371b2a8 r5:002601b8 r4:00000000
Jun 21 17:15:43 raspiwall kernel: [25208.469643] [<c02ac1e8>] (run_timer_softirq) from [<c0201508>] (__do_softirq+0x198/0x49c)
Jun 21 17:15:43 raspiwall kernel: [25208.469658]  r10:00000082 r9:ffffe000 r8:c1810800 r7:00000100 r6:00000001 r5:00000002
Jun 21 17:15:43 raspiwall kernel: [25208.469667]  r4:c1203084
Jun 21 17:15:43 raspiwall kernel: [25208.469685] [<c0201370>] (__do_softirq) from [<c0227494>] (irq_exit+0xd0/0xf8)
Jun 21 17:15:43 raspiwall kernel: [25208.469699]  r10:c0e21a80 r9:c1200000 r8:c1810800 r7:00000001 r6:00000000 r5:00000000
Jun 21 17:15:43 raspiwall kernel: [25208.469709]  r4:ffffe000
Jun 21 17:15:43 raspiwall kernel: [25208.469729] [<c02273c4>] (irq_exit) from [<c0287990>] (__handle_domain_irq+0x70/0xc4)
Jun 21 17:15:43 raspiwall kernel: [25208.469739]  r5:00000000 r4:c1094d50
Jun 21 17:15:43 raspiwall kernel: [25208.469756] [<c0287920>] (__handle_domain_irq) from [<c020135c>] (gic_handle_irq+0x90/0xa4)
Jun 21 17:15:43 raspiwall kernel: [25208.469770]  r9:c1200000 r8:c1094d5c r7:c1201ec8 r6:f081400c r5:f0814000 r4:c1205b7c
Jun 21 17:15:43 raspiwall kernel: [25208.469785] [<c02012cc>] (gic_handle_irq) from [<c0200abc>] (__irq_svc+0x5c/0x7c)
Jun 21 17:15:43 raspiwall kernel: [25208.469796] Exception stack(0xc1201ec8 to 0xc1201f10)
Jun 21 17:15:43 raspiwall kernel: [25208.469810] 1ec0:                   00000000 054cad98 eff13304 c021ac20 ffffe000 c120509c
Jun 21 17:15:43 raspiwall kernel: [25208.469824] 1ee0: c12050e4 00000001 00000001 c133d12f c0e21a80 c1201f24 c1201f28 c1201f18
Jun 21 17:15:43 raspiwall kernel: [25208.469835] 1f00: c02088c0 c02088c4 60000013 ffffffff
Jun 21 17:15:43 raspiwall kernel: [25208.469849]  r9:c1200000 r8:00000001 r7:c1201efc r6:ffffffff r5:60000013 r4:c02088c4
Jun 21 17:15:43 raspiwall kernel: [25208.469871] [<c020887c>] (arch_cpu_idle) from [<c0b7fa38>] (default_idle_call+0x4c/0x118)
Jun 21 17:15:43 raspiwall kernel: [25208.469890] [<c0b7f9ec>] (default_idle_call) from [<c02587ac>] (do_idle+0x118/0x168)
Jun 21 17:15:43 raspiwall kernel: [25208.469907] [<c0258694>] (do_idle) from [<c0258ad0>] (cpu_startup_entry+0x28/0x30)
Jun 21 17:15:43 raspiwall kernel: [25208.469921]  r10:00000197 r9:c1053a60 r8:ffffffff r7:c1053a60 r6:c1205040 r5:c1205048
Jun 21 17:15:43 raspiwall kernel: [25208.469932]  r4:000000d9 r3:c108a294
Jun 21 17:15:43 raspiwall kernel: [25208.469947] [<c0258aa8>] (cpu_startup_entry) from [<c0b78a10>] (rest_init+0xbc/0xc4)
Jun 21 17:15:43 raspiwall kernel: [25208.469968] [<c0b78954>] (rest_init) from [<c1000ab4>] (arch_call_rest_init+0x18/0x1c)
Jun 21 17:15:43 raspiwall kernel: [25208.469978]  r5:c1205048 r4:c1356068
Jun 21 17:15:43 raspiwall kernel: [25208.469997] [<c1000a9c>] (arch_call_rest_init) from [<c1001098>] (start_kernel+0x568/0x59c)
Jun 21 17:15:43 raspiwall kernel: [25208.470014] [<c1000b30>] (start_kernel) from [<00000000>] (0x0)
Jun 21 17:15:43 raspiwall kernel: [25208.470026] ---[ end trace bc6a810ce98742d4 ]---
Jun 21 17:15:43 raspiwall kernel: [25208.470048] r8152 2-1:1.0 wan0: Tx timeout
Jun 21 17:15:43 raspiwall kernel: [25208.978475] r8152 2-1:1.0 wan0: get_registers -110
Jun 21 17:15:44 raspiwall kernel: [25209.488528] r8152 2-1:1.0 wan0: set_registers -110
Jun 21 17:15:44 raspiwall kernel: [25209.998469] r8152 2-1:1.0 wan0: get_registers -110
Jun 21 17:15:44 raspiwall kernel: [25209.998750] r8152 2-1:1.0 wan0: get_registers -71
Jun 21 17:15:44 raspiwall kernel: [25209.999059] r8152 2-1:1.0 wan0: set_registers -71
Jun 21 17:15:44 raspiwall kernel: [25209.999549] r8152 2-1:1.0 wan0: Tx status -2
Jun 21 17:15:44 raspiwall kernel: [25209.999835] r8152 2-1:1.0 wan0: Tx status -2
Jun 21 17:15:44 raspiwall kernel: [25210.000218] r8152 2-1:1.0 wan0: Tx status -2
Jun 21 17:15:44 raspiwall kernel: [25210.000528] r8152 2-1:1.0 wan0: Tx status -2
Jun 21 17:15:44 raspiwall kernel: [25210.000832] r8152 2-1:1.0 wan0: get_registers -71


Solved with command

ethtool -k wan0 tx off
Comment 85 Tomáš Mark 2021-06-21 18:11:47 UTC
I-TEC USB Adapter

Linux raspiwall.debianium.com 5.10.42-v7l-lutein79+ #1 SMP Sun Jun 13 16:17:03 CEST 2021 armv7l GNU/Linux

tomas@raspiwall:~ $ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 3: Dev 3, If 0, Class=Mass Storage, Driver=uas, 480M
        |__ Port 4: Dev 4, If 1, Class=Human Interface Device, Driver=usbhid, 12M
        |__ Port 4: Dev 4, If 2, Class=Human Interface Device, Driver=usbhid, 12M
        |__ Port 4: Dev 4, If 0, Class=Human Interface Device, Driver=usbhid, 12M


modinfo r8152
filename:       /lib/modules/5.10.42-v7l-lutein79+/kernel/drivers/net/usb/r8152.ko
version:        v2.15.0 (2021/04/15)
license:        GPL
description:    Realtek RTL8152/RTL8153 Based USB Ethernet Adapters
author:         Realtek nic sw <nic_swsd@realtek.com>
srcversion:     643C9AE76696D1629871EA2
Comment 86 Weber K. 2021-06-27 05:14:25 UTC
Hi!
There is a new version at https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-usb-3-0-software .
It didn't worked in my usb 3.0 ports and kernel 4.19.128.
Thanks.
Comment 87 Weber K. 2021-06-27 05:26:42 UTC
The reason for failure is explained here https://www.spinics.net/lists/linux-usb/msg173690.html

Here is the first message of the thread https://www.spinics.net/lists/linux-usb/msg173675.html

thanks.
Comment 88 Weber K. 2021-11-05 06:04:26 UTC
New patch sent... https://www.spinics.net/lists/linux-usb/msg217590.html
Comment 89 Arno Schuring 2021-11-14 18:54:31 UTC
FWIW, I see the same symptoms on an r8152 10/100 adapter which doesn't seem to require any firmware:

[  +0.152909] usb 1-2: New USB device found, idVendor=0bda, idProduct=8152, bcdDevice=20.00
[  +0.008284] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  +0.007236] usb 1-2: Product: USB 10/100 LAN
[  +0.004998] usb 1-2: Manufacturer: Realtek
[  +0.004195] usb 1-2: SerialNumber: 00116B686258
[  +0.181834] r8152 1-2:1.0: skip request firmware


/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 480M
    ID 1d6b:0002 Linux Foundation 2.0 root hub
    |__ Port 2: Dev 11, If 0, Class=Vendor Specific Class, Driver=r8152, 480M
        ID 0bda:8152 Realtek Semiconductor Corp. RTL8152 Fast Ethernet Adapter

filename:       /lib/modules/5.10.0-9-amd64/kernel/drivers/net/usb/r8152.ko
version:        v1.11.11
vermagic:       5.10.0-9-amd64 SMP mod_unload modversions 


I can see various repeating "r8152 1-2:1.0 enx00116b686258: Rx status -71" messages in dmesg, they occur roughly once every odd minutes but they seem benign, in that it doesn't seem to affect connectivity (it may have been affecting throughput, but not enough to disconnect video calls I've been making during the same time).

However, today I also saw this happen (I'll attach the full backtrace):
[  +0.004720] NETDEV WATCHDOG: enx00116b686258 (r8152): transmit queue 0 timed out
[  +0.007515] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x24d/0x260

...which resulted in numerous:
[  +0.004711] r8152 1-2:1.0 enx00116b686258: Tx timeout
[  +2.095629] r8152 1-2:1.0 enx00116b686258: Tx status -2

...and eventually (duplicate messages removed):
[  +0.239953] usb 1-2: reset high-speed USB device number 3 using xhci_hcd
[  +1.919956] usb 1-2: device descriptor read/64, error -110
[  +2.815976] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
[  +0.215896] usb 1-2: device not accepting address 3, error -62
[  +0.006011] r8152 1-2:1.0 enx00116b686258: Get ether addr fail

...after which I needed to unplug the device to get it working again. A port power cycle from software might have been enough, but it's hard to lookup the correct incantation for that when your outside interface is failing.
Comment 90 Arno Schuring 2021-11-14 18:57:24 UTC
Created attachment 299567 [details]
WARNING: at net/sched/sch_generic.c:467 dev_watchdog+0x24d/0x260
Comment 91 Ole Ernst 2021-11-27 10:28:44 UTC
Using 5.14.15 (including https://www.spinics.net/lists/linux-usb/msg217590.html) I still got status -71. Afterwards, the network connection is down shortly, but recovers quickly. 

Adding the following quirk resolved the issue for my Lenovo Powered USB-C Travel Hub: "usbcore.quirks=17ef:721e:k" connnected to a ThinkBook 14s.

I submitted a related patch upstream: https://lore.kernel.org/netdev/20211127090546.52072-1-olebowle@gmx.com/
Comment 92 Vassilis Virvilis 2022-01-31 22:20:28 UTC
Hi,

I have a TP-LINK U330 (usb 3 hub with ethernet RJ45) with the RTL8153 chipset attached to a generic laptop.

I am seeing this behavior. Status -71 messages (possibly others too) and then disconnects and maybe freezes

Specifying 
  options usbcore quirks=0bda:8153:k,0bda:5411:k,0bda:0411:k
  options usbcore autosuspend=-1

helps considerably. Not sure if it fixes the problem completely. 
No crash and the status -71 were far less. Possible one or two

lspci:
00:15.0 USB controller: Intel Corporation Celeron/Pentium Silver Processor USB 3.0 xHCI Controller (rev 03)

Attaching the usb dongle to my desktop works perfectly for several hours without a problem so I believe this is not a problem of the dongle or the driver.

The laptop has a realtek wireless/bluetooth card for whicj I also had to specify 
  options rtw88_pci disable_aspm=1
for it to work without may problems.

I tend to believe the issue is with the usb controller or the UEFI/BIOS/ACPI tables.

Unfortunately I cannot locate an update for that specific instance of this AMI BIOS.

How can I debug it further? Is it possible to fix it by modifying the ACPI tables? Any pointers?
Comment 93 Emtee 2022-01-31 22:30:38 UTC
At this point I just recommend disabling USB auto suspend/USB LPM, ASPM etc.

Works fine on Debian Bullseye with custom compiled kernel 5.10.90 (ASPM disabled in bios, and usb autosuspend on -1), but kernel parameter should work as well.

And in case of my Intel NUC system, I cannot do a reboot because USB/Realtek combo goes haywire. Always a cold boot (Full shutdown) required.
Comment 94 Vassilis Virvilis 2022-02-02 08:03:36 UTC
Thanks @Emtee for comment #93

I was thinking about ACPI tables and I overlooked the basics, the BIOS setup. Not my brightest moment.

Anyway, It turns out that the AMI BIOS (UEFI) in my laptop has a setting that reads something like this:

Enable/Disable blah-blah ASPM

If Enabled Vista will handle the ASPM for the device(s?).
If Disable BIOS itself will handle the ASPM.

So I disabled it and so far looks working. I will test for some days and I will let you know.

Right now I have only this setting in BIOS. I have commented all module parameter workarounds.
  #options usbcore quirks=0bda:8153:k,0bda:5411:k,0bda:0411:k
  #options usbcore autosuspend=-1

I have debian unstable kernel 5.15.03 and I believe the quirk 0bda:8153:k is already compiled in.

If this works I will also disable the rtw88 aspm quirk.
Comment 95 Vassilis Virvilis 2022-02-04 11:38:45 UTC
Speak too soon.

After some days (without heavy testing) it started acting with status -71 during a TC meeting.

This time (or now I noticed) after a status -71 it had a usb disconnect event and the usb reconnect, firmware reload etc.

Real shame to cannot pinpoint this. I wonder if this bug happens to other BIOSes or only to AMI ones?

I just want to stress that without setting the ASPM disable BIOS options the failures are almost instant.

If anybody has an idea on how to debug this further, what logs to enable please feel free to speak.

Thanks
Comment 96 anarcat 2022-03-21 20:01:45 UTC
For what it's worth, I was having regular hangs with those USB controllers in the past, for many different Linux kernel releases. The network interface would just freeze and hang: traffic wouldn't go through, and even rebooting the box wouldn't work, as userspace would hang during the shutdown sequence.



Since the Debian 11 (bullseye) upgrade, things improved slightly as I started seeing the error message described in this ticket (Tx status -71). But it would be similarly broken, and would occur after a variable number of hours idling.

I upgraded to the bookworm kernel recently (5.16) and this problem completely disappeared, so I consider the patch in https://github.com/torvalds/linux/commit/baf33d7a7564 fixed this problem.

So thanks everyone, this is really great to see this finally fixed.
Comment 97 Anthony Rabbito 2022-05-22 15:07:45 UTC
For what it's worth I'm still seeing this on 5.18.0-0.rc7.220519.f993aed406ea.56.vanilla.1.fc36.x86_64
Comment 98 RussianNeuroMancer 2022-07-31 12:00:53 UTC
Please correct me if I misunderstand something but from following two links it's looks like USB controller issues rather than network adapter driver issue:

https://armbian.atlassian.net/browse/AR-1172

https://github.com/armbian/build/pull/3763/commits/d52b67ffeeebc89f49159accb953f1ecf9352e74
Comment 99 Mørke 2022-10-27 18:36:08 UTC
I'm seeing this issue (though I'm getting a different Tx status) on a Raspberry Pi 4B:

some dmesg output:
[Oct27 13:13] r8152 2-2:1.0 eth1: Tx timeout
[  +0.003510] r8152 2-2:1.0 eth1: Tx status -2
[  +0.000297] r8152 2-2:1.0 eth1: Tx status -2
[  +2.180790] usb 2-2: reset SuperSpeed USB device number 4 using xhci_hcd

$ lsusb
Bus 002 Device 004: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

$ uname -a
Linux raspberrypi 5.15.61-v8+ #1579 SMP PREEMPT Fri Aug 26 11:16:44 BST 2022 aarch64 GNU/Linux

However, this problem is quite weird since I have two dongles (both are exactly the same): 
On my desktop computer, I'm yet to experience any issues with the dongle. I can push it to the limit for hours (using iperf) and nothing bad happens. No resets or any weird stuff happens. This computer runs Linux 6.0.x, though.
OTOH, the dongle connected to the Raspberry can't go past 100Mbit/s without _resetting_ itself after some minutes (sometimes even seconds). I tried adding the quirk option to the boot parameters to no avail. 

This makes me think, maybe there's an architectural difference in the driver? Perhaps it's the USB controller screwing things up? 

I'm 99.99% sure it's not the dongle being faulty/buggy, because I can swap the dongles and I don't see this issue on my desk computer.
Comment 100 Rahil Bhimjiani 2023-04-07 20:21:24 UTC
I'm having this issue as well despite being on latest kernel. Unable to recognize any patterns in random hangs up throwing "Tx status -71". It can stream YT 4k60 video smoothly but then randomly crashes when trying to ssh. Literally unusable, thank god amazon for offering refunds.


Adapter: TP-Link UE330
Kernel 6.2.9-300.fc38.x86_64
Fedora 38
[    5.953304] r8152 3-2.4:1.0: load rtl8153a-4 v2 02/07/20 successfully
Comment 101 Adam Gradzki 2023-10-12 14:54:46 UTC
I belive I found a way to handle this problem:


https://bbs.archlinux.org/viewtopic.php?pid=2125855#p2125855


r8152-dkms uses an updated Realtek driver https://github.com/wget/realtek-r8152-linux/


The usbcore quirks are absolutely essential! I am not sure if a subset of them is necessary; the ones I chose seem to fix the problem completely: bjkm
Comment 102 Stian Skjelstad 2023-10-23 08:44:34 UTC
I have been experiencing the same problem for some time too:

Currently with Ubuntu build kernel: 5.15.0-87-generic

The USB device seems to reconnect on its own, but it is then in a defunctional state and traffic does not pass.


[2667308.625276] ------------[ cut here ]------------
[2667308.625317] NETDEV WATCHDOG: enx1c1adff98bf8 (r8152): transmit queue 0 timed out
[2667308.625406] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x277/0x280
[2667308.625431] Modules linked in: cpuid tls bluetooth ecdh_generic ecc vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_conntra                                                                            ck nft_chain_nat xt_MASQUERADE nft_counter xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc snd_hda_codec_realtek snd_hda                                                                            _codec_generic edac_mce_amd snd_hda_codec_hdmi ledtrig_audio snd_hda_intel kvm_amd ccp snd_intel_dspcfg snd_intel_sdw_acpi sn                                                                            d_hda_codec kvm radeon snd_hda_core snd_hwdep crct10dif_pclmul ghash_clmulni_intel snd_pcm aesni_intel joydev input_leds drm_                                                                            ttm_helper ttm drm_kms_helper cec rc_core snd_timer snd i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore                                                                             crypto_simd fam15h_power cryptd ppdev k10temp mac_hid serio_raw parport_pc wmi_bmof sch_fq_codel lp parport nf_nat_pptp nf_co                                                                            nntrack_pptp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ramoops reed_solomon pstore_blk pstore_zone efi_pstore drm ip_                                                                            tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid                                                                            0
[2667308.626027]  multipath linear raid1 hid_microsoft ff_memless hid_generic cdc_ether pata_acpi usbnet usbhid r8169 psmouse                                                                             hid r8152 crc32_pclmul ahci mii i2c_piix4 pata_atiixp libahci realtek wmi
[2667308.626166] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           OE     5.15.0-84-generic #93-Ubuntu
[2667308.626178] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596), BIOS V3.6 10/26/2012
[2667308.626186] RIP: 0010:dev_watchdog+0x277/0x280
[2667308.626200] Code: eb 97 48 8b 5d d0 c6 05 6b e2 67 01 01 48 89 df e8 2e 5f f9 ff 44 89 e1 48 89 de 48 c7 c7 78 ee ad 8d                                                                             48 89 c2 e8 91 d6 19 00 <0f> 0b eb 80 e9 db 68 23 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56
[2667308.626210] RSP: 0018:ffffac3940234e70 EFLAGS: 00010282
[2667308.626225] RAX: 0000000000000000 RBX: ffff9cbdc0a8c000 RCX: 0000000000000000
[2667308.626234] RDX: ffff9cbed5d2cb40 RSI: ffff9cbed5d20580 RDI: 0000000000000300
[2667308.626242] RBP: ffffac3940234ea8 R08: 0000000000000003 R09: fffffffffffd22d8
[2667308.626250] R10: 0000000074756f20 R11: 0000000074756f20 R12: 0000000000000000
[2667308.626258] R13: ffff9cbdd036ce80 R14: 0000000000000001 R15: ffff9cbdc0a8c4c0
[2667308.626268] FS:  0000000000000000(0000) GS:ffff9cbed5d00000(0000) knlGS:0000000000000000
[2667308.626278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2667308.626287] CR2: 000056130da71686 CR3: 000000010689c000 CR4: 00000000000406e0
[2667308.626296] Call Trace:
[2667308.626303]  <IRQ>
[2667308.626313]  ? show_trace_log_lvl+0x1d6/0x2ea
[2667308.626328]  ? show_trace_log_lvl+0x1d6/0x2ea
[2667308.626342]  ? call_timer_fn+0x2c/0x120
[2667308.626359]  ? show_regs.part.0+0x23/0x29
[2667308.626372]  ? show_regs.cold+0x8/0xd
[2667308.626384]  ? dev_watchdog+0x277/0x280
[2667308.626396]  ? __warn+0x8c/0x100
[2667308.626407]  ? dev_watchdog+0x277/0x280
[2667308.626420]  ? report_bug+0xa4/0xd0
[2667308.626431]  ? arch_irq_work_raise+0x3a/0x50
[2667308.626444]  ? handle_bug+0x39/0x90
[2667308.626457]  ? exc_invalid_op+0x19/0x70
[2667308.626469]  ? asm_exc_invalid_op+0x1b/0x20
[2667308.626482]  ? dev_watchdog+0x277/0x280
[2667308.626493]  ? pfifo_fast_enqueue+0x160/0x160
[2667308.626505]  call_timer_fn+0x2c/0x120
[2667308.626518]  __run_timers.part.0+0x1e3/0x270
[2667308.626529]  ? ktime_get+0x46/0xc0
[2667308.626544]  ? native_x2apic_icr_read+0x20/0x20
[2667308.626556]  ? lapic_next_event+0x20/0x30
[2667308.626568]  ? clockevents_program_event+0xad/0x130
[2667308.626583]  run_timer_softirq+0x2a/0x60
[2667308.626595]  __do_softirq+0xd9/0x2e7
[2667308.626607]  irq_exit_rcu+0x94/0xc0
[2667308.626620]  sysvec_apic_timer_interrupt+0x80/0x90
[2667308.626633]  </IRQ>
[2667308.626639]  <TASK>
[2667308.626646]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[2667308.626657] RIP: 0010:cpuidle_enter_state+0xd9/0x620
[2667308.626671] Code: 3d 54 7a 18 73 e8 67 77 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 a8 84 67 ff 80 7d d0 00 0f 85 61 01 00                                                                             00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00
[2667308.626680] RSP: 0018:ffffac39400c3e28 EFLAGS: 00000246
[2667308.626694] RAX: ffff9cbed5d31480 RBX: ffff9cbdc09c9000 RCX: 0000000000000000
[2667308.626703] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
[2667308.626711] RBP: ffffac39400c3e78 R08: 000979e72f1d6d1e R09: 0000000000000000
[2667308.626719] R10: 0000000000000001 R11: 071c71c71c71c71c R12: ffffffff8e4e9120
[2667308.626726] R13: 0000000000000002 R14: 0000000000000002 R15: 000979e72f1d6d1e
[2667308.626738]  ? cpuidle_enter_state+0xc8/0x620
[2667308.626751]  ? tick_nohz_stop_tick+0x16a/0x1d0
[2667308.626764]  cpuidle_enter+0x2e/0x50
[2667308.626776]  cpuidle_idle_call+0x142/0x1e0
[2667308.626789]  do_idle+0x83/0xf0
[2667308.626800]  cpu_startup_entry+0x20/0x30
[2667308.626811]  start_secondary+0x12a/0x180
[2667308.626823]  secondary_startup_64_no_verify+0xc2/0xcb
[2667308.626839]  </TASK>
[2667308.626846] ---[ end trace 095eb2337a23b35c ]---
[2667308.626858] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667311.361232] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2
[2667311.386436] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2
[2667311.410686] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2
[2667311.435339] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2
[2667314.513162] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667319.633077] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667325.520952] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667330.384862] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667335.504701] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667340.624647] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667346.512537] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667351.632450] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667357.520323] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667362.384221] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667367.504126] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667372.628027] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667378.511924] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667383.631810] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667389.519693] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667394.383594] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667399.503496] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667404.623394] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667410.511284] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667415.631198] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667421.519064] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667426.382970] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667431.502869] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667436.622767] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667442.510659] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667447.630559] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667453.518451] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667458.382348] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667463.502260] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667468.622145] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667474.510027] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667479.629925] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667485.517810] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667490.381715] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667495.501611] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667500.621520] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667506.509360] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667511.629308] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667517.517193] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667522.381091] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667527.501007] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667532.624888] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667538.508784] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667543.628643] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667549.520557] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667554.380459] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667559.500359] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667564.620259] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667570.508151] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667575.628063] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667581.515931] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667586.379839] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667591.499741] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667596.619634] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667602.507519] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667607.627439] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667613.515304] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667618.379207] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667623.499107] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667628.619006] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667634.506894] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667639.626799] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667645.514680] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667650.378579] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667655.498482] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667660.618379] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667666.506272] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667671.626169] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667677.514051] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667682.377960] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667687.497859] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667692.617753] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667698.505641] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667703.625502] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667709.513391] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667714.377330] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667719.497233] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667724.617130] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667730.505018] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667735.624917] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667741.512797] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667746.380705] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667751.496568] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667756.616504] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667762.504389] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667767.624291] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667773.512174] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667778.376076] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667783.495978] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667788.615883] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667794.503763] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667799.623675] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667805.511548] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667810.375416] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667815.495353] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667820.615257] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667826.503148] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667831.623054] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667837.510922] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667842.374831] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667847.494732] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667852.614632] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667858.506517] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667861.214565] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci
[2667863.622378] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667866.590335] r8152-cfgselector 1-4: device descriptor read/64, error -110
[2667869.510264] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667874.374205] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667879.494099] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667882.206079] r8152-cfgselector 1-4: device descriptor read/64, error -110
[2667882.442065] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci
[2667884.613963] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667887.837962] r8152-cfgselector 1-4: device descriptor read/64, error -110
[2667890.501887] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667895.621790] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667901.509634] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667903.461589] r8152-cfgselector 1-4: device descriptor read/64, error -110
[2667903.697656] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci
[2667906.373574] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667911.493475] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667914.545390] r8152-cfgselector 1-4: device not accepting address 3, error -110
[2667914.673433] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci
[2667916.613326] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667922.501263] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout
[2667925.297229] r8152-cfgselector 1-4: device not accepting address 3, error -110
[2667925.297410] r8152 1-4:1.0 enx1c1adff98bf8: Get ether addr fail
[2667925.299958] r8152-cfgselector 1-4: USB disconnect, device number 3
[2667925.489232] usb 1-4: new high-speed USB device number 11 using ehci-pci
[2667930.845121] usb 1-4: device descriptor read/64, error -110
[2667946.460774] usb 1-4: device descriptor read/64, error -110
[2667946.696818] usb 1-4: new high-speed USB device number 12 using ehci-pci
[2667952.092711] usb 1-4: device descriptor read/64, error -110
[2667967.704405] usb 1-4: device descriptor read/64, error -110
[2667967.812426] usb usb1-port4: attempt power cycle
[2667968.260383] usb 1-4: new high-speed USB device number 13 using ehci-pci
[2667979.052183] usb 1-4: device not accepting address 13, error -110
[2667979.180183] usb 1-4: new high-speed USB device number 14 using ehci-pci
[2667989.803973] usb 1-4: device not accepting address 14, error -110
[2667989.804112] usb usb1-port4: unable to enumerate USB device
[2667990.187980] usb 4-1: new full-speed USB device number 2 using ohci-pci
[2667990.379048] usb 4-1: not running at top speed; connect to a high speed hub
[2667990.395043] usb 4-1: New USB device found, idVendor=045e, idProduct=0927, bcdDevice=31.00
[2667990.395055] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[2667990.395061] usb 4-1: Product: Ethernet Adapter
[2667990.395065] usb 4-1: Manufacturer: Microsoft
[2667990.395068] usb 4-1: SerialNumber: 001000905
[2667990.596023] r8152-cfgselector 4-1: reset full-speed USB device number 2 using ohci-pci
[2667991.067052] r8152 4-1:1.0: load rtl8153b-2 v1 10/23/19 successfully
[2667991.222391] r8152 4-1:1.0 eth2: v1.12.13
[2667991.352538] r8152 4-1:1.0 enx1c1adff98bf8: renamed from eth2
[2667993.712957] IPv6: ADDRCONF(NETDEV_CHANGE): enx1c1adff98bf8: link becomes ready
[2667993.720866] r8152 4-1:1.0 enx1c1adff98bf8: carrier on
Comment 103 Stian Skjelstad 2023-10-23 08:46:51 UTC
Bonus information: no dock, directly attached to USB on mainboard

$ lsusb
Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 045e:0927 Microsoft Corp. RTL8153B GigE [Surface Ethernet Adapter]
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
$ lsusb -t
/:  Bus 07.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/2p, 12M
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M
    |__ Port 4: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 480M
Comment 104 Prashanth K 2023-11-14 08:39:05 UTC
I was also facing the same issue which was seen by @Stian, turning off SG using ethtool helped to solve the issue. Upon checking further, the DWC3 xHC controller (which apparently i was using) has some limitations with its internal TRB Cache size for chained TRBs. And it was fixed using the following patch - 

https://lore.kernel.org/all/20201208092912.1773650-3-mathias.nyman@linux.intel.com/

Ultimately I had to enable XHCI_SG_TRB_CACHE_SIZE_QUIRK in XHCI :)
Comment 105 Stian Skjelstad 2023-11-14 12:06:09 UTC
I recently updated to Ubuntu kernel 6.2.0-36-generic

[ 7827.109137] ------------[ cut here ]------------
[ 7827.109157] NETDEV WATCHDOG: enx1c1adff98bf8 (r8152): transmit queue 0 timed out
[ 7827.109255] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x21f/0x230
[ 7827.109281] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_conntrack nft_chain_nat xt_MASQUERADE xt_tcpudp nft_compat nf_tables nfnetlink amdgpu binfmt_misc iommu_v2 drm_buddy gpu_sched radeon drm_ttm_helper ttm drm_display_helper cec input_leds joydev edac_mce_amd rc_core kvm_amd ccp snd_hda_codec_realtek drm_kms_helper kvm i2c_algo_bit snd_hda_codec_generic syscopyarea sysfillrect sysimgblt video irqbypass crct10dif_pclmul snd_hda_codec_hdmi ledtrig_audio polyval_clmulni polyval_generic snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ghash_clmulni_intel snd_hda_codec snd_hda_core sha512_ssse3 snd_hwdep aesni_intel snd_pcm snd_timer snd ppdev soundcore crypto_simd cryptd k10temp fam15h_power mac_hid serio_raw wmi_bmof parport_pc sch_fq_codel lp parport msr nf_nat_pptp nf_conntrack_pptp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
[ 7827.109916]  libcrc32c raid0 multipath linear raid1 hid_microsoft ff_memless cdc_ether pata_acpi hid_generic usbnet usbhid r8152 psmouse mii hid crc32_pclmul i2c_piix4 pata_atiixp r8169 ahci libahci realtek wmi
[ 7827.110050] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G           OE      6.2.0-36-generic #37~22.04.1-Ubuntu
[ 7827.110060] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596), BIOS V3.6 10/26/2012
[ 7827.110068] RIP: 0010:dev_watchdog+0x21f/0x230
[ 7827.110080] Code: 00 e9 31 ff ff ff 4c 89 e7 c6 05 d9 5f 78 01 01 e8 e6 ff f7 ff 44 89 f1 4c 89 e6 48 c7 c7 b8 8c c4 85 48 89 c2 e8 81 c3 2b ff <0f> 0b e9 22 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[ 7827.110089] RSP: 0018:ffffbf178023ce70 EFLAGS: 00010246
[ 7827.110103] RAX: 0000000000000000 RBX: ffffa0ec12ae34c8 RCX: 0000000000000000
[ 7827.110110] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 7827.110117] RBP: ffffbf178023ce98 R08: 0000000000000000 R09: 0000000000000000
[ 7827.110124] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0ec12ae3000
[ 7827.110131] R13: ffffa0ec12ae341c R14: 0000000000000000 R15: 0000000000000000
[ 7827.110138] FS:  0000000000000000(0000) GS:ffffa0ed15d00000(0000) knlGS:0000000000000000
[ 7827.110146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7827.110153] CR2: 00007ff492a92d30 CR3: 0000000102ff0000 CR4: 00000000000406e0
[ 7827.110162] Call Trace:
[ 7827.110169]  <IRQ>
[ 7827.110177]  ? show_regs+0x72/0x90
[ 7827.110189]  ? dev_watchdog+0x21f/0x230
[ 7827.110198]  ? __warn+0x8d/0x160
[ 7827.110211]  ? dev_watchdog+0x21f/0x230
[ 7827.110222]  ? report_bug+0x1bb/0x1d0
[ 7827.110233]  ? irq_work_queue+0x32/0x80
[ 7827.110244]  ? handle_bug+0x46/0x90
[ 7827.110256]  ? exc_invalid_op+0x19/0x80
[ 7827.110266]  ? asm_exc_invalid_op+0x1b/0x20
[ 7827.110280]  ? dev_watchdog+0x21f/0x230
[ 7827.110290]  ? __pfx_dev_watchdog+0x10/0x10
[ 7827.110299]  call_timer_fn+0x2c/0x160
[ 7827.110312]  ? __pfx_dev_watchdog+0x10/0x10
[ 7827.110322]  __run_timers.part.0+0x1fb/0x2b0
[ 7827.110335]  run_timer_softirq+0x2a/0x60
[ 7827.110348]  __do_softirq+0xdd/0x330
[ 7827.110359]  ? hrtimer_interrupt+0x12b/0x250
[ 7827.110373]  __irq_exit_rcu+0xa2/0xd0
[ 7827.110383]  irq_exit_rcu+0xe/0x20
[ 7827.110392]  sysvec_apic_timer_interrupt+0x96/0xb0
[ 7827.110403]  </IRQ>
[ 7827.110409]  <TASK>
[ 7827.110415]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 7827.110427] RIP: 0010:cpuidle_enter_state+0xde/0x6f0
[ 7827.110440] Code: 4f 11 7b e8 94 1a 45 ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 92 f8 43 ff 80 7d d0 00 0f 85 e8 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 0f 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c4 04 00 00
[ 7827.110448] RSP: 0018:ffffbf17800cbe28 EFLAGS: 00000246
[ 7827.110460] RAX: 0000000000000000 RBX: ffffa0ec0243e000 RCX: 0000000000000000
[ 7827.110467] RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000000000000
[ 7827.110472] RBP: ffffbf17800cbe78 R08: 0000000000000000 R09: 0000000000000000
[ 7827.110478] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff866d6140
[ 7827.110491] R13: 0000000000000002 R14: 0000000000000002 R15: 0000071e640eaa93
[ 7827.110503]  ? cpuidle_enter_state+0xce/0x6f0
[ 7827.110516]  ? tick_nohz_stop_tick+0x17a/0x210
[ 7827.110527]  cpuidle_enter+0x2e/0x50
[ 7827.110539]  cpuidle_idle_call+0x14f/0x1e0
[ 7827.110554]  do_idle+0x82/0x110
[ 7827.110565]  cpu_startup_entry+0x20/0x30
[ 7827.110576]  start_secondary+0x138/0x170
[ 7827.110589]  secondary_startup_64_no_verify+0xe5/0xeb
[ 7827.110604]  </TASK>
[ 7827.110609] ---[ end trace 0000000000000000 ]---


It seems to both happen with EHCI and OHCI port, and the entire USB subsystem is broken until I do a reboot.

$ lspci -nn|grep USB
00:12.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397]
00:12.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller [1002:4398]
00:12.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396]
00:13.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397]
00:13.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller [1002:4398]
00:13.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396]
00:14.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller [1002:4399]


I will try to disable SG with ethtool and see if the problem goes away, as suggested by @Prashanth by putting the following into /etc/network/interfaces

auto enx1c1adff98bf8
iface enx1c1adff98bf8 inet static
        address 192.168.255.1
        netmask 255.255.255.0
        up sleep 5; ethtool -K enx1c1adff98bf8 sg off
Comment 106 Stian Skjelstad 2023-11-29 10:31:50 UTC
15 days uptime so far with:
  ethtool -K enx1c1adff98bf8 sg off

So it is a viable workaround.
Comment 107 Prashanth K 2024-01-11 09:56:07 UTC
Thanks for the confirmation, I have sent a fix (for DWC3 controllers) to upstream, lets see if we can get it accepted - https://lore.kernel.org/all/20231212112521.3774610-1-quic_prashk@quicinc.com/
Comment 108 Robin Tetour 2024-02-25 19:06:43 UTC
I am experiencing this issue on latest Fedora, kernel 6.7.5. I have noticed the patch got merged. So I was running it flawlessly, no dropped connections, but after a few hours it crashed with Tx status -71. I have noticed that my hub is 2109:2813 (3 entries in lsusb) and 2109:0813 (also 3 entries in lsusb). Which is different from others here, yet I am experiencing identical issue.
Comment 109 ellePdesk 2024-03-01 16:12:17 UTC
I have 2 usb R8153 ethernet adapters, both use usbID 0bda:8153.
Both stop functioning within a minute with the logs flooding with "Tx status -71".
Using 
  ethtool -K [iface] sg off 
had no effect on this, still fails within minutes.
Currently running ubuntu 23.11 with kernel 6.5.0-21-generic
Comment 110 Ansgar Hegerfeld 2024-03-04 16:24:36 UTC
I think I can reproduce the "Tx status -71" error using a fresh 6.8.0-rc7-1-mainline kernel (Arch Linux here) and two Samsung ViewFinity S80TB displays (integrated RTL8153 controller), which are connected using Thunderbolt to a Framework 13 AMD. "ethtool -K [iface] sg off" has no effect for me, too.

I can run a network download speed test (i.e. https://www.speedtest.net/run ) without problems, but after the download is finished and the upload part starts, my displays seem to reconnect 1-2x and then they disconnect completely. dmesg attached, it can be reproduced with a chance of 95%. Sometimes I need a second speed test to crash it. No TLP active here, the usbcore.quirks didn't work for me.

lsusb shows me this: "Bus 008 Device 013: ID 0bda:8153 Realtek Semiconductor Corp. RTL" and "Bus 008 Device 016: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter"
Comment 111 Ansgar Hegerfeld 2024-03-04 16:25:12 UTC
Created attachment 305959 [details]
dmesg of Linux 6.8.0-rc7
Comment 112 Robin Tetour 2024-03-04 22:29:22 UTC
(In reply to Ansgar Hegerfeld from comment #110)
> I can run a network download speed test (i.e. https://www.speedtest.net/run
> ) without problems, but after the download is finished and the upload part
> starts, my displays seem to reconnect 1-2x and then they disconnect
> completely.

Pretty much same behavior over here, this was happening before patch for me 100% of the time. Now it happened at random. If it does crash during speedtest I did not manage to hit it since I tested it only once since the patch. Will try again soon.
Comment 113 ellePdesk 2024-03-06 11:59:50 UTC
I've done some testing today, and I've come to some interesting findings:
First: tnx to Ansgar Hegerfeld, speedtest upload is a very reliable trigger for the error.

When connecting the my separate dongle to the usb-a port or using a converter the usb-c ports on my laptop the behavior is not triggered.

When connecting the same dongle to my Thinkpad 40AY docking station, the behaviour can be reliably triggered.

The difference as far as I can tell is the addition of the internal usb 3.1 hub(s) of the docking station, usbid 17ef:30ab for the internal r8153 resp. 17ef:30ab and 17ef:30ad for the external.

My workaround for now is to connect the external usb-ethernet adapter to a usb 2 port of the hub and accept the lower speed.

output of 'lsusb -v -d  17ef:30ab':

Bus 004 Device 002: ID 17ef:30ab Lenovo USB3.1 Hub             
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               3.20
  bDeviceClass            9 Hub
  bDeviceSubClass         0 
  bDeviceProtocol         3 
  bMaxPacketSize0         9
  idVendor           0x17ef Lenovo
  idProduct          0x30ab 
  bcdDevice           51.34
  iManufacturer           1 VIA Labs, Inc.         
  iProduct                2 USB3.1 Hub             
  iSerial                 3 000000001
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x001f
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes           19
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Feedback
        wMaxPacketSize     0x0002  1x 2 bytes
        bInterval               8
        bMaxBurst               0
Binary Object Store Descriptor:
  bLength                 5
  bDescriptorType        15
  wTotalLength       0x0049
  bNumDeviceCaps          5
  USB 2.0 Extension Device Capability:
    bLength                 7
    bDescriptorType        16
    bDevCapabilityType      2
    bmAttributes   0x00000006
      BESL Link Power Management (LPM) Supported
  SuperSpeed USB Device Capability:
    bLength                10
    bDescriptorType        16
    bDevCapabilityType      3
    bmAttributes         0x00
    wSpeedsSupported   0x000e
      Device can operate at Full Speed (12Mbps)
      Device can operate at High Speed (480Mbps)
      Device can operate at SuperSpeed (5Gbps)
    bFunctionalitySupport   1
      Lowest fully-functional device speed is Full Speed (12Mbps)
    bU1DevExitLat           4 micro seconds
    bU2DevExitLat         231 micro seconds
  Container ID Device Capability:
    bLength                20
    bDescriptorType        16
    bDevCapabilityType      4
    bReserved               0
    ContainerID             {5e048157-f6af-4075-b308-2b3de1bdadf5}
  SuperSpeedPlus USB Device Capability:
    bLength                28
    bDescriptorType        16
    bDevCapabilityType     10
    bmAttributes         0x00000023
      Sublink Speed Attribute count 4
      Sublink Speed ID count 2
    wFunctionalitySupport   0x1100
      Min functional Speed Attribute ID: 0
      Min functional RX lanes: 1
      Min functional TX lanes: 1
    bmSublinkSpeedAttr[0]   0x00050030
      Speed Attribute ID: 0 5Gb/s Symmetric RX SuperSpeed
    bmSublinkSpeedAttr[1]   0x000500b0
      Speed Attribute ID: 0 5Gb/s Symmetric TX SuperSpeed
    bmSublinkSpeedAttr[2]   0x000a4031
      Speed Attribute ID: 1 10Gb/s Symmetric RX SuperSpeedPlus
    bmSublinkSpeedAttr[3]   0x000a40b1
      Speed Attribute ID: 1 10Gb/s Symmetric TX SuperSpeedPlus
  ** UNRECOGNIZED:  03 10 0b
Hub Descriptor:
  bLength              12
  bDescriptorType      42
  nNbrPorts             4
  wHubCharacteristic 0x0009
    Per-port power switching
    Per-port overcurrent protection
  bPwrOn2PwrGood      175 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  bHubDecLat          0.4 micro seconds
  wHubDelay          2292 nano seconds
  DeviceRemovable    0x0a
 Hub Port Status:
   Port 1: 0000.0203 5Gbps power U0 enable connect
     Ext Status: 0000.0000
       RX Speed Attribute ID: 0 Lanes: 1
       TX Speed Attribute ID: 0 Lanes: 1
   Port 2: 0000.02a0 5Gbps power Rx.Detect
   Port 3: 0000.0263 5Gbps power suspend enable connect
     Ext Status: 0000.0011
       RX Speed Attribute ID: 1 Lanes: 1
       TX Speed Attribute ID: 1 Lanes: 1
   Port 4: 0000.02a0 5Gbps power Rx.Detect
Device Status:     0x0001
  Self Powered