Bug 198931 - Network connection on r8152 stops with "Tx status -71"
Summary: Network connection on r8152 stops with "Tx status -71"
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-25 13:03 UTC by Jean-Louis Dupond
Modified: 2021-05-09 20:42 UTC (History)
30 users (show)

See Also:
Kernel Version: 4.16-rc2 (drm-tip)
Tree: Mainline
Regression: No


Attachments
dmesg of Linux 4.17.0 (94.57 KB, text/plain)
2018-10-19 13:24 UTC, RussianNeuroMancer
Details
dmesg of Linux 4.19rc8 (110.52 KB, text/plain)
2018-10-19 13:25 UTC, RussianNeuroMancer
Details
dmesg of Linux 5.4.0 (174.20 KB, text/plain)
2020-03-03 14:13 UTC, RussianNeuroMancer
Details
lsusb output mjanssens wd15 xps9360 (3.98 KB, text/plain)
2020-05-05 14:27 UTC, Michiel Janssens
Details

Description Jean-Louis Dupond 2018-02-25 13:03:56 UTC
Hi All!

I have the following setup:
Precision 5520
Dell WD15 Docking

On the WD15 docking there is an network port r8152.

Now once in a while (like 1 time a week), the connection dies, and kernel prints the following message:
kernel: [ 6164.073282] r8152 4-1.2:1.0 enxa44cc890f4c8: Tx status -71

simply replugging the dock, or disable/enable the network interface fixes the problem.

Question is, how comes this appear :)

Feel free to ask for additional information!
Comment 1 RussianNeuroMancer 2018-10-19 13:23:48 UTC
Seems like a have same issue on Dell Latitude 7285 and HP EliteBook Folio G1 with Belkin USB-C Express Dock 3.1 HD F4U093:

[ 1090.235874] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[ 1090.235879] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 1090.235886] pcieport 0000:00:1c.0:   device [8086:9d10] error status/mask=00003000/00002000
[ 1090.235889] pcieport 0000:00:1c.0:    [12] Timeout               
[ 1589.760804] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx status -2
[ 1594.048998] ------------[ cut here ]------------
[ 1594.049003] NETDEV WATCHDOG: enx58ef68a8892b (r8152): transmit queue 0 timed out
[ 1594.049040] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:461 dev_watchdog+0x221/0x230
[ 1594.049042] Modules linked in: [...]
[ 1594.049294] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.19.0-041900rc8-generic #201810150631
[ 1594.049296] Hardware name: Dell Inc. Latitude 7285/0VVWNX, BIOS 1.2.0 07/09/2018
[ 1594.049303] RIP: 0010:dev_watchdog+0x221/0x230
[ 1594.049307] Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 26 ff f5 00 01 e8 c3 b6 fc ff 89 d9 4c 89 ee 48 c7 c7 08 2d 7b 82 48 89 c2 e8 61 26 7b ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
[ 1594.049310] RSP: 0018:ffffc2438192bd70 EFLAGS: 00010282
[ 1594.049314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 1594.049317] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffffa0341e216420
[ 1594.049320] RBP: ffffc2438192bda0 R08: 0000000000000001 R09: 0000000000000511
[ 1594.049322] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000001
[ 1594.049325] R13: ffffa033fac37000 R14: ffffa033fac374c0 R15: ffffa034166cf680
[ 1594.049329] FS:  0000000000000000(0000) GS:ffffa0341e200000(0000) knlGS:0000000000000000
[ 1594.049332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1594.049335] CR2: 00007f26eec71020 CR3: 000000038a20a005 CR4: 00000000003606f0
[ 1594.049340] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1594.049342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1594.049344] Call Trace:
[ 1594.049356]  ? pfifo_fast_change_tx_queue_len+0x2e0/0x2e0
[ 1594.049363]  call_timer_fn+0x30/0x130
[ 1594.049371]  run_timer_softirq+0x3ea/0x420
[ 1594.049376]  ? __switch_to_asm+0x34/0x70
[ 1594.049381]  ? __switch_to+0xad/0x500
[ 1594.049385]  ? __switch_to_asm+0x40/0x70
[ 1594.049388]  ? __switch_to_asm+0x34/0x70
[ 1594.049392]  ? __switch_to_asm+0x40/0x70
[ 1594.049397]  __do_softirq+0xdc/0x2b5
[ 1594.049403]  run_ksoftirqd+0x2b/0x40
[ 1594.049410]  smpboot_thread_fn+0xd0/0x170
[ 1594.049416]  kthread+0x120/0x140
[ 1594.049421]  ? sort_range+0x30/0x30
[ 1594.049426]  ? kthread_bind+0x40/0x40
[ 1594.049431]  ret_from_fork+0x35/0x40
[ 1594.049435] ---[ end trace 3fcb83dc58402212 ]---
[ 1594.049468] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout
[ 1599.172616] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout
[ 1604.288619] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout
[ 1610.176579] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout

Full logs:
Comment 2 RussianNeuroMancer 2018-10-19 13:24:37 UTC
Created attachment 279099 [details]
dmesg of Linux 4.17.0
Comment 3 RussianNeuroMancer 2018-10-19 13:25:08 UTC
Created attachment 279101 [details]
dmesg of Linux 4.19rc8
Comment 4 RussianNeuroMancer 2018-11-28 11:43:09 UTC
Jean-Louis, can you please verify if issue is still reproducible for you on Linux 4.20rc4? For me, at least with one dock (Belkin USB-C Express Dock 3.1 HD F4U093) and one device (HP Elite x2 1013 G3) this issue is no longer reproducible. I will verify other laptops with Linux 4.20 later.
Comment 5 Jean-Louis Dupond 2018-12-04 12:14:35 UTC
I haven't seen this the last months. Running Ubuntu 18.10 with 4.18.0-11-generic
Comment 6 Konstantin Sobolev 2019-12-04 23:22:05 UTC
I have a very similar setup: Dell Precision 7540 with WD19DC dock that has RTL8153 adapter. It crashes periodically with similar symptoms, my current kernel is 5.4.1

[76658.437411] ------------[ cut here ]------------
[76658.437412] NETDEV WATCHDOG: enp57s0u2u4 (r8152): transmit queue 0 timed out
[76658.437421] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x21f/0x230
[76658.437421] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi r8152 mii tun md4 cifs dm_zero fuse raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_crypt dm_mirror dm_region_hash dm_log dm_mod dax ohci_pci ohci_hcd uhci_hcd ehci_pci ehci_hcd mousedev hid_multitouch dell_rbtn input_leds dell_laptop dell_wmi dell_smbios i2c_designware_platform atkbd rtsx_pci_sdmmc mmc_core mei_hdcp i2c_designware_core dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof intel_rapl_msr libps2 dcdbas dell_smm_hwmon btusb btrtl btbcm uvcvideo x86_pkg_temp_thermal videobuf2_vmalloc btintel intel_powerclamp videobuf2_memops coretemp videobuf2_v4l2 ucsi_acpi bluetooth processor_thermal_device videodev intel_lpss_pci typec_ucsi mei_me i2c_i801 rtsx_pci intel_soc_dts_iosf ecdh_generic intel_lpss mei mfd_core ecc videobuf2_common intel_rapl_common intel_pch_thermal typec wmi i8042 int3403_thermal int3400_thermal i2c_hid dell_smo8800
[76658.437439]  int340x_thermal_zone serio acpi_thermal_rel intel_pmc_core evdev i915
[76658.437441] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            5.4.1-gentoo #6
[76658.437442] Hardware name: Dell Inc. Precision 7540/0CYJDT, BIOS 1.4.0 09/23/2019
[76658.437443] RIP: 0010:dev_watchdog+0x21f/0x230
[76658.437444] Code: 85 c0 75 e8 eb a8 4c 89 ef c6 05 5d 62 b3 00 01 e8 e6 c8 fc ff 44 89 e1 4c 89 ee 48 c7 c7 48 c1 49 9f 48 89 c2 e8 ea 11 8a ff <0f> 0b eb 89 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 c7 47 08 00
[76658.437444] RSP: 0018:ffffb139401b8e80 EFLAGS: 00010282
[76658.437445] RAX: 0000000000000000 RBX: ffff9f890e1a2a00 RCX: 00000000000011d4
[76658.437445] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffffa39e53ac
[76658.437446] RBP: ffff9f8914cc7440 R08: 0000000000000001 R09: 00000000000011d4
[76658.437446] R10: 0000000000028978 R11: 0000000000000001 R12: 0000000000000000
[76658.437447] R13: ffff9f8914cc7000 R14: ffff9f8914cc7440 R15: 0000000000000001
[76658.437447] FS:  0000000000000000(0000) GS:ffff9f891c080000(0000) knlGS:0000000000000000
[76658.437448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76658.437448] CR2: 00007fc9000030b8 CR3: 00000009c4384006 CR4: 00000000003606e0
[76658.437448] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76658.437449] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76658.437449] Call Trace:
[76658.437450]  <IRQ>
[76658.437452]  ? qdisc_put_unlocked+0x30/0x30
[76658.437454]  call_timer_fn+0x26/0x120
[76658.437454]  run_timer_softirq+0x17d/0x470
[76658.437456]  ? enqueue_hrtimer+0x31/0x80
[76658.437457]  ? __hrtimer_run_queues+0x11b/0x260
[76658.437458]  __do_softirq+0xd6/0x2ba
[76658.437460]  irq_exit+0x9b/0xa0
[76658.437461]  smp_apic_timer_interrupt+0x5b/0x110
[76658.437462]  apic_timer_interrupt+0xf/0x20
[76658.437462]  </IRQ>
[76658.437464] RIP: 0010:cpuidle_enter_state+0xa8/0x400
[76658.437464] Code: c5 0f 1f 44 00 00 31 ff e8 85 fb 9b ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 2d 03 00 00 31 ff e8 7c d1 a0 ff fb 45 85 e4 <0f> 88 6c 02 00 00 4c 2b 2c 24 49 63 cc 48 8d 04 49 48 c1 e0 05 8b
[76658.437465] RSP: 0018:ffffb139400dfe70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[76658.437465] RAX: ffff9f891c0a7bc0 RBX: ffffffff9f6a1ce0 RCX: 000045b86eed58ba
[76658.437466] RDX: 000045b86efc9af4 RSI: 000045b86eed58ba RDI: 0000000000000000
[76658.437466] RBP: ffffd1393fab4a10 R08: 000045b86eed58d6 R09: 00000000000001bf
[76658.437466] R10: ffff9f891c0a6c20 R11: ffff9f891c0a6c00 R12: 0000000000000002
[76658.437467] R13: 000045b86eed58d6 R14: 0000000000000002 R15: ffff9f8915f5c740
[76658.437468]  cpuidle_enter+0x24/0x40
[76658.437470]  do_idle+0x1bf/0x230
[76658.437471]  cpu_startup_entry+0x14/0x20
[76658.437472]  start_secondary+0x131/0x160
[76658.437473]  secondary_startup_64+0xa4/0xb0
[76658.437474] ---[ end trace 907e490a0cd3c160 ]---
[76658.437476] r8152 4-2.4:1.0 enp57s0u2u4: Tx timeout
[76659.788078] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=83035 end=83036) time 243 us, min 1431, max 1439, scanline start 1421, end 1443
[76660.958672] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
[76660.958758] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
[76660.958848] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
[76660.958940] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2
Comment 7 Peter 2020-01-27 13:20:07 UTC
I have a similiar setup and similiar problem:
Setup: Lenovo Thinkpad t480, Think Pad USB-C Dock 40A90090EU [1], Ubuntu 16.04, Kernel  4.15.0-74-generic #83~16.04.1-Ubuntu

Network connection is periodically crashing.
Dmesg shows `r8152 4-1.1:1.0 enxe04f43991e1c: Rx status -71` in that case. 
I noticed that this seams to depend on the use of the network connection. E.g. if I compile a lot using icecream to distribute compilation jobs, it seams to be a lot less stable. 

Using `rmmod r8152 && modprobe r8152` fixes the problem temporarily. 


[1] https://support.lenovo.com/de/de/accessories/acc100348
Comment 8 RussianNeuroMancer 2020-01-27 14:00:29 UTC
@Peter check Comment 4
Re-test on newer kernel (you can take it from mainline PPA).
Comment 9 Timur Kristóf 2020-03-02 12:08:11 UTC
This still happens to me on 5.5.6-201.fc31.x86_64. My dmesg is full of these messages:

[12696.189484] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12702.333456] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12707.965422] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12713.085385] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12718.205360] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12724.349321] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12729.981295] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12735.101256] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12740.221235] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12746.365199] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12751.997171] r8152 6-1:1.0 enp10s0u1: Tx timeout
[12757.117155] r8152 6-1:1.0 enp10s0u1: Tx timeout
Comment 10 RussianNeuroMancer 2020-03-03 08:06:00 UTC
Timur, you are using same docking station as Jean-Louis or some other?
Comment 11 Timur Kristóf 2020-03-03 12:47:07 UTC
RussianNeuroMancer, I use a Dell XPS 13 9370 with a Lenovo ThinkPad branded Thunderbolt 3 dock. The model number is DBB9003L1. (The dock is not mine, I'm just borrowing it from a collegaue for a week.) I think these docks mostly use the same hardware under the hood, I think I've also seen a Fedora bug report about the same issue with the Dell TB16 here: https://bugzilla.redhat.com/show_bug.cgi?id=1460789
Comment 12 RussianNeuroMancer 2020-03-03 13:01:35 UTC
I see. By the way, since my Comment 4 I was able to reproduce this issue again. This time with Linux 5.4 on Dell Venue 8 Pro 5855 and Dell WD15 Dock.
Comment 13 RussianNeuroMancer 2020-03-03 14:13:28 UTC
Created attachment 287779 [details]
dmesg of Linux 5.4.0
Comment 14 BniceJada 2020-03-05 14:03:27 UTC
Same problem here. Dell Latitude 7480 (BIOS 1.16.1) with WD15 dock (Port Controller on v1.1.8). I am using 5.5.7-zen1-1-zen (but the same problem also occured with the standard arch kernel).

It has not occured with 5.4.2.arch1-1 but it for sure occured with 5.4.5.arch1-1 (I had holidays inbetween and the troubles started after them).

--
Mar 05 13:42:34 hostname kernel: ------------[ cut here ]------------
Mar 05 13:42:34 hostname kernel: NETDEV WATCHDOG: enp59s0u1u2 (r8152): transmit queue 0 timed out
Mar 05 13:42:34 hostname kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x268/0x270
Mar 05 13:42:34 hostname kernel: Modules linked in: md4 nls_utf8 cifs dns_resolver fscache libdes rfcomm ip6t_REJECT nf_reject_ipv6 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_na>
Mar 05 13:42:34 hostname kernel:  coretemp snd_hda_codec_generic ledtrig_audio kvm_intel snd_pcm_dmaengine snd_hda_intel dell_wmi_descriptor dcdbas snd_intel_dspcfg dell_smm_hwmon snd_hda_codec kvm cfg80211 snd_hda_core snd_hwdep snd_pcm e1000e fuse irqbypass i>
Mar 05 13:42:34 hostname kernel:  libps2 aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_hcd rtsx_pci i8042 serio i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_agp intel_gtt agpgart btrfs blake2b_generic libcr>
Mar 05 13:42:34 hostname kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.7-zen1-1-zen #1
Mar 05 13:42:34 hostname kernel: Hardware name: Dell Inc. Latitude 7480/00F6D3, BIOS 1.16.1 10/03/2019
Mar 05 13:42:34 hostname kernel: RIP: 0010:dev_watchdog+0x268/0x270
Mar 05 13:42:34 hostname kernel: Code: 47 9c 69 ff eb 8a 4c 89 f7 c6 05 dc 05 db 00 01 e8 0d fa f9 ff 44 89 e9 4c 89 f6 48 c7 c7 d0 2a 5a 8e 48 89 c2 e8 0f 92 73 ff <0f> 0b e9 68 ff ff ff 90 0f 1f 44 00 00 48 c7 47 08 00 00 00 00 48
Mar 05 13:42:34 hostname kernel: RSP: 0018:ffffb39300164e60 EFLAGS: 00010286
Mar 05 13:42:34 hostname kernel: RAX: 0000000000000000 RBX: ffff8cdc200b2000 RCX: 0000000000000000
Mar 05 13:42:34 hostname kernel: RDX: 0000000000000103 RSI: 00000000000000f6 RDI: 00000000ffffffff
Mar 05 13:42:34 hostname kernel: RBP: ffff8cdc0e5bf45c R08: 0000000000000515 R09: 0000000000000003
Mar 05 13:42:34 hostname kernel: R10: 0000000000000001 R11: 0000000000003c00 R12: ffff8cdc0e5bf480
Mar 05 13:42:34 hostname kernel: R13: 0000000000000000 R14: ffff8cdc0e5bf000 R15: ffff8cdc200b2080
Mar 05 13:42:34 hostname kernel: FS:  0000000000000000(0000) GS:ffff8cdc26500000(0000) knlGS:0000000000000000
Mar 05 13:42:34 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 05 13:42:34 hostname kernel: CR2: 00007fe15a0d3000 CR3: 000000019f20a001 CR4: 00000000003606e0
Mar 05 13:42:34 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 05 13:42:34 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 05 13:42:34 hostname kernel: Call Trace:
Mar 05 13:42:34 hostname kernel:  <IRQ>
Mar 05 13:42:34 hostname kernel:  ? qdisc_put_unlocked+0x30/0x30
Mar 05 13:42:34 hostname kernel:  ? qdisc_put_unlocked+0x30/0x30
Mar 05 13:42:34 hostname kernel:  call_timer_fn+0x2d/0x150
Mar 05 13:42:34 hostname kernel:  ? qdisc_put_unlocked+0x30/0x30
Mar 05 13:42:34 hostname kernel:  run_timer_softirq+0xaec/0xce0
Mar 05 13:42:34 hostname kernel:  __do_softirq+0x111/0x374
Mar 05 13:42:34 hostname kernel:  ? hrtimer_interrupt+0x235/0x3e0
Mar 05 13:42:34 hostname kernel:  irq_exit+0xc9/0x120
Mar 05 13:42:34 hostname kernel:  smp_apic_timer_interrupt+0xa6/0x1a0
Mar 05 13:42:34 hostname kernel:  apic_timer_interrupt+0xf/0x20
Mar 05 13:42:34 hostname kernel:  </IRQ>
Mar 05 13:42:34 hostname kernel: RIP: 0010:cpuidle_enter_state+0xc9/0x850
Mar 05 13:42:34 hostname kernel: Code: e8 8c b0 85 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 00 06 00 00 31 ff e8 3e 09 8d ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 1f 04 00 00 49 63 d4 4c 2b 6c 24 10 48 8d 04 52 48
Mar 05 13:42:34 hostname kernel: RSP: 0018:ffffb393000dbe50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Mar 05 13:42:34 hostname kernel: RAX: ffff8cdc26500000 RBX: ffff8cdc26537800 RCX: 000000000000001f
Mar 05 13:42:34 hostname kernel: RDX: 0000000000000000 RSI: 000000002f32988b RDI: 0000000000000000
Mar 05 13:42:34 hostname kernel: RBP: ffffffff8e8bea60 R08: 00000a3f27ed44df R09: 00000a3f251f7ba7
Mar 05 13:42:34 hostname kernel: R10: 0000000000000007 R11: 0000000000000007 R12: 0000000000000008
Mar 05 13:42:34 hostname kernel: R13: 00000a3f27ed44df R14: 0000000000000008 R15: ffff8cdc22a98000
Mar 05 13:42:34 hostname kernel:  cpuidle_enter+0x29/0x40
Mar 05 13:42:34 hostname kernel:  do_idle+0x20c/0x2c0
Mar 05 13:42:34 hostname kernel:  cpu_startup_entry+0x19/0x20
Mar 05 13:42:34 hostname kernel:  start_secondary+0x1c6/0x220
Mar 05 13:42:34 hostname kernel:  secondary_startup_64+0xb6/0xc0
Mar 05 13:42:34 hostname kernel: ---[ end trace 358d3d81e0691439 ]---
Mar 05 13:42:34 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:40 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:46 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:51 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Mar 05 13:42:56 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout
Comment 15 Jamin W. Collins 2020-03-08 20:43:20 UTC
I've been encountering this problem with every relatively recent (4.9+) kernel, and possibly older ones as well.

System: Lenovo W530

USB adapter: 
Cable Matters 3 Port USB 3.0 Hub with Ethernet (USB Hub with Ethernet, Gigabit Ethernet USB Hub ) Supporting 10/100/1000 Mbps Ethernet Network in Black
https://smile.amazon.com/gp/product/B01J6583NK/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

I've encountered the problem with Arch's main linux kernels and their LTS builds.

The interface seems to have trouble once it is put under any sort of load (30% or more utilization) on the host system. Removing and reloading the module can sometimes temporarily improve things, but (from what I've seen) the issue always returns within a few minutes to an hour.
Comment 17 RussianNeuroMancer 2020-03-09 18:26:12 UTC
So if it's same issue there is at least several workarounds: 

"usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link)
Install tlp and change USB_BLACKLIST option in /etc/default/tlp to "0bda:8153" (from second askubuntu link)
Patch /drivers/usb/core/quirks.c with following line (mentioned in tlp bugreport)
{ USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

Unfortunately, this week I doesn't have access to Dell WD15 docking station. Is any else can try at least first or second workaround?
Comment 18 BniceJada 2020-03-12 05:47:10 UTC
(In reply to RussianNeuroMancer from comment #17)
> So if it's same issue there is at least several workarounds: 
> 
> "usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link)
> Install tlp and change USB_BLACKLIST option in /etc/default/tlp to
> "0bda:8153" (from second askubuntu link)
> Patch /drivers/usb/core/quirks.c with following line (mentioned in tlp
> bugreport)
> { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

I can confirm that blacklisting "0bda:8153" for USB_BLACKLIST in my tlp.conf seems to work fine for me. Prior to this change I lost network connection each night and now I have connection straight for the last two nights (three days)
Comment 19 Peter 2020-03-13 10:20:51 UTC
(In reply to RussianNeuroMancer from comment #17)

> "usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link)

I can confirm that adding "usbcore.quirks=0bda:8153:k" to kernel boot options worked for me.
Comment 20 Hans de Goede 2020-03-13 11:04:25 UTC
So reading through this bug report, the solution, or at least a workaround would seem to be to add USB_QUIRK_NO_LPM entries for the troublesome rtl8152 / rtl8153 based ethernet adapters to drivers/usb/core/quirks.c. There actually already is at least one line in there for a dock with a r8153 nic:

        /* Microsoft Surface Dock Ethernet (RTL8153 GigE) */
        { USB_DEVICE(0x045e, 0x07c6), .driver_info = USB_QUIRK_NO_LPM },

There is mention of several docks here; but upon checking various logs, they all seem to use the generic realtek usb-id for the RTL8153 GigE NIC.

So it seems that the solution is adding the following lines to: drivers/usb/core/quirks.c :

        /* Generic RTL8153 GigE adapters */
        { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

I will submit a patch upstream for this.
Comment 21 Peter 2020-03-13 11:37:49 UTC
Unfortunately I was to enthusiastic about this. 
I wrote my comment after 1 day of working and 1 night of downloading huge amount of big data without problems. But after that using icecream distributed compiler daemon again crashed my connection. 

So it seams to be better but not solved for me.
Comment 22 Hans de Goede 2020-03-13 12:04:16 UTC
(In reply to Peter from comment #21)
> Unfortunately I was to enthusiastic about this. 
> I wrote my comment after 1 day of working and 1 night of downloading huge
> amount of big data without problems. But after that using icecream
> distributed compiler daemon again crashed my connection. 
> 
> So it seams to be better but not solved for me.

I'm sorry to hear that the issue is not 100% resolved. Still I've found enough other bug-reports where people are having success with this option when used with a RTL813 device, that I believe that it is worthwhile to submit a patch for this upstream, see. e.g. :
https://bugzilla.redhat.com/show_bug.cgi?id=1713657
Comment 23 RussianNeuroMancer 2020-03-13 13:29:52 UTC
> https://bugzilla.redhat.com/show_bug.cgi?id=1713657

I wonder why blacklist in tlp didn't help him, but usbcore.quirks does.
Comment 24 RussianNeuroMancer 2020-03-13 14:21:22 UTC
> But after that using icecream distributed compiler daemon again crashed my
> connection. 

> So it seams to be better but not solved for me.

Try this:

1. remove lines 737-737 here https://github.com/torvalds/linux/blob/0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733

2. remove lines 6900 and 6901 here https://github.com/torvalds/linux/blob/0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900

Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD.
Comment 25 Hans de Goede 2020-03-13 14:29:54 UTC
(In reply to RussianNeuroMancer from comment #24)
> Try this:
> 
> 1. remove lines 737-737 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733
> 
> 2. remove lines 6900 and 6901 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900
> 
> Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on
> HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD.

Hmm, so in essence that swaps the driver which is specifically made for the RTL8153 with the generic USB ethernet class driver.

Although it might be interesting to try that there are known issues with that.

E.g. with a Lenovo thunderbolt 3 gen 2 dock, when the laptop is turned off while connected to the dock, most of the dock is turned off, but the ethernet card still has power (for wake on lan I guess) and when using the cdc_ether driver, then the RTL8153 nick will start spamming the network as fast as it can after the laptop has been turned off, which in my case made my entire (wired) home network unusable (*).

So I actually send a patch upstream doing the opposite, adding the Lenovo specific USB-ids for the RTL8153 to the blacklist in cdc_ether and to the white/device-id list in r8152.c which solved the dock jamming my wired network after the laptop turned off.

*) I'm using a cheap unmanaged switch a better switch may have kept the network at least somewhat usable
Comment 26 Marcus Sundman 2020-04-30 02:11:35 UTC
(In reply to RussianNeuroMancer from comment #24)
> > But after that using icecream distributed compiler daemon again crashed my
> > connection. 
> 
> > So it seams to be better but not solved for me.
> 
> Try this:
> 
> 1. remove lines 737-737 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733
> 
> 2. remove lines 6900 and 6901 here
> https://github.com/torvalds/linux/blob/
> 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900
> 
> Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on
> HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD.

My 0bda:8153 also stops working with the cdc_ether driver (without it saying anything in syslog).
Blacklisting 0bda:8153 in TLP didn't work.
Adding 0bda:8153 quirks kernel parameter didn't work.
Using the newest r8152.53.56-2.12.0 driver from realtek didn't work.
Comment 27 Michiel Janssens 2020-05-01 08:38:23 UTC
Similar issues here with a Dell dock WD15, connected from Dell XPS 13 9360.
Since several months the wired connection from the dock 0bda:8153 dies, but the network stack isn't notified. A reboot after this waits endlessly on services to stop. Sometimes Gnome gui locks up shortly after logging back in the system and being presented with the issue. I have to do REISUB to get the system working again.
The issue doesn't appear while working on the the system, mostly when leaving it running by itself for a while. I haven't found a way to actually trigger it.

At the moment I'm running openSUSE Tumbleweed with kernel 5.6.6, issue is still happening.
I tried quircks, but no result.
For several days i'm now testing running it with usbcore.autosuspend=-1 and have left the system running for longer periods. The issue didn't happen so far.

Side note:
Commit 75d7676ead19b1fbb5e0ee934c9ccddcb666b68c doesn't seem to have fixed the message "Tx status -71" from the original bug reporter. (Tx timeout, in my case)
That still happens once in a while.
Comment 28 Marcus Sundman 2020-05-01 21:42:27 UTC
The usbcore.autosuspend=-1 kernel parameter doesn't resolve it for me.
Also, I can trigger the problem in seconds, simply by reading at gigabit speeds.
Comment 29 Hans de Goede 2020-05-04 13:05:08 UTC
At least for thise seeing issues with Dell's WD15 dock I think that trying something similar to this quirk might help:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b63e48fb50e1ca71db301ca9082befa6f16c55c4

To try this, first do:

lsusb -t

To find the Bus and Dev number of any USB hub(s) inside the dock.

Then do:

lsusb

And lookup the same Bus and Dev number to get the vendor- and product-id used for the hub, e.g. 0bda:0487

Then try booting with this added to your kernel commandline:

usbcore.quirks=0bda:0487:k

Replacing the 0bda:0487 with the <vend>:<prod> ids for your hub (from the lsusb output). If you want to try this on more then one USB device, you can specify the NO_LPM quirk for multiple USB devices like this:

usbcore.quirks=0bda:0487:k,0bda:0488:k

Please give this a try and see if that helps. Also note that the same thing can be used to set the NO_LPM quirk on the USB ethernet-chip itself if it has a different USB-id which is not yet in the kernel's quirks list.
Comment 30 Marcus Sundman 2020-05-04 22:41:26 UTC
(In reply to Hans de Goede from comment #29)
> Replacing the 0bda:0487 with the <vend>:<prod> ids for your hub (from the
> lsusb output). If you want to try this on more then one USB device, you can
> specify the NO_LPM quirk for multiple USB devices like this:
> 
> usbcore.quirks=0bda:0487:k,0bda:0488:k

This didn't work.

I have 3 devices:
Bus 003 Device 009: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 003 Device 008: ID 0bda:0411 Realtek Semiconductor Corp. 4-Port USB 3.0 Hub
Bus 002 Device 006: ID 0bda:5411 Realtek Semiconductor Corp. 4-Port USB 2.0 Hub

I added these kernel params:
usbcore.quirks=0bda:8153:k,0bda:5411:k,0bda:0411:k usbcore.autosuspend=-1

Still fails with either
Rx status -71
or
Tx status -71
after reading 50 MB/s over the network for a minute or few.

> Please give this a try and see if that helps. Also note that the same thing
> can be used to set the NO_LPM quirk on the USB ethernet-chip itself if it
> has a different USB-id which is not yet in the kernel's quirks list.

I'm not sure how to do that. As far as I can tell my ethernet chip is at 0bda:8153 (which in my case is at usb@3:1.4, which maps to device 9, which maps to 0bda:8153).
Comment 31 Hans de Goede 2020-05-05 13:11:56 UTC
@Marcus Sundman, right you have already set the flag for your ethernet usb controller by adding the 0bda:8153:k part to the quirks. So it seems that at least for you setting the NO_LPM flag does not help.

Does your dock have updateable firmware? If so you may want to try to update the firmware. The first generation thunderbolt docks from all vendors were notoriously buggy and the all need the latest firmware to work at least somewhat reliable. Getting the latest firmware is also strongly advised for people using Windows since there really were quite a few issues with these devices which are fixed with fw updates.

Yes, I wrote somewhat reliable, the best fix for thunderbolt dock issues often is getting a second generation or newer dock :(
Comment 32 Michiel Janssens 2020-05-05 14:23:03 UTC
(In reply to Hans de Goede from comment #29)
Thanks for posting this instruction!
I already had seen the commit for the WD19, but it wasn't clear how I should investigate that on my system.
The WD15 doc adds 2 usb busses with both a Microchip USB hub
I removed the usbcore.autosuspend=-1 parameter and will test for several days with usbcore.quirks=0424:5537:k, which is the hub which has the 0bda:8153 as child.
I will add attachments with my lsusb output.
Comment 33 Michiel Janssens 2020-05-05 14:27:36 UTC
Created attachment 288917 [details]
lsusb output mjanssens wd15 xps9360
Comment 34 Michiel Janssens 2020-05-06 10:41:55 UTC
It didn't take days to get results.

Just the hub where 0bda:8153 is child
usbcore.quirks=0424:5537:k
result: 0bda:8153 dies after a while, without log entry, needed REISUB to reboot

Both hubs which are added by connecting WD15
usbcore.quirks=0424:5537:k,0424:2137:k
result: 0bda:8153 dies after a while, without log entry, needed REISUB to reboot

So I'm back to using usbcore.autosuspend=-1.
Please advise if I missed something (or incorrect dev id) I could test.
Comment 35 Hans de Goede 2020-05-06 10:43:36 UTC
(In reply to Michiel Janssens from comment #34)
> It didn't take days to get results.
> 
> Just the hub where 0bda:8153 is child
> usbcore.quirks=0424:5537:k
> result: 0bda:8153 dies after a while, without log entry, needed REISUB to
> reboot
> 
> Both hubs which are added by connecting WD15
> usbcore.quirks=0424:5537:k,0424:2137:k
> result: 0bda:8153 dies after a while, without log entry, needed REISUB to
> reboot
> 
> So I'm back to using usbcore.autosuspend=-1.
> Please advise if I missed something (or incorrect dev id) I could test.

There have been a lot of firmware updates for the wd15, do you have these all applied?
Comment 36 Michiel Janssens 2020-05-06 11:32:54 UTC
(In reply to Hans de Goede from comment #35)

> There have been a lot of firmware updates for the wd15, do you have these
> all applied?

Good catch, i'm on 1.0.4 according to fwupdmgr. Latest is 1.0.6 on the dell site.
Unfortunately the wd15 appears not (yet) to be fully supported via fwupdmgr so Windows is the only option, sigh. I will try to update, test again and report.
Bios is current by the way.
Comment 37 Marcus Sundman 2020-05-07 00:11:04 UTC
(In reply to Hans de Goede from comment #31)
> @Marcus Sundman, right you have already set the flag for your ethernet usb
> controller by adding the 0bda:8153:k part to the quirks. So it seems that at
> least for you setting the NO_LPM flag does not help.
I also tried without usbcore.autosuspend=-1 but that also didn't help.

> Does your dock have updateable firmware? If so you may want to try to update
> the firmware.
It's a LogiLink UA0173A, and it doesn't seem to have any firmware available (only newer drivers, which I already tried).
Comment 38 RussianNeuroMancer 2020-05-07 03:49:50 UTC
Just for the record, I was able to reproduce this issue even on NanoPi-M1 (Allwinner H3) with Linux 5.4.32 attached to Belkin USB-C Express Dock 3.1 HD F4U093 (did this for convenience, just to quickly get working keyboard and mouse without reattaching keyboard and mouse cables from dock to board). Quirk was included in 5.4 since 5.4.28 so it already applied. Unfortunately, I didn't expected this issue to be reproducible with NanoPi-M1 board, so I didn't saved lsusb -t before/after this happened.
Comment 39 Michiel Janssens 2020-05-07 10:30:09 UTC
(In reply to Michiel Janssens from comment #36)

I ran several updaters from Dell under Windows. My WD15 firmware components (4 of them) were already current, apparently the main version is some sort of wrapper. So no updates to Bios or WD15 firmware are possible.
At the moment I run kernel 5.6.8, so I ran all tests again:
- with or without usbcore.quirks=0424:5537:k,0424:2137:k the nic dies after a while
- with usbcore.autosuspend=-1 the nic remains alive
Comment 40 Marcus Sundman 2020-06-13 02:01:35 UTC
Still the same problem with Realtek's new driver, r8152.53.56-2.13.0, on ubuntu's 5.4.0-37-generic with usbcore.autosuspend=-1.

It fails with 'Tx status -71' or 'Rx status -71':
> net_ratelimit: 22 callbacks suppressed
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71
> ...

But sometimes that quickly turns into this:
> xhci_hcd 0000:03:00.0: WARN: TRB error for slot 3 ep 3 on endpoint
> r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -84
> xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared
> r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22
> xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared
> r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22
> xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared
> r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22
> ...

I've also tried adjusting the nic's Rx ring size from 100 to 20 or 2000, but still the same crash seconds after starting a gigabit speed download.
Comment 41 Nikolay Kichukov 2020-06-16 15:23:32 UTC
GNU/Gentoo, 64bit here, kernel 5.7.2, same problem on Lenovo 40AS USB-C dock:

None of the suggested "workarounds" helped, here is the lsusb tree:

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
        |__ Port 1: Dev 4, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
        |__ Port 3: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M

And the patch applied to the kernel(the ids differ):
cat /etc/portage/patches/sys-kernel/gentoo-sources-5.7.2/lenovo-usbc-dock-rtl-ethernet-quirk.patch 
--- a/drivers/usb/core/quirks.c	2020-06-01 01:49:15.000000000 +0200
+++ b/drivers/usb/core/quirks.c	2020-06-15 12:01:39.028377907 +0200
@@ -384,6 +384,11 @@
 	/* Generic RTL8153 based ethernet adapters */
 	{ USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },
 
+	/* Lenovo USB-C Ethernet RTL8153 based ethernet adapters */
+       { USB_DEVICE(0x1d6b, 0x0003), .driver_info = USB_QUIRK_NO_LPM },
+	{ USB_DEVICE(0x17ef, 0xa391), .driver_info = USB_QUIRK_NO_LPM },
+	{ USB_DEVICE(0x17ef, 0xa387), .driver_info = USB_QUIRK_NO_LPM },
+
 	/* Action Semiconductor flash disk */
 	{ USB_DEVICE(0x10d6, 0x2200), .driver_info =
 			USB_QUIRK_STRING_FETCH_255 },

and booting with:
usbcore.quirks=17ef:a387:k,17ef:a391:k,1d6b:0003:k

or 

usbcore.autosuspend=-1

does not help.

Same problem happens on laptops connected to this lenovo docks running windows OSes.
Comment 42 Peter Ries 2020-07-30 09:08:57 UTC
Hi, I'm glad I found this (old) bug. It affects me as well and still is up to date.

I'm running Kubuntu 20.04 with Mainline Kernel 5.7.9 and tried 5.8.0-rc7. My brand new Thinkpad T14 (AMD Version, 32GB RAM) is connected to a Thinkpad USB-C dock Gen 2. Laptop and Dock run latest firmware.

**Testcase** is copying a large video file (~ 2GB) from Laptop to NAS.
**Error** lots of "r8152 5-1.1:1.0 enx482ae36d721f: Tx status -71" in dmesg log after a few seconds. Connection lost. Need to use either WiFi or reconnect dock.

To find the culprit:
Copying via WiFi connection or Laptops LAN port (r8169) works.
Using another dock (DELL docking station) works.
Connection a DELL Windows Laptop to Lenovo dock works
Hardware must be OK then!

So I suppose the cause lies in r8152 driver.

As I can reproduce this "on demand" I could provide more information/logs if you tell me what's needed and how to do it ;)

BR
Peter
Comment 43 Peter Ries 2020-08-07 06:24:43 UTC
update to my previous comment...

Kubuntu's Network Manager obviously set up the USB Network Adapter with "Link Negotiation: ignore" for whatever reason.

I changed it to "Auto" and now it is more stable - even solid. I just pumped a 20G VM Diskimage to NAS and no error occurred. I'm optimistic that this setting solved my problem.

Time will tell ...
Comment 44 Peter Ries 2020-08-08 17:20:53 UTC
(In reply to Peter Ries from comment #43)
> update to my previous comment...
> 
> Kubuntu's Network Manager obviously set up the USB Network Adapter with
> "Link Negotiation: ignore" for whatever reason.
> 
> I changed it to "Auto" and now it is more stable - even solid. I just pumped
> a 20G VM Diskimage to NAS and no error occurred. I'm optimistic that this
> setting solved my problem.
> 
> Time will tell ...

update: no problems anymore. stable for 1,5 days constant usage :)
Comment 45 smihael 2020-08-17 18:25:56 UTC
The same problem occurs with Anker Ethernet Adapter combined with USB hub (https://www.anker.com/products/variant/aluminum-3port-usb-30-and-ethernet-hub/A7514041) on 4.15.0-38-generic kernel in KDE neon (based on Ubuntu 18.04).

dmesg outputs "r8152 4-3.3:1.0 eth0: Tx status -71" for multiple times.

lsusb -t output

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
    |__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 3: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M

lsusb output

Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. 
Bus 004 Device 002: ID 2109:0812 VIA Labs, Inc. VL812 Hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

None of the workarounds involving changing boot paramters suggested above helped (usbcore.autosuspend=-1; usbcore.quirks=2109:0812:k,0bda:8153:k). Actually with usbcore.quirks I couldn't even boot as the process hung at "Switching to clocksource tsc" error. 

The adapter works flawlessly with 4.14.x kernel.

Interestingly, the adapter works fine when certain devices (e.g. wireless mouse's receiver) are connected in the USB hub.
Comment 46 Weber K. 2020-09-27 05:07:54 UTC
Hi!

I have the same error, but only in USB 3.0 port.

My hub doesn't support LPM, so I think I have other problem (usbcore.quirks=0bda:8153:k).

I've tried usbcore.quirks=0bda:8153:j and I've got no -71 error.

HTH

Weber Kai
Comment 47 Weber K. 2020-09-27 08:09:35 UTC
Sorry fellows, please ignore my previous comment.
After 40 minutes the network stopped.
Comment 48 Weber K. 2020-09-28 02:23:47 UTC
Hi fellows,

I think read_bulk_callback should treat EPROTO error code...
But I don't know exactly how... But adding EPROTO to ESHUTDOWN the driver becomes more stable...

Thanks
HTH
Comment 49 Pekka Järveläinen 2020-10-16 13:02:22 UTC
Hi, before my crash I have many
 4478.945334] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[ 4602.649511] CPU2: Core temperature above threshold, cpu clock throttled (total events 
[ 4626.376508] CPU5: Core temperature/speed normal
[ 4681.081753] perf: interrupt took too long (3202 > 3131), lowering kernel.perf_event_max_sample_rate to 62000
[ 4895.218338] perf: interrupt took too long (4065 > 4002), lowering kernel.perf_event_max_sample_rate to 49000

Can they be part of the problem?

I have had three crashes all during zoom meeting which means high traffic, videos going and coming.

Pekka
Comment 50 Emtee 2020-11-11 15:39:23 UTC
I have the same issue with three

Bus 002 Device 004: ID 13b1:0041 Linksys Gigabit Ethernet Adapter
Bus 002 Device 003: ID 13b1:0041 Linksys Gigabit Ethernet Adapter
Bus 002 Device 002: ID 13b1:0041 Linksys Gigabit Ethernet Adapter

devices on platform:

Linux debian 4.19.155-redundant #1 SMP PREEMPT Mon Nov 9 01:54:50 CET 2020 x86_64 GNU/Linux
debian
    description: Mini PC
    product: NUC7CJYS
    vendor: Intel(R) Client Systems
    version: J67993-403
    serial: G6JY936009MK
    width: 64 bits
    capabilities: smbios-3.1.1 dmi-3.1.1 smp vsyscall32
    configuration: boot=normal chassis=mini family=JY uuid=8A72A51B-79C4-85CA-66A9-1C697A088052
  *-core
       description: Motherboard
       product: NUC7JYB
       vendor: Intel Corporation
       physical id: 0
       version: J67970-402
       serial: GEJY93500752
       slot: Default string
     *-firmware
          description: BIOS
          vendor: Intel Corp.
          physical id: 0
          version: JYGLKCPX.86A.0057.2020.1020.1637

Latest firmware.

It looks to be an USB enumeration issue here. When rebooting the device, the problem always shows up after a while, when physically turning the unit on and off, the issue no longer appears and it can run stable forever.

I basically made a habit of shutting down entirely when I need to reboot.
Comment 51 Yorick de Wid 2020-11-27 21:50:43 UTC
Dell Inc. XPS 15 9570/0HWTMH, BIOS 1.17.1 07/09/2020

Kernel: 5.8.0-7630-generic
Chipset: RTL8153b-2 (version 9)


After a full day of debugging the r8152 driver on an USB-c dock 
it does look like the RTL 8152 chipset is unable to keep pace with the transmission queue (tx_queue). The driver keeps sending URB blocks towards the chipset but at some point the chipset will no longer fire status interrupts. This stalls the write_bulk_callback which in turn sets of the netdev timeout watchdog. The timeout then tries to reset the USB device but the RTL chipset is no longer responding. Power cycle the USB port is the only option to reset the chip. The behavior is quite deterministic. When sending bulk packets over the 1000TX interface (Gigabit that is) it doesn't take long before the watchdog reports queue congestion. If I simulate a lockup by deliberately slowing down the tasklet, the timeout ans TX status -2/-71 is almost triggered immediately.

The netdev timeout resetting the USB device seems a bit silly. There is no lock held when the USB device is reset, and the timeout handler is invoked every (5 * 1000Hz). A better solution would be to power cycle the USB port.

Because this bug as been away for a while and now reappears it could be a firmware issue. The most recent firmware was updated a little more than a year ago. 

Y.
Comment 52 Emtee 2020-11-28 01:14:05 UTC
All my issues with the USB Adapters have disappeared after I used these kernel parameters;

usbcore.old_scheme_first=1 usbcore.use_both_schemes=0 usbcore.autosuspend=-1 pci_aspm=off

Not sure which of them, or a combination of actually does it.

Been running for a week 24/7 with them now.
Comment 53 Peter Ries 2020-12-13 07:20:04 UTC
Strangely this issue reappeared some days ago. It was gone after I set link negotiation in network manager to "auto" (kubuntu st it to ignore for some reason) and worked around three months flawlessly.

A week ago sending a 4 gb to my NAS the error reappeared, but was gone after I set manual negotiation in network manager 1 Gbit/full duplex. It worked the again for some days but yesterday with a lot of traffic in my LAN a network sync immediately "killed" the USB-C dock networkinterface.

I didn't change anything beside regular apt updates and installing the latest mainline kernel 5.9.x branch - don't know what happened.

May be I'll give autosuspend a try...

Yorick de Wid's analysis https://bugzilla.kernel.org/show_bug.cgi?id=198931#c51 seems quite promising to be the root cause.
Comment 54 Peter Ries 2020-12-13 08:12:04 UTC
usbcore.autosuspend=-1 

didn't help with my Lenovo USB-C dock gen 2...
Comment 55 Yorick de Wid 2020-12-13 11:09:57 UTC
Peter Ries can you specify which firmware version is being loaded? The driver writes the version to dmesg on load. For example; RTL8153b-2 (version 9).

Its hard to track exact firmware versions. If it happens to be a firmware issue we might be able to downgrade. Haven't tried an older version as yet.

SHA1 checksum on 5.8.0-7630-generic (which should be the latest):

3008299c2fee3f5a5e3b2d8e16919d230204542c  rtl8105e-1.fw
c6fcde458093a4ef60b534feacc9dd564098ff9b  rtl8106e-1.fw
221d833a22040e4014bf34b31481712180b77594  rtl8106e-2.fw
9d390948663bf6885d86586588c428186d5dff7e  rtl8107e-1.fw
a065c863146d8216d8cc84a6b754968613848b32  rtl8107e-2.fw
5da573149e80587668e1d4bcbcbced184e51ac03  rtl8125a-3.fw
21c7c428112bd9e24713192302513a95ba41ed5e  rtl8125b-1.fw
a588787b9ebeec9cbfdbd46612a63f53ad5b1d62  rtl8125b-2.fw
3d87c04720c4b4709e4673707c4c104e28be1c1b  rtl8153a-2.fw
e467098b1cbb04022805cd777eb66585022524a6  rtl8153a-3.fw
cce086e885091c348bf521924f306f240f8dcc08  rtl8153a-4.fw
2b268656c6cd7d03dc47ca8eaec2f31ca668c53c  rtl8153b-2.fw
9872f469227555937d4063b1420a0ff23790da59  rtl8168d-1.fw
0ab15a6c812fafc38dd896972fa5fcb46cca1068  rtl8168d-2.fw
61fdc2ba78caf36a6551554f089d1c964159d247  rtl8168e-1.fw
60e16292fd4eb90138a3e2061305030b4993de79  rtl8168e-2.fw
6c3721e8e5d19f62b3da13519e2496ffabb3ffb4  rtl8168e-3.fw
c7c01066ddfc0215ad8977af5a3cd654b6f7ed10  rtl8168f-1.fw
24bf10a38bcb1b4652f71653e41d1a444f303c3f  rtl8168f-2.fw
bf3495d9233f3abaceab194e732bdb9c350a68a5  rtl8168fp-3.fw
12ed6246c8c4d6344d4840acc11de30d7c3ff1ec  rtl8168g-1.fw
0ead82c11625a677600a589cc4590722ea2f6de7  rtl8168g-2.fw
36e09340d99f9290fd9cc62c48c11b8112558b8a  rtl8168g-3.fw
c012f50c24ef64dcc48615d584709f8094df6af7  rtl8168h-1.fw
439686559c1fa53820c5e740b71566ee874171ba  rtl8168h-2.fw
c4fda34e80b4124377a7554636327fd03e697ee2  rtl8402-1.fw
38a89b6f1b57795a470675184733aff335cb41ae  rtl8411-1.fw
57c4e659337aacc88a52e9940bc0246a0f02e47b  rtl8411-2.fw
Comment 56 Peter Ries 2020-12-13 15:59:59 UTC
Hi Yorick,

I found this line:

[   16.848492] r8152 5-1.1:1.0: load rtl8153b-2 v1 10/23/19 successfully


Is it what you're looking for? Otherwise let me know how to find out.

Thanks for investigating!
Comment 57 Peter Ries 2020-12-13 17:15:29 UTC
and this one, too:

[   16.883982] r8152 5-1.1:1.0 eth0: v1.11.11
Comment 58 Yorick de Wid 2020-12-13 19:38:19 UTC
> I found this line:
> 
> [   16.848492] r8152 5-1.1:1.0: load rtl8153b-2 v1 10/23/19 successfully

Thats the one. The v9 chipset is also the version im running. Its too early to conclude but it strokes with the presumption that the rtl8153b-2 firmware contains a bug.
Comment 59 Patrick Decat 2020-12-16 08:36:48 UTC
Hi, I had this crash with my Dell XPS 9350 and a Dell USB 3.0 Ethernet adapter for the first time yesterday:

# lsusb | grep -i RTL8153
Bus 002 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

# journalctl -b -1 | grep "rtl8153"
Dec 15 15:36:40 patrickxps kernel: r8152 2-2:1.0: load rtl8153a-3 v2 02/07/20 successfully

# uname -a
Linux patrickxps 5.9.14-zen1-1-zen #1 ZEN SMP PREEMPT Sat, 12 Dec 2020 14:36:44 +0000 x86_64 GNU/Linux


Dec 15 18:18:50 patrickxps kernel: ------------[ cut here ]------------
Dec 15 18:18:50 patrickxps kernel: NETDEV WATCHDOG: enp0s20f0u2 (r8152): transmit queue 0 timed out
Dec 15 18:18:50 patrickxps kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x26b/0x280
Dec 15 18:18:50 patrickxps kernel: Modules linked in: rfcomm ccm cmac snd_hda_codec_hdmi algif_hash algif_skcipher af_alg bnep snd_hda_codec_realtek snd_hda_codec_generic cdc_ether usb>
Dec 15 18:18:50 patrickxps kernel:  intel_uncore psmouse soundcore rc_core processor_thermal_device i2c_i801 input_leds pcspkr tpm_tis i2c_smbus mei_me rfkill intel_gtt intel_rapl_comm>
Dec 15 18:18:50 patrickxps kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: P          IOE     5.9.14-zen1-1-zen #1
Dec 15 18:18:50 patrickxps kernel: Hardware name: Dell Inc. XPS 13 9350/0PWNCR, BIOS 1.13.0 02/10/2020
Dec 15 18:18:50 patrickxps kernel: RIP: 0010:dev_watchdog+0x26b/0x280
Dec 15 18:18:50 patrickxps kernel: Code: fa 1e 64 ff eb 87 4c 89 f7 c6 05 de c3 f7 00 01 e8 8a 02 fa ff 44 89 e9 4c 89 f6 48 c7 c7 b8 c2 df 86 48 89 c2 e8 6e 31 18 00 <0f> 0b e9 65 ff >
Dec 15 18:18:50 patrickxps kernel: RSP: 0018:ffffa234c019ceb0 EFLAGS: 00010282
Dec 15 18:18:50 patrickxps kernel: RAX: 0000000000000000 RBX: ffff89334d358400 RCX: 0000000000000000
Dec 15 18:18:50 patrickxps kernel: RDX: 0000000000000103 RSI: 0000000000000027 RDI: 00000000ffffffff
Dec 15 18:18:50 patrickxps kernel: RBP: ffff8933422da3dc R08: 0000000000000452 R09: 0000000000000004
Dec 15 18:18:50 patrickxps kernel: R10: 0000000000000001 R11: 0000000000007434 R12: ffff8933422da480
Dec 15 18:18:50 patrickxps kernel: R13: 0000000000000000 R14: ffff8933422da000 R15: ffff89334d358480
Dec 15 18:18:50 patrickxps kernel: FS:  0000000000000000(0000) GS:ffff893376d80000(0000) knlGS:0000000000000000
Dec 15 18:18:50 patrickxps kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 15 18:18:50 patrickxps kernel: CR2: 00007f7d5ef1b4c0 CR3: 0000000211fcc004 CR4: 00000000003706e0
Dec 15 18:18:50 patrickxps kernel: Call Trace:
Dec 15 18:18:50 patrickxps kernel:  <IRQ>
Dec 15 18:18:50 patrickxps kernel:  ? qdisc_put_unlocked+0x30/0x30
Dec 15 18:18:50 patrickxps kernel:  ? qdisc_put_unlocked+0x30/0x30
Dec 15 18:18:50 patrickxps kernel:  call_timer_fn+0x2d/0x150
Dec 15 18:18:50 patrickxps kernel:  run_timer_softirq+0x8e7/0xb50
Dec 15 18:18:50 patrickxps kernel:  __do_softirq+0xff/0x340
Dec 15 18:18:50 patrickxps kernel:  asm_call_irq_on_stack+0x12/0x20
Dec 15 18:18:50 patrickxps kernel:  </IRQ>
Dec 15 18:18:50 patrickxps kernel:  do_softirq_own_stack+0x5d/0x80
Dec 15 18:18:50 patrickxps kernel:  irq_exit_rcu+0xd2/0x120
Dec 15 18:18:50 patrickxps kernel:  sysvec_apic_timer_interrupt+0x47/0xe0
Dec 15 18:18:50 patrickxps kernel:  asm_sysvec_apic_timer_interrupt+0x12/0x20
Dec 15 18:18:50 patrickxps kernel: RIP: 0010:cpuidle_enter_state+0xdf/0x7f0
Dec 15 18:18:50 patrickxps kernel: Code: e8 16 87 7f ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 8c 05 00 00 31 ff e8 48 e3 86 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 >
Dec 15 18:18:50 patrickxps kernel: RSP: 0018:ffffa234c00dfea0 EFLAGS: 00000246
Dec 15 18:18:50 patrickxps kernel: RAX: ffff893376d80000 RBX: ffff893376db6800 RCX: 000000000000001f
Dec 15 18:18:50 patrickxps kernel: RDX: 0000000000000000 RSI: ffffffff86d4f968 RDI: ffffffff86d5a1f9
Dec 15 18:18:50 patrickxps kernel: RBP: ffffffff872cef60 R08: 000008dd96ed4562 R09: 0000000000000008
Dec 15 18:18:50 patrickxps kernel: R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000002
Dec 15 18:18:50 patrickxps kernel: R13: ffffffff872cf048 R14: 0000000000000002 R15: 000008dd96ed4562
Dec 15 18:18:50 patrickxps kernel:  cpuidle_enter+0x29/0x40
Dec 15 18:18:50 patrickxps kernel:  do_idle+0x1ed/0x280
Dec 15 18:18:50 patrickxps kernel:  cpu_startup_entry+0x19/0x20
Dec 15 18:18:50 patrickxps kernel:  secondary_startup_64+0xb6/0xc0
Dec 15 18:18:50 patrickxps kernel: ---[ end trace 32ac432b0caddcb1 ]---
Dec 15 18:18:50 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:18:56 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:02 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:07 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:12 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout
Dec 15 18:19:19 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout

FWIW, I used to have another way more frequent crash with r8152 and another model of adapter before, see https://bugzilla.kernel.org/show_bug.cgi?id=200977#c19
Comment 60 Yorick de Wid 2020-12-16 09:01:58 UTC
Patrick Decat whats the checksum of rtl8153a-3.fw?

sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw
Comment 61 Patrick Decat 2020-12-16 15:01:33 UTC
(In reply to Yorick de Wid from comment #60)
> Patrick Decat whats the checksum of rtl8153a-3.fw?
> 
> sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw

Here you go:

sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw
e467098b1cbb04022805cd777eb66585022524a6  /usr/lib/firmware/rtl_nic/rtl8153a-3.fw
Comment 62 Peter H 2021-02-22 18:45:33 UTC
Hi all, glad to see I'm not the only one. I also have the Lenovo USB-C Dock gen 2, and am connecting to it from a Surface Pro 7 (running Arch with the surface-linux kernel 5.10.16), and am similarly loading the "rtl8153b-2 v1 10/23/19" firmware. 


An interesting thing that I learned recently is that my ethernet works perfectly if I boot without an external screen plugged into the dock. As soon as I plug one in, the "Tx status -71" messages return and the ethernet dies. I don't know if that helps at all, maybe someone else has found this too? I can also confirm the ethernet and external display work fine together running Windows.
Comment 63 Danny O'Brien 2021-02-23 04:37:17 UTC
Just as an FYI -- I upgraded to the latest realtek firmware (in Debian's sid/unstable distribution, version 20210208-1 <https://packages.debian.org/sid/firmware-realtek>) -- this fixed the problem for me. sha1sum is cce086e885091c348bf521924f306f240f8dcc08 , in /usr/lib/firmware/rtl_nic/rtl8153a-4.fw
Comment 64 Danny O'Brien 2021-02-23 19:47:05 UTC
I spoke too soon, the problem persists!
Comment 65 Yorick de Wid 2021-02-23 22:41:27 UTC
(In reply to Danny O'Brien from comment #63)
> Just as an FYI -- I upgraded to the latest realtek firmware (in Debian's
> sid/unstable distribution, version 20210208-1
> <https://packages.debian.org/sid/firmware-realtek>) -- this fixed the
> problem for me. sha1sum is cce086e885091c348bf521924f306f240f8dcc08 , in
> /usr/lib/firmware/rtl_nic/rtl8153a-4.fw

The firmware hasn't been updated for over a year, see official kernel repo. 

Because the lockup is likely caused by a data race in the firmware, its to be expected that higher interrupt count (additional peripherals) will trigger the issue sooner.

Just to give a little update, Realtek is currently testing against hardware with known issues.
Comment 66 Tim 2021-03-18 11:14:43 UTC
I am suffering from this problem too, using a tp-link UE330, which has the Realtek 8153, is a USB A 3.0 device with a hub and gigabit ethernet. I have had no issues with the hub (it keeps even working after the ethernet stops working), but the gigabit ethernet just stops working completely after some time. Sometimes it can go for a couple days, one time it stoped working after 10 minutes.

I am running the linux kernel 5.4.103 and modinfo says it has the v1.10.11 of the r8152 driver. I am going to update to the v2.14.0 that is in the Realtek webpage (https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-usb-3-0-software), but if like some of you say here the issue is a firmware issue I am not that hopeful.

How can I check which firmware my device is loading? From reading this bug page I have seen there are different versions of the firmware at /usr/lib/firmware/rtl_nic . In my case for rtl8153x-x.fw there is a-2, a-3, a-4 and b-2.

@Yorick de Wid, good to know Realtek is aware of the problem. What can we realistically expect from them? The lastest drivers, which I am guessing also includes de firmware, is from 19/10/2020 and this chipset has been out for years. You'd assume they have had time to iron everything out.
Comment 67 Yorick de Wid 2021-03-19 09:23:36 UTC
dmesg should log the requested firmware version by the driver.

Hayes pushed a few patches upstream last month. These changes include power flow regulation of the driver and URB speeds. Those are driver level changes and preliminary tests show they are working on recent kernels. I've been running iperf for a few days and nothing has broken down nor did I see any timing issues. 

I can't speak to all problems here but a combination of hardware and this chipset may be resolved by these patches. I'd expect Ubuntu will backport these drivers to LRS as well.
Comment 68 Nikolay Kichukov 2021-03-19 10:27:52 UTC
Hi Yorick de Wid,

Can you refer me to those 'few patches' from last month? I just want to verify if the kernel I run has them, 5.11.7 here, and perhaps give this 'hardware' its last chance this time. I have been trying to make it work for too long now, without any success.

Thanks.
Comment 70 Nikolay Kichukov 2021-03-19 12:04:49 UTC
Thanks, both patches are not yet in the latest stable kernel: 5.11.7, so I may backport them and give it a try. Thanks for sharing!
Comment 71 Tim 2021-03-20 06:45:50 UTC
@Yorick de Wid

From my ignorance on how the kernel and drivers work, I can see those two commits are for the r8152 driver only, it does not touch any other part of the code. Could we just add those two commits to the r8152 driver and compile it as a module instead of recompiling the whole netdev subsystem? (If so, it also would be nice if Realtek publishes it even if it is as beta driver in their webpage).

I have updated the r8152 driver with the v2.14.0 version from the Realtek website. The ethernet is still randomly stoping to work. I will try to see if I can apply the patches to the v2.14.0 driver. If it compiles without errors I will test running with the patched drivers. Is there a better way of doing this? Maybe there are some more changes in the kernel branch of r8152 and I should get the code of the driver from there? Any advice and/or instructions would be welcomed.

I really do not understand how Realtek releases a product like this. I have this gigabit ethernet device connected to a router with only fast ethernet (100Mb/s), so it is going at less than 10% of its speed and it is still hanging daily. This is not some race condition that happens when some specific or weird heavy load goes through the device, this is happening with normal very low load traffic. How was this not caught up during development or testing? Did they even try the device they are selling? Now I understand why everywhere you read people badmouth Realtek products and recommend getting Intel NIC's.
Comment 72 Yorick de Wid 2021-03-20 11:49:15 UTC
Rarely ever does anyone compile the kernel from a subsystem. Just compile the r8152 module like any other KBuild module.

There are many variants of the RTL815x chipset, some made by Realtek, some not. The chipset family is employed in a large variety of products, and minor PCB design flaws like trace length can cause timing issues which are notoriously hard to diagnose. Even though this *does* feel like a firmware/driver issue, it's impossible to account for all the ways this microcontroller is being used.
Comment 73 Tim 2021-03-20 15:01:02 UTC
@Yorick de Wid

I managed to compile and install the driver from the Realtek webpage, but I have no idea how to download only the r8152 part out of the kernel. Could you give some link or instructions so the ones who want to give it a try like me can? Thanks.
Comment 74 Tim 2021-03-31 04:51:19 UTC
I have finally managed to compile the kernel of my system adding the r8152 upstream patches that were missing. The device, tp-link UE330, is still hanging.

I have compiled 5.11.8 . This already included this recommended patch https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=7a0ae61acde2cebd69665837170405eced86a6c7 , and all the patches made to r8152 in 2020 as far as I can see. The only r8152 patches that are missing are the two from 2021, again as far as I can see, this two: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a08c0d309d8c078d22717d815cf9853f6f2c07bd, https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=80fd850b31f09263ad175b2f640d5c5c6f76ed41 (the last one is the other recommended at #69).

So with 5.11.8 compiled with the two r8152 patches from 2021, the device is still hanging. No progress.
Comment 75 Daniel Squires 2021-04-11 10:11:44 UTC
I have a load of USB3 Ethernet dongles using this driver. Reported by lsusb as follows:

Bus 002 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

I experience this problem (connection drops and dmesg shows repeated TX Status -71 message) on various systems, Dell laptop Ubuntu 18.04 and 20.04, Gentoo and raspberry pi. On all systems the problem either only happens when plugged into USB3 or is far worse when plugged into USB3, obviously USB2 does now allow the full bandwidth to be achieved so is not a good solution.

running iperf will generally cause the problem within a very short time frame. I am using 3 of them with a raspberry pi 4 and have been having this problem and started to investigate. I noticed it was failing to load the firmware file (rtl8153a-4.fw) and I guess falling back on a default internal firmware. The fw file turns out to be missing in the Raspbian linux-firmware package. Having manually copied it from Ubuntu 20.04 it now loads successfully for 2 of the connected adapter but fails on the 3rd:



pi@router:~ $ lsusb | grep 8153
Bus 002 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 001 Device 005: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 001 Device 007: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter

pi@router:~ $ dmesg | grep r8152
[    2.104683] usbcore: registered new interface driver r8152
[    3.683627] r8152 2-1.3:1.0: Direct firmware load for rtl_nic/rtl8153a-4.fw failed with error -2
[    3.683675] r8152 2-1.3:1.0: unable to load firmware patch rtl_nic/rtl8153a-4.fw (-2)
[    3.724435] r8152 2-1.3:1.0 eth0: v1.11.11
[    4.574454] r8152 1-1.4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[    4.614895] r8152 1-1.4:1.0 eth1: v1.11.11
[    4.903594] r8152 1-1.3.4:1.0: load rtl8153a-4 v2 02/07/20 successfully
[    4.945192] r8152 1-1.3.4:1.0 eth2: v1.11.11
[    8.767705] r8152 1-1.3.4:1.0 wan: renamed from eth2
[    8.806489] r8152 1-1.4:1.0 voip: renamed from eth1
[    8.865195] r8152 2-1.3:1.0 house: renamed from eth0
[   16.461782] r8152 2-1.3:1.0 house: carrier on
[   16.545744] r8152 2-1.3:1.0 house: Promiscuous mode enabled
[   17.079827] r8152 1-1.3.4:1.0 wan: carrier on
[   28.923646] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled
[ 4664.551927] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled
[ 4664.583404] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled
[ 4664.660467] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled


Any ideas why it fails to load for one of the three above?
Comment 76 Daniel Squires 2021-04-11 10:53:30 UTC
Seems like fw load fails when plugged into a USB3 port, but succeeds when plugged into a USB2 one!
Comment 77 Vladyslav Shtabovenko 2021-04-22 13:40:02 UTC
Since I'm also suffering from these issues (Dell WD15 + Thinkpad T14 AMD), I'm wondering if someone managed to unfreeze the machine in the situation described by
Michiel Janssens




> Similar issues here with a Dell dock WD15, connected from Dell XPS 13 9360.
> Since several months the wired connection from the dock 0bda:8153 dies, but
> the > network stack isn't notified. A reboot after this waits endlessly on
> services to > stop. Sometimes Gnome gui locks up shortly after logging back
> in the system and > being presented with the issue. I have to do REISUB to
> get the system working  again.




I mean, once the connection is lost, the system will start hanging on all operations that are somehow related to the networking, so that even sudo won't work (su is possible, though). Restarting/killing NetworkManager or dhclient doesn't succeed either. So essentially you have your graphical session (Gnome 3 in my case) where the mouse cursor still moves but everything else doesn't react. Trying to "save" the system from a virtual console ultimately fails b/c everything network-related hangs. At the end of the day REISUB is the only thing you can do - meaning that you will loose all your unsaved work.




Perhaps someone has found a trick how one could avoid REISUB in such a situation? It's really annoying since you cannot reliably predict when such a freeze would happen.
Comment 78 Tim 2021-04-26 10:53:17 UTC
So now that I have been using the kernel with the lastest patches for a month, I want to report some experiences that might be useful for users and even maybe for the developers to finally fix this issue. Again, my device is the TP-Link U330.

Right now, the device seems to only hang on reboot. Everytime I reboot the device hangs several times in a very short time, up to 11 times once in 5 minutes, until it does not hang anymore and then it can work without hanging for weeks (I think 3 weeks is as far as I tested it until I had to reboot again).

For users, if you install ifupdown2 you can recover the device without rebooting or disconnecting or even disrupting the network. Once you have ifupdown2 installed in your linux system and the device is down (you can check with the command 'ip addr'), use these two commands to recover he device:

- ifup enxxxxx
[enxxxxx is the name of the device in your system]
- ifreload -a
[-a is probably not needed but it is very quick and does not disrupt the network so I did not looked further]

This will get your device working up again.

I created a script that does this and tried to execute it everytime the device goes down. For some reason when the device goes down because of the bug the system does not run the script. The script gets triggered when I put the device down manually or when rebooting, but for some unknown reason it does not get triggered when it goes down because of the bug. So I have resorted to run the script every 30 seconds, check if the device is up or down and if it is down, put it up again with those two commands. It is not pretty, but it makes my system workable at least, until a proper solution is released by Realtek. If anyone is interested in the script I can share it.
Comment 79 Tim 2021-04-26 10:56:27 UTC
I forgot one thing, the above procedure to recover the device has only been tested with the two 2021 kernel patches. I have not tested it without them so I do not know if it would work or not without the patches.
Comment 80 Vladyslav Shtabovenko 2021-04-26 12:26:06 UTC
Could you please specify which patches do you mean? I'm currently on 5.11.16-100.fc32.x86_64, so I guess that they are not included yet.

Another question, what does ifupdown2 do that "simpler" tools cannot do
when it comes to recovering a frozen machine? Once I've even tried rmmod
on the r8152 module but even that command froze.
Comment 81 Michiel Janssens 2021-04-26 12:37:21 UTC
Hi Vladyslav Shtabovenko, have you tried running usbcore.autosuspend=-1?
I still have this in my boot parameters, running kernel 5.11.15.
Since adding this parameter I haven't had any lockups or network freezes.
Comment 82 Tim 2021-04-27 06:15:18 UTC
@Vladyslav Shtabovenko I specified the patches I included to compile the kernel in comment #74.

I needed to install ifupdown2 for other reasons in my system so I have only tried with ifupdown2. But ifupdown2 claims to be able to put up, down and reload the interfaces with less disruption to the network system than previous software, so I suspect it is needed, but can not be sure.
Comment 83 Vladyslav Shtabovenko 2021-05-09 20:42:40 UTC
Many thanks. In my case the total freezes occur not so often (perhaps once in 3-5 weeks), so I didn't try any radical measures yet. The point is that T14 AMD already has many issues with the power management and I'm sort of reluctant to worsen the battery life even further by disabling usb autosuspend. I'm currently testing TLP with the option

USB_BLACKLIST="0bda:8153 0bda:4014"

in /etc/tlp.conf. Not sure if it helps or not, but I didn't have any "fatal" hangers so far. Perhaps it even solves the issue for me. Should there be another freeze as described in my first posting, I will comment on that here.

Note You need to log in before you can comment on or make changes to this bug.