Bug 198931
Summary: | Network connection on r8152 stops with "Tx status -71" | ||
---|---|---|---|
Product: | Drivers | Reporter: | Jean-Louis Dupond (jean-louis) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | NEW --- | ||
Severity: | normal | CC: | aelschuring, anarcat, bugkernel, bugzilla.kernel.org, buo.ren.lin, danny, dion, doubeon1, elia.f.geretto, francesco.giudici, hi, hijacker, intelligence.dance, jcollins, jean-louis, jwrdegoede, kernel, koema, konstantin.sobolev, linux, main.haarp, mango, marctraider, michiel, mliska, numanair, olebowle, ongun.kanat+kernelbugzilla, p.s.vanderheide, pdecat, peter.hahn, peter_hayman, prashanthk0539, richard, ries.infotec+kernel, Rob.Tetour, russianneuromancer, smihael, stian.skjelstad, sundman, ted437, timur.kristof, tomas, truls, vasvir2, vpsink, weberkai, ydewid |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.16-rc2 (drm-tip) | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg of Linux 4.17.0
dmesg of Linux 4.19rc8 dmesg of Linux 5.4.0 lsusb output mjanssens wd15 xps9360 WARNING: at net/sched/sch_generic.c:467 dev_watchdog+0x24d/0x260 dmesg of Linux 6.8.0-rc7 Kernel log of the 6.8.0-39-generic Ubuntu kernel |
Description
Jean-Louis Dupond
2018-02-25 13:03:56 UTC
Seems like a have same issue on Dell Latitude 7285 and HP EliteBook Folio G1 with Belkin USB-C Express Dock 3.1 HD F4U093: [ 1090.235874] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 [ 1090.235879] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 1090.235886] pcieport 0000:00:1c.0: device [8086:9d10] error status/mask=00003000/00002000 [ 1090.235889] pcieport 0000:00:1c.0: [12] Timeout [ 1589.760804] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx status -2 [ 1594.048998] ------------[ cut here ]------------ [ 1594.049003] NETDEV WATCHDOG: enx58ef68a8892b (r8152): transmit queue 0 timed out [ 1594.049040] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:461 dev_watchdog+0x221/0x230 [ 1594.049042] Modules linked in: [...] [ 1594.049294] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.19.0-041900rc8-generic #201810150631 [ 1594.049296] Hardware name: Dell Inc. Latitude 7285/0VVWNX, BIOS 1.2.0 07/09/2018 [ 1594.049303] RIP: 0010:dev_watchdog+0x221/0x230 [ 1594.049307] Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 26 ff f5 00 01 e8 c3 b6 fc ff 89 d9 4c 89 ee 48 c7 c7 08 2d 7b 82 48 89 c2 e8 61 26 7b ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 [ 1594.049310] RSP: 0018:ffffc2438192bd70 EFLAGS: 00010282 [ 1594.049314] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 [ 1594.049317] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffffa0341e216420 [ 1594.049320] RBP: ffffc2438192bda0 R08: 0000000000000001 R09: 0000000000000511 [ 1594.049322] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000001 [ 1594.049325] R13: ffffa033fac37000 R14: ffffa033fac374c0 R15: ffffa034166cf680 [ 1594.049329] FS: 0000000000000000(0000) GS:ffffa0341e200000(0000) knlGS:0000000000000000 [ 1594.049332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1594.049335] CR2: 00007f26eec71020 CR3: 000000038a20a005 CR4: 00000000003606f0 [ 1594.049340] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1594.049342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1594.049344] Call Trace: [ 1594.049356] ? pfifo_fast_change_tx_queue_len+0x2e0/0x2e0 [ 1594.049363] call_timer_fn+0x30/0x130 [ 1594.049371] run_timer_softirq+0x3ea/0x420 [ 1594.049376] ? __switch_to_asm+0x34/0x70 [ 1594.049381] ? __switch_to+0xad/0x500 [ 1594.049385] ? __switch_to_asm+0x40/0x70 [ 1594.049388] ? __switch_to_asm+0x34/0x70 [ 1594.049392] ? __switch_to_asm+0x40/0x70 [ 1594.049397] __do_softirq+0xdc/0x2b5 [ 1594.049403] run_ksoftirqd+0x2b/0x40 [ 1594.049410] smpboot_thread_fn+0xd0/0x170 [ 1594.049416] kthread+0x120/0x140 [ 1594.049421] ? sort_range+0x30/0x30 [ 1594.049426] ? kthread_bind+0x40/0x40 [ 1594.049431] ret_from_fork+0x35/0x40 [ 1594.049435] ---[ end trace 3fcb83dc58402212 ]--- [ 1594.049468] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout [ 1599.172616] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout [ 1604.288619] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout [ 1610.176579] r8152 4-1.1.2:1.0 enx58ef68a8892b: Tx timeout Full logs: Created attachment 279099 [details]
dmesg of Linux 4.17.0
Created attachment 279101 [details]
dmesg of Linux 4.19rc8
Jean-Louis, can you please verify if issue is still reproducible for you on Linux 4.20rc4? For me, at least with one dock (Belkin USB-C Express Dock 3.1 HD F4U093) and one device (HP Elite x2 1013 G3) this issue is no longer reproducible. I will verify other laptops with Linux 4.20 later. I haven't seen this the last months. Running Ubuntu 18.10 with 4.18.0-11-generic I have a very similar setup: Dell Precision 7540 with WD19DC dock that has RTL8153 adapter. It crashes periodically with similar symptoms, my current kernel is 5.4.1 [76658.437411] ------------[ cut here ]------------ [76658.437412] NETDEV WATCHDOG: enp57s0u2u4 (r8152): transmit queue 0 timed out [76658.437421] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x21f/0x230 [76658.437421] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi r8152 mii tun md4 cifs dm_zero fuse raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_crypt dm_mirror dm_region_hash dm_log dm_mod dax ohci_pci ohci_hcd uhci_hcd ehci_pci ehci_hcd mousedev hid_multitouch dell_rbtn input_leds dell_laptop dell_wmi dell_smbios i2c_designware_platform atkbd rtsx_pci_sdmmc mmc_core mei_hdcp i2c_designware_core dell_wmi_descriptor intel_wmi_thunderbolt wmi_bmof intel_rapl_msr libps2 dcdbas dell_smm_hwmon btusb btrtl btbcm uvcvideo x86_pkg_temp_thermal videobuf2_vmalloc btintel intel_powerclamp videobuf2_memops coretemp videobuf2_v4l2 ucsi_acpi bluetooth processor_thermal_device videodev intel_lpss_pci typec_ucsi mei_me i2c_i801 rtsx_pci intel_soc_dts_iosf ecdh_generic intel_lpss mei mfd_core ecc videobuf2_common intel_rapl_common intel_pch_thermal typec wmi i8042 int3403_thermal int3400_thermal i2c_hid dell_smo8800 [76658.437439] int340x_thermal_zone serio acpi_thermal_rel intel_pmc_core evdev i915 [76658.437441] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G U 5.4.1-gentoo #6 [76658.437442] Hardware name: Dell Inc. Precision 7540/0CYJDT, BIOS 1.4.0 09/23/2019 [76658.437443] RIP: 0010:dev_watchdog+0x21f/0x230 [76658.437444] Code: 85 c0 75 e8 eb a8 4c 89 ef c6 05 5d 62 b3 00 01 e8 e6 c8 fc ff 44 89 e1 4c 89 ee 48 c7 c7 48 c1 49 9f 48 89 c2 e8 ea 11 8a ff <0f> 0b eb 89 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 c7 47 08 00 [76658.437444] RSP: 0018:ffffb139401b8e80 EFLAGS: 00010282 [76658.437445] RAX: 0000000000000000 RBX: ffff9f890e1a2a00 RCX: 00000000000011d4 [76658.437445] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffffa39e53ac [76658.437446] RBP: ffff9f8914cc7440 R08: 0000000000000001 R09: 00000000000011d4 [76658.437446] R10: 0000000000028978 R11: 0000000000000001 R12: 0000000000000000 [76658.437447] R13: ffff9f8914cc7000 R14: ffff9f8914cc7440 R15: 0000000000000001 [76658.437447] FS: 0000000000000000(0000) GS:ffff9f891c080000(0000) knlGS:0000000000000000 [76658.437448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [76658.437448] CR2: 00007fc9000030b8 CR3: 00000009c4384006 CR4: 00000000003606e0 [76658.437448] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [76658.437449] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [76658.437449] Call Trace: [76658.437450] <IRQ> [76658.437452] ? qdisc_put_unlocked+0x30/0x30 [76658.437454] call_timer_fn+0x26/0x120 [76658.437454] run_timer_softirq+0x17d/0x470 [76658.437456] ? enqueue_hrtimer+0x31/0x80 [76658.437457] ? __hrtimer_run_queues+0x11b/0x260 [76658.437458] __do_softirq+0xd6/0x2ba [76658.437460] irq_exit+0x9b/0xa0 [76658.437461] smp_apic_timer_interrupt+0x5b/0x110 [76658.437462] apic_timer_interrupt+0xf/0x20 [76658.437462] </IRQ> [76658.437464] RIP: 0010:cpuidle_enter_state+0xa8/0x400 [76658.437464] Code: c5 0f 1f 44 00 00 31 ff e8 85 fb 9b ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 2d 03 00 00 31 ff e8 7c d1 a0 ff fb 45 85 e4 <0f> 88 6c 02 00 00 4c 2b 2c 24 49 63 cc 48 8d 04 49 48 c1 e0 05 8b [76658.437465] RSP: 0018:ffffb139400dfe70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 [76658.437465] RAX: ffff9f891c0a7bc0 RBX: ffffffff9f6a1ce0 RCX: 000045b86eed58ba [76658.437466] RDX: 000045b86efc9af4 RSI: 000045b86eed58ba RDI: 0000000000000000 [76658.437466] RBP: ffffd1393fab4a10 R08: 000045b86eed58d6 R09: 00000000000001bf [76658.437466] R10: ffff9f891c0a6c20 R11: ffff9f891c0a6c00 R12: 0000000000000002 [76658.437467] R13: 000045b86eed58d6 R14: 0000000000000002 R15: ffff9f8915f5c740 [76658.437468] cpuidle_enter+0x24/0x40 [76658.437470] do_idle+0x1bf/0x230 [76658.437471] cpu_startup_entry+0x14/0x20 [76658.437472] start_secondary+0x131/0x160 [76658.437473] secondary_startup_64+0xa4/0xb0 [76658.437474] ---[ end trace 907e490a0cd3c160 ]--- [76658.437476] r8152 4-2.4:1.0 enp57s0u2u4: Tx timeout [76659.788078] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=83035 end=83036) time 243 us, min 1431, max 1439, scanline start 1421, end 1443 [76660.958672] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2 [76660.958758] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2 [76660.958848] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2 [76660.958940] r8152 4-2.4:1.0 enp57s0u2u4: Tx status -2 I have a similiar setup and similiar problem: Setup: Lenovo Thinkpad t480, Think Pad USB-C Dock 40A90090EU [1], Ubuntu 16.04, Kernel 4.15.0-74-generic #83~16.04.1-Ubuntu Network connection is periodically crashing. Dmesg shows `r8152 4-1.1:1.0 enxe04f43991e1c: Rx status -71` in that case. I noticed that this seams to depend on the use of the network connection. E.g. if I compile a lot using icecream to distribute compilation jobs, it seams to be a lot less stable. Using `rmmod r8152 && modprobe r8152` fixes the problem temporarily. [1] https://support.lenovo.com/de/de/accessories/acc100348 @Peter check Comment 4 Re-test on newer kernel (you can take it from mainline PPA). This still happens to me on 5.5.6-201.fc31.x86_64. My dmesg is full of these messages: [12696.189484] r8152 6-1:1.0 enp10s0u1: Tx timeout [12702.333456] r8152 6-1:1.0 enp10s0u1: Tx timeout [12707.965422] r8152 6-1:1.0 enp10s0u1: Tx timeout [12713.085385] r8152 6-1:1.0 enp10s0u1: Tx timeout [12718.205360] r8152 6-1:1.0 enp10s0u1: Tx timeout [12724.349321] r8152 6-1:1.0 enp10s0u1: Tx timeout [12729.981295] r8152 6-1:1.0 enp10s0u1: Tx timeout [12735.101256] r8152 6-1:1.0 enp10s0u1: Tx timeout [12740.221235] r8152 6-1:1.0 enp10s0u1: Tx timeout [12746.365199] r8152 6-1:1.0 enp10s0u1: Tx timeout [12751.997171] r8152 6-1:1.0 enp10s0u1: Tx timeout [12757.117155] r8152 6-1:1.0 enp10s0u1: Tx timeout Timur, you are using same docking station as Jean-Louis or some other? RussianNeuroMancer, I use a Dell XPS 13 9370 with a Lenovo ThinkPad branded Thunderbolt 3 dock. The model number is DBB9003L1. (The dock is not mine, I'm just borrowing it from a collegaue for a week.) I think these docks mostly use the same hardware under the hood, I think I've also seen a Fedora bug report about the same issue with the Dell TB16 here: https://bugzilla.redhat.com/show_bug.cgi?id=1460789 I see. By the way, since my Comment 4 I was able to reproduce this issue again. This time with Linux 5.4 on Dell Venue 8 Pro 5855 and Dell WD15 Dock. Created attachment 287779 [details]
dmesg of Linux 5.4.0
Same problem here. Dell Latitude 7480 (BIOS 1.16.1) with WD15 dock (Port Controller on v1.1.8). I am using 5.5.7-zen1-1-zen (but the same problem also occured with the standard arch kernel). It has not occured with 5.4.2.arch1-1 but it for sure occured with 5.4.5.arch1-1 (I had holidays inbetween and the troubles started after them). -- Mar 05 13:42:34 hostname kernel: ------------[ cut here ]------------ Mar 05 13:42:34 hostname kernel: NETDEV WATCHDOG: enp59s0u1u2 (r8152): transmit queue 0 timed out Mar 05 13:42:34 hostname kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x268/0x270 Mar 05 13:42:34 hostname kernel: Modules linked in: md4 nls_utf8 cifs dns_resolver fscache libdes rfcomm ip6t_REJECT nf_reject_ipv6 xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_na> Mar 05 13:42:34 hostname kernel: coretemp snd_hda_codec_generic ledtrig_audio kvm_intel snd_pcm_dmaengine snd_hda_intel dell_wmi_descriptor dcdbas snd_intel_dspcfg dell_smm_hwmon snd_hda_codec kvm cfg80211 snd_hda_core snd_hwdep snd_pcm e1000e fuse irqbypass i> Mar 05 13:42:34 hostname kernel: libps2 aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_hcd rtsx_pci i8042 serio i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_agp intel_gtt agpgart btrfs blake2b_generic libcr> Mar 05 13:42:34 hostname kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.7-zen1-1-zen #1 Mar 05 13:42:34 hostname kernel: Hardware name: Dell Inc. Latitude 7480/00F6D3, BIOS 1.16.1 10/03/2019 Mar 05 13:42:34 hostname kernel: RIP: 0010:dev_watchdog+0x268/0x270 Mar 05 13:42:34 hostname kernel: Code: 47 9c 69 ff eb 8a 4c 89 f7 c6 05 dc 05 db 00 01 e8 0d fa f9 ff 44 89 e9 4c 89 f6 48 c7 c7 d0 2a 5a 8e 48 89 c2 e8 0f 92 73 ff <0f> 0b e9 68 ff ff ff 90 0f 1f 44 00 00 48 c7 47 08 00 00 00 00 48 Mar 05 13:42:34 hostname kernel: RSP: 0018:ffffb39300164e60 EFLAGS: 00010286 Mar 05 13:42:34 hostname kernel: RAX: 0000000000000000 RBX: ffff8cdc200b2000 RCX: 0000000000000000 Mar 05 13:42:34 hostname kernel: RDX: 0000000000000103 RSI: 00000000000000f6 RDI: 00000000ffffffff Mar 05 13:42:34 hostname kernel: RBP: ffff8cdc0e5bf45c R08: 0000000000000515 R09: 0000000000000003 Mar 05 13:42:34 hostname kernel: R10: 0000000000000001 R11: 0000000000003c00 R12: ffff8cdc0e5bf480 Mar 05 13:42:34 hostname kernel: R13: 0000000000000000 R14: ffff8cdc0e5bf000 R15: ffff8cdc200b2080 Mar 05 13:42:34 hostname kernel: FS: 0000000000000000(0000) GS:ffff8cdc26500000(0000) knlGS:0000000000000000 Mar 05 13:42:34 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 05 13:42:34 hostname kernel: CR2: 00007fe15a0d3000 CR3: 000000019f20a001 CR4: 00000000003606e0 Mar 05 13:42:34 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 05 13:42:34 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 05 13:42:34 hostname kernel: Call Trace: Mar 05 13:42:34 hostname kernel: <IRQ> Mar 05 13:42:34 hostname kernel: ? qdisc_put_unlocked+0x30/0x30 Mar 05 13:42:34 hostname kernel: ? qdisc_put_unlocked+0x30/0x30 Mar 05 13:42:34 hostname kernel: call_timer_fn+0x2d/0x150 Mar 05 13:42:34 hostname kernel: ? qdisc_put_unlocked+0x30/0x30 Mar 05 13:42:34 hostname kernel: run_timer_softirq+0xaec/0xce0 Mar 05 13:42:34 hostname kernel: __do_softirq+0x111/0x374 Mar 05 13:42:34 hostname kernel: ? hrtimer_interrupt+0x235/0x3e0 Mar 05 13:42:34 hostname kernel: irq_exit+0xc9/0x120 Mar 05 13:42:34 hostname kernel: smp_apic_timer_interrupt+0xa6/0x1a0 Mar 05 13:42:34 hostname kernel: apic_timer_interrupt+0xf/0x20 Mar 05 13:42:34 hostname kernel: </IRQ> Mar 05 13:42:34 hostname kernel: RIP: 0010:cpuidle_enter_state+0xc9/0x850 Mar 05 13:42:34 hostname kernel: Code: e8 8c b0 85 ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 00 06 00 00 31 ff e8 3e 09 8d ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 1f 04 00 00 49 63 d4 4c 2b 6c 24 10 48 8d 04 52 48 Mar 05 13:42:34 hostname kernel: RSP: 0018:ffffb393000dbe50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Mar 05 13:42:34 hostname kernel: RAX: ffff8cdc26500000 RBX: ffff8cdc26537800 RCX: 000000000000001f Mar 05 13:42:34 hostname kernel: RDX: 0000000000000000 RSI: 000000002f32988b RDI: 0000000000000000 Mar 05 13:42:34 hostname kernel: RBP: ffffffff8e8bea60 R08: 00000a3f27ed44df R09: 00000a3f251f7ba7 Mar 05 13:42:34 hostname kernel: R10: 0000000000000007 R11: 0000000000000007 R12: 0000000000000008 Mar 05 13:42:34 hostname kernel: R13: 00000a3f27ed44df R14: 0000000000000008 R15: ffff8cdc22a98000 Mar 05 13:42:34 hostname kernel: cpuidle_enter+0x29/0x40 Mar 05 13:42:34 hostname kernel: do_idle+0x20c/0x2c0 Mar 05 13:42:34 hostname kernel: cpu_startup_entry+0x19/0x20 Mar 05 13:42:34 hostname kernel: start_secondary+0x1c6/0x220 Mar 05 13:42:34 hostname kernel: secondary_startup_64+0xb6/0xc0 Mar 05 13:42:34 hostname kernel: ---[ end trace 358d3d81e0691439 ]--- Mar 05 13:42:34 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout Mar 05 13:42:40 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout Mar 05 13:42:46 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout Mar 05 13:42:51 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout Mar 05 13:42:56 hostname kernel: r8152 4-1.2:1.0 enp59s0u1u2: Tx timeout I've been encountering this problem with every relatively recent (4.9+) kernel, and possibly older ones as well. System: Lenovo W530 USB adapter: Cable Matters 3 Port USB 3.0 Hub with Ethernet (USB Hub with Ethernet, Gigabit Ethernet USB Hub ) Supporting 10/100/1000 Mbps Ethernet Network in Black https://smile.amazon.com/gp/product/B01J6583NK/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter I've encountered the problem with Arch's main linux kernels and their LTS builds. The interface seems to have trouble once it is put under any sort of load (30% or more utilization) on the host system. Removing and reloading the module can sometimes temporarily improve things, but (from what I've seen) the issue always returns within a few minutes to an hour. This is also a rather widespread problem: https://askubuntu.com/questions/1081128/usb-3-0-ethernet-adapter-not-working-ubuntu-18-04 https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1742922 https://unix.stackexchange.com/questions/362014/usb-3-0-ethernet-dongle-periodically-disconnects-from-network https://bugzilla.kernel.org/show_bug.cgi?id=200977 https://askubuntu.com/questions/1044127/usb-ethernet-adapter-realtek-r8153-keeps-disconnecting https://forum.odroid.com/viewtopic.php?t=26121 So if it's same issue there is at least several workarounds: "usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link) Install tlp and change USB_BLACKLIST option in /etc/default/tlp to "0bda:8153" (from second askubuntu link) Patch /drivers/usb/core/quirks.c with following line (mentioned in tlp bugreport) { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM }, Unfortunately, this week I doesn't have access to Dell WD15 docking station. Is any else can try at least first or second workaround? (In reply to RussianNeuroMancer from comment #17) > So if it's same issue there is at least several workarounds: > > "usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link) > Install tlp and change USB_BLACKLIST option in /etc/default/tlp to > "0bda:8153" (from second askubuntu link) > Patch /drivers/usb/core/quirks.c with following line (mentioned in tlp > bugreport) > { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM }, I can confirm that blacklisting "0bda:8153" for USB_BLACKLIST in my tlp.conf seems to work fine for me. Prior to this change I lost network connection each night and now I have connection straight for the last two nights (three days) (In reply to RussianNeuroMancer from comment #17) > "usbcore.quirks=0bda:8153:k" kernel boot option (from first askubuntu link) I can confirm that adding "usbcore.quirks=0bda:8153:k" to kernel boot options worked for me. So reading through this bug report, the solution, or at least a workaround would seem to be to add USB_QUIRK_NO_LPM entries for the troublesome rtl8152 / rtl8153 based ethernet adapters to drivers/usb/core/quirks.c. There actually already is at least one line in there for a dock with a r8153 nic: /* Microsoft Surface Dock Ethernet (RTL8153 GigE) */ { USB_DEVICE(0x045e, 0x07c6), .driver_info = USB_QUIRK_NO_LPM }, There is mention of several docks here; but upon checking various logs, they all seem to use the generic realtek usb-id for the RTL8153 GigE NIC. So it seems that the solution is adding the following lines to: drivers/usb/core/quirks.c : /* Generic RTL8153 GigE adapters */ { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM }, I will submit a patch upstream for this. Unfortunately I was to enthusiastic about this. I wrote my comment after 1 day of working and 1 night of downloading huge amount of big data without problems. But after that using icecream distributed compiler daemon again crashed my connection. So it seams to be better but not solved for me. (In reply to Peter from comment #21) > Unfortunately I was to enthusiastic about this. > I wrote my comment after 1 day of working and 1 night of downloading huge > amount of big data without problems. But after that using icecream > distributed compiler daemon again crashed my connection. > > So it seams to be better but not solved for me. I'm sorry to hear that the issue is not 100% resolved. Still I've found enough other bug-reports where people are having success with this option when used with a RTL813 device, that I believe that it is worthwhile to submit a patch for this upstream, see. e.g. : https://bugzilla.redhat.com/show_bug.cgi?id=1713657 > https://bugzilla.redhat.com/show_bug.cgi?id=1713657
I wonder why blacklist in tlp didn't help him, but usbcore.quirks does.
> But after that using icecream distributed compiler daemon again crashed my > connection. > So it seams to be better but not solved for me. Try this: 1. remove lines 737-737 here https://github.com/torvalds/linux/blob/0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733 2. remove lines 6900 and 6901 here https://github.com/torvalds/linux/blob/0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900 Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD. (In reply to RussianNeuroMancer from comment #24) > Try this: > > 1. remove lines 737-737 here > https://github.com/torvalds/linux/blob/ > 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733 > > 2. remove lines 6900 and 6901 here > https://github.com/torvalds/linux/blob/ > 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900 > > Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on > HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD. Hmm, so in essence that swaps the driver which is specifically made for the RTL8153 with the generic USB ethernet class driver. Although it might be interesting to try that there are known issues with that. E.g. with a Lenovo thunderbolt 3 gen 2 dock, when the laptop is turned off while connected to the dock, most of the dock is turned off, but the ethernet card still has power (for wake on lan I guess) and when using the cdc_ether driver, then the RTL8153 nick will start spamming the network as fast as it can after the laptop has been turned off, which in my case made my entire (wired) home network unusable (*). So I actually send a patch upstream doing the opposite, adding the Lenovo specific USB-ids for the RTL8153 to the blacklist in cdc_ether and to the white/device-id list in r8152.c which solved the dock jamming my wired network after the laptop turned off. *) I'm using a cheap unmanaged switch a better switch may have kept the network at least somewhat usable (In reply to RussianNeuroMancer from comment #24) > > But after that using icecream distributed compiler daemon again crashed my > > connection. > > > So it seams to be better but not solved for me. > > Try this: > > 1. remove lines 737-737 here > https://github.com/torvalds/linux/blob/ > 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/cdc_ether.c#L733 > > 2. remove lines 6900 and 6901 here > https://github.com/torvalds/linux/blob/ > 0d81a3f29c0afb18ba2b1275dcccf21e0dd4da38/drivers/net/usb/r8152.c#L6900 > > Back in Linux 4.18/4.19 days that allowed me to workaround similar issue on > HP Elite x2 1013 G3 and Belkin USB-C Express Dock 3.1 HD. My 0bda:8153 also stops working with the cdc_ether driver (without it saying anything in syslog). Blacklisting 0bda:8153 in TLP didn't work. Adding 0bda:8153 quirks kernel parameter didn't work. Using the newest r8152.53.56-2.12.0 driver from realtek didn't work. Similar issues here with a Dell dock WD15, connected from Dell XPS 13 9360. Since several months the wired connection from the dock 0bda:8153 dies, but the network stack isn't notified. A reboot after this waits endlessly on services to stop. Sometimes Gnome gui locks up shortly after logging back in the system and being presented with the issue. I have to do REISUB to get the system working again. The issue doesn't appear while working on the the system, mostly when leaving it running by itself for a while. I haven't found a way to actually trigger it. At the moment I'm running openSUSE Tumbleweed with kernel 5.6.6, issue is still happening. I tried quircks, but no result. For several days i'm now testing running it with usbcore.autosuspend=-1 and have left the system running for longer periods. The issue didn't happen so far. Side note: Commit 75d7676ead19b1fbb5e0ee934c9ccddcb666b68c doesn't seem to have fixed the message "Tx status -71" from the original bug reporter. (Tx timeout, in my case) That still happens once in a while. The usbcore.autosuspend=-1 kernel parameter doesn't resolve it for me. Also, I can trigger the problem in seconds, simply by reading at gigabit speeds. At least for thise seeing issues with Dell's WD15 dock I think that trying something similar to this quirk might help: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b63e48fb50e1ca71db301ca9082befa6f16c55c4 To try this, first do: lsusb -t To find the Bus and Dev number of any USB hub(s) inside the dock. Then do: lsusb And lookup the same Bus and Dev number to get the vendor- and product-id used for the hub, e.g. 0bda:0487 Then try booting with this added to your kernel commandline: usbcore.quirks=0bda:0487:k Replacing the 0bda:0487 with the <vend>:<prod> ids for your hub (from the lsusb output). If you want to try this on more then one USB device, you can specify the NO_LPM quirk for multiple USB devices like this: usbcore.quirks=0bda:0487:k,0bda:0488:k Please give this a try and see if that helps. Also note that the same thing can be used to set the NO_LPM quirk on the USB ethernet-chip itself if it has a different USB-id which is not yet in the kernel's quirks list. (In reply to Hans de Goede from comment #29) > Replacing the 0bda:0487 with the <vend>:<prod> ids for your hub (from the > lsusb output). If you want to try this on more then one USB device, you can > specify the NO_LPM quirk for multiple USB devices like this: > > usbcore.quirks=0bda:0487:k,0bda:0488:k This didn't work. I have 3 devices: Bus 003 Device 009: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter Bus 003 Device 008: ID 0bda:0411 Realtek Semiconductor Corp. 4-Port USB 3.0 Hub Bus 002 Device 006: ID 0bda:5411 Realtek Semiconductor Corp. 4-Port USB 2.0 Hub I added these kernel params: usbcore.quirks=0bda:8153:k,0bda:5411:k,0bda:0411:k usbcore.autosuspend=-1 Still fails with either Rx status -71 or Tx status -71 after reading 50 MB/s over the network for a minute or few. > Please give this a try and see if that helps. Also note that the same thing > can be used to set the NO_LPM quirk on the USB ethernet-chip itself if it > has a different USB-id which is not yet in the kernel's quirks list. I'm not sure how to do that. As far as I can tell my ethernet chip is at 0bda:8153 (which in my case is at usb@3:1.4, which maps to device 9, which maps to 0bda:8153). @Marcus Sundman, right you have already set the flag for your ethernet usb controller by adding the 0bda:8153:k part to the quirks. So it seems that at least for you setting the NO_LPM flag does not help. Does your dock have updateable firmware? If so you may want to try to update the firmware. The first generation thunderbolt docks from all vendors were notoriously buggy and the all need the latest firmware to work at least somewhat reliable. Getting the latest firmware is also strongly advised for people using Windows since there really were quite a few issues with these devices which are fixed with fw updates. Yes, I wrote somewhat reliable, the best fix for thunderbolt dock issues often is getting a second generation or newer dock :( (In reply to Hans de Goede from comment #29) Thanks for posting this instruction! I already had seen the commit for the WD19, but it wasn't clear how I should investigate that on my system. The WD15 doc adds 2 usb busses with both a Microchip USB hub I removed the usbcore.autosuspend=-1 parameter and will test for several days with usbcore.quirks=0424:5537:k, which is the hub which has the 0bda:8153 as child. I will add attachments with my lsusb output. Created attachment 288917 [details]
lsusb output mjanssens wd15 xps9360
It didn't take days to get results. Just the hub where 0bda:8153 is child usbcore.quirks=0424:5537:k result: 0bda:8153 dies after a while, without log entry, needed REISUB to reboot Both hubs which are added by connecting WD15 usbcore.quirks=0424:5537:k,0424:2137:k result: 0bda:8153 dies after a while, without log entry, needed REISUB to reboot So I'm back to using usbcore.autosuspend=-1. Please advise if I missed something (or incorrect dev id) I could test. (In reply to Michiel Janssens from comment #34) > It didn't take days to get results. > > Just the hub where 0bda:8153 is child > usbcore.quirks=0424:5537:k > result: 0bda:8153 dies after a while, without log entry, needed REISUB to > reboot > > Both hubs which are added by connecting WD15 > usbcore.quirks=0424:5537:k,0424:2137:k > result: 0bda:8153 dies after a while, without log entry, needed REISUB to > reboot > > So I'm back to using usbcore.autosuspend=-1. > Please advise if I missed something (or incorrect dev id) I could test. There have been a lot of firmware updates for the wd15, do you have these all applied? (In reply to Hans de Goede from comment #35) > There have been a lot of firmware updates for the wd15, do you have these > all applied? Good catch, i'm on 1.0.4 according to fwupdmgr. Latest is 1.0.6 on the dell site. Unfortunately the wd15 appears not (yet) to be fully supported via fwupdmgr so Windows is the only option, sigh. I will try to update, test again and report. Bios is current by the way. (In reply to Hans de Goede from comment #31) > @Marcus Sundman, right you have already set the flag for your ethernet usb > controller by adding the 0bda:8153:k part to the quirks. So it seems that at > least for you setting the NO_LPM flag does not help. I also tried without usbcore.autosuspend=-1 but that also didn't help. > Does your dock have updateable firmware? If so you may want to try to update > the firmware. It's a LogiLink UA0173A, and it doesn't seem to have any firmware available (only newer drivers, which I already tried). Just for the record, I was able to reproduce this issue even on NanoPi-M1 (Allwinner H3) with Linux 5.4.32 attached to Belkin USB-C Express Dock 3.1 HD F4U093 (did this for convenience, just to quickly get working keyboard and mouse without reattaching keyboard and mouse cables from dock to board). Quirk was included in 5.4 since 5.4.28 so it already applied. Unfortunately, I didn't expected this issue to be reproducible with NanoPi-M1 board, so I didn't saved lsusb -t before/after this happened. (In reply to Michiel Janssens from comment #36) I ran several updaters from Dell under Windows. My WD15 firmware components (4 of them) were already current, apparently the main version is some sort of wrapper. So no updates to Bios or WD15 firmware are possible. At the moment I run kernel 5.6.8, so I ran all tests again: - with or without usbcore.quirks=0424:5537:k,0424:2137:k the nic dies after a while - with usbcore.autosuspend=-1 the nic remains alive Still the same problem with Realtek's new driver, r8152.53.56-2.13.0, on ubuntu's 5.4.0-37-generic with usbcore.autosuspend=-1. It fails with 'Tx status -71' or 'Rx status -71': > net_ratelimit: 22 callbacks suppressed > r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71 > r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71 > r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -71 > ... But sometimes that quickly turns into this: > xhci_hcd 0000:03:00.0: WARN: TRB error for slot 3 ep 3 on endpoint > r8152 3-2.4:1.0 enx00e04d6aeb98: Tx status -84 > xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared > r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22 > xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared > r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22 > xhci_hcd 0000:03:00.0: WARN waiting for error on ep to be cleared > r8152 3-2.4:1.0 enx00e04d6aeb98: failed tx_urb -22 > ... I've also tried adjusting the nic's Rx ring size from 100 to 20 or 2000, but still the same crash seconds after starting a gigabit speed download. GNU/Gentoo, 64bit here, kernel 5.7.2, same problem on Lenovo 40AS USB-C dock: None of the suggested "workarounds" helped, here is the lsusb tree: /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M |__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M |__ Port 1: Dev 4, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M |__ Port 3: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M And the patch applied to the kernel(the ids differ): cat /etc/portage/patches/sys-kernel/gentoo-sources-5.7.2/lenovo-usbc-dock-rtl-ethernet-quirk.patch --- a/drivers/usb/core/quirks.c 2020-06-01 01:49:15.000000000 +0200 +++ b/drivers/usb/core/quirks.c 2020-06-15 12:01:39.028377907 +0200 @@ -384,6 +384,11 @@ /* Generic RTL8153 based ethernet adapters */ { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM }, + /* Lenovo USB-C Ethernet RTL8153 based ethernet adapters */ + { USB_DEVICE(0x1d6b, 0x0003), .driver_info = USB_QUIRK_NO_LPM }, + { USB_DEVICE(0x17ef, 0xa391), .driver_info = USB_QUIRK_NO_LPM }, + { USB_DEVICE(0x17ef, 0xa387), .driver_info = USB_QUIRK_NO_LPM }, + /* Action Semiconductor flash disk */ { USB_DEVICE(0x10d6, 0x2200), .driver_info = USB_QUIRK_STRING_FETCH_255 }, and booting with: usbcore.quirks=17ef:a387:k,17ef:a391:k,1d6b:0003:k or usbcore.autosuspend=-1 does not help. Same problem happens on laptops connected to this lenovo docks running windows OSes. Hi, I'm glad I found this (old) bug. It affects me as well and still is up to date. I'm running Kubuntu 20.04 with Mainline Kernel 5.7.9 and tried 5.8.0-rc7. My brand new Thinkpad T14 (AMD Version, 32GB RAM) is connected to a Thinkpad USB-C dock Gen 2. Laptop and Dock run latest firmware. **Testcase** is copying a large video file (~ 2GB) from Laptop to NAS. **Error** lots of "r8152 5-1.1:1.0 enx482ae36d721f: Tx status -71" in dmesg log after a few seconds. Connection lost. Need to use either WiFi or reconnect dock. To find the culprit: Copying via WiFi connection or Laptops LAN port (r8169) works. Using another dock (DELL docking station) works. Connection a DELL Windows Laptop to Lenovo dock works Hardware must be OK then! So I suppose the cause lies in r8152 driver. As I can reproduce this "on demand" I could provide more information/logs if you tell me what's needed and how to do it ;) BR Peter update to my previous comment... Kubuntu's Network Manager obviously set up the USB Network Adapter with "Link Negotiation: ignore" for whatever reason. I changed it to "Auto" and now it is more stable - even solid. I just pumped a 20G VM Diskimage to NAS and no error occurred. I'm optimistic that this setting solved my problem. Time will tell ... (In reply to Peter Ries from comment #43) > update to my previous comment... > > Kubuntu's Network Manager obviously set up the USB Network Adapter with > "Link Negotiation: ignore" for whatever reason. > > I changed it to "Auto" and now it is more stable - even solid. I just pumped > a 20G VM Diskimage to NAS and no error occurred. I'm optimistic that this > setting solved my problem. > > Time will tell ... update: no problems anymore. stable for 1,5 days constant usage :) The same problem occurs with Anker Ethernet Adapter combined with USB hub (https://www.anker.com/products/variant/aluminum-3port-usb-30-and-ethernet-hub/A7514041) on 4.15.0-38-generic kernel in KDE neon (based on Ubuntu 18.04). dmesg outputs "r8152 4-3.3:1.0 eth0: Tx status -71" for multiple times. lsusb -t output /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M |__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 3: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M lsusb output Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. Bus 004 Device 002: ID 2109:0812 VIA Labs, Inc. VL812 Hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub None of the workarounds involving changing boot paramters suggested above helped (usbcore.autosuspend=-1; usbcore.quirks=2109:0812:k,0bda:8153:k). Actually with usbcore.quirks I couldn't even boot as the process hung at "Switching to clocksource tsc" error. The adapter works flawlessly with 4.14.x kernel. Interestingly, the adapter works fine when certain devices (e.g. wireless mouse's receiver) are connected in the USB hub. Hi! I have the same error, but only in USB 3.0 port. My hub doesn't support LPM, so I think I have other problem (usbcore.quirks=0bda:8153:k). I've tried usbcore.quirks=0bda:8153:j and I've got no -71 error. HTH Weber Kai Sorry fellows, please ignore my previous comment. After 40 minutes the network stopped. Hi fellows, I think read_bulk_callback should treat EPROTO error code... But I don't know exactly how... But adding EPROTO to ESHUTDOWN the driver becomes more stable... Thanks HTH Hi, before my crash I have many 4478.945334] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ 4602.649511] CPU2: Core temperature above threshold, cpu clock throttled (total events [ 4626.376508] CPU5: Core temperature/speed normal [ 4681.081753] perf: interrupt took too long (3202 > 3131), lowering kernel.perf_event_max_sample_rate to 62000 [ 4895.218338] perf: interrupt took too long (4065 > 4002), lowering kernel.perf_event_max_sample_rate to 49000 Can they be part of the problem? I have had three crashes all during zoom meeting which means high traffic, videos going and coming. Pekka I have the same issue with three Bus 002 Device 004: ID 13b1:0041 Linksys Gigabit Ethernet Adapter Bus 002 Device 003: ID 13b1:0041 Linksys Gigabit Ethernet Adapter Bus 002 Device 002: ID 13b1:0041 Linksys Gigabit Ethernet Adapter devices on platform: Linux debian 4.19.155-redundant #1 SMP PREEMPT Mon Nov 9 01:54:50 CET 2020 x86_64 GNU/Linux debian description: Mini PC product: NUC7CJYS vendor: Intel(R) Client Systems version: J67993-403 serial: G6JY936009MK width: 64 bits capabilities: smbios-3.1.1 dmi-3.1.1 smp vsyscall32 configuration: boot=normal chassis=mini family=JY uuid=8A72A51B-79C4-85CA-66A9-1C697A088052 *-core description: Motherboard product: NUC7JYB vendor: Intel Corporation physical id: 0 version: J67970-402 serial: GEJY93500752 slot: Default string *-firmware description: BIOS vendor: Intel Corp. physical id: 0 version: JYGLKCPX.86A.0057.2020.1020.1637 Latest firmware. It looks to be an USB enumeration issue here. When rebooting the device, the problem always shows up after a while, when physically turning the unit on and off, the issue no longer appears and it can run stable forever. I basically made a habit of shutting down entirely when I need to reboot. Dell Inc. XPS 15 9570/0HWTMH, BIOS 1.17.1 07/09/2020 Kernel: 5.8.0-7630-generic Chipset: RTL8153b-2 (version 9) After a full day of debugging the r8152 driver on an USB-c dock it does look like the RTL 8152 chipset is unable to keep pace with the transmission queue (tx_queue). The driver keeps sending URB blocks towards the chipset but at some point the chipset will no longer fire status interrupts. This stalls the write_bulk_callback which in turn sets of the netdev timeout watchdog. The timeout then tries to reset the USB device but the RTL chipset is no longer responding. Power cycle the USB port is the only option to reset the chip. The behavior is quite deterministic. When sending bulk packets over the 1000TX interface (Gigabit that is) it doesn't take long before the watchdog reports queue congestion. If I simulate a lockup by deliberately slowing down the tasklet, the timeout ans TX status -2/-71 is almost triggered immediately. The netdev timeout resetting the USB device seems a bit silly. There is no lock held when the USB device is reset, and the timeout handler is invoked every (5 * 1000Hz). A better solution would be to power cycle the USB port. Because this bug as been away for a while and now reappears it could be a firmware issue. The most recent firmware was updated a little more than a year ago. Y. All my issues with the USB Adapters have disappeared after I used these kernel parameters; usbcore.old_scheme_first=1 usbcore.use_both_schemes=0 usbcore.autosuspend=-1 pci_aspm=off Not sure which of them, or a combination of actually does it. Been running for a week 24/7 with them now. Strangely this issue reappeared some days ago. It was gone after I set link negotiation in network manager to "auto" (kubuntu st it to ignore for some reason) and worked around three months flawlessly. A week ago sending a 4 gb to my NAS the error reappeared, but was gone after I set manual negotiation in network manager 1 Gbit/full duplex. It worked the again for some days but yesterday with a lot of traffic in my LAN a network sync immediately "killed" the USB-C dock networkinterface. I didn't change anything beside regular apt updates and installing the latest mainline kernel 5.9.x branch - don't know what happened. May be I'll give autosuspend a try... Yorick de Wid's analysis https://bugzilla.kernel.org/show_bug.cgi?id=198931#c51 seems quite promising to be the root cause. usbcore.autosuspend=-1 didn't help with my Lenovo USB-C dock gen 2... Peter Ries can you specify which firmware version is being loaded? The driver writes the version to dmesg on load. For example; RTL8153b-2 (version 9). Its hard to track exact firmware versions. If it happens to be a firmware issue we might be able to downgrade. Haven't tried an older version as yet. SHA1 checksum on 5.8.0-7630-generic (which should be the latest): 3008299c2fee3f5a5e3b2d8e16919d230204542c rtl8105e-1.fw c6fcde458093a4ef60b534feacc9dd564098ff9b rtl8106e-1.fw 221d833a22040e4014bf34b31481712180b77594 rtl8106e-2.fw 9d390948663bf6885d86586588c428186d5dff7e rtl8107e-1.fw a065c863146d8216d8cc84a6b754968613848b32 rtl8107e-2.fw 5da573149e80587668e1d4bcbcbced184e51ac03 rtl8125a-3.fw 21c7c428112bd9e24713192302513a95ba41ed5e rtl8125b-1.fw a588787b9ebeec9cbfdbd46612a63f53ad5b1d62 rtl8125b-2.fw 3d87c04720c4b4709e4673707c4c104e28be1c1b rtl8153a-2.fw e467098b1cbb04022805cd777eb66585022524a6 rtl8153a-3.fw cce086e885091c348bf521924f306f240f8dcc08 rtl8153a-4.fw 2b268656c6cd7d03dc47ca8eaec2f31ca668c53c rtl8153b-2.fw 9872f469227555937d4063b1420a0ff23790da59 rtl8168d-1.fw 0ab15a6c812fafc38dd896972fa5fcb46cca1068 rtl8168d-2.fw 61fdc2ba78caf36a6551554f089d1c964159d247 rtl8168e-1.fw 60e16292fd4eb90138a3e2061305030b4993de79 rtl8168e-2.fw 6c3721e8e5d19f62b3da13519e2496ffabb3ffb4 rtl8168e-3.fw c7c01066ddfc0215ad8977af5a3cd654b6f7ed10 rtl8168f-1.fw 24bf10a38bcb1b4652f71653e41d1a444f303c3f rtl8168f-2.fw bf3495d9233f3abaceab194e732bdb9c350a68a5 rtl8168fp-3.fw 12ed6246c8c4d6344d4840acc11de30d7c3ff1ec rtl8168g-1.fw 0ead82c11625a677600a589cc4590722ea2f6de7 rtl8168g-2.fw 36e09340d99f9290fd9cc62c48c11b8112558b8a rtl8168g-3.fw c012f50c24ef64dcc48615d584709f8094df6af7 rtl8168h-1.fw 439686559c1fa53820c5e740b71566ee874171ba rtl8168h-2.fw c4fda34e80b4124377a7554636327fd03e697ee2 rtl8402-1.fw 38a89b6f1b57795a470675184733aff335cb41ae rtl8411-1.fw 57c4e659337aacc88a52e9940bc0246a0f02e47b rtl8411-2.fw Hi Yorick, I found this line: [ 16.848492] r8152 5-1.1:1.0: load rtl8153b-2 v1 10/23/19 successfully Is it what you're looking for? Otherwise let me know how to find out. Thanks for investigating! and this one, too: [ 16.883982] r8152 5-1.1:1.0 eth0: v1.11.11 > I found this line:
>
> [ 16.848492] r8152 5-1.1:1.0: load rtl8153b-2 v1 10/23/19 successfully
Thats the one. The v9 chipset is also the version im running. Its too early to conclude but it strokes with the presumption that the rtl8153b-2 firmware contains a bug.
Hi, I had this crash with my Dell XPS 9350 and a Dell USB 3.0 Ethernet adapter for the first time yesterday: # lsusb | grep -i RTL8153 Bus 002 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter # journalctl -b -1 | grep "rtl8153" Dec 15 15:36:40 patrickxps kernel: r8152 2-2:1.0: load rtl8153a-3 v2 02/07/20 successfully # uname -a Linux patrickxps 5.9.14-zen1-1-zen #1 ZEN SMP PREEMPT Sat, 12 Dec 2020 14:36:44 +0000 x86_64 GNU/Linux Dec 15 18:18:50 patrickxps kernel: ------------[ cut here ]------------ Dec 15 18:18:50 patrickxps kernel: NETDEV WATCHDOG: enp0s20f0u2 (r8152): transmit queue 0 timed out Dec 15 18:18:50 patrickxps kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x26b/0x280 Dec 15 18:18:50 patrickxps kernel: Modules linked in: rfcomm ccm cmac snd_hda_codec_hdmi algif_hash algif_skcipher af_alg bnep snd_hda_codec_realtek snd_hda_codec_generic cdc_ether usb> Dec 15 18:18:50 patrickxps kernel: intel_uncore psmouse soundcore rc_core processor_thermal_device i2c_i801 input_leds pcspkr tpm_tis i2c_smbus mei_me rfkill intel_gtt intel_rapl_comm> Dec 15 18:18:50 patrickxps kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: P IOE 5.9.14-zen1-1-zen #1 Dec 15 18:18:50 patrickxps kernel: Hardware name: Dell Inc. XPS 13 9350/0PWNCR, BIOS 1.13.0 02/10/2020 Dec 15 18:18:50 patrickxps kernel: RIP: 0010:dev_watchdog+0x26b/0x280 Dec 15 18:18:50 patrickxps kernel: Code: fa 1e 64 ff eb 87 4c 89 f7 c6 05 de c3 f7 00 01 e8 8a 02 fa ff 44 89 e9 4c 89 f6 48 c7 c7 b8 c2 df 86 48 89 c2 e8 6e 31 18 00 <0f> 0b e9 65 ff > Dec 15 18:18:50 patrickxps kernel: RSP: 0018:ffffa234c019ceb0 EFLAGS: 00010282 Dec 15 18:18:50 patrickxps kernel: RAX: 0000000000000000 RBX: ffff89334d358400 RCX: 0000000000000000 Dec 15 18:18:50 patrickxps kernel: RDX: 0000000000000103 RSI: 0000000000000027 RDI: 00000000ffffffff Dec 15 18:18:50 patrickxps kernel: RBP: ffff8933422da3dc R08: 0000000000000452 R09: 0000000000000004 Dec 15 18:18:50 patrickxps kernel: R10: 0000000000000001 R11: 0000000000007434 R12: ffff8933422da480 Dec 15 18:18:50 patrickxps kernel: R13: 0000000000000000 R14: ffff8933422da000 R15: ffff89334d358480 Dec 15 18:18:50 patrickxps kernel: FS: 0000000000000000(0000) GS:ffff893376d80000(0000) knlGS:0000000000000000 Dec 15 18:18:50 patrickxps kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 15 18:18:50 patrickxps kernel: CR2: 00007f7d5ef1b4c0 CR3: 0000000211fcc004 CR4: 00000000003706e0 Dec 15 18:18:50 patrickxps kernel: Call Trace: Dec 15 18:18:50 patrickxps kernel: <IRQ> Dec 15 18:18:50 patrickxps kernel: ? qdisc_put_unlocked+0x30/0x30 Dec 15 18:18:50 patrickxps kernel: ? qdisc_put_unlocked+0x30/0x30 Dec 15 18:18:50 patrickxps kernel: call_timer_fn+0x2d/0x150 Dec 15 18:18:50 patrickxps kernel: run_timer_softirq+0x8e7/0xb50 Dec 15 18:18:50 patrickxps kernel: __do_softirq+0xff/0x340 Dec 15 18:18:50 patrickxps kernel: asm_call_irq_on_stack+0x12/0x20 Dec 15 18:18:50 patrickxps kernel: </IRQ> Dec 15 18:18:50 patrickxps kernel: do_softirq_own_stack+0x5d/0x80 Dec 15 18:18:50 patrickxps kernel: irq_exit_rcu+0xd2/0x120 Dec 15 18:18:50 patrickxps kernel: sysvec_apic_timer_interrupt+0x47/0xe0 Dec 15 18:18:50 patrickxps kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Dec 15 18:18:50 patrickxps kernel: RIP: 0010:cpuidle_enter_state+0xdf/0x7f0 Dec 15 18:18:50 patrickxps kernel: Code: e8 16 87 7f ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 8c 05 00 00 31 ff e8 48 e3 86 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 > Dec 15 18:18:50 patrickxps kernel: RSP: 0018:ffffa234c00dfea0 EFLAGS: 00000246 Dec 15 18:18:50 patrickxps kernel: RAX: ffff893376d80000 RBX: ffff893376db6800 RCX: 000000000000001f Dec 15 18:18:50 patrickxps kernel: RDX: 0000000000000000 RSI: ffffffff86d4f968 RDI: ffffffff86d5a1f9 Dec 15 18:18:50 patrickxps kernel: RBP: ffffffff872cef60 R08: 000008dd96ed4562 R09: 0000000000000008 Dec 15 18:18:50 patrickxps kernel: R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000002 Dec 15 18:18:50 patrickxps kernel: R13: ffffffff872cf048 R14: 0000000000000002 R15: 000008dd96ed4562 Dec 15 18:18:50 patrickxps kernel: cpuidle_enter+0x29/0x40 Dec 15 18:18:50 patrickxps kernel: do_idle+0x1ed/0x280 Dec 15 18:18:50 patrickxps kernel: cpu_startup_entry+0x19/0x20 Dec 15 18:18:50 patrickxps kernel: secondary_startup_64+0xb6/0xc0 Dec 15 18:18:50 patrickxps kernel: ---[ end trace 32ac432b0caddcb1 ]--- Dec 15 18:18:50 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout Dec 15 18:18:56 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout Dec 15 18:19:02 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout Dec 15 18:19:07 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout Dec 15 18:19:12 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout Dec 15 18:19:19 patrickxps kernel: r8152 2-2:1.0 enp0s20f0u2: Tx timeout FWIW, I used to have another way more frequent crash with r8152 and another model of adapter before, see https://bugzilla.kernel.org/show_bug.cgi?id=200977#c19 Patrick Decat whats the checksum of rtl8153a-3.fw? sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw (In reply to Yorick de Wid from comment #60) > Patrick Decat whats the checksum of rtl8153a-3.fw? > > sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw Here you go: sha1sum /usr/lib/firmware/rtl_nic/rtl8153a-3.fw e467098b1cbb04022805cd777eb66585022524a6 /usr/lib/firmware/rtl_nic/rtl8153a-3.fw Hi all, glad to see I'm not the only one. I also have the Lenovo USB-C Dock gen 2, and am connecting to it from a Surface Pro 7 (running Arch with the surface-linux kernel 5.10.16), and am similarly loading the "rtl8153b-2 v1 10/23/19" firmware. An interesting thing that I learned recently is that my ethernet works perfectly if I boot without an external screen plugged into the dock. As soon as I plug one in, the "Tx status -71" messages return and the ethernet dies. I don't know if that helps at all, maybe someone else has found this too? I can also confirm the ethernet and external display work fine together running Windows. Just as an FYI -- I upgraded to the latest realtek firmware (in Debian's sid/unstable distribution, version 20210208-1 <https://packages.debian.org/sid/firmware-realtek>) -- this fixed the problem for me. sha1sum is cce086e885091c348bf521924f306f240f8dcc08 , in /usr/lib/firmware/rtl_nic/rtl8153a-4.fw I spoke too soon, the problem persists! (In reply to Danny O'Brien from comment #63) > Just as an FYI -- I upgraded to the latest realtek firmware (in Debian's > sid/unstable distribution, version 20210208-1 > <https://packages.debian.org/sid/firmware-realtek>) -- this fixed the > problem for me. sha1sum is cce086e885091c348bf521924f306f240f8dcc08 , in > /usr/lib/firmware/rtl_nic/rtl8153a-4.fw The firmware hasn't been updated for over a year, see official kernel repo. Because the lockup is likely caused by a data race in the firmware, its to be expected that higher interrupt count (additional peripherals) will trigger the issue sooner. Just to give a little update, Realtek is currently testing against hardware with known issues. I am suffering from this problem too, using a tp-link UE330, which has the Realtek 8153, is a USB A 3.0 device with a hub and gigabit ethernet. I have had no issues with the hub (it keeps even working after the ethernet stops working), but the gigabit ethernet just stops working completely after some time. Sometimes it can go for a couple days, one time it stoped working after 10 minutes. I am running the linux kernel 5.4.103 and modinfo says it has the v1.10.11 of the r8152 driver. I am going to update to the v2.14.0 that is in the Realtek webpage (https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-usb-3-0-software), but if like some of you say here the issue is a firmware issue I am not that hopeful. How can I check which firmware my device is loading? From reading this bug page I have seen there are different versions of the firmware at /usr/lib/firmware/rtl_nic . In my case for rtl8153x-x.fw there is a-2, a-3, a-4 and b-2. @Yorick de Wid, good to know Realtek is aware of the problem. What can we realistically expect from them? The lastest drivers, which I am guessing also includes de firmware, is from 19/10/2020 and this chipset has been out for years. You'd assume they have had time to iron everything out. dmesg should log the requested firmware version by the driver. Hayes pushed a few patches upstream last month. These changes include power flow regulation of the driver and URB speeds. Those are driver level changes and preliminary tests show they are working on recent kernels. I've been running iperf for a few days and nothing has broken down nor did I see any timing issues. I can't speak to all problems here but a combination of hardware and this chipset may be resolved by these patches. I'd expect Ubuntu will backport these drivers to LRS as well. Hi Yorick de Wid, Can you refer me to those 'few patches' from last month? I just want to verify if the kernel I run has them, 5.11.7 here, and perhaps give this 'hardware' its last chance this time. I have been trying to make it work for too long now, without any success. Thanks. https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=7a0ae61acde2cebd69665837170405eced86a6c7 https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=80fd850b31f09263ad175b2f640d5c5c6f76ed41 Build the master netdev subsystem, that should work. Thanks, both patches are not yet in the latest stable kernel: 5.11.7, so I may backport them and give it a try. Thanks for sharing! @Yorick de Wid From my ignorance on how the kernel and drivers work, I can see those two commits are for the r8152 driver only, it does not touch any other part of the code. Could we just add those two commits to the r8152 driver and compile it as a module instead of recompiling the whole netdev subsystem? (If so, it also would be nice if Realtek publishes it even if it is as beta driver in their webpage). I have updated the r8152 driver with the v2.14.0 version from the Realtek website. The ethernet is still randomly stoping to work. I will try to see if I can apply the patches to the v2.14.0 driver. If it compiles without errors I will test running with the patched drivers. Is there a better way of doing this? Maybe there are some more changes in the kernel branch of r8152 and I should get the code of the driver from there? Any advice and/or instructions would be welcomed. I really do not understand how Realtek releases a product like this. I have this gigabit ethernet device connected to a router with only fast ethernet (100Mb/s), so it is going at less than 10% of its speed and it is still hanging daily. This is not some race condition that happens when some specific or weird heavy load goes through the device, this is happening with normal very low load traffic. How was this not caught up during development or testing? Did they even try the device they are selling? Now I understand why everywhere you read people badmouth Realtek products and recommend getting Intel NIC's. Rarely ever does anyone compile the kernel from a subsystem. Just compile the r8152 module like any other KBuild module. There are many variants of the RTL815x chipset, some made by Realtek, some not. The chipset family is employed in a large variety of products, and minor PCB design flaws like trace length can cause timing issues which are notoriously hard to diagnose. Even though this *does* feel like a firmware/driver issue, it's impossible to account for all the ways this microcontroller is being used. @Yorick de Wid I managed to compile and install the driver from the Realtek webpage, but I have no idea how to download only the r8152 part out of the kernel. Could you give some link or instructions so the ones who want to give it a try like me can? Thanks. I have finally managed to compile the kernel of my system adding the r8152 upstream patches that were missing. The device, tp-link UE330, is still hanging. I have compiled 5.11.8 . This already included this recommended patch https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=7a0ae61acde2cebd69665837170405eced86a6c7 , and all the patches made to r8152 in 2020 as far as I can see. The only r8152 patches that are missing are the two from 2021, again as far as I can see, this two: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a08c0d309d8c078d22717d815cf9853f6f2c07bd, https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=80fd850b31f09263ad175b2f640d5c5c6f76ed41 (the last one is the other recommended at #69). So with 5.11.8 compiled with the two r8152 patches from 2021, the device is still hanging. No progress. I have a load of USB3 Ethernet dongles using this driver. Reported by lsusb as follows: Bus 002 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter I experience this problem (connection drops and dmesg shows repeated TX Status -71 message) on various systems, Dell laptop Ubuntu 18.04 and 20.04, Gentoo and raspberry pi. On all systems the problem either only happens when plugged into USB3 or is far worse when plugged into USB3, obviously USB2 does now allow the full bandwidth to be achieved so is not a good solution. running iperf will generally cause the problem within a very short time frame. I am using 3 of them with a raspberry pi 4 and have been having this problem and started to investigate. I noticed it was failing to load the firmware file (rtl8153a-4.fw) and I guess falling back on a default internal firmware. The fw file turns out to be missing in the Raspbian linux-firmware package. Having manually copied it from Ubuntu 20.04 it now loads successfully for 2 of the connected adapter but fails on the 3rd: pi@router:~ $ lsusb | grep 8153 Bus 002 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter Bus 001 Device 005: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter Bus 001 Device 007: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter pi@router:~ $ dmesg | grep r8152 [ 2.104683] usbcore: registered new interface driver r8152 [ 3.683627] r8152 2-1.3:1.0: Direct firmware load for rtl_nic/rtl8153a-4.fw failed with error -2 [ 3.683675] r8152 2-1.3:1.0: unable to load firmware patch rtl_nic/rtl8153a-4.fw (-2) [ 3.724435] r8152 2-1.3:1.0 eth0: v1.11.11 [ 4.574454] r8152 1-1.4:1.0: load rtl8153a-4 v2 02/07/20 successfully [ 4.614895] r8152 1-1.4:1.0 eth1: v1.11.11 [ 4.903594] r8152 1-1.3.4:1.0: load rtl8153a-4 v2 02/07/20 successfully [ 4.945192] r8152 1-1.3.4:1.0 eth2: v1.11.11 [ 8.767705] r8152 1-1.3.4:1.0 wan: renamed from eth2 [ 8.806489] r8152 1-1.4:1.0 voip: renamed from eth1 [ 8.865195] r8152 2-1.3:1.0 house: renamed from eth0 [ 16.461782] r8152 2-1.3:1.0 house: carrier on [ 16.545744] r8152 2-1.3:1.0 house: Promiscuous mode enabled [ 17.079827] r8152 1-1.3.4:1.0 wan: carrier on [ 28.923646] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled [ 4664.551927] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled [ 4664.583404] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled [ 4664.660467] r8152 1-1.3.4:1.0 wan: Promiscuous mode enabled Any ideas why it fails to load for one of the three above? Seems like fw load fails when plugged into a USB3 port, but succeeds when plugged into a USB2 one! Since I'm also suffering from these issues (Dell WD15 + Thinkpad T14 AMD), I'm wondering if someone managed to unfreeze the machine in the situation described by
Michiel Janssens
> Similar issues here with a Dell dock WD15, connected from Dell XPS 13 9360.
> Since several months the wired connection from the dock 0bda:8153 dies, but
> the > network stack isn't notified. A reboot after this waits endlessly on
> services to > stop. Sometimes Gnome gui locks up shortly after logging back
> in the system and > being presented with the issue. I have to do REISUB to
> get the system working again.
I mean, once the connection is lost, the system will start hanging on all operations that are somehow related to the networking, so that even sudo won't work (su is possible, though). Restarting/killing NetworkManager or dhclient doesn't succeed either. So essentially you have your graphical session (Gnome 3 in my case) where the mouse cursor still moves but everything else doesn't react. Trying to "save" the system from a virtual console ultimately fails b/c everything network-related hangs. At the end of the day REISUB is the only thing you can do - meaning that you will loose all your unsaved work.
Perhaps someone has found a trick how one could avoid REISUB in such a situation? It's really annoying since you cannot reliably predict when such a freeze would happen.
So now that I have been using the kernel with the lastest patches for a month, I want to report some experiences that might be useful for users and even maybe for the developers to finally fix this issue. Again, my device is the TP-Link U330. Right now, the device seems to only hang on reboot. Everytime I reboot the device hangs several times in a very short time, up to 11 times once in 5 minutes, until it does not hang anymore and then it can work without hanging for weeks (I think 3 weeks is as far as I tested it until I had to reboot again). For users, if you install ifupdown2 you can recover the device without rebooting or disconnecting or even disrupting the network. Once you have ifupdown2 installed in your linux system and the device is down (you can check with the command 'ip addr'), use these two commands to recover he device: - ifup enxxxxx [enxxxxx is the name of the device in your system] - ifreload -a [-a is probably not needed but it is very quick and does not disrupt the network so I did not looked further] This will get your device working up again. I created a script that does this and tried to execute it everytime the device goes down. For some reason when the device goes down because of the bug the system does not run the script. The script gets triggered when I put the device down manually or when rebooting, but for some unknown reason it does not get triggered when it goes down because of the bug. So I have resorted to run the script every 30 seconds, check if the device is up or down and if it is down, put it up again with those two commands. It is not pretty, but it makes my system workable at least, until a proper solution is released by Realtek. If anyone is interested in the script I can share it. I forgot one thing, the above procedure to recover the device has only been tested with the two 2021 kernel patches. I have not tested it without them so I do not know if it would work or not without the patches. Could you please specify which patches do you mean? I'm currently on 5.11.16-100.fc32.x86_64, so I guess that they are not included yet. Another question, what does ifupdown2 do that "simpler" tools cannot do when it comes to recovering a frozen machine? Once I've even tried rmmod on the r8152 module but even that command froze. Hi Vladyslav Shtabovenko, have you tried running usbcore.autosuspend=-1? I still have this in my boot parameters, running kernel 5.11.15. Since adding this parameter I haven't had any lockups or network freezes. @Vladyslav Shtabovenko I specified the patches I included to compile the kernel in comment #74. I needed to install ifupdown2 for other reasons in my system so I have only tried with ifupdown2. But ifupdown2 claims to be able to put up, down and reload the interfaces with less disruption to the network system than previous software, so I suspect it is needed, but can not be sure. Many thanks. In my case the total freezes occur not so often (perhaps once in 3-5 weeks), so I didn't try any radical measures yet. The point is that T14 AMD already has many issues with the power management and I'm sort of reluctant to worsen the battery life even further by disabling usb autosuspend. I'm currently testing TLP with the option USB_BLACKLIST="0bda:8153 0bda:4014" in /etc/tlp.conf. Not sure if it helps or not, but I didn't have any "fatal" hangers so far. Perhaps it even solves the issue for me. Should there be another freeze as described in my first posting, I will comment on that here. The same here: dmesg Jun 21 17:15:34 raspiwall kernel: [25199.657912] r8152 2-1:1.0 wan0: skb_to_sgvec fail -90 Jun 21 17:15:43 raspiwall kernel: [25208.468109] ------------[ cut here ]------------ Jun 21 17:15:43 raspiwall kernel: [25208.468150] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:468 dev_watchdog+0x308/0x30c Jun 21 17:15:43 raspiwall kernel: [25208.468168] NETDEV WATCHDOG: wan0 (r8152): transmit queue 0 timed out Jun 21 17:15:43 raspiwall kernel: [25208.468182] Modules linked in: bnep hci_uart btbcm bluetooth ecdh_generic ecc cdc_ether r8152(O) nft_chain_nat xt_MASQUERADE xt_nat nf_nat nf_log_ipv4 nf_log_common nft_limit nft_counter ipt_REJECT nf_reject_ipv4 xt_multiport xt_tcpudp xt_LOG xt_limit xt_recent xt_addrtype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink hid_logitech_hidpp joydev sr_mod cdrom sg vc4 hid_logitech_dj cec brcmfmac v3d gpu_sched brcmutil drm_kms_helper sha256_generic drm cfg80211 drm_panel_orientation_quirks rfkill snd_soc_core raspberrypi_hwmon bcm2835_codec(C) v4l2_mem2mem snd_compress bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) snd_bcm2835(C) videobuf2_dma_contig snd_pcm_dmaengine videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_pcm vc_sm_cma(C) snd_timer snd syscopyarea sysfillrect rpivid_mem sysimgblt fb_sys_fops backlight uio_pdrv_genirq nvmem_rmem uio ip_tables x_tables ipv6 Jun 21 17:15:43 raspiwall kernel: [25208.469358] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O 5.10.42-v7l-lutein79+ #1 Jun 21 17:15:43 raspiwall kernel: [25208.469368] Hardware name: BCM2711 Jun 21 17:15:43 raspiwall kernel: [25208.469378] Backtrace: Jun 21 17:15:43 raspiwall kernel: [25208.469413] [<c0b6fae8>] (dump_backtrace) from [<c0b6fe78>] (show_stack+0x20/0x24) Jun 21 17:15:43 raspiwall kernel: [25208.469429] r7:ffffffff r6:00000000 r5:60000113 r4:c12e6b3c Jun 21 17:15:43 raspiwall kernel: [25208.469448] [<c0b6fe58>] (show_stack) from [<c0b74210>] (dump_stack+0xcc/0xf8) Jun 21 17:15:43 raspiwall kernel: [25208.469469] [<c0b74144>] (dump_stack) from [<c0220bcc>] (__warn+0xfc/0x114) Jun 21 17:15:43 raspiwall kernel: [25208.469485] r10:c133dfb8 r9:00000009 r8:c0a52894 r7:000001d4 r6:00000009 r5:c0a52894 Jun 21 17:15:43 raspiwall kernel: [25208.469496] r4:c0eab8ac r3:c1205094 Jun 21 17:15:43 raspiwall kernel: [25208.469513] [<c0220ad0>] (__warn) from [<c0b7061c>] (warn_slowpath_fmt+0xa4/0xd8) Jun 21 17:15:43 raspiwall kernel: [25208.469526] r7:000001d4 r6:c0eab8ac r5:c1205048 r4:c0eab870 Jun 21 17:15:43 raspiwall kernel: [25208.469545] [<c0b7057c>] (warn_slowpath_fmt) from [<c0a52894>] (dev_watchdog+0x308/0x30c) Jun 21 17:15:43 raspiwall kernel: [25208.469560] r9:eff0b540 r8:c371b000 r7:c1203d00 r6:c354c200 r5:c371b2a8 r4:00000000 Jun 21 17:15:43 raspiwall kernel: [25208.469580] [<c0a5258c>] (dev_watchdog) from [<c02ab178>] (call_timer_fn+0x40/0x1bc) Jun 21 17:15:43 raspiwall kernel: [25208.469594] r8:c1201d9c r7:002601b8 r6:c0a5258c r5:00000100 r4:c371b2a8 Jun 21 17:15:43 raspiwall kernel: [25208.469612] [<c02ab138>] (call_timer_fn) from [<c02ac798>] (run_timer_softirq+0x5b0/0x698) Jun 21 17:15:43 raspiwall kernel: [25208.469625] r8:c1201d9c r7:00000000 r6:c371b2a8 r5:002601b8 r4:00000000 Jun 21 17:15:43 raspiwall kernel: [25208.469643] [<c02ac1e8>] (run_timer_softirq) from [<c0201508>] (__do_softirq+0x198/0x49c) Jun 21 17:15:43 raspiwall kernel: [25208.469658] r10:00000082 r9:ffffe000 r8:c1810800 r7:00000100 r6:00000001 r5:00000002 Jun 21 17:15:43 raspiwall kernel: [25208.469667] r4:c1203084 Jun 21 17:15:43 raspiwall kernel: [25208.469685] [<c0201370>] (__do_softirq) from [<c0227494>] (irq_exit+0xd0/0xf8) Jun 21 17:15:43 raspiwall kernel: [25208.469699] r10:c0e21a80 r9:c1200000 r8:c1810800 r7:00000001 r6:00000000 r5:00000000 Jun 21 17:15:43 raspiwall kernel: [25208.469709] r4:ffffe000 Jun 21 17:15:43 raspiwall kernel: [25208.469729] [<c02273c4>] (irq_exit) from [<c0287990>] (__handle_domain_irq+0x70/0xc4) Jun 21 17:15:43 raspiwall kernel: [25208.469739] r5:00000000 r4:c1094d50 Jun 21 17:15:43 raspiwall kernel: [25208.469756] [<c0287920>] (__handle_domain_irq) from [<c020135c>] (gic_handle_irq+0x90/0xa4) Jun 21 17:15:43 raspiwall kernel: [25208.469770] r9:c1200000 r8:c1094d5c r7:c1201ec8 r6:f081400c r5:f0814000 r4:c1205b7c Jun 21 17:15:43 raspiwall kernel: [25208.469785] [<c02012cc>] (gic_handle_irq) from [<c0200abc>] (__irq_svc+0x5c/0x7c) Jun 21 17:15:43 raspiwall kernel: [25208.469796] Exception stack(0xc1201ec8 to 0xc1201f10) Jun 21 17:15:43 raspiwall kernel: [25208.469810] 1ec0: 00000000 054cad98 eff13304 c021ac20 ffffe000 c120509c Jun 21 17:15:43 raspiwall kernel: [25208.469824] 1ee0: c12050e4 00000001 00000001 c133d12f c0e21a80 c1201f24 c1201f28 c1201f18 Jun 21 17:15:43 raspiwall kernel: [25208.469835] 1f00: c02088c0 c02088c4 60000013 ffffffff Jun 21 17:15:43 raspiwall kernel: [25208.469849] r9:c1200000 r8:00000001 r7:c1201efc r6:ffffffff r5:60000013 r4:c02088c4 Jun 21 17:15:43 raspiwall kernel: [25208.469871] [<c020887c>] (arch_cpu_idle) from [<c0b7fa38>] (default_idle_call+0x4c/0x118) Jun 21 17:15:43 raspiwall kernel: [25208.469890] [<c0b7f9ec>] (default_idle_call) from [<c02587ac>] (do_idle+0x118/0x168) Jun 21 17:15:43 raspiwall kernel: [25208.469907] [<c0258694>] (do_idle) from [<c0258ad0>] (cpu_startup_entry+0x28/0x30) Jun 21 17:15:43 raspiwall kernel: [25208.469921] r10:00000197 r9:c1053a60 r8:ffffffff r7:c1053a60 r6:c1205040 r5:c1205048 Jun 21 17:15:43 raspiwall kernel: [25208.469932] r4:000000d9 r3:c108a294 Jun 21 17:15:43 raspiwall kernel: [25208.469947] [<c0258aa8>] (cpu_startup_entry) from [<c0b78a10>] (rest_init+0xbc/0xc4) Jun 21 17:15:43 raspiwall kernel: [25208.469968] [<c0b78954>] (rest_init) from [<c1000ab4>] (arch_call_rest_init+0x18/0x1c) Jun 21 17:15:43 raspiwall kernel: [25208.469978] r5:c1205048 r4:c1356068 Jun 21 17:15:43 raspiwall kernel: [25208.469997] [<c1000a9c>] (arch_call_rest_init) from [<c1001098>] (start_kernel+0x568/0x59c) Jun 21 17:15:43 raspiwall kernel: [25208.470014] [<c1000b30>] (start_kernel) from [<00000000>] (0x0) Jun 21 17:15:43 raspiwall kernel: [25208.470026] ---[ end trace bc6a810ce98742d4 ]--- Jun 21 17:15:43 raspiwall kernel: [25208.470048] r8152 2-1:1.0 wan0: Tx timeout Jun 21 17:15:43 raspiwall kernel: [25208.978475] r8152 2-1:1.0 wan0: get_registers -110 Jun 21 17:15:44 raspiwall kernel: [25209.488528] r8152 2-1:1.0 wan0: set_registers -110 Jun 21 17:15:44 raspiwall kernel: [25209.998469] r8152 2-1:1.0 wan0: get_registers -110 Jun 21 17:15:44 raspiwall kernel: [25209.998750] r8152 2-1:1.0 wan0: get_registers -71 Jun 21 17:15:44 raspiwall kernel: [25209.999059] r8152 2-1:1.0 wan0: set_registers -71 Jun 21 17:15:44 raspiwall kernel: [25209.999549] r8152 2-1:1.0 wan0: Tx status -2 Jun 21 17:15:44 raspiwall kernel: [25209.999835] r8152 2-1:1.0 wan0: Tx status -2 Jun 21 17:15:44 raspiwall kernel: [25210.000218] r8152 2-1:1.0 wan0: Tx status -2 Jun 21 17:15:44 raspiwall kernel: [25210.000528] r8152 2-1:1.0 wan0: Tx status -2 Jun 21 17:15:44 raspiwall kernel: [25210.000832] r8152 2-1:1.0 wan0: get_registers -71 Solved with command ethtool -k wan0 tx off I-TEC USB Adapter Linux raspiwall.debianium.com 5.10.42-v7l-lutein79+ #1 SMP Sun Jun 13 16:17:03 CEST 2021 armv7l GNU/Linux tomas@raspiwall:~ $ lsusb -t /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M |__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 3: Dev 3, If 0, Class=Mass Storage, Driver=uas, 480M |__ Port 4: Dev 4, If 1, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 4: Dev 4, If 2, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 4: Dev 4, If 0, Class=Human Interface Device, Driver=usbhid, 12M modinfo r8152 filename: /lib/modules/5.10.42-v7l-lutein79+/kernel/drivers/net/usb/r8152.ko version: v2.15.0 (2021/04/15) license: GPL description: Realtek RTL8152/RTL8153 Based USB Ethernet Adapters author: Realtek nic sw <nic_swsd@realtek.com> srcversion: 643C9AE76696D1629871EA2 Hi! There is a new version at https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-usb-3-0-software . It didn't worked in my usb 3.0 ports and kernel 4.19.128. Thanks. The reason for failure is explained here https://www.spinics.net/lists/linux-usb/msg173690.html Here is the first message of the thread https://www.spinics.net/lists/linux-usb/msg173675.html thanks. New patch sent... https://www.spinics.net/lists/linux-usb/msg217590.html FWIW, I see the same symptoms on an r8152 10/100 adapter which doesn't seem to require any firmware: [ +0.152909] usb 1-2: New USB device found, idVendor=0bda, idProduct=8152, bcdDevice=20.00 [ +0.008284] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.007236] usb 1-2: Product: USB 10/100 LAN [ +0.004998] usb 1-2: Manufacturer: Realtek [ +0.004195] usb 1-2: SerialNumber: 00116B686258 [ +0.181834] r8152 1-2:1.0: skip request firmware /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 480M ID 1d6b:0002 Linux Foundation 2.0 root hub |__ Port 2: Dev 11, If 0, Class=Vendor Specific Class, Driver=r8152, 480M ID 0bda:8152 Realtek Semiconductor Corp. RTL8152 Fast Ethernet Adapter filename: /lib/modules/5.10.0-9-amd64/kernel/drivers/net/usb/r8152.ko version: v1.11.11 vermagic: 5.10.0-9-amd64 SMP mod_unload modversions I can see various repeating "r8152 1-2:1.0 enx00116b686258: Rx status -71" messages in dmesg, they occur roughly once every odd minutes but they seem benign, in that it doesn't seem to affect connectivity (it may have been affecting throughput, but not enough to disconnect video calls I've been making during the same time). However, today I also saw this happen (I'll attach the full backtrace): [ +0.004720] NETDEV WATCHDOG: enx00116b686258 (r8152): transmit queue 0 timed out [ +0.007515] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x24d/0x260 ...which resulted in numerous: [ +0.004711] r8152 1-2:1.0 enx00116b686258: Tx timeout [ +2.095629] r8152 1-2:1.0 enx00116b686258: Tx status -2 ...and eventually (duplicate messages removed): [ +0.239953] usb 1-2: reset high-speed USB device number 3 using xhci_hcd [ +1.919956] usb 1-2: device descriptor read/64, error -110 [ +2.815976] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ +0.215896] usb 1-2: device not accepting address 3, error -62 [ +0.006011] r8152 1-2:1.0 enx00116b686258: Get ether addr fail ...after which I needed to unplug the device to get it working again. A port power cycle from software might have been enough, but it's hard to lookup the correct incantation for that when your outside interface is failing. Created attachment 299567 [details]
WARNING: at net/sched/sch_generic.c:467 dev_watchdog+0x24d/0x260
Using 5.14.15 (including https://www.spinics.net/lists/linux-usb/msg217590.html) I still got status -71. Afterwards, the network connection is down shortly, but recovers quickly. Adding the following quirk resolved the issue for my Lenovo Powered USB-C Travel Hub: "usbcore.quirks=17ef:721e:k" connnected to a ThinkBook 14s. I submitted a related patch upstream: https://lore.kernel.org/netdev/20211127090546.52072-1-olebowle@gmx.com/ Hi, I have a TP-LINK U330 (usb 3 hub with ethernet RJ45) with the RTL8153 chipset attached to a generic laptop. I am seeing this behavior. Status -71 messages (possibly others too) and then disconnects and maybe freezes Specifying options usbcore quirks=0bda:8153:k,0bda:5411:k,0bda:0411:k options usbcore autosuspend=-1 helps considerably. Not sure if it fixes the problem completely. No crash and the status -71 were far less. Possible one or two lspci: 00:15.0 USB controller: Intel Corporation Celeron/Pentium Silver Processor USB 3.0 xHCI Controller (rev 03) Attaching the usb dongle to my desktop works perfectly for several hours without a problem so I believe this is not a problem of the dongle or the driver. The laptop has a realtek wireless/bluetooth card for whicj I also had to specify options rtw88_pci disable_aspm=1 for it to work without may problems. I tend to believe the issue is with the usb controller or the UEFI/BIOS/ACPI tables. Unfortunately I cannot locate an update for that specific instance of this AMI BIOS. How can I debug it further? Is it possible to fix it by modifying the ACPI tables? Any pointers? At this point I just recommend disabling USB auto suspend/USB LPM, ASPM etc. Works fine on Debian Bullseye with custom compiled kernel 5.10.90 (ASPM disabled in bios, and usb autosuspend on -1), but kernel parameter should work as well. And in case of my Intel NUC system, I cannot do a reboot because USB/Realtek combo goes haywire. Always a cold boot (Full shutdown) required. Thanks @Emtee for comment #93 I was thinking about ACPI tables and I overlooked the basics, the BIOS setup. Not my brightest moment. Anyway, It turns out that the AMI BIOS (UEFI) in my laptop has a setting that reads something like this: Enable/Disable blah-blah ASPM If Enabled Vista will handle the ASPM for the device(s?). If Disable BIOS itself will handle the ASPM. So I disabled it and so far looks working. I will test for some days and I will let you know. Right now I have only this setting in BIOS. I have commented all module parameter workarounds. #options usbcore quirks=0bda:8153:k,0bda:5411:k,0bda:0411:k #options usbcore autosuspend=-1 I have debian unstable kernel 5.15.03 and I believe the quirk 0bda:8153:k is already compiled in. If this works I will also disable the rtw88 aspm quirk. Speak too soon. After some days (without heavy testing) it started acting with status -71 during a TC meeting. This time (or now I noticed) after a status -71 it had a usb disconnect event and the usb reconnect, firmware reload etc. Real shame to cannot pinpoint this. I wonder if this bug happens to other BIOSes or only to AMI ones? I just want to stress that without setting the ASPM disable BIOS options the failures are almost instant. If anybody has an idea on how to debug this further, what logs to enable please feel free to speak. Thanks For what it's worth, I was having regular hangs with those USB controllers in the past, for many different Linux kernel releases. The network interface would just freeze and hang: traffic wouldn't go through, and even rebooting the box wouldn't work, as userspace would hang during the shutdown sequence. Since the Debian 11 (bullseye) upgrade, things improved slightly as I started seeing the error message described in this ticket (Tx status -71). But it would be similarly broken, and would occur after a variable number of hours idling. I upgraded to the bookworm kernel recently (5.16) and this problem completely disappeared, so I consider the patch in https://github.com/torvalds/linux/commit/baf33d7a7564 fixed this problem. So thanks everyone, this is really great to see this finally fixed. For what it's worth I'm still seeing this on 5.18.0-0.rc7.220519.f993aed406ea.56.vanilla.1.fc36.x86_64 Please correct me if I misunderstand something but from following two links it's looks like USB controller issues rather than network adapter driver issue: https://armbian.atlassian.net/browse/AR-1172 https://github.com/armbian/build/pull/3763/commits/d52b67ffeeebc89f49159accb953f1ecf9352e74 I'm seeing this issue (though I'm getting a different Tx status) on a Raspberry Pi 4B: some dmesg output: [Oct27 13:13] r8152 2-2:1.0 eth1: Tx timeout [ +0.003510] r8152 2-2:1.0 eth1: Tx status -2 [ +0.000297] r8152 2-2:1.0 eth1: Tx status -2 [ +2.180790] usb 2-2: reset SuperSpeed USB device number 4 using xhci_hcd $ lsusb Bus 002 Device 004: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter $ uname -a Linux raspberrypi 5.15.61-v8+ #1579 SMP PREEMPT Fri Aug 26 11:16:44 BST 2022 aarch64 GNU/Linux However, this problem is quite weird since I have two dongles (both are exactly the same): On my desktop computer, I'm yet to experience any issues with the dongle. I can push it to the limit for hours (using iperf) and nothing bad happens. No resets or any weird stuff happens. This computer runs Linux 6.0.x, though. OTOH, the dongle connected to the Raspberry can't go past 100Mbit/s without _resetting_ itself after some minutes (sometimes even seconds). I tried adding the quirk option to the boot parameters to no avail. This makes me think, maybe there's an architectural difference in the driver? Perhaps it's the USB controller screwing things up? I'm 99.99% sure it's not the dongle being faulty/buggy, because I can swap the dongles and I don't see this issue on my desk computer. I'm having this issue as well despite being on latest kernel. Unable to recognize any patterns in random hangs up throwing "Tx status -71". It can stream YT 4k60 video smoothly but then randomly crashes when trying to ssh. Literally unusable, thank god amazon for offering refunds. Adapter: TP-Link UE330 Kernel 6.2.9-300.fc38.x86_64 Fedora 38 [ 5.953304] r8152 3-2.4:1.0: load rtl8153a-4 v2 02/07/20 successfully I belive I found a way to handle this problem: https://bbs.archlinux.org/viewtopic.php?pid=2125855#p2125855 r8152-dkms uses an updated Realtek driver https://github.com/wget/realtek-r8152-linux/ The usbcore quirks are absolutely essential! I am not sure if a subset of them is necessary; the ones I chose seem to fix the problem completely: bjkm I have been experiencing the same problem for some time too: Currently with Ubuntu build kernel: 5.15.0-87-generic The USB device seems to reconnect on its own, but it is then in a defunctional state and traffic does not pass. [2667308.625276] ------------[ cut here ]------------ [2667308.625317] NETDEV WATCHDOG: enx1c1adff98bf8 (r8152): transmit queue 0 timed out [2667308.625406] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x277/0x280 [2667308.625431] Modules linked in: cpuid tls bluetooth ecdh_generic ecc vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_conntra ck nft_chain_nat xt_MASQUERADE nft_counter xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc snd_hda_codec_realtek snd_hda _codec_generic edac_mce_amd snd_hda_codec_hdmi ledtrig_audio snd_hda_intel kvm_amd ccp snd_intel_dspcfg snd_intel_sdw_acpi sn d_hda_codec kvm radeon snd_hda_core snd_hwdep crct10dif_pclmul ghash_clmulni_intel snd_pcm aesni_intel joydev input_leds drm_ ttm_helper ttm drm_kms_helper cec rc_core snd_timer snd i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore crypto_simd fam15h_power cryptd ppdev k10temp mac_hid serio_raw parport_pc wmi_bmof sch_fq_codel lp parport nf_nat_pptp nf_co nntrack_pptp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ramoops reed_solomon pstore_blk pstore_zone efi_pstore drm ip_ tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid 0 [2667308.626027] multipath linear raid1 hid_microsoft ff_memless hid_generic cdc_ether pata_acpi usbnet usbhid r8169 psmouse hid r8152 crc32_pclmul ahci mii i2c_piix4 pata_atiixp libahci realtek wmi [2667308.626166] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G OE 5.15.0-84-generic #93-Ubuntu [2667308.626178] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596), BIOS V3.6 10/26/2012 [2667308.626186] RIP: 0010:dev_watchdog+0x277/0x280 [2667308.626200] Code: eb 97 48 8b 5d d0 c6 05 6b e2 67 01 01 48 89 df e8 2e 5f f9 ff 44 89 e1 48 89 de 48 c7 c7 78 ee ad 8d 48 89 c2 e8 91 d6 19 00 <0f> 0b eb 80 e9 db 68 23 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 [2667308.626210] RSP: 0018:ffffac3940234e70 EFLAGS: 00010282 [2667308.626225] RAX: 0000000000000000 RBX: ffff9cbdc0a8c000 RCX: 0000000000000000 [2667308.626234] RDX: ffff9cbed5d2cb40 RSI: ffff9cbed5d20580 RDI: 0000000000000300 [2667308.626242] RBP: ffffac3940234ea8 R08: 0000000000000003 R09: fffffffffffd22d8 [2667308.626250] R10: 0000000074756f20 R11: 0000000074756f20 R12: 0000000000000000 [2667308.626258] R13: ffff9cbdd036ce80 R14: 0000000000000001 R15: ffff9cbdc0a8c4c0 [2667308.626268] FS: 0000000000000000(0000) GS:ffff9cbed5d00000(0000) knlGS:0000000000000000 [2667308.626278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2667308.626287] CR2: 000056130da71686 CR3: 000000010689c000 CR4: 00000000000406e0 [2667308.626296] Call Trace: [2667308.626303] <IRQ> [2667308.626313] ? show_trace_log_lvl+0x1d6/0x2ea [2667308.626328] ? show_trace_log_lvl+0x1d6/0x2ea [2667308.626342] ? call_timer_fn+0x2c/0x120 [2667308.626359] ? show_regs.part.0+0x23/0x29 [2667308.626372] ? show_regs.cold+0x8/0xd [2667308.626384] ? dev_watchdog+0x277/0x280 [2667308.626396] ? __warn+0x8c/0x100 [2667308.626407] ? dev_watchdog+0x277/0x280 [2667308.626420] ? report_bug+0xa4/0xd0 [2667308.626431] ? arch_irq_work_raise+0x3a/0x50 [2667308.626444] ? handle_bug+0x39/0x90 [2667308.626457] ? exc_invalid_op+0x19/0x70 [2667308.626469] ? asm_exc_invalid_op+0x1b/0x20 [2667308.626482] ? dev_watchdog+0x277/0x280 [2667308.626493] ? pfifo_fast_enqueue+0x160/0x160 [2667308.626505] call_timer_fn+0x2c/0x120 [2667308.626518] __run_timers.part.0+0x1e3/0x270 [2667308.626529] ? ktime_get+0x46/0xc0 [2667308.626544] ? native_x2apic_icr_read+0x20/0x20 [2667308.626556] ? lapic_next_event+0x20/0x30 [2667308.626568] ? clockevents_program_event+0xad/0x130 [2667308.626583] run_timer_softirq+0x2a/0x60 [2667308.626595] __do_softirq+0xd9/0x2e7 [2667308.626607] irq_exit_rcu+0x94/0xc0 [2667308.626620] sysvec_apic_timer_interrupt+0x80/0x90 [2667308.626633] </IRQ> [2667308.626639] <TASK> [2667308.626646] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [2667308.626657] RIP: 0010:cpuidle_enter_state+0xd9/0x620 [2667308.626671] Code: 3d 54 7a 18 73 e8 67 77 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 a8 84 67 ff 80 7d d0 00 0f 85 61 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00 [2667308.626680] RSP: 0018:ffffac39400c3e28 EFLAGS: 00000246 [2667308.626694] RAX: ffff9cbed5d31480 RBX: ffff9cbdc09c9000 RCX: 0000000000000000 [2667308.626703] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 [2667308.626711] RBP: ffffac39400c3e78 R08: 000979e72f1d6d1e R09: 0000000000000000 [2667308.626719] R10: 0000000000000001 R11: 071c71c71c71c71c R12: ffffffff8e4e9120 [2667308.626726] R13: 0000000000000002 R14: 0000000000000002 R15: 000979e72f1d6d1e [2667308.626738] ? cpuidle_enter_state+0xc8/0x620 [2667308.626751] ? tick_nohz_stop_tick+0x16a/0x1d0 [2667308.626764] cpuidle_enter+0x2e/0x50 [2667308.626776] cpuidle_idle_call+0x142/0x1e0 [2667308.626789] do_idle+0x83/0xf0 [2667308.626800] cpu_startup_entry+0x20/0x30 [2667308.626811] start_secondary+0x12a/0x180 [2667308.626823] secondary_startup_64_no_verify+0xc2/0xcb [2667308.626839] </TASK> [2667308.626846] ---[ end trace 095eb2337a23b35c ]--- [2667308.626858] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667311.361232] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2 [2667311.386436] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2 [2667311.410686] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2 [2667311.435339] r8152 1-4:1.0 enx1c1adff98bf8: Tx status -2 [2667314.513162] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667319.633077] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667325.520952] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667330.384862] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667335.504701] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667340.624647] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667346.512537] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667351.632450] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667357.520323] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667362.384221] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667367.504126] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667372.628027] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667378.511924] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667383.631810] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667389.519693] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667394.383594] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667399.503496] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667404.623394] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667410.511284] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667415.631198] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667421.519064] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667426.382970] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667431.502869] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667436.622767] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667442.510659] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667447.630559] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667453.518451] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667458.382348] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667463.502260] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667468.622145] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667474.510027] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667479.629925] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667485.517810] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667490.381715] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667495.501611] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667500.621520] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667506.509360] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667511.629308] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667517.517193] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667522.381091] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667527.501007] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667532.624888] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667538.508784] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667543.628643] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667549.520557] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667554.380459] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667559.500359] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667564.620259] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667570.508151] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667575.628063] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667581.515931] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667586.379839] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667591.499741] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667596.619634] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667602.507519] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667607.627439] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667613.515304] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667618.379207] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667623.499107] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667628.619006] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667634.506894] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667639.626799] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667645.514680] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667650.378579] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667655.498482] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667660.618379] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667666.506272] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667671.626169] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667677.514051] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667682.377960] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667687.497859] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667692.617753] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667698.505641] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667703.625502] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667709.513391] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667714.377330] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667719.497233] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667724.617130] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667730.505018] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667735.624917] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667741.512797] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667746.380705] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667751.496568] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667756.616504] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667762.504389] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667767.624291] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667773.512174] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667778.376076] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667783.495978] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667788.615883] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667794.503763] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667799.623675] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667805.511548] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667810.375416] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667815.495353] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667820.615257] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667826.503148] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667831.623054] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667837.510922] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667842.374831] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667847.494732] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667852.614632] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667858.506517] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667861.214565] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci [2667863.622378] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667866.590335] r8152-cfgselector 1-4: device descriptor read/64, error -110 [2667869.510264] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667874.374205] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667879.494099] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667882.206079] r8152-cfgselector 1-4: device descriptor read/64, error -110 [2667882.442065] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci [2667884.613963] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667887.837962] r8152-cfgselector 1-4: device descriptor read/64, error -110 [2667890.501887] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667895.621790] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667901.509634] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667903.461589] r8152-cfgselector 1-4: device descriptor read/64, error -110 [2667903.697656] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci [2667906.373574] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667911.493475] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667914.545390] r8152-cfgselector 1-4: device not accepting address 3, error -110 [2667914.673433] r8152-cfgselector 1-4: reset high-speed USB device number 3 using ehci-pci [2667916.613326] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667922.501263] r8152 1-4:1.0 enx1c1adff98bf8: Tx timeout [2667925.297229] r8152-cfgselector 1-4: device not accepting address 3, error -110 [2667925.297410] r8152 1-4:1.0 enx1c1adff98bf8: Get ether addr fail [2667925.299958] r8152-cfgselector 1-4: USB disconnect, device number 3 [2667925.489232] usb 1-4: new high-speed USB device number 11 using ehci-pci [2667930.845121] usb 1-4: device descriptor read/64, error -110 [2667946.460774] usb 1-4: device descriptor read/64, error -110 [2667946.696818] usb 1-4: new high-speed USB device number 12 using ehci-pci [2667952.092711] usb 1-4: device descriptor read/64, error -110 [2667967.704405] usb 1-4: device descriptor read/64, error -110 [2667967.812426] usb usb1-port4: attempt power cycle [2667968.260383] usb 1-4: new high-speed USB device number 13 using ehci-pci [2667979.052183] usb 1-4: device not accepting address 13, error -110 [2667979.180183] usb 1-4: new high-speed USB device number 14 using ehci-pci [2667989.803973] usb 1-4: device not accepting address 14, error -110 [2667989.804112] usb usb1-port4: unable to enumerate USB device [2667990.187980] usb 4-1: new full-speed USB device number 2 using ohci-pci [2667990.379048] usb 4-1: not running at top speed; connect to a high speed hub [2667990.395043] usb 4-1: New USB device found, idVendor=045e, idProduct=0927, bcdDevice=31.00 [2667990.395055] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6 [2667990.395061] usb 4-1: Product: Ethernet Adapter [2667990.395065] usb 4-1: Manufacturer: Microsoft [2667990.395068] usb 4-1: SerialNumber: 001000905 [2667990.596023] r8152-cfgselector 4-1: reset full-speed USB device number 2 using ohci-pci [2667991.067052] r8152 4-1:1.0: load rtl8153b-2 v1 10/23/19 successfully [2667991.222391] r8152 4-1:1.0 eth2: v1.12.13 [2667991.352538] r8152 4-1:1.0 enx1c1adff98bf8: renamed from eth2 [2667993.712957] IPv6: ADDRCONF(NETDEV_CHANGE): enx1c1adff98bf8: link becomes ready [2667993.720866] r8152 4-1:1.0 enx1c1adff98bf8: carrier on Bonus information: no dock, directly attached to USB on mainboard $ lsusb Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 001 Device 002: ID 045e:0927 Microsoft Corp. RTL8153B GigE [Surface Ethernet Adapter] Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub $ lsusb -t /: Bus 07.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/2p, 12M /: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M /: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M |__ Port 4: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 480M I was also facing the same issue which was seen by @Stian, turning off SG using ethtool helped to solve the issue. Upon checking further, the DWC3 xHC controller (which apparently i was using) has some limitations with its internal TRB Cache size for chained TRBs. And it was fixed using the following patch - https://lore.kernel.org/all/20201208092912.1773650-3-mathias.nyman@linux.intel.com/ Ultimately I had to enable XHCI_SG_TRB_CACHE_SIZE_QUIRK in XHCI :) I recently updated to Ubuntu kernel 6.2.0-36-generic [ 7827.109137] ------------[ cut here ]------------ [ 7827.109157] NETDEV WATCHDOG: enx1c1adff98bf8 (r8152): transmit queue 0 timed out [ 7827.109255] WARNING: CPU: 4 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x21f/0x230 [ 7827.109281] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_conntrack nft_chain_nat xt_MASQUERADE xt_tcpudp nft_compat nf_tables nfnetlink amdgpu binfmt_misc iommu_v2 drm_buddy gpu_sched radeon drm_ttm_helper ttm drm_display_helper cec input_leds joydev edac_mce_amd rc_core kvm_amd ccp snd_hda_codec_realtek drm_kms_helper kvm i2c_algo_bit snd_hda_codec_generic syscopyarea sysfillrect sysimgblt video irqbypass crct10dif_pclmul snd_hda_codec_hdmi ledtrig_audio polyval_clmulni polyval_generic snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi ghash_clmulni_intel snd_hda_codec snd_hda_core sha512_ssse3 snd_hwdep aesni_intel snd_pcm snd_timer snd ppdev soundcore crypto_simd cryptd k10temp fam15h_power mac_hid serio_raw wmi_bmof parport_pc sch_fq_codel lp parport msr nf_nat_pptp nf_conntrack_pptp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq [ 7827.109916] libcrc32c raid0 multipath linear raid1 hid_microsoft ff_memless cdc_ether pata_acpi hid_generic usbnet usbhid r8152 psmouse mii hid crc32_pclmul i2c_piix4 pata_atiixp r8169 ahci libahci realtek wmi [ 7827.110050] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G OE 6.2.0-36-generic #37~22.04.1-Ubuntu [ 7827.110060] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/760GM-E51(MS-7596), BIOS V3.6 10/26/2012 [ 7827.110068] RIP: 0010:dev_watchdog+0x21f/0x230 [ 7827.110080] Code: 00 e9 31 ff ff ff 4c 89 e7 c6 05 d9 5f 78 01 01 e8 e6 ff f7 ff 44 89 f1 4c 89 e6 48 c7 c7 b8 8c c4 85 48 89 c2 e8 81 c3 2b ff <0f> 0b e9 22 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 [ 7827.110089] RSP: 0018:ffffbf178023ce70 EFLAGS: 00010246 [ 7827.110103] RAX: 0000000000000000 RBX: ffffa0ec12ae34c8 RCX: 0000000000000000 [ 7827.110110] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 7827.110117] RBP: ffffbf178023ce98 R08: 0000000000000000 R09: 0000000000000000 [ 7827.110124] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0ec12ae3000 [ 7827.110131] R13: ffffa0ec12ae341c R14: 0000000000000000 R15: 0000000000000000 [ 7827.110138] FS: 0000000000000000(0000) GS:ffffa0ed15d00000(0000) knlGS:0000000000000000 [ 7827.110146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7827.110153] CR2: 00007ff492a92d30 CR3: 0000000102ff0000 CR4: 00000000000406e0 [ 7827.110162] Call Trace: [ 7827.110169] <IRQ> [ 7827.110177] ? show_regs+0x72/0x90 [ 7827.110189] ? dev_watchdog+0x21f/0x230 [ 7827.110198] ? __warn+0x8d/0x160 [ 7827.110211] ? dev_watchdog+0x21f/0x230 [ 7827.110222] ? report_bug+0x1bb/0x1d0 [ 7827.110233] ? irq_work_queue+0x32/0x80 [ 7827.110244] ? handle_bug+0x46/0x90 [ 7827.110256] ? exc_invalid_op+0x19/0x80 [ 7827.110266] ? asm_exc_invalid_op+0x1b/0x20 [ 7827.110280] ? dev_watchdog+0x21f/0x230 [ 7827.110290] ? __pfx_dev_watchdog+0x10/0x10 [ 7827.110299] call_timer_fn+0x2c/0x160 [ 7827.110312] ? __pfx_dev_watchdog+0x10/0x10 [ 7827.110322] __run_timers.part.0+0x1fb/0x2b0 [ 7827.110335] run_timer_softirq+0x2a/0x60 [ 7827.110348] __do_softirq+0xdd/0x330 [ 7827.110359] ? hrtimer_interrupt+0x12b/0x250 [ 7827.110373] __irq_exit_rcu+0xa2/0xd0 [ 7827.110383] irq_exit_rcu+0xe/0x20 [ 7827.110392] sysvec_apic_timer_interrupt+0x96/0xb0 [ 7827.110403] </IRQ> [ 7827.110409] <TASK> [ 7827.110415] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 7827.110427] RIP: 0010:cpuidle_enter_state+0xde/0x6f0 [ 7827.110440] Code: 4f 11 7b e8 94 1a 45 ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 92 f8 43 ff 80 7d d0 00 0f 85 e8 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 0f 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c4 04 00 00 [ 7827.110448] RSP: 0018:ffffbf17800cbe28 EFLAGS: 00000246 [ 7827.110460] RAX: 0000000000000000 RBX: ffffa0ec0243e000 RCX: 0000000000000000 [ 7827.110467] RDX: 0000000000000004 RSI: 0000000000000000 RDI: 0000000000000000 [ 7827.110472] RBP: ffffbf17800cbe78 R08: 0000000000000000 R09: 0000000000000000 [ 7827.110478] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff866d6140 [ 7827.110491] R13: 0000000000000002 R14: 0000000000000002 R15: 0000071e640eaa93 [ 7827.110503] ? cpuidle_enter_state+0xce/0x6f0 [ 7827.110516] ? tick_nohz_stop_tick+0x17a/0x210 [ 7827.110527] cpuidle_enter+0x2e/0x50 [ 7827.110539] cpuidle_idle_call+0x14f/0x1e0 [ 7827.110554] do_idle+0x82/0x110 [ 7827.110565] cpu_startup_entry+0x20/0x30 [ 7827.110576] start_secondary+0x138/0x170 [ 7827.110589] secondary_startup_64_no_verify+0xe5/0xeb [ 7827.110604] </TASK> [ 7827.110609] ---[ end trace 0000000000000000 ]--- It seems to both happen with EHCI and OHCI port, and the entire USB subsystem is broken until I do a reboot. $ lspci -nn|grep USB 00:12.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397] 00:12.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller [1002:4398] 00:12.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396] 00:13.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397] 00:13.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0 USB OHCI1 Controller [1002:4398] 00:13.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396] 00:14.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller [1002:4399] I will try to disable SG with ethtool and see if the problem goes away, as suggested by @Prashanth by putting the following into /etc/network/interfaces auto enx1c1adff98bf8 iface enx1c1adff98bf8 inet static address 192.168.255.1 netmask 255.255.255.0 up sleep 5; ethtool -K enx1c1adff98bf8 sg off 15 days uptime so far with: ethtool -K enx1c1adff98bf8 sg off So it is a viable workaround. Thanks for the confirmation, I have sent a fix (for DWC3 controllers) to upstream, lets see if we can get it accepted - https://lore.kernel.org/all/20231212112521.3774610-1-quic_prashk@quicinc.com/ I am experiencing this issue on latest Fedora, kernel 6.7.5. I have noticed the patch got merged. So I was running it flawlessly, no dropped connections, but after a few hours it crashed with Tx status -71. I have noticed that my hub is 2109:2813 (3 entries in lsusb) and 2109:0813 (also 3 entries in lsusb). Which is different from others here, yet I am experiencing identical issue. I have 2 usb R8153 ethernet adapters, both use usbID 0bda:8153. Both stop functioning within a minute with the logs flooding with "Tx status -71". Using ethtool -K [iface] sg off had no effect on this, still fails within minutes. Currently running ubuntu 23.11 with kernel 6.5.0-21-generic I think I can reproduce the "Tx status -71" error using a fresh 6.8.0-rc7-1-mainline kernel (Arch Linux here) and two Samsung ViewFinity S80TB displays (integrated RTL8153 controller), which are connected using Thunderbolt to a Framework 13 AMD. "ethtool -K [iface] sg off" has no effect for me, too. I can run a network download speed test (i.e. https://www.speedtest.net/run ) without problems, but after the download is finished and the upload part starts, my displays seem to reconnect 1-2x and then they disconnect completely. dmesg attached, it can be reproduced with a chance of 95%. Sometimes I need a second speed test to crash it. No TLP active here, the usbcore.quirks didn't work for me. lsusb shows me this: "Bus 008 Device 013: ID 0bda:8153 Realtek Semiconductor Corp. RTL" and "Bus 008 Device 016: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter" Created attachment 305959 [details]
dmesg of Linux 6.8.0-rc7
(In reply to Ansgar Hegerfeld from comment #110) > I can run a network download speed test (i.e. https://www.speedtest.net/run > ) without problems, but after the download is finished and the upload part > starts, my displays seem to reconnect 1-2x and then they disconnect > completely. Pretty much same behavior over here, this was happening before patch for me 100% of the time. Now it happened at random. If it does crash during speedtest I did not manage to hit it since I tested it only once since the patch. Will try again soon. I've done some testing today, and I've come to some interesting findings: First: tnx to Ansgar Hegerfeld, speedtest upload is a very reliable trigger for the error. When connecting the my separate dongle to the usb-a port or using a converter the usb-c ports on my laptop the behavior is not triggered. When connecting the same dongle to my Thinkpad 40AY docking station, the behaviour can be reliably triggered. The difference as far as I can tell is the addition of the internal usb 3.1 hub(s) of the docking station, usbid 17ef:30ab for the internal r8153 resp. 17ef:30ab and 17ef:30ad for the external. My workaround for now is to connect the external usb-ethernet adapter to a usb 2 port of the hub and accept the lower speed. output of 'lsusb -v -d 17ef:30ab': Bus 004 Device 002: ID 17ef:30ab Lenovo USB3.1 Hub Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.20 bDeviceClass 9 Hub bDeviceSubClass 0 bDeviceProtocol 3 bMaxPacketSize0 9 idVendor 0x17ef Lenovo idProduct 0x30ab bcdDevice 51.34 iManufacturer 1 VIA Labs, Inc. iProduct 2 USB3.1 Hub iSerial 3 000000001 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x001f bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xe0 Self Powered Remote Wakeup MaxPower 0mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 9 Hub bInterfaceSubClass 0 bInterfaceProtocol 0 Full speed (or root) hub iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 19 Transfer Type Interrupt Synch Type None Usage Type Feedback wMaxPacketSize 0x0002 1x 2 bytes bInterval 8 bMaxBurst 0 Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 0x0049 bNumDeviceCaps 5 USB 2.0 Extension Device Capability: bLength 7 bDescriptorType 16 bDevCapabilityType 2 bmAttributes 0x00000006 BESL Link Power Management (LPM) Supported SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x00 wSpeedsSupported 0x000e Device can operate at Full Speed (12Mbps) Device can operate at High Speed (480Mbps) Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 1 Lowest fully-functional device speed is Full Speed (12Mbps) bU1DevExitLat 4 micro seconds bU2DevExitLat 231 micro seconds Container ID Device Capability: bLength 20 bDescriptorType 16 bDevCapabilityType 4 bReserved 0 ContainerID {5e048157-f6af-4075-b308-2b3de1bdadf5} SuperSpeedPlus USB Device Capability: bLength 28 bDescriptorType 16 bDevCapabilityType 10 bmAttributes 0x00000023 Sublink Speed Attribute count 4 Sublink Speed ID count 2 wFunctionalitySupport 0x1100 Min functional Speed Attribute ID: 0 Min functional RX lanes: 1 Min functional TX lanes: 1 bmSublinkSpeedAttr[0] 0x00050030 Speed Attribute ID: 0 5Gb/s Symmetric RX SuperSpeed bmSublinkSpeedAttr[1] 0x000500b0 Speed Attribute ID: 0 5Gb/s Symmetric TX SuperSpeed bmSublinkSpeedAttr[2] 0x000a4031 Speed Attribute ID: 1 10Gb/s Symmetric RX SuperSpeedPlus bmSublinkSpeedAttr[3] 0x000a40b1 Speed Attribute ID: 1 10Gb/s Symmetric TX SuperSpeedPlus ** UNRECOGNIZED: 03 10 0b Hub Descriptor: bLength 12 bDescriptorType 42 nNbrPorts 4 wHubCharacteristic 0x0009 Per-port power switching Per-port overcurrent protection bPwrOn2PwrGood 175 * 2 milli seconds bHubContrCurrent 0 milli Ampere bHubDecLat 0.4 micro seconds wHubDelay 2292 nano seconds DeviceRemovable 0x0a Hub Port Status: Port 1: 0000.0203 5Gbps power U0 enable connect Ext Status: 0000.0000 RX Speed Attribute ID: 0 Lanes: 1 TX Speed Attribute ID: 0 Lanes: 1 Port 2: 0000.02a0 5Gbps power Rx.Detect Port 3: 0000.0263 5Gbps power suspend enable connect Ext Status: 0000.0011 RX Speed Attribute ID: 1 Lanes: 1 TX Speed Attribute ID: 1 Lanes: 1 Port 4: 0000.02a0 5Gbps power Rx.Detect Device Status: 0x0001 Self Powered I have two USB r8153 from two different brands. Only one of them exhibits this problem. I can easily reproduce it with iperf3 in bidirectional mode (--bidir). Other modes were not reliably triggering the issue. I would like to inform you that this bug is also reproduced on an OWC USB-C TRAVEL DOCK E product, related information: ``` OS distribution: Ubuntu 24.04 CPU architecture: AMD64 System: Framework Laptop 13(AMD 7040 series) OS kernel: Ubuntu kernel 6.8.0-39-generic Ethernet interface identifier: idVendor=0bda, idProduct=8153, bcdDevice=31.00 Ethernet interface firmware: rtl8153b-2 v2 04/27/23 Ethernet interface negotiated speed: 100MBit/s Dock connections: * Type-C to the host * PD power input * HDMI(to a 1080p external display) * Ethernet: 100MBit/s ``` I thought the hub was overheated until this bug report was found. Created attachment 306712 [details]
Kernel log of the 6.8.0-39-generic Ubuntu kernel
Please disregard messages related to the rtl8156 chip as it is from another Ethernet interface that doesn't suffer from this issue(the Framework Ethernet Expansion Card).
For my Thinkpad 40AY the issue seems to be resolved with kernel 6.8.0-45 (ubuntu 24.04.1) Both the internal ethernet of the dock and my external ethernet now run without issues. Same here! I cannot reproduce this bug anymore with my speedtest.net-check, too. Didn't test it for 6 months and now I'm running 6.11.2-arch1-1 which works just as expected for multiple days. I could not trigger the failure even once using speedtest or iperf. Disabled all USB quirks and it doesn't matter which powersaving setting I'm using (power-profiles-daemon). Hopefully this behavior will last longer than some of the other workarounds we saw here, thanks to everybody who invested time here! For me it was fine for a while, then after a specific kernel version (I am on 6.11.6 right now) it started happening again. Can someone please confirm or deny if 0bda:8153 is still having problems or not or recent stable kernel versions? |