Bug 202541 - xhci_hcd 0000:15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state
Summary: xhci_hcd 0000:15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot o...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-09 16:41 UTC by Bernhard
Modified: 2019-11-13 10:33 UTC (History)
13 users (show)

See Also:
Kernel Version: >= 4.20
Tree: Mainline
Regression: Yes


Attachments
dmesg output before reboot (80.82 KB, text/plain)
2019-03-10 11:51 UTC, Bernhard
Details
dmesg output after reboot (83.19 KB, text/plain)
2019-03-10 11:51 UTC, Bernhard
Details
rt2x00_revert_4.20_changes.patch (23.69 KB, text/plain)
2019-03-11 14:18 UTC, Stanislaw Gruszka
Details
Patch adding doorbell tracing (3.55 KB, application/mbox)
2019-10-15 14:50 UTC, Mathias Nyman
Details
Dmesg log and trace file (560.91 KB, application/zip)
2019-10-15 18:25 UTC, Bernhard
Details
Logs after flushing endpoint (833.02 KB, application/zip)
2019-10-17 19:27 UTC, Bernhard
Details
Patch handling halted endpoints at completion of stop endpoint command (8.45 KB, patch)
2019-10-29 16:17 UTC, Mathias Nyman
Details | Diff
Logs after applying the patch (833.02 KB, application/zip)
2019-10-30 03:22 UTC, Bernhard
Details
Logs after applying the patch (630.93 KB, application/zip)
2019-10-30 12:59 UTC, Bernhard
Details

Description Bernhard 2019-02-09 16:41:04 UTC
After upgrading to the 4.20 Kernel(was using 4.19 previously) my usb wifi stick doesn´t work until I reboot the system. This issue happens every time I start my pc(only when the system was shut down, it doesn´t happen after rebooting). The wifi driver in use is rt2800usb. I tried restarting the NetworkManager, but this didn´t change anything.
Comment 1 Stanislaw Gruszka 2019-03-10 10:39:39 UTC
Hmm, that's strange perhaps this is some USB host problem. Please provide dmesg of your system.
Comment 2 Bernhard 2019-03-10 11:51:19 UTC
Created attachment 281677 [details]
dmesg output before reboot
Comment 3 Bernhard 2019-03-10 11:51:46 UTC
Created attachment 281679 [details]
dmesg output after reboot
Comment 4 Stanislaw Gruszka 2019-03-10 13:59:34 UTC
We have this xhci_hcd warning on bad case:

 xhci_hcd 0000:15:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state

Not sure where it come from. But I notice you are using AMD IOMMU which we have troubles with with different drivers. 

You could try to disable iommu via kerenl boot parameter and check if that improve things. You could also try test this patch if possible:
https://bugzilla.kernel.org/attachment.cgi?id=281675

If none of that helps I will prepare some rt2800 patches to see if this not caused by some of v4.19 .. v4.20 rt2800 commits:

0240564430c0 rt2800: flush and txstatus rework for rt2800mmio
adf26a356f13 rt2x00: use different txstatus timeouts when flushing
5022efb50f62 rt2x00: do not check for txstatus timeout every time on tasklet
0b0d556e0ebb rt2800mmio: use txdone/txstatus routines from lib
5c656c71b1bf rt2800: move usb specific txdone/txstatus routines to rt2800lib
f483039cf51a rt2x00: use simple_read_from_buffer()

But I would rather suspect problem introduced in AMD IOMMU or usb/xhci drivers.
Comment 5 Bernhard 2019-03-10 22:32:28 UTC
I tried disabling iommu, and I also compiled the 4.20.15 kernel from source with that patch applied, but the wifi didn´t work in both cases either.
Comment 6 Stanislaw Gruszka 2019-03-11 14:18:25 UTC
Created attachment 281711 [details]
rt2x00_revert_4.20_changes.patch

Please test this patch and report if it makes problem gone or not.
Comment 7 Bernhard 2019-03-11 14:51:47 UTC
The problem is still there after applying that patch.
Comment 8 Stanislaw Gruszka 2019-03-11 15:22:06 UTC
You need to report this bug usb maintainers. I'm changing the topic and component, but USB bugs should be reported directly to mailing list.
Comment 9 Stanislaw Gruszka 2019-03-11 15:22:50 UTC
Please send bug report to  linux-usb@vger.kernel.org
Comment 10 Michael 2019-04-05 19:46:31 UTC
I can confirm this issue. Also I can confirm that other USB devices are effected, too (mostly if plugged into an USB3 port).
For example:
ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

dmesg doesn't show IOMMU warnings, so I assume it is a problem introduced in usb/xhci driver.
Comment 11 Bernhard 2019-04-06 02:00:56 UTC
(In reply to Michael from comment #10)
> I can confirm this issue. Also I can confirm that other USB devices are
> effected, too (mostly if plugged into an USB3 port).
> For example:
> ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
> WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
> 
> dmesg doesn't show IOMMU warnings, so I assume it is a problem introduced in
> usb/xhci driver.

I think this affects only a specific hardware configuration(I've tried using my wifi stick on a different machine and it worked without problems).
Which hardware are you using? Maybe there are some parts we have in common.

My hardware configuration:
CPU: AMD Ryzen 3 2200G, Motherboard: MSI B350 PC MATE
GPU: AMD Radeon RX 580 8GB
Comment 12 Michael 2019-04-06 07:40:43 UTC
@ Bernhard
The parts we have in common : AMD RYZEN

AMD RYZEN 1700 MSI X370 KRAIT, MSI AERO GTX1080Ti, 5.0.6-arch1-1-ARCH (system was also affected by IOMMU issue - but that is fixed)

Affected USB WiFi devices (tested):
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter (ALFA AWUS036NH - rt2800usb)
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter (ipTime/ zioncom  - rt2800usb)
ID 7392:7710 Edimax Technology Co., Ltd (mt7601u)
ID 7392:a812 Edimax Technology Co., Ltd (Edimax EW-7811USC - rtl88xxau)
ID 148f:761a Ralink Technology, Corp. MT7610U ("Archer T2U" 2.4G+5G WLAN Adapter  - mt76x0)
ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U] 
ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)
I'm sure there are more.

After he has fixed some driver / IOMMU issues, Stanislaw has found out, that it possibly could be a xhci/driver issue. I share his opinion.

You can read more about the issues here:
https://github.com/ZerBea/hcxdumptool/issues/42
and the fixed IOMMU issue here:
https://bugzilla.kernel.org/show_bug.cgi?id=202241
Comment 13 Stanislaw Gruszka 2019-04-06 08:47:54 UTC
FTR: I think those two commits could help:

commit 6cbcf596934c8e16d6288c7cc62dfb7ad8eadf15
Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date:   Fri Mar 22 17:50:15 2019 +0200

    xhci: Fix port resume done detection for SS ports with LPM enabled

commit d92f2c59cc2cbca6bfb2cc54882b58ba76b15fd4
Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date:   Fri Mar 22 17:50:17 2019 +0200

    xhci: Don't let USB3 ports stuck in polling state prevent suspend

Also I'm not sure if if issue was reported to proper maintainer. If not and problem is not already fixed on latest upstream, either bisection will be needed to precede with this bug or fill properly informative bug report to proper maintainer.
Comment 14 Michael 2019-04-06 09:27:40 UTC
@ Stanislaw, thanks for additional information.

@ Bernhard, ‎have you already sent this bug report to linux-usb mailing list?

can we change affected kernel version from 4.20 to >= 4.20, because 5.0.6 is affected, too?
Comment 15 Bernhard 2019-04-06 10:15:17 UTC
Yes, I already sent this to the mailing list, but I got no response unfortunately.

I've changed the affected kernel version btw.
Comment 16 Michael 2019-04-06 10:28:05 UTC
@ Bernhard, thanks for your answer. So there is no need for me to report this issue, too.
Comment 17 Bernhard 2019-04-06 11:31:23 UTC
I just tried the two patches Stanislaw mentioned, but the problem is still there.
Comment 18 Michael 2019-04-06 11:57:00 UTC
Tried them, too, some days ago, but the didn't solve the issue.
Just downloaded 5.1rc3, but I don't expect a working driver (usb/host), inside.
Comment 19 Michael 2019-04-06 22:36:40 UTC
Tested an ASUS X555U (Intel i5-6200 - 5.0.6-arch1-1-ARCH) and that system is affected, if the device is plugged into one of the USB3 ports. The device is working, if plugged into the USB2 port.
Comment 20 Bernhard 2019-04-07 01:15:44 UTC
I just tried replacing the xhci_ring.c file with the version from the 4.19 kernel, that solved the problem. Then I started patching the code until the problem occurs again. 
The change in the function "static int process_bulk_intr_td" is causing the problem, it's part of this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/usb/host/xhci-ring.c?id=9703fc8caf36ac65dca1538b23dd137de0b53233
Comment 21 Stanislaw Gruszka 2019-04-07 08:17:17 UTC
Berna(In reply to Bernhard from comment #20)
> I just tried replacing the xhci_ring.c file with the version from the 4.19
> kernel, that solved the problem. Then I started patching the code until the
> problem occurs again. 
> The change in the function "static int process_bulk_intr_td" is causing the
> problem, it's part of this patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/
> drivers/usb/host/xhci-ring.c?id=9703fc8caf36ac65dca1538b23dd137de0b53233

Good findings, great. This seems to be part of 

commit f8f80be501aa2f10669585c3e328fad079d8cb3a
Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date:   Thu Sep 20 19:13:37 2018 +0300

    xhci: Use soft retry to recover faster from transaction errors

Just add information you found in the posted linux-usb email and CC "Mathias Nyman <mathias.nyman@linux.intel.com>" to make sure he is aware of the problem.
Comment 22 Michael 2019-04-07 09:08:51 UTC
The issue isn't fixed in 5.1rc3, so it look's like Mathias Nyman isn't aware of the problem, yet.
Comment 23 Michael 2019-05-20 08:04:25 UTC
Still present in 5.1.2
Comment 24 Michael 2019-06-08 19:24:48 UTC
This issue is really funny:
running
D 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]

on kernel
$ uname -r
5.1.7-arch1-1-ARCH

will spam the log after the know WARN
43163.034783] mt76x0u 1-10.2:1.0 wlp3s0f0u10u2: renamed from wlan0
[43163.351656] usb 1-10.2: USB disconnect, device number 6
[43163.352176] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

with tons of failed vendor requests:
[43160.683383] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3dc failed:-71
[43160.813398] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e0 failed:-71
[43160.943415] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e4 failed:-71
[43161.073440] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3e8 failed:-71
[43161.203439] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3ec failed:-71
[43161.333458] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f0 failed:-71
[43161.463468] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f4 failed:-71
[43161.593561] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3f8 failed:-71
[43161.723502] mt76x0u 1-10.2:1.0: vendor request req:06 off:c3fc failed:-71
[43161.853512] mt76x0u 1-10.2:1.0: vendor request req:06 off:108c failed:-71
....
Comment 25 Michael 2019-06-09 17:59:30 UTC
If the same device is connected to an Intel Core I5-6200 system (USB3 port), the log looks different to the AMD RYZEN system.

[  204.231872] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.231901] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.231940] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.231980] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.232020] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.232188] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.232226] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.232275] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.232304] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.232345] mt76x0u 1-1:1.0: rx urb failed: -71
[  204.233284] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[  204.233291] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
[  204.263427] mt76x0u 1-1:1.0: TX DMA did not stop
[  207.596726] mt76x0u 1-1:1.0: Warning: MAC TX did not stop!
[  209.650050] mt76x0u 1-1:1.0: Warning: MAC RX did not stop!
[  209.651133] mt76x0u 1-1:1.0: RX DMA did not stop

Also I noticed some changes in xhci-ring.c between 5.1.7 and 5.2_rc4. Maybe they'll fix the problem. I didn't tested it, yet.
Comment 26 Bernhard 2019-06-09 18:12:43 UTC
I already tried the 5.2-rc3 kernel and the problem isn't fixed yet. There were no changes in the xhci driver between rc3 and rc4, so it's very unlikely that the problem doesn't occur in the 5.2-rc4 kernel.
Comment 27 Michael 2019-06-09 18:23:33 UTC
Thanks for the information. I skipped 5.2rc1 ... rc3.

But with your information, there is no real need for me to run some more tests.

Unfortunately it looks like the issue is back ported to older kernel versions (4.19), because I got some issue reports here, too:
https://github.com/ZerBea/hcxdumptool/issues/57

and 90% of my devices doesn't work any longer.
Comment 28 Bernhard 2019-06-09 18:41:16 UTC
When did it get back ported? I'm on 4.19.48 and haven't had a problem with this version...
Comment 29 Michael 2019-06-09 19:46:43 UTC
It's just a guess, because of this post:
https://github.com/ZerBea/hcxdumptool/issues/57#issuecomment-483964293

But it looks like the device was working before that post. 
I cant test it, because I have not such a device.

I tested a TP-LINK Archer T2UH and this device is not working on 4.19.46 arm (Raspberry Pi).
Comment 30 Michael 2019-06-09 19:48:54 UTC
Yes, rt2800usb is working fine on 4.19.46.
Comment 31 Michael 2019-06-10 07:44:05 UTC
hcxdumptool running on kernel 4.19.46 arm doesn't receive packets on several different devices. In this case:
ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]
INFO: cha=1, rx=0, rx(dropped)=0, tx=18, err=0, aps=0 (0 in range)

while a few other devices still working
INFO: cha=1, rx=805, rx(dropped)=0, tx=93, err=0, aps=29 (21 in range)

BTW:
I'm running/testing only devices on which driver support monitor mode and packet injection.

Very interesting on that arm kernel is that dmesg doesn't show any WARNs.
Comment 32 Michael 2019-06-20 16:07:12 UTC
Still no fix?
$ uname -r
5.1.11-arch1-1-ARCH

and most of the USB devices WiFI, BLUETOOTH,....) are still not working:
32942.700591] usb 1-10.4: new full-speed USB device number 7 using xhci_hcd
[32944.721410] usb 1-10.4: New USB device found, idVendor=0a12, idProduct=0001, bcdDevice=52.76
[32944.721412] usb 1-10.4: New USB device strings: Mfr=0, Product=2, SerialNumber=0
[32945.069015] Bluetooth: hci0: hardware error 0x37

How about kernel 5.2?
Comment 33 Michael 2019-06-21 07:32:33 UTC
Some USB card readers are also affected (connected to USB 3 port):

$ uname -r
5.1.12-arch1-1-ARCH

[ 3510.100114] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[ 3510.134121] usb 2-2: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 0.02
[ 3510.134126] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 3510.134128] usb 2-2: Product: Intenso Ultra Line  
[ 3510.134130] usb 2-2: Manufacturer: ALCOR          
...
[ 5129.997608] usb 1-1: reset high-speed USB device number 7 using xhci_hcd
[ 5130.218618] sd 9:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 5130.218631] sd 9:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 20 c3 c0 00 00 20 00
[ 5130.218637] print_req_error: I/O error, dev sdb, sector 2147264 flags 80700

I really wonder why that issue hasn't been fixed, yet, because many, many devices are affected.
Comment 34 Bernhard 2019-06-21 13:57:28 UTC
The list of changes for 5.2-rc6 contains this two commits:

Mathias Nyman (2):
      usb: xhci: Don't try to recover an endpoint if port is in error state.
      xhci: detect USB 3.2 capable host controllers correctly

I think this could be the fix for this issue.
Comment 35 Michael 2019-06-21 14:40:50 UTC
Great, thanks for the information. The issue is really ugly, because many USB devices are affected (hdd, card reader, bleutooth, wlan, ... - this list is long)
I'll check 5.2-rc6.
Comment 36 Bernhard 2019-06-24 00:53:47 UTC
Just tried 5.2-rc6, but unfortunately I still have the same issue.
Comment 37 Michael 2019-06-24 06:27:32 UTC
Thanks for the information. I tested 5.2-rc6, too. Even an USB 3.0 HDD isn't working.
Comment 38 Michael 2019-07-11 08:32:59 UTC
Now running mainline kernel 5.2 and the issue still exists.
Tested on this device:
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
but the same applies to many other devices, too

dmesg after plug in the device:

[75.482165] usb 1-2: new high-speed USB device number 6 using xhci_hcd
[75.639236] usb 1-2: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[75.639238] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[75.639239] usb 1-2: Product: 802.11 n WLAN
[75.639240] usb 1-2: Manufacturer: Ralink
[75.639241] usb 1-2: SerialNumber: 1.0
[75.952611] usb 1-2: reset high-speed USB device number 6 using xhci_hcd
[76.107232] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[76.120228] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 0005 detected
[76.121079] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[76.130873] usbcore: registered new interface driver rt2800usb
[76.194447] audit: type=1130 audit(1562833499.983:49): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[76.195313] rt2800usb 1-2:1.0 wlp0s20f0u2: renamed from wlan0
[76.216178] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[76.241382] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[76.544022] ieee80211 phy1: rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x0404 with error -71
[77.562305] ieee80211 phy1: rt2800_wait_csr_ready: Error - Unstable hardware
[77.562316] ieee80211 phy1: rt2800usb_set_device_state: Error - Device failed to enter state 4 (-5)
...
followed by this message on access to the interface:
[341.598563] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[341.598573] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
Comment 39 Michael 2019-07-11 08:57:33 UTC
BTW:
The tested device is an ALFA AWUS036NH and I really can't see "Unstable hardware" here.
Comment 40 Bernhard 2019-07-11 11:23:41 UTC
I don't really think the problem is caused by the WIFI stick itself, maybe the cause is the xHCI controller from the motherboard? We're both using a 300-series AM4 board(even the same brand), so we probably have the same controller.

Btw. I've already tried the git snapshot from 5.3-rc1, problem isn't fixed there either.
Comment 41 Michael 2019-07-11 11:43:46 UTC
No, I don't think it's the controller. I'm running three different systems here:
RYZEN 1700, MSI X370 KRAIT
INTEL I5-6200U, ASUS X555U (notebook)
INTEL i7-3930K, ASUS P9X79
and all of them running into the same issue. Also, not all of the testing devices are affected. Some devices are still working as expected (for example TENDA W311U+), while others failed epically (ALFA AWUSH036NH). The same applies to several bluetooth devices.
Absolutely new (and really funny) is the error message "Unstable hardware" on 5.2.
Comment 42 Michael 2019-07-11 13:46:59 UTC
And 5.2 makes things more worse. Most of my adapters are not working.

EDIMAX EW-7711UAN V2
ID 7392:7710 Edimax Technology Co., Ltd

[  228.451035] usb 1-2: new high-speed USB device number 53 using xhci_hcd
[  228.629543] usb 1-2: New USB device found, idVendor=7392, idProduct=7710, bcdDevice= 0.00
[  228.629548] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  228.629550] usb 1-2: Product: Edimax Wi-Fi
[  228.629552] usb 1-2: Manufacturer: MediaTek
[  228.629554] usb 1-2: SerialNumber: 1.0
[  228.779827] usb 1-2: reset high-speed USB device number 53 using xhci_hcd
[  229.037761] mt7601u 1-2:1.0: ASIC revision: 76010001 MAC revision: 76010500
[  229.064654] mt7601u 1-2:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[  230.045089] mt7601u 1-2:1.0: EEPROM ver:0d fae:00
[  230.055724] mt7601u 1-2:1.0: EEPROM country region 01 (channels 1-13)
[  230.763955] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[  231.084339] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[  231.404311] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[  231.724294] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[  232.044298] mt7601u 1-2:1.0: Warning: mt7601u_mcu_wait_resp retrying
[  232.044301] mt7601u 1-2:1.0: Error: mt7601u_mcu_wait_resp timed out
[  232.044485] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  232.046810] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[  232.197641] mt7601u 1-2:1.0: Vendor request req:07 off:0080 failed:-71
[  232.347631] mt7601u 1-2:1.0: Vendor request req:02 off:0080 failed:-71
[  232.497630] mt7601u 1-2:1.0: Vendor request req:02 off:0080 failed:-71
[  232.497675] mt7601u: probe of 1-2:1.0 failed with error -110


LOGILINK WL0150
ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter

[  527.994013] usb 1-2: new high-speed USB device number 86 using xhci_hcd
[  528.238517] usb 1-2: New USB device found, idVendor=148f, idProduct=5370, bcdDevice= 1.01
[  528.238519] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  528.238521] usb 1-2: Product: 802.11 n WLAN
[  528.238522] usb 1-2: Manufacturer: Ralink
[  528.238523] usb 1-2: SerialNumber: 1.0
[  528.495914] usb 1-2: reset high-speed USB device number 86 using xhci_hcd
[  528.747058] ieee80211 phy81: rt2x00_set_rt: Info - RT chipset 5390, rev 0502 detected
[  529.426163] ieee80211 phy81: rt2x00_set_rf: Info - RF chipset 5370 detected
[  529.432544] ieee80211 phy81: Selected rate control algorithm 'minstrel_ht'
[  529.433131] usbcore: registered new interface driver rt2800usb
[  529.447058] audit: type=1130 audit(1562850994.757:43): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  529.447260] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[  534.453471] audit: type=1131 audit(1562850999.761:44): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  560.993915] ieee80211 phy81: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[  561.000105] ieee80211 phy81: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[  599.002238] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.


Delock Bluetooth USB :
ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)

[bluetooth]# scan on
Failed to start discovery: org.bluez.Error.NotReady

[ 1641.910015] usb 1-2: new full-speed USB device number 94 using xhci_hcd
[ 1642.126667] usb 1-2: device descriptor read/64, error -71
[ 1644.002791] usb 1-2: New USB device found, idVendor=0a12, idProduct=0001, bcdDevice=52.76
[ 1644.002795] usb 1-2: New USB device strings: Mfr=0, Product=2, SerialNumber=0
[ 1644.002798] usb 1-2: Product: BT2.0
[ 1644.026094] audit: type=1130 audit(1562852109.344:79): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1644.286731] Bluetooth: hci0: hardware error 0x37


Logilink Bluetooth USB
ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)

[bluetooth]# scan on
Failed to start discovery: org.bluez.Error.NotReady

[ 1641.910015] usb 1-2: new full-speed USB device number 94 using xhci_hcd
[ 1642.126667] usb 1-2: device descriptor read/64, error -71
[ 1644.002791] usb 1-2: New USB device found, idVendor=0a12, idProduct=0001, bcdDevice=52.76
[ 1644.002795] usb 1-2: New USB device strings: Mfr=0, Product=2, SerialNumber=0
[ 1644.002798] usb 1-2: Product: BT2.0
[ 1644.026094] audit: type=1130 audit(1562852109.344:79): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1644.286731] Bluetooth: hci0: hardware error 0x37

Also funny this error message, here: "hardware error".


DL-Link DBT-122 Bluetooth USB
ID 07d1:f101 D-Link System DBT-122 Bluetooth

[bluetooth]# scan on
Failed to start discovery: org.bluez.Error.NotReady

The device list of not working devices is long....
Comment 43 Michael 2019-07-11 13:51:21 UTC
Sorry, copy and paste error of the last dmesg log. Due to several tests, dmesg log was flooded by warnings and error messages.
I'll stop the tests and will wait for next LTS kernel.
Comment 44 Michael 2019-07-12 06:25:29 UTC
BTW:
For me the issue started at this point:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/build&id=72a9c673636b779e370983fea08e40f97039b981
when the Linux kernel's default i386/x86_64 kernel configurations shiped with USB 3.0 support enabled (CONFIG_USB_XHCI_HCD).
Comment 45 Stanislaw Gruszka 2019-07-19 10:32:02 UTC
Looks like there was requested a debug tracing, what was ignored:

https://lore.kernel.org/linux-usb/520462de-9a7f-d3f3-2cce-e4be045cf8b2@linux.intel.com/
Comment 46 Bernhard 2019-07-19 11:31:44 UTC
I didn't ignore it, I sent it to Mathias Nyman only, and not to the whole mailing list("Send output to me" didn't sound like I should send it to the whole mailing list but idk). I have to admit that the first traces weren't really useful though, when I ran the commands he told me the traces started too late(because the error happens immediately after system startup, so when I run this commands after startup the important part is missing). 

Then he gave me instructions how to enable tracing at startup, which only resulted in this error: [ 0.172042] Failed to enable trace event: xhci-hcd
and the tracing file was empty afterwards. 

Just about one week ago I had another idea how I could get it working, and it actually worked. The solution was to just unplug the wifi stick at boot, then enable tracing and plug in the stick again(I don't know why I didn't try that a few months ago tbh). I've sent the two files(dmesg and tracing) to Mathias Nyman again, but this time he didn't respond(I've sent the mail on July 11th). 

Should I send the whole tracing file and dmesg log to the mailing list instead? What is the preferred way to send files that are too big for an e-mail(tracing is around 17.6MB in size)?
Comment 47 Stanislaw Gruszka 2019-07-22 07:54:20 UTC
Bernhard, thanks for the update and provide debug data to the maintainer.

I think you should ping him on mailing list and ask if anything else need to be provided or how to precede otherwise. Maybe we can we just revert the patch?

This issue is annoying and I see more users entering it (and blaming mt76x0u or rt2800usb drivers). It should not be hard to fix since is regression (commit causing it is known) and is reproducible.

Please also point that changes in process_bulk_intr_td() are main cause of the problem as stated in comment 20.
Comment 48 Michael 2019-07-29 17:12:12 UTC
This issue is blaming nearly everything (like this SAMSUNG Galaxy S3):

[34385.294067] usb 1-2: new high-speed USB device number 6 using xhci_hcd
[34385.465017] usb 1-2: New USB device found, idVendor=18d1, idProduct=4ee7, bcdDevice= 2.26
[34385.465022] usb 1-2: New USB device strings: Mfr=2, Product=3, SerialNumber=4
[34385.465025] usb 1-2: Product: GT-I9300
[34385.465028] usb 1-2: Manufacturer: samsung
...
[35074.182055] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Comment 49 Michael 2019-08-24 07:03:55 UTC
Bernhard, running my RYZEN for some days and noticed that tha xhci issue also 
affected the USB keyboard and the USB mouse:


At this time, the system is allready running for 2 days:
Aug 24 08:38:41.665376 tux1 kernel: usb 1-12: new low-speed USB device number 19 using xhci_hcd
Aug 24 08:38:42.001609 tux1 kernel: usb 1-12: New USB device found, idVendor=046a, idProduct=0011, bcdDevice= 1.00
Aug 24 08:38:42.001850 tux1 kernel: usb 1-12: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Aug 24 08:38:42.098291 tux1 kernel: hid-generic 0003:046A:0011.0003: input,hidraw0: USB HID v1.11 Keyboard [HID 046a:0011] on usb-0000:03:00.0-12/input0
Aug 24 08:38:43.631091 tux1 kernel: usb 1-12: input irq status -75 received
Aug 24 08:38:43.631384 tux1 kernel: usb usb1-port12: disabled by hub (EMI?), re-enabling...
Aug 24 08:38:43.631409 tux1 kernel: usb 1-12: USB disconnect, device number 19
Aug 24 08:38:44.025057 tux1 kernel: usb 1-12: new low-speed USB device number 20 using xhci_hcd
Aug 24 08:38:44.361600 tux1 kernel: usb 1-12: New USB device found, idVendor=046a, idProduct=0011, bcdDevice= 1.00
Aug 24 08:38:44.361839 tux1 kernel: usb 1-12: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Aug 24 08:38:44.401604 tux1 kernel: input: HID 046a:0011 as /devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-12/1-12:1.0/0003:046A:0011.0004/input/input18
Aug 24 08:38:44.458277 tux1 kernel: hid-generic 0003:046A:0011.0004: input,hidraw0: USB HID v1.11 Keyboard [HID 046a:0011] on usb-0000:03:00.0-12/input0
Aug 24 08:38:49.031776 tux1 kernel: usb 1-12: input irq status -75 received
Aug 24 08:38:49.032082 tux1 kernel: usb usb1-port12: disabled by hub (EMI?), re-enabling...
Aug 24 08:38:49.032099 tux1 kernel: usb 1-12: USB disconnect, device number 20
Aug 24 08:38:49.425365 tux1 kernel: usb 1-12: new low-speed USB device number 21 using xhci_hcd
Aug 24 08:39:04.905175 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:20.478280 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:20.774967 tux1 kernel: usb 1-12: new low-speed USB device number 22 using xhci_hcd
Aug 24 08:39:36.331757 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:51.838723 tux1 kernel: usb 1-12: device descriptor read/64, error -110
Aug 24 08:39:51.945370 tux1 kernel: usb usb1-port12: attempt power cycle
Aug 24 08:39:52.588394 tux1 kernel: usb 1-12: new low-speed USB device number 23 using xhci_hcd
Aug 24 08:39:57.415723 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:02.448295 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:02.655042 tux1 kernel: usb 1-12: device not accepting address 23, error -71
Aug 24 08:40:02.778269 tux1 kernel: usb 1-12: new low-speed USB device number 24 using xhci_hcd
Aug 24 08:40:07.604975 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:12.638751 tux1 kernel: usb 1-12: Device not responding to setup address.
Aug 24 08:40:12.845561 tux1 kernel: usb 1-12: device not accepting address 24, error -71
Aug 24 08:40:12.845696 tux1 kernel: usb usb1-port12: unable to enumerate USB device

At this time only hard power off will help.

I got rid of that and reverted the ugly xhci patch. Will now see if that will solve the xhci issue.
Comment 50 Bernhard 2019-08-24 11:40:45 UTC
Was one of the affected USB devices plugged in and you rebooted to get the wifi working? Or did that happen even without the device plugged in?

I've noticed once that even after if I rebooted my system to get wifi working, my external HDD didn't work after plugging it in, so I had to reboot again to get that working... 

I'm just using the LTS kernel right now, which works fine for me, but because of that bug I'm kinda limited when choosing a distribution since most distros don't offer different kernel versions and I don't really want to recompile my kernel every time.
Comment 51 Michael 2019-08-24 12:27:45 UTC
No, it happened without a warning. Keyboard LED flashed some times, according to the device descriptor errors. This was the first time I noticed something like that and only on the RYZEN machine.
We talked about that xhci issue in other (git) threads, too:
https://github.com/aircrack-ng/rtl8812au/issues/376#issuecomment-522169478

BTW:
LTS kernel (4.19) still working fine here, too. In my opinion xhci host is unstable since 4.20. I noticed that everytime, when testing/improving a driver.
Comment 52 Michael 2019-08-24 14:04:56 UTC
I noticed the same behavior. Not on an USB hdd, but on an USB ram:

This is an INTENSO USB 2 ALU LINE 64 GB USB stick:

[ 1032.600762] usb 1-11.4: new high-speed USB device number 15 using xhci_hcd
[ 1032.626487] hub 1-11:1.0: hub_ext_port_status failed (err = -71)
[ 1032.629487] usb 1-11-port4: cannot reset (err = -71)
[ 1032.632491] usb 1-11-port4: cannot reset (err = -71)
[ 1032.635486] usb 1-11-port4: cannot reset (err = -71)
[ 1032.638482] usb 1-11-port4: cannot reset (err = -71)
[ 1032.638483] usb 1-11-port4: Cannot enable. Maybe the USB cable is bad?

The stick is ok plugged in on another port:
[ 1465.770379] usb 1-11.4: USB disconnect, device number 23
[ 1708.302214] usb 1-2: new high-speed USB device number 24 using xhci_hcd
[ 1708.471933] usb 1-2: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 1.ff
[ 1708.471938] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1708.471940] usb 1-2: Product: Intenso Alu Line
[ 1708.471943] usb 1-2: Manufacturer: 6989
[ 1708.471945] usb 1-2: SerialNumber: 21F84CE8
[ 1708.479111] usb-storage 1-2:1.0: USB Mass Storage device detected

re-plugged in on 1-11-port4:
[ 1776.661289] usb 1-11.4: new high-speed USB device number 25 using xhci_hcd
[ 1776.810678] usb 1-11.4: New USB device found, idVendor=058f, idProduct=6387, bcdDevice= 1.ff
[ 1776.810684] usb 1-11.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1776.810687] usb 1-11.4: Product: Intenso Alu Line
[ 1776.810691] usb 1-11.4: Manufacturer: 6989
[ 1776.810694] usb 1-11.4: SerialNumber: 21F84CE8
[ 1776.817710] usb-storage 1-11.4:1.0: USB Mass Storage device detected


That leads me to the assumption that the xhci host is unstable, at least in combination with my controller:


[    1.325164] xhci_hcd 0000:03:00.0: hcc params 0x0200ef81 hci version 0x110 quirks 0x0000000008000410
[    1.325319] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.02
[    1.325321] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.325322] usb usb1: Product: xHCI Host Controller
[    1.325323] usb usb1: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[    1.325323] usb usb1: SerialNumber: 0000:03:00.0
[    1.325428] hub 1-0:1.0: USB hub found
[    1.325443] hub 1-0:1.0: 14 ports detected
[    1.325922] xhci_hcd 0000:03:00.0: xHCI Host Controller
[    1.325925] xhci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 2
[    1.325927] xhci_hcd 0000:03:00.0: Host supports USB 3.1 Enhanced SuperSpeed
[    1.325958] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.325974] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.02
[    1.325976] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.325977] usb usb2: Product: xHCI Host Controller
[    1.325978] usb usb2: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[    1.325979] usb usb2: SerialNumber: 0000:03:00.0
[    1.326046] hub 2-0:1.0: USB hub found
[    1.326057] hub 2-0:1.0: 8 ports detected
[    1.326289] usb: port power management may be unreliable
[    1.326451] xhci_hcd 0000:25:00.0: xHCI Host Controller
[    1.326454] xhci_hcd 0000:25:00.0: new USB bus registered, assigned bus number 3
[    1.381302] xhci_hcd 0000:25:00.0: hcc params 0x0200ef81 hci version 0x110 quirks 0x0000000000000010
[    1.381657] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.02
[    1.381658] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.381658] usb usb3: Product: xHCI Host Controller
[    1.381659] usb usb3: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[    1.381660] usb usb3: SerialNumber: 0000:25:00.0
[    1.381737] hub 3-0:1.0: USB hub found
[    1.381744] hub 3-0:1.0: 2 ports detected
[    1.381830] xhci_hcd 0000:25:00.0: xHCI Host Controller
[    1.381832] xhci_hcd 0000:25:00.0: new USB bus registered, assigned bus number 4
[    1.381834] xhci_hcd 0000:25:00.0: Host supports USB 3.0 SuperSpeed
[    1.381863] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.381878] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.02
[    1.381880] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.381880] usb usb4: Product: xHCI Host Controller
[    1.381881] usb usb4: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[    1.381882] usb usb4: SerialNumber: 0000:25:00.0
[    1.381941] hub 4-0:1.0: USB hub found
[    1.381947] hub 4-0:1.0: 2 ports detected
[    1.382075] xhci_hcd 0000:27:00.3: xHCI Host Controller
[    1.382077] xhci_hcd 0000:27:00.3: new USB bus registered, assigned bus number 5
[    1.382170] xhci_hcd 0000:27:00.3: hcc params 0x0270f665 hci version 0x100 quirks 0x0000000000000410
[    1.382268] usb usb5: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.02
[    1.382269] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.382270] usb usb5: Product: xHCI Host Controller
[    1.382271] usb usb5: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[    1.382272] usb usb5: SerialNumber: 0000:27:00.3
[    1.382335] hub 5-0:1.0: USB hub found
[    1.382341] hub 5-0:1.0: 4 ports detected
[    1.382498] xhci_hcd 0000:27:00.3: xHCI Host Controller
[    1.382500] xhci_hcd 0000:27:00.3: new USB bus registered, assigned bus number 6
[    1.382501] xhci_hcd 0000:27:00.3: Host supports USB 3.0 SuperSpeed
[    1.382512] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.382527] usb usb6: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.02
[    1.382528] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.382529] usb usb6: Product: xHCI Host Controller
[    1.382530] usb usb6: Manufacturer: Linux 5.2.9-arch1-1-ARCH xhci-hcd
[    1.382531] usb usb6: SerialNumber: 0000:27:00.3
Comment 53 Michael 2019-09-07 10:19:48 UTC
tested another USB controller (at this time 3.1) and the results are even worse than on USB 3.0:
USB controller: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller (rev 02)
and
TENDA W311U+
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
This device is one of the few that work on an USB 3.0 controller 
Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) 

but it failed epically on USB 3.1:
[ 1213.285622] rt2800usb 5-3.1.2:1.0 wlp39s0f3u3u1u2: renamed from wlan0
[ 1218.918384] ieee80211 phy6: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 1218.918427] ieee80211 phy6: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[ 1219.222282] device wlp39s0f3u3u1u2 entered promiscuous mode
[ 1220.797413] rt2800usb_tx_sta_fifo_read_completed: 186 callbacks suppressed
[ 1220.797417] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797452] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797531] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797611] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797692] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797772] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797851] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.797931] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.798011] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.798091] ieee80211 phy6: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
[ 1220.814661] xhci_hcd 0000:27:00.3: WARN Cannot submit Set TR Deq Ptr
[ 1220.814663] xhci_hcd 0000:27:00.3: A Set TR Deq Ptr command is pending.
[ 1221.378769] ieee80211 phy6: rt2x00queue_flush_queue: Warning - Queue 0 failed to flush
[ 1221.409201] device wlp39s0f3u3u1u2 left promiscuous mode


I really hope it will be fixed until we reach next LTS-KERNEL.
Comment 54 Michael 2019-09-07 10:40:39 UTC
and more and more devices are affected:
https://github.com/dresden-elektronik/deconz-rest-plugin/issues/1640
Comment 55 Michael 2019-09-08 09:09:09 UTC
@  Stanislaw Gruszka
We once talked about a rt2800usb issue (rt2800usb stops receiving) here:
https://bugzilla.kernel.org/show_bug.cgi?id=202243#c19

Now, I'm not sure, if it is related to this xhci issue or not, because I get it sometimes on kernel 4.19, too.

After doing setsockopt PACKET_MR_PROMISC:
https://github.com/ZerBea/hcxdumptool/blob/master/hcxdumptool.c#L5513

dmesg will show this warning (in this case running an USB 2.0 controller):
[ 1687.106514] device wlp3s0f0u2 entered promiscuous mode
[ 1687.106551] audit: type=1700 audit(1567932110.523:46): dev=wlp3s0f0u2 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
[ 1718.525815] ieee80211 phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush
[ 1718.558846] device wlp3s0f0u2 left promiscuous mode
[ 1718.558888] audit: type=1700 audit(1567932141.974:47): dev=wlp3s0f0u2 prom=0 old_prom=256 auid=1000 uid=0 gid=0 ses=2

The adapter stops working until it is plugged out and plugged in again:
[ 1722.950110] usb 1-2: USB disconnect, device number 5

If you think it is not related to this issue, I can open a new rt2800usb issue.
Comment 56 Koen 2019-09-22 12:02:02 UTC
Hi,

a google search led me here. I'm getting the same error on my Lenovo Thinkpad X220 running Kubuntu 19.04. Everytime I plug in an USB memory stick or a SD card I get the following messages in dmesg:

[ 9649.078958] xhci_hcd 0000:05:00.0: WARN Cannot submit Set TR Deq Ptr
[ 9649.078966] xhci_hcd 0000:05:00.0: A Set TR Deq Ptr command is pending.

Linux koen-ThinkPad-X220 5.0.0-29-generic #31-Ubuntu SMP Thu Sep 12 13:05:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Koen
Comment 57 Doug Gale 2019-09-23 22:24:32 UTC
Got this issue on 5.0.0-29-generic, host hardware is Ryzen 2700X on B350 chipset (Asus Prime B350-Plus).

USB Device is Samsung Galaxy A5, Model SM-A520W, Android 8.0

[57460.411327] usb 1-4.1.4: USB disconnect, device number 10
[57460.411566] xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[57460.685963] usb 1-4.1.4: new high-speed USB device number 11 using xhci_hcd
[57460.830379] usb 1-4.1.4: New USB device found, idVendor=04e8, idProduct=6860, bcdDevice= 4.00
[57460.830382] usb 1-4.1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[57460.830383] usb 1-4.1.4: Product: SAMSUNG_Android
[57460.830385] usb 1-4.1.4: Manufacturer: SAMSUNG
[57460.830386] usb 1-4.1.4: SerialNumber: **withheld**

doug@doug-dt:~$ sudo lspci -s 2:0.0 -vvvvvv
[sudo] password for doug: 
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset USB 3.1 xHCI Controller (rev 02) (prog-if 30 [XHCI])
	Subsystem: ASMedia Technology Inc. 300 Series Chipset USB 3.1 xHCI Controller

I have this kernel parameter to prevent other USB issues: usbcore.autosuspend=-1

Linux doug-dt 5.0.0-29-generic #31~18.04.1-Ubuntu SMP Thu Sep 12 18:29:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Comment 58 Michael 2019-09-24 06:30:55 UTC
This issue still exists on
$ uname -r
5.3.1-arch1-1-ARCH

Sep 24 08:14:00.374050 tux1 kernel: device wlp3s0f0u2 entered promiscuous mode
Sep 24 08:14:39.757848 tux1 kernel: xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Sep 24 08:14:39.758158 tux1 kernel: xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Sep 24 08:14:39.770950 tux1 kernel: mt7601u 1-2:1.0: Warning: TX DMA did not stop!

xhci host is running completely instable after receiving the first warning: 
WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state
Ignoring this warning, the whole system freezes. At this time only a "hard" power off will help.

BTW:
Shouldn't we increase importance (next kernel will be LTS - and this issue will reach the major distributions).
Comment 59 Bernhard 2019-09-24 14:04:54 UTC
I have noticed that I don't get that error("WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state") anymore, even though I still have the same USB issues(maybe something in the rt2800usb driver changed, idk). I've even tried applying all the patches in the "for-usb-linus" branch from Mathias Nyman's git repo - but I still have the same issue. 

Maybe more people should send a message to the usb kernel mailing list(linux-usb@vger.kernel.org)? I didn't get a response the last time but maybe they will address this issue if they see that more users are affected by this regression. 

BTW @Michael:
There is a commit in the for-usb-linus branch that could fix the system freezes you've encountered: https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/commit/?h=for-usb-linus&id=750ed908bbb57153c75b79c50135e7cc94feb4a5
Comment 60 Michael 2019-09-24 14:40:26 UTC
@Bernhard.
Thanks. I'll check it. Also thanks for setting prio to high.
Until the system freezes, I receive the funniest warnings from the xhci system: bad cable, bad device, firmware not loaded,... 

"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" depend also on the device:
Running a 
148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
I got no "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"

Running
148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter
I got the warning.

Both of them using the rt2800usb driver.

That and the different warnings let me assume, the xhci host is running completely instable, especially when hcxdumptool doing high workload.
Comment 61 Michael 2019-09-26 11:31:27 UTC
It seems that the commit is working - no freeze, up to now.
Comment 62 Michael 2019-09-27 17:40:26 UTC
Nop, doesn't work as expected. No freezes, but:

[ 2914.285601] ieee80211 phy77: Atheros AR9271 Rev:1
[ 2914.286229] ath9k_htc 1-3:1.0 wlp0s20f0u3: renamed from wlan0
[ 2914.389748] usb 1-3: USB disconnect, device number 83
[ 2914.749819] ath: phy77: Failed to wakeup in 500us
[ 2914.760221] ath: phy77: Failed to wakeup in 500us
[ 2914.770309] ath: phy77: Failed to wakeup in 500us
[ 2914.780411] ath: phy77: Failed to wakeup in 500us
[ 2915.283332] usb 1-3: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[ 2915.531824] usb 1-3: ath9k_htc: Firmware - ath9k_htc/htc_9271-1.4.0.fw download failed
[ 2915.532206] usb 1-3: ath9k_htc: USB layer deinitialized
[ 2928.339410] ------------[ cut here ]------------
[ 2928.339505] WARNING: CPU: 1 PID: 704 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[ 2928.339506] Modules linked in: ath9k_htc ath9k_common ath9k_hw ath nfnetlink_queue nfnetlink_log nfnetlink ccm uas usb_storage rt2800usb rt2x00usb rt2800lib rt2x00lib fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo snd_soc_skl videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_soc_hdac_hda snd_hda_codec_hdmi videobuf2_common snd_hda_ext_core videodev snd_soc_skl_ipc snd_hda_codec_realtek rtsx_usb_ms memstick mc snd_soc_sst_ipc x86_pkg_temp_thermal snd_soc_sst_dsp r8169 intel_powerclamp snd_soc_acpi_intel_match snd_soc_acpi coretemp snd_soc_core kvm_intel snd_hda_codec_generic ledtrig_audio realtek snd_compress rtl8821ae ac97_bus kvm libphy irqbypass snd_pcm_dmaengine ipmi_devintf btcoexist ipmi_msghandler crct10dif_pclmul crc32_pclmul i915 rtl_pci rtlwifi mac80211 ghash_clmulni_intel joydev cfg80211 mousedev aesni_intel mei_hdcp libarc4 iTCO_wdt aes_x86_64 i2c_hid crypto_simd snd_hda_intel i2c_algo_bit cryptd asus_nb_wmi iTCO_vendor_support
.....

At this time xhci crashed TP-LINK TL722WN v1.
And that device worked, before...
xhci is still running completely unstable and the delivered warnings are unpredictable.
Comment 63 Vinicius 2019-10-01 02:47:09 UTC
Just providing some information that could be helpful to debug the issue. It is also affecting me.

Kernel version:
# uname -a
Linux arch 5.3.1-arch1-1-ARCH #1 SMP PREEMPT Sat Sep 21 11:33:49 UTC 2019 x86_64 GNU/Linux

Hardware specs: AMD Ryzen 5 2400G

The issue happens when I plug in an Alfa AWUS036NH (148f:3070 Ralink Technology, Corp. RT2870/RT3070) - It uses the module rt2800usb

Below you can find my dmesg output when I plug in the Alfa device:

---
[ 1130.410091] usb 1-10: new high-speed USB device number 5 using xhci_hcd
[ 1130.653103] usb 1-10: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[ 1130.653108] usb 1-10: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1130.653111] usb 1-10: Product: 802.11 n WLAN
[ 1130.653113] usb 1-10: Manufacturer: Ralink
[ 1130.653114] usb 1-10: SerialNumber: 1.0
[ 1130.864470] usb 1-10: reset high-speed USB device number 5 using xhci_hcd
[ 1131.110058] ieee80211 phy1: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[ 1131.788103] ieee80211 phy1: rt2x00_set_rf: Info - RF chipset 0005 detected
[ 1131.794331] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[ 1131.833798] rt2800usb 1-10:1.0 wlp1s0f0u10: renamed from wlan0
[ 1131.834234] audit: type=1130 audit(1569896348.109:56): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 1131.867763] ieee80211 phy1: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 1131.867797] ieee80211 phy1: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[ 1136.117228] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 1136.840084] audit: type=1131 audit(1569896353.117:57): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
---


I don't know if this is useful, but I do have another USB WiFi that uses another module but doesn't trigger the issue when I plug in:
lsusb output: 2357:010c TP-Link TL-WN722N v2

Below is the dmesg output when I plug in the TP-LINK:

---
[ 1697.619576] usb 1-7: new high-speed USB device number 9 using xhci_hcd
[ 1697.846601] usb 1-7: New USB device found, idVendor=2357, idProduct=010c, bcdDevice= 0.00
[ 1697.846603] usb 1-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1697.846605] usb 1-7: Product: 802.11n NIC
[ 1697.846606] usb 1-7: Manufacturer: Realtek
[ 1697.846607] usb 1-7: SerialNumber: 00E04C0001
[ 1697.858603] Chip Version Info: CHIP_8188E_Normal_Chip_TSMC_D_CUT_1T1R_RomVer(0)
[ 1698.262531] r8188eu 1-7:1.0 wlp1s0f0u7: renamed from wlan0
[ 1711.847379] MAC Address = c0:25:e9:1f:5c:3c
[ 1712.075372] R8188EU: indicate disassoc
---

Additionally, I see the warning when I plug in a Samsung Galaxy S5 device, but the warning appears only when I select certain "USB modes" in Android. Below you can see the dmesg log for each one of the USB modes:

--- dmesg log for "No data transfer" USB mode ---
[ 2523.666729] usb 1-7: USB disconnect, device number 32
[ 2524.157919] usb 1-7: new high-speed USB device number 33 using xhci_hcd
[ 2524.402980] usb 1-7: New USB device found, idVendor=18d1, idProduct=4ee7, bcdDevice= 2.32
[ 2524.402985] usb 1-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2524.402987] usb 1-7: Product: SAMSUNG_Android
[ 2524.402989] usb 1-7: Manufacturer: SAMSUNG
[ 2524.402991] usb 1-7: SerialNumber: 19a6e4d5


--- dmesg log for "PTP" USB mode ---
[ 2529.713679] usb 1-7: USB disconnect, device number 33
[ 2529.714018] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 2530.207346] usb 1-7: new high-speed USB device number 34 using xhci_hcd
[ 2530.452843] usb 1-7: New USB device found, idVendor=04e8, idProduct=6866, bcdDevice= 2.32
[ 2530.452847] usb 1-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2530.452850] usb 1-7: Product: SAMSUNG_Android
[ 2530.452852] usb 1-7: Manufacturer: SAMSUNG
[ 2530.452854] usb 1-7: SerialNumber: 19a6e4d5


--- dmesg log for "USB Tethering" USB mode ---
[ 2533.697810] usb 1-7: USB disconnect, device number 34
[ 2533.698156] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 2534.183649] usb 1-7: new high-speed USB device number 35 using xhci_hcd
[ 2534.430728] usb 1-7: New USB device found, idVendor=04e8, idProduct=6864, bcdDevice= 2.32
[ 2534.430730] usb 1-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2534.430732] usb 1-7: Product: SAMSUNG_Android
[ 2534.430733] usb 1-7: Manufacturer: SAMSUNG
[ 2534.430734] usb 1-7: SerialNumber: 19a6e4d5
[ 2534.464706] rndis_host 1-7:1.0: rndis media connect
[ 2534.510921] rndis_host 1-7:1.0 usb0: register 'rndis_host' at usb-0000:01:00.0-7, RNDIS device, 6e:36:5a:07:cd:06
[ 2534.552158] rndis_host 1-7:1.0 enp1s0f0u7: renamed from usb0
[ 2534.641284] audit: type=1130 audit(1569897751.028:73): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'


--- dmesg log for "File Transfer" USB mode ---
[ 2536.996343] usb 1-7: USB disconnect, device number 35
[ 2536.996703] rndis_host 1-7:1.0 enp1s0f0u7: unregister 'rndis_host' usb-0000:01:00.0-7, RNDIS device
[ 2536.997131] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 2537.014124] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[ 2537.500061] usb 1-7: new high-speed USB device number 36 using xhci_hcd
[ 2537.754684] usb 1-7: New USB device found, idVendor=04e8, idProduct=6860, bcdDevice= 4.00
[ 2537.754688] usb 1-7: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2537.754690] usb 1-7: Product: SAMSUNG_Android
[ 2537.754691] usb 1-7: Manufacturer: SAMSUNG
[ 2537.754692] usb 1-7: SerialNumber: 19a6e4d5



I hope these logs can help you to debug the issue.
Comment 64 Bernhard 2019-10-01 12:55:04 UTC
@Vinicius
Which motherboard do you have? 

Maybe the issue is related to 300-series motherboards...
Comment 65 Vinicius 2019-10-02 00:36:58 UTC
My motherboard is a Biostar B350GT3.
Comment 66 Bernhard 2019-10-03 12:14:08 UTC
I've sent another mail to the kernel usb mailing list, this time I got a response. I sent them kernel debugging logs/traces from xhci, unfortunately I have one of the devices where the error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state." doesn't get shown anymore, which makes it harder to find the cause for the problem.

@Michael
Could you do the following steps, upload the dmesg log and trace file somewhere and post the link to the files here(or send them directly to the mailing list yourself, if you prefer that)? When using one of the devices where the error gets shown obviously.


1. start the PC with an affected kernel, but without the affected device plugged in, then run the following commands as root
2. mount -t debugfs none /sys/kernel/debug
3. echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
4. echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
5. echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
6. echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
7. Plug in the affected device
8. Send output of dmesg and the /sys/kernel/debug/tracing/trace file(upload them somewhere, especially the trace file will be big)

Thanks in advance
Comment 67 Michael 2019-10-03 15:08:46 UTC
Here it goes:
https://www.sendspace.com/file/413hlj

ALFA AWUS036NH connected to USB 3.x port running stress test using hcxdumptool.
Comment 68 Michael 2019-10-05 12:07:06 UTC
If the error occurred once, xhci will be unusable for all other devices:

[20480.414467] usb 1-2: new full-speed USB device number 6 using xhci_hcd
[20480.717690] usb 1-2: New USB device found, idVendor=1546, idProduct=01a7, bcdDevice= 1.00
[20480.717695] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[20480.717698] usb 1-2: Product: u-blox 7 - GPS/GNSS Receiver
[20480.717700] usb 1-2: Manufacturer: u-blox AG - www.u-blox.com
[20480.726485] cdc_acm 1-2:1.0: ttyACM0: USB ACM device
[20480.760327] audit: type=1130 audit(1570274963.323:75): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=gpsdctl@ttyACM0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20486.732259] usb 1-2: USB disconnect, device number 6
[20486.746846] audit: type=1131 audit(1570274969.310:76): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=gpsdctl@ttyACM0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20487.027593] usb 1-2: new full-speed USB device number 7 using xhci_hcd
[20487.244298] usb 1-2: device descriptor read/64, error -71
[20487.540954] usb 1-2: device descriptor read/64, error -71
[20487.837571] usb 1-2: new full-speed USB device number 8 using xhci_hcd
[20487.991378] usb 1-2: device descriptor read/64, error -71
[20488.287616] usb 1-2: device descriptor read/64, error -71
[20488.394301] usb usb1-port2: attempt power cycle
[20489.037910] usb 1-2: new full-speed USB device number 9 using xhci_hcd
[20489.065424] usb 1-2: Device not responding to setup address.
[20489.271605] usb 1-2: Device not responding to setup address.
[20489.477900] usb 1-2: device not accepting address 9, error -71
Comment 69 Bernhard 2019-10-13 01:45:31 UTC
Good News!
After reading a bit in the xhci spec sheet I've figured out what the problem is. I've already created a patch and sent it to the mailing list, so it will hopefully be fixed in 5.4.

If you want to see or try the patch, you can find it here: https://marc.info/?l=linux-usb&m=157092844415047
Comment 70 Bernhard 2019-10-14 10:16:58 UTC
Nevermind, I've misunderstood something in the xhci spec sheet, apparently the xhci slot id isn't the same as the "TT Hub slot id".
Comment 71 Mathias Nyman 2019-10-15 14:50:55 UTC
Created attachment 285501 [details]
Patch adding doorbell tracing

Patch that adds even more tracing, this will show if xhci driver
correctly rings endpoint doorbell to start endpoint after soft retry
Comment 72 Bernhard 2019-10-15 18:25:33 UTC
Created attachment 285505 [details]
Dmesg log and trace file

Not sure how useful the logs from my device are, because the error "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" only gets shown after unplugging the device, but it looks like the error messages are mostly the same.

There are still some differences compared to Michaels device though:
Dmesg from him(the one I've also sent to the mailing list):
[   96.789306] xhci_hcd 0000:03:00.0: Resetting device with slot ID 4
[   96.789313] xhci_hcd 0000:03:00.0: // Ding dong!
[   96.791053] xhci_hcd 0000:03:00.0: Completed reset device command.
[   96.791111] xhci_hcd 0000:03:00.0: Successful reset device command.

compared to mine:
[   91.777887] xhci_hcd 0000:15:00.0: Resetting device with slot ID 4
[   91.777892] xhci_hcd 0000:15:00.0: // Ding dong!
[   91.777940] xhci_hcd 0000:15:00.0: Completed reset device command.
[   91.777950] xhci_hcd 0000:15:00.0: Can't reset device (slot ID 4) in default state
[   91.777951] xhci_hcd 0000:15:00.0: Not freeing device rings.
[   91.777956] xhci_hcd 0000:15:00.0: // Ding dong!
Comment 73 Michael 2019-10-16 09:12:30 UTC
Some times the warning doesn't appear. Instead the the driver crashed:
$ dmidecode
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: X555UB

$ cat /proc/cpuinfo
model name	: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz

device connected to USB 3:

[10799.155340] usb 1-2: reset high-speed USB device number 12 using xhci_hcd
[10799.310446] ieee80211 phy5: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[10799.364982] ieee80211 phy5: rt2x00_set_rf: Info - RF chipset 0005 detected
[10799.365842] ieee80211 phy5: Selected rate control algorithm 'minstrel_ht'
[10799.412456] rt2800usb 1-2:1.0 wlp0s20f0u2: renamed from wlan0
[10799.432236] ieee80211 phy5: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[10799.432263] ieee80211 phy5: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[10799.728051] ieee80211 phy5: rt2x00usb_vendor_request: Error - Vendor Request 0x06 failed for offset 0x0404 with error -71
[10800.745185] ieee80211 phy5: rt2800_wait_csr_ready: Error - Unstable hardware
[10800.745197] ieee80211 phy5: rt2800usb_set_device_state: Error - Device failed to enter state 4 (-5)
...
[11237.887923] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[11237.887929] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.

xhci is unstable - not the hardware.


The same device, connected to the same notebook, but to a USB 2 port:
[11243.042957] usb 1-3: reset high-speed USB device number 13 using xhci_hcd
[11243.197261] ieee80211 phy6: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[11243.251969] ieee80211 phy6: rt2x00_set_rf: Info - RF chipset 0005 detected
[11243.253036] ieee80211 phy6: Selected rate control algorithm 'minstrel_ht'
[11243.272919] rt2800usb 1-3:1.0 wlp0s20f0u3: renamed from wlan0
[11243.293056] ieee80211 phy6: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[11243.293082] ieee80211 phy6: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
Comment 74 Bernhard 2019-10-16 10:58:18 UTC
@Michael 
Could you apply the patch from Mathias(comment 71) to the kernel, enable xhci tracing(steps in comment 66), and upload the dmesg and trace file?
The patch adds more tracing which will make it easier to find the exact issue.
Comment 75 Mathias Nyman 2019-10-17 15:01:06 UTC
@Bernhard

Logs with added tracing show that driver does ring the endpoint doorbell, so
host controller should start processing the pending requests. Endpoint is
in stopped state as it should after endpoint reset, before we ring the doorbell.

So this part looks like hardware isn't doing its part.

when class driver starts cancelling transfer requests after some timeout time, we can see that the endpoint is in halted state. Host controller didn't issue any
event when endpoint turned into halted state. so driver is unaware of this state.

There is also a bug in the driver how the error is handled later. After the timeout, when class driver starts cancelling transfers, and xhci driver tries to stop the endpoint to cancel tranfers, it sohuld react to the context state error,
and check endpoint state, and handle the halted endpoint p
Driver should react to this, it should detect and handle the halted endpoint before attempting to set a new dequeue pointer. Now it just bluntly tries to set
a new dequeue pointer, and fails.

Details:
* We get a transaction error event, for transfer request (TRB) at 0xf61a0000

96.985254: xhci_handle_event: EVENT: TRB 00000000f61a0000 status 'USB Transaction Error' len 3860 slot 4 ep 3 type 'Transfer Event' flags e:C
96.985262: xhci_handle_transfer: BULK: Buffer 00000000ff32b04c length 3860 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:C

* We issue a Reset endpoint command to resolve the halted endpoint
 (move endpoint from halted to stopped state)

96.985264: xhci_queue_trb: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
96.985265: xhci_inc_enq: CMD 0000000090dd7572: enq 0x00000000fff7e550(0x00000000fff7e000) deq 0x00000000fff7e540(0x00000000fff7e000) segs 1 
96.985266: xhci_ring_host_doorbell: Ring doorbell for Command Ring 0 
96.985268: xhci_inc_deq: EVENT 000000005715d3fc: enq 0x00000000fff7c000(0x00000000fff7c000) deq 0x00000000fff7c4a0(0x00000000fff7c000) segs 1 

* Reset endpoint command successfully, endpoint state is now "stopped"

96.985395: xhci_handle_event: EVENT: TRB 00000000fff7e540 status 'Success' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:C
96.985396: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 3 flags C
96.985397: xhci_handle_cmd_reset_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512 deq 00000000f61a0001 avg \
trb len 0

* We ring the doorbell, xHC hardware should start processing events on ring,

96.985402: xhci_ring_ep_doorbell: Ring doorbell for Slot 4 ep1in 

* but nothing happends, this endpoint i silent until class driver starts cancelling Transfers ~25 seconds later

122.813121: xhci_urb_dequeue: ep1in-bulk: urb 00000000790ce3f7 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200
122.813134: xhci_dbg_cancel_urb: Cancel URB 00000000790ce3f7, dev 4, ep 0x81, starting at offset 0xf61a07f0

* stop the endpoint to cancel the pending transfers
 
122.813137: xhci_queue_trb: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
122.813137: xhci_inc_enq: CMD 0000000090dd7572: enq 0x00000000fff7e560(0x00000000fff7e000) deq 0x00000000fff7e550(0x00000000fff7e000) segs 1 
122.813138: xhci_ring_host_doorbell: Ring doorbell for Command Ring 0 
122.813141: xhci_urb_dequeue: ep1in-bulk: urb 000000009d35e579 pipe 3221259648 slot 4 length 0/3860 sgs 0/0 stream 0 flags 00010200

* Stop endpoit fails with "Context State Error"
 Endpoint is halted, not stopped, but driver doesn't handle this    

122.813605: xhci_handle_event: EVENT: TRB 00000000fff7e550 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c
122.813606: xhci_handle_command: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
122.813606: xhci_handle_cmd_stop_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512 deq 00000000f61a0001

* Driver tries to queue new dequeue pointer anyway

122.813605: xhci_handle_event: EVENT: TRB 00000000fff7e550 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c
122.813606: xhci_handle_command: CMD: Stop Ring Command: slot 4 sp 0 ep 3 flags C
122.813606: xhci_handle_cmd_stop_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512 deq 00000000f61a0001

* setting new dequeue pointer fails, endpoint is still halted

122.813820: xhci_handle_event: EVENT: TRB 00000000fff7e560 status 'Context State Error' len 0 slot 4 ep 0 type 'Command Completion Event' flags e:c
122.813820: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 00000000f61a0011 stream 0 slot 4 ep 3 flags C
122.813821: xhci_handle_cmd_set_deq: RS 00000 high-speed Ctx Entries 12 MEL 0 us Port# 8/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 4 State configured
122.813821: xhci_handle_cmd_set_deq_ep: State halted mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN burst 0 maxp 512 deq 00000000f61a0001
Comment 76 Mathias Nyman 2019-10-17 15:10:39 UTC
you could try to flush the endpoint ringing PCI write, and see if it helps
starting the endpint, but I don't have high hopes for this, a PCI write should
be flushed anyway, especially in 25 seconds.

maybe also add trace to re-read the endpoint state after flushing pci write:
(untested)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e74518e7de6a..20e209b64551 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -408,6 +408,7 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
        trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id));
 
        writel(DB_VALUE(ep_index, stream_id), db_addr);
+       readl(db_addr);
        /* The CPU has better things to do at this point than wait for a
         * write-posting flush.  It'll get there soon enough.
         */
@@ -1176,6 +1177,8 @@ static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id,
        /* if this was a soft reset, then restart */
        if ((le32_to_cpu(trb->generic.field[3])) & TRB_TSP)
                ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+
+       trace_xhci_handle_cmd_reset_ep(ep_ctx);
 }
Comment 77 Bernhard 2019-10-17 19:27:14 UTC
Created attachment 285527 [details]
Logs after flushing endpoint

I've applied the patch, but it seems like the endpoint doesn't get started even after flushing the endpoint.
Comment 78 Michael 2019-10-18 06:37:57 UTC
@Bernhard, can't do further going tests at the moment, because I'm on vacation until November.
Comment 79 Mathias Nyman 2019-10-29 16:17:25 UTC
Created attachment 285709 [details]
Patch handling halted endpoints at completion of stop endpoint command

Patch to handle a context state error at stop endpoint completion
where a endpoint TRB processing had a error/stall, and hardware halted the
endpoint just before completing normal stop endpoint command.

This won't fix the initial issue about endpoint not restarting after
soft retry, but it should resolve the flood of "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" messages

Code is completely untested as I can't trigger this codepath manually.
It requires hardware halting a endpoint just before completing a stop
endpoint command
Comment 80 Bernhard 2019-10-30 03:22:06 UTC
Created attachment 285713 [details]
Logs after applying the patch

After applying the patch the "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" messages are indeed gone, and the issue is (as expected) still there.
Comment 81 Mathias Nyman 2019-10-30 09:28:39 UTC
(In reply to Bernhard from comment #80)
> Created attachment 285713 [details]
> Logs after applying the patch

Did you by mistake attach some old logs?
Comment 82 Bernhard 2019-10-30 12:59:58 UTC
Created attachment 285717 [details]
Logs after applying the patch

Yes, looks like I've uploaded the zip file from the wrong folder. The new file should be the right one.
Comment 83 Michael 2019-10-30 18:42:43 UTC
The "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" doesn't flood the log file. The message appear only if the device is disconnected (after xhci died):

Connected the device:
[42407.193511] usb 1-2: ath9k_htc: USB layer deinitialized
[42410.956671] usb 1-2: new high-speed USB device number 9 using xhci_hcd
[42411.214091] usb 1-2: New USB device found, idVendor=0cf3, idProduct=9271, bcdDevice= 1.08
[42411.214095] usb 1-2: New USB device strings: Mfr=16, Product=32, SerialNumber=48
[42411.214098] usb 1-2: Product: USB2.0 WLAN
[42411.214100] usb 1-2: Manufacturer: ATHEROS
[42411.214102] usb 1-2: SerialNumber: 12345
[42411.232116] usb 1-2: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[42412.308181] usb 1-2: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[42412.558320] ath9k_htc 1-2:1.0: ath9k_htc: HTC initialized with 33 credits
[42412.784721] ath9k_htc 1-2:1.0: ath9k_htc: FW Version: 1.4
[42412.784724] ath9k_htc 1-2:1.0: FW RMW support: On
[42412.784726] ath: EEPROM regdomain: 0x809c
[42412.784727] ath: EEPROM indicates we should expect a country code
[42412.784728] ath: doing EEPROM country->regdmn map search
[42412.784729] ath: country maps to regdmn code: 0x52
[42412.784730] ath: Country alpha2 being used: CN
[42412.784731] ath: Regpair used: 0x52
[42412.788460] ieee80211 phy2: Atheros AR9271 Rev:1
[42412.791852] ath9k_htc 1-2:1.0 wlp3s0f0u2: renamed from wlan0

and everything is looking fine.

after running the device for a few minutes
[42445.806367] device wlp3s0f0u2 entered promiscuous mode

we receive the first indication that xhci died
[42911.706734] ath: phy2: Unable to set channel

and the device stops working. There are absolutely no other error messages, shwon by dmesg or the running application (in this case hcxdumptool).

Now we disconnect the device and got the final warning:
[43082.759737] usb 1-2: USB disconnect, device number 9
[43082.760434] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[43082.760607] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[43082.764275] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
[43082.784722] device wlp3s0f0u2 left promiscuous mode

At this point xhci is dead. No other device connected to the same port is working.
Comment 84 Mathias Nyman 2019-10-31 09:02:56 UTC
(In reply to Michael from comment #83)
> The "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
> doesn't flood the log file. The message appear only if the device is
> disconnected (after xhci died):
> 

Could you take full logs and traces of this:

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
< Trigger the issue >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace

In Bernhards case there were issues both with hardware not starting the
ring after soft retry, and software not handling context state error when stopping an endpoint. Second issue can be fixed in driver.
Comment 85 Michael 2019-10-31 09:35:59 UTC
I try to trigger it. That isn't so easy, because different devices showing different behavior and the occurrence of the issue is totally random. Sometimes it happens immediately after connecting the device and sometimes it happens after a while or heavy stressing the device.

BTW:
mount -t debugfs none /sys/kernel/debug
is done by default here.
Comment 86 Michael 2019-10-31 09:55:57 UTC
I'm doing several runs, using different devices. So we have the chance to compare them against each other.
Comment 87 Michael 2019-10-31 10:16:24 UTC
Here go.
https://www.sendspace.com/file/8ybhnk

Unfortunately it looks like this stress test was to heavy for dmesg's ringbuffer.
Comment 88 Michael 2019-11-01 13:17:38 UTC
After several tests, I assume that this warning:
"rt2x00queue_flush_queue: Warning - Queue 14 failed to flush"
is also related to the xhci issue. I don't think that the issue is related to powermanagement (https://bugzilla.kernel.org/show_bug.cgi?id=61621), because pwrmgt is disabled, here.


affected: rt2800usb
[ 7384.825764] usb 1-2: new high-speed USB device number 8 using xhci_hcd
[ 7385.069208] usb 1-2: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[ 7385.069211] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 7385.069214] usb 1-2: Product: 802.11 n WLAN
[ 7385.069216] usb 1-2: Manufacturer: Ralink
[ 7385.069217] usb 1-2: SerialNumber: 1.0
[ 7385.280539] usb 1-2: reset high-speed USB device number 8 using xhci_hcd
[ 7385.526260] ieee80211 phy3: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[ 7386.204480] ieee80211 phy3: rt2x00_set_rf: Info - RF chipset 0005 detected
[ 7386.210679] ieee80211 phy3: Selected rate control algorithm 'minstrel_ht'
[ 7386.227147] rt2800usb 1-2:1.0 wlp3s0f0u2: renamed from wlan0
[ 7386.227812] audit: type=1130 audit(1572610437.724:150): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 7387.737404] ieee80211 phy3: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 7387.737440] ieee80211 phy3: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
Comment 89 Michael 2019-11-02 07:55:40 UTC
The bad thing on this issue is that it isn't detectable by an application, while the device is plugged in. The device doesn't start or stops working without any warning. The application says every thing is fine and dmesg showing absolutely no warning.
Only when the device is plugged out, we get a bunch of warnings, depending on the device (tested on INTEL and AMD systems):
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
"rt2x00queue_flush_queue: Warning - Queue 14 failed to flush"
"rx urb failed: -71"
"A Set TR Deq Ptr command is pending."
and more (bad cable, hardware error, ....).


BTW:
I'm running kernel 4.19.80 in parallel and every thing is fine here. This issue appeared for the first time on 4.20.
Comment 90 Mathias Nyman 2019-11-04 09:36:02 UTC
Seems that it was a known issue that xHCI on AMD platforms can fail to restart an endpoint if it wasn't running when the stop command was issued. This also applies to Berhards case where the endpoint stop command raced with an error halting the endpoint.
See patch:

commit 28a2369f7d72ece55089f33e7d7b9c1223673cc3
Author: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Date:   Thu Jul 20 14:48:28 2017 +0300

    usb: xhci: Issue stop EP command only when the EP state is running
    
    on AMD platforms with SNPS 3.1 USB controller if stop endpoint command is
    issued the controller does not respond, when the EP is not in running
    state. HW completes the command execution and reports
    "Context State Error" completion code. This is as per the spec. However
    HW on receiving the second command additionally marks EP to Flow control
    state in HW which is RTL bug. This bug causes the HW not to respond
    to any further doorbells that are rung by the driver. This makes the EP
    to not functional anymore and causes gross functional failures.
    
    As a workaround, not to hit this problem, it's better to check the EP state
    and issue a stop EP command only when the EP is in running state.
    
    As a sidenote, even with this patch there is still a possibility of
    triggering the RTL bug if the context state races with the stop endpoint
    command as described in xHCI spec 4.6.9
    
    [code simplification and reworded sidenote in commit message -Mathias]
    Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
    Signed-off-by: Nehal Shah <Nehal-bakulchandra.Shah@amd.com>
    Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Does anybody have a link to that errata?
Comment 91 Michael 2019-11-04 09:42:59 UTC
I can't confirm that, because this issue happens on all platforms if the device is connected to an USB 3 port:
RYZEN 1700, MSI X370 KRAIT
INTEL I5-6200U, ASUS X555U (notebook)
INTEL i7-3930K, ASUS P9X79
Comment 92 Michael 2019-11-04 09:52:22 UTC
The only systems which are running without this issue are my Raspberry Pi's:
$ uname -r
4.19.80-2-ARCH
Comment 93 Michael 2019-11-05 10:56:05 UTC
Generating a lot of traffic on the socket, causes xhci to die very early.

Here it happened on an AMD RYZEN system, running hcxdumptool:
[ 8316.184018] device wlp3s0f0u2 entered promiscuous mode
[ 8372.392206] ath: phy0: Unable to remove monitor interface at idx: 0
[ 8374.525500] ath: phy0: Unable to remove station entry for monitor mode
[ 8381.692889] usb 1-2: USB disconnect, device number 5
[ 8381.693576] xhci_hcd 0000:03:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or 

and here on an INTEL notebook running NetworkManager:
[  166.174157] usb 1-1: new high-speed USB device number 8 using xhci_hcd
[  166.330703] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[  166.330713] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  166.330719] usb 1-1: Product: WiFi
[  166.330725] usb 1-1: Manufacturer: MediaTek
[  166.330729] usb 1-1: SerialNumber: 1.0
[  166.458249] usb 1-1: reset high-speed USB device number 8 using xhci_hcd
[  166.607874] usb 1-1: ASIC revision: 76100002 MAC revision: 76502000
[  167.669762] usb 1-1: EEPROM ver:02 fae:01
[  203.846465] mt76u_complete_rx: 13 callbacks suppressed
[  203.846479] usb 1-1: rx urb failed: -71
[  203.846552] usb 1-1: rx urb failed: -71
[  203.846614] usb 1-1: rx urb failed: -71
[  203.846667] usb 1-1: rx urb failed: -71
[  203.846712] usb 1-1: rx urb failed: -71
[  203.846799] usb 1-1: rx urb failed: -71
[  203.846874] usb 1-1: rx urb failed: -71
[  203.846924] usb 1-1: rx urb failed: -71
[  203.846998] usb 1-1: rx urb failed: -71
[  203.847069] usb 1-1: rx urb failed: -71
[  203.848249] usb 1-1: USB disconnect, device number 8
[  203.850032] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[  203.850040] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
Comment 94 Michael 2019-11-06 13:02:11 UTC
Running really heavy traffic, first xhci caused the driver to crash, than the whole system crashed:
System: ASUS X555UB (INTEL)

[ 1564.588784] mt7601u 1-2:1.0: Error: TSSI upper saturation
[ 1614.221860] ------------[ cut here ]------------
[ 1614.221923] WARNING: CPU: 1 PID: 0 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[ 1614.221924] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink uas usb_storage ccm mt7601u hid_generic usbhid fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev snd_soc_skl x86_pkg_temp_thermal intel_powerclamp snd_soc_hdac_hda coretemp mc kvm_intel snd_hda_ext_core rtl8821ae snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi kvm btcoexist snd_soc_core snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev ledtrig_audio snd_compress mousedev ac97_bus snd_pcm_dmaengine irqbypass rtsx_usb_ms rtl_pci r8169 memstick rtlwifi i915 btusb mac80211 btrtl ipmi_devintf realtek ipmi_msghandler libphy i2c_algo_bit cfg80211 btbcm crct10dif_pclmul drm_kms_helper crc32_pclmul btintel ghash_clmulni_intel snd_hda_intel drm bluetooth snd_hda_codec libarc4 aesni_intel asus_nb_wmi snd_hda_core asus_wmi intel_gtt aes_x86_64
[ 1614.221947]  intel_rapl_msr agpgart ecdh_generic crypto_simd sparse_keymap i2c_hid cryptd rfkill iTCO_wdt mei_hdcp hid snd_hwdep glue_helper syscopyarea ecc sysfillrect iTCO_vendor_support sysimgblt fb_sys_fops snd_pcm pcspkr intel_cstate intel_uncore mxm_wmi intel_rapl_perf input_leds elan_i2c tpm_crb snd_timer tpm_tis snd tpm_tis_core tpm int3403_thermal soundcore intel_xhci_usb_role_switch evdev i2c_i801 roles processor_thermal_device mei_me mei rng_core idma64 mac_hid intel_lpss_pci intel_lpss intel_rapl_common int340x_thermal_zone intel_soc_dts_iosf intel_pch_thermal int3400_thermal acpi_thermal_rel asus_wireless wmi ac battery sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb sr_mod cdrom sd_mod serio_raw atkbd ahci libps2 libahci libata xhci_pci crc32c_intel scsi_mod xhci_hcd i8042 serio
[ 1614.221975] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P        W  OE     5.3.8-arch1-1 #1
[ 1614.221976] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[ 1614.221993] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[ 1614.221994] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 40 a1 d1 b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 7b c1 c0 e8 b7 53 4f d1 <0f> 0b 48 89 ef e8 7f 28 b4 d1 e9 d1 5b fb ff 48 c7 c7 60 7b c1 c0
[ 1614.221995] RSP: 0018:ffffa50840120e10 EFLAGS: 00010246
[ 1614.221996] RAX: 0000000000000024 RBX: ffff92206bc407a0 RCX: 0000000000000000
[ 1614.221997] RDX: 0000000000000000 RSI: ffff92207ba97708 RDI: 00000000ffffffff
[ 1614.221998] RBP: ffff922034510400 R08: 0000000000001137 R09: 0000000000000001
[ 1614.221998] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 1614.221999] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000000
[ 1614.222000] FS:  0000000000000000(0000) GS:ffff92207ba80000(0000) knlGS:0000000000000000
[ 1614.222000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1614.222001] CR2: 00007f87cc12e000 CR3: 00000001d180a001 CR4: 00000000003606e0
[ 1614.222002] Call Trace:
[ 1614.222005]  <IRQ>
[ 1614.222010]  ? __mod_zone_page_state+0x66/0xa0
[ 1614.222014]  ? kmem_cache_free_bulk+0x2e1/0x450
[ 1614.222025]  ieee80211_tasklet_handler+0xbc/0xd0 [mac80211]
[ 1614.222028]  tasklet_action_common.isra.0+0x4a/0xb0
[ 1614.222031]  __do_softirq+0x114/0x332
[ 1614.222033]  irq_exit+0xd4/0xf0
[ 1614.222034]  do_IRQ+0xbb/0x130
[ 1614.222037]  common_interrupt+0xf/0xf
[ 1614.222038]  </IRQ>
[ 1614.222041] RIP: 0010:cpuidle_enter_state+0xc4/0x480
[ 1614.222042] Code: e8 41 d6 9b ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 93 03 00 00 31 ff e8 a3 6b a2 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 be 02 00 00 49 63 cc 4c 2b 6c 24 10 48 8d 04 49 48
[ 1614.222042] RSP: 0018:ffffa508400cbe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[ 1614.222044] RAX: ffff92207ba80000 RBX: ffffffff932bcd60 RCX: 000000000000001f
[ 1614.222044] RDX: 0000000000000000 RSI: 00000000355556ca RDI: 0000000000000000
[ 1614.222045] RBP: ffff92207bab4200 R08: 00000177d71e2c19 R09: 0000000000001373
[ 1614.222046] R10: ffff92207baa9784 R11: ffff92207baa9764 R12: 0000000000000004
[ 1614.222046] R13: 00000177d71e2c19 R14: 0000000000000004 R15: ffff92207a148000
[ 1614.222049]  ? cpuidle_enter_state+0x9f/0x480
[ 1614.222051]  cpuidle_enter+0x29/0x40
[ 1614.222053]  do_idle+0x1de/0x260
[ 1614.222055]  cpu_startup_entry+0x19/0x20
[ 1614.222058]  start_secondary+0x185/0x1d0
[ 1614.222060]  secondary_startup_64+0xa4/0xb0
[ 1614.222062] ---[ end trace dee13b04de319021 ]---
[ 1614.222429] ------------[ cut here ]------------
Comment 95 Mathias Nyman 2019-11-08 14:43:07 UTC
Michael, I've been looking at the traces and can't find anything xhci related in your logs that could cause this. xhci isn't dying, crashig or causing other drivers to crash in the above logs either. It doesn't seem related to Bernhards case. 

Have you tried bisecting what patch causes the problems between 4.19 and 4.20 kernels?

The "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is related to unplugging of the device. In short, while unplugging the device we get a transaction error for each running endpoint before the hub thread notices the disconnect, so xhci driver tries to recover the endpoint before everything is tore down and returned for the device. It's should be harmless at this stage.

There are several disconnect events initiated by device, or then actual physical
disconnect, could be related to firmware loading?

Traces also show many bulk-in urbs being queued but none completed until cancel at disconnect. so we are waiting 49 seconds to get data from the device before disconnect.

URB b2383f4 TRB is queued from ep4in, waiting for data from device:

  13714.468994: xhci_urb_enqueue: ep4in-bulk: urb 000000000b2383f4 pipe 3221360512 slot 14 length 0/4096 sgs 1/1 stream 0 flags 00040200
  13714.468996: xhci_queue_trb: BULK: Buffer 00000000ff5df000 length 4096 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:I:e:c
  13714.468996: xhci_inc_enq: BULK 0000000096dfdec9: enq 0x00000000feaec010(0x00000000feaec000) deq 0x00000000feaec000(0x00000000feaec000) segs 2 stream 0 free_trbs 508 bounce 512\

49 seconds later transaction error on ep4in on disconnect:

   13763.472759: xhci_handle_event: EVENT: TRB 00000000feaec000 status 'USB Transaction Error' len 4096 slot 14 ep 9 type 'Transfer Event' flags e:c
   ...   
   13763.472787: xhci_handle_event: EVENT: TRB 000000000a000000 status 'Success' len 0 slot 0 ep 0 type 'Port Status Change Event' flags e:c
   13763.472792: xhci_handle_port_status: port-1: Powered Not-connected Disabled Link:RxDetect PortSpeed:0 Change: CSC Wake: 

After this urb b2383f4 is canceled and given back:

  13763.474221: xhci_urb_dequeue: ep4in-bulk: urb 000000000b2383f4 pipe 3221360512 slot 14 length 0/4096 sgs 1/1 stream 0 flags 00040200
  13763.474225: xhci_dbg_cancel_urb: Cancel URB 000000000b2383f4, dev 2, ep 0x84, starting at offset 0xfeaec000
   ...
   13763.474673: xhci_urb_giveback: ep4in-bulk: urb 000000000b2383f4 pipe 3221360512 slot 14 length 0/4096 sgs 1/1 stream 0 flags 00040200
Comment 96 Michael 2019-11-08 15:24:58 UTC
Mathias, it is really hard to find the cause of that issue. dmesg is showing nothing until something crashed. I'm not able to detect the cause:
https://bugzilla.kernel.org/show_bug.cgi?id=202541#c89
At this point, I know:
- the driver stops working (independent of the driver - rt2800usb as well as mt76) 
- no warning, no error message)
- the system became instable (AMD as well as INTEL)
- kernel 4.20 up to 5.3

It is very unlikely that the driver caused this, because it doesn't happen on USB2 and it happens on different drivers and different systems.

I can try to bisect to identify the patch, but that will take a while.
Comment 97 Michael 2019-11-08 15:47:33 UTC
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" only appeared when something went wrong.
If everything's fine and I plug out the device, this warning is not shown.

Here are the results from another device
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter
running on an INTEL system.

dmesg output if everything is ok:
[14492.749187] usb 1-1: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[14492.749197] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[14492.749203] usb 1-1: Product: 802.11 n WLAN
[14492.749208] usb 1-1: Manufacturer: Ralink
[14492.749213] usb 1-1: SerialNumber: 1.0
[14492.881097] usb 1-1: reset high-speed USB device number 20 using xhci_hcd
[14493.035766] ieee80211 phy11: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[14493.090480] ieee80211 phy11: rt2x00_set_rf: Info - RF chipset 0005 detected
[14493.091489] ieee80211 phy11: Selected rate control algorithm 'minstrel_ht'
[14493.113656] rt2800usb 1-1:1.0 wlp0s20f0u1: renamed from wlan0
[14493.116525] audit: type=1130 audit(1573227592.687:137): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[14493.141430] ieee80211 phy11: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[14493.141456] ieee80211 phy11: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[14498.126056] audit: type=1131 audit(1573227597.697:138): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[14506.300174] usb 1-1: USB disconnect, device number 20
[14506.463603] audit: type=1130 audit(1573227606.037:139): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'


demsg if the device stops working and something went wrong:
[14565.489976] usb 1-1: new high-speed USB device number 21 using xhci_hcd
[14565.648114] usb 1-1: New USB device found, idVendor=148f, idProduct=3070, bcdDevice= 1.01
[14565.648124] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[14565.648130] usb 1-1: Product: 802.11 n WLAN
[14565.648135] usb 1-1: Manufacturer: Ralink
[14565.648140] usb 1-1: SerialNumber: 1.0
[14565.773934] usb 1-1: reset high-speed USB device number 21 using xhci_hcd
[14565.927986] ieee80211 phy12: rt2x00_set_rt: Info - RT chipset 3070, rev 0201 detected
[14565.982385] ieee80211 phy12: rt2x00_set_rf: Info - RF chipset 0005 detected
[14565.983295] ieee80211 phy12: Selected rate control algorithm 'minstrel_ht'
[14566.002249] rt2800usb 1-1:1.0 wlp0s20f0u1: renamed from wlan0
[14566.004829] audit: type=1130 audit(1573227665.577:141): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[14566.018308] ieee80211 phy12: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[14566.018335] ieee80211 phy12: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.36
[14571.010706] audit: type=1131 audit(1573227670.584:142): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[14576.147069] ------------[ cut here ]------------
[14576.147136] WARNING: CPU: 1 PID: 0 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[14576.147137] Modules linked in: rt2800usb rt2x00usb rt2800lib rt2x00lib ath9k_htc ath9k_common ath9k_hw ath mt76x0u mt76x0_common mt76x02_usb mt76_usb mt76x02_lib mt76 ccm fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops snd_soc_skl videobuf2_v4l2 snd_soc_hdac_hda videobuf2_common snd_hda_ext_core videodev snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp mc snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi snd_hda_codec_realtek rtsx_usb_ms rtl8821ae snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_codec_generic memstick ledtrig_audio r8169 x86_pkg_temp_thermal intel_powerclamp coretemp btcoexist rtl_pci joydev kvm_intel rtlwifi realtek mousedev i915 libphy btusb btrtl btbcm btintel kvm mac80211 ipmi_devintf bluetooth ipmi_msghandler i2c_algo_bit irqbypass snd_hda_intel drm_kms_helper snd_hda_codec drm cfg80211 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core asus_nb_wmi asus_wmi
[14576.147158]  aesni_intel intel_gtt libarc4 snd_hwdep intel_rapl_msr agpgart snd_pcm aes_x86_64 crypto_simd mei_hdcp syscopyarea cryptd sparse_keymap i2c_hid ecdh_generic sysfillrect glue_helper iTCO_wdt sysimgblt rfkill iTCO_vendor_support fb_sys_fops intel_cstate intel_uncore ecc intel_rapl_perf mxm_wmi snd_timer snd intel_xhci_usb_role_switch input_leds pcspkr soundcore roles elan_i2c tpm_crb evdev processor_thermal_device tpm_tis intel_rapl_common tpm_tis_core mei_me int3403_thermal tpm int340x_thermal_zone mac_hid i2c_i801 mei rng_core intel_soc_dts_iosf intel_pch_thermal idma64 intel_lpss_pci intel_lpss int3400_thermal acpi_thermal_rel wmi asus_wireless ac battery sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sr_mod cdrom sd_mod ahci serio_raw libahci atkbd libps2 libata xhci_pci crc32c_intel scsi_mod xhci_hcd i8042 serio
[14576.147187] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P        W  OE     5.3.8-arch1-1 #1
[14576.147187] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[14576.147203] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[14576.147205] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 40 8e f1 b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 7b 94 c0 e8 b7 53 3c f1 <0f> 0b 48 89 ef e8 7f 28 a1 f1 e9 d1 5b fb ff 48 c7 c7 60 7b 94 c0
[14576.147206] RSP: 0018:ffffab7a00120e10 EFLAGS: 00010246
[14576.147207] RAX: 0000000000000024 RBX: ffff9656387207a0 RCX: 0000000000000000
[14576.147207] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 00000000ffffffff
[14576.147208] RBP: ffff965615251400 R08: 0000000000000d40 R09: 0000000000000001
[14576.147209] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[14576.147209] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000000
[14576.147210] FS:  0000000000000000(0000) GS:ffff96563ba80000(0000) knlGS:0000000000000000
[14576.147211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14576.147211] CR2: 00005642ee566100 CR3: 000000019b00a001 CR4: 00000000003606e0
[14576.147212] Call Trace:
[14576.147214]  <IRQ>
[14576.147219]  ? __mod_zone_page_state+0x66/0xa0
[14576.147222]  ? _raw_spin_unlock_irqrestore+0x20/0x40
[14576.147224]  ? kmem_cache_free_bulk+0x2e1/0x450
[14576.147234]  ieee80211_tasklet_handler+0xbc/0xd0 [mac80211]
[14576.147237]  tasklet_action_common.isra.0+0x4a/0xb0
[14576.147239]  __do_softirq+0x114/0x332
[14576.147241]  irq_exit+0xd4/0xf0
[14576.147242]  do_IRQ+0xbb/0x130
[14576.147245]  common_interrupt+0xf/0xf
[14576.147246]  </IRQ>
[14576.147248] RIP: 0010:cpuidle_enter_state+0xc4/0x480
[14576.147249] Code: e8 41 d6 9b ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 93 03 00 00 31 ff e8 a3 6b a2 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 be 02 00 00 49 63 cc 4c 2b 6c 24 10 48 8d 04 49 48
[14576.147250] RSP: 0018:ffffab7a000cbe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[14576.147251] RAX: ffff96563ba80000 RBX: ffffffffb2ebcd60 RCX: 000000000000001f
[14576.147252] RDX: 0000000000000000 RSI: 00000000355556ca RDI: 0000000000000000
[14576.147252] RBP: ffff96563bab4200 R08: 00000d41c610ca50 R09: 0000000000000000
[14576.147253] R10: ffff96563baa9784 R11: ffff96563baa9764 R12: 0000000000000008
[14576.147254] R13: 00000d41c610ca50 R14: 0000000000000008 R15: ffff96563a149ec0
[14576.147256]  ? cpuidle_enter_state+0x9f/0x480
[14576.147258]  cpuidle_enter+0x29/0x40
[14576.147260]  do_idle+0x1de/0x260
[14576.147262]  cpu_startup_entry+0x19/0x20
[14576.147265]  start_secondary+0x185/0x1d0
[14576.147267]  secondary_startup_64+0xa4/0xb0
[14576.147269] ---[ end trace 6c01ee4ff02c47b6 ]---
[14576.147743] ------------[ cut here ]------------
[14576.147775] WARNING: CPU: 1 PID: 272 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[14576.147775] Modules linked in: rt2800usb rt2x00usb rt2800lib rt2x00lib ath9k_htc ath9k_common ath9k_hw ath mt76x0u mt76x0_common mt76x02_usb mt76_usb mt76x02_lib mt76 ccm fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops snd_soc_skl videobuf2_v4l2 snd_soc_hdac_hda videobuf2_common snd_hda_ext_core videodev snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp mc snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi snd_hda_codec_realtek rtsx_usb_ms rtl8821ae snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_codec_generic memstick ledtrig_audio r8169 x86_pkg_temp_thermal intel_powerclamp coretemp btcoexist rtl_pci joydev kvm_intel rtlwifi realtek mousedev i915 libphy btusb btrtl btbcm btintel kvm mac80211 ipmi_devintf bluetooth ipmi_msghandler i2c_algo_bit irqbypass snd_hda_intel drm_kms_helper snd_hda_codec drm cfg80211 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core asus_nb_wmi asus_wmi
[14576.147791]  aesni_intel intel_gtt libarc4 snd_hwdep intel_rapl_msr agpgart snd_pcm aes_x86_64 crypto_simd mei_hdcp syscopyarea cryptd sparse_keymap i2c_hid ecdh_generic sysfillrect glue_helper iTCO_wdt sysimgblt rfkill iTCO_vendor_support fb_sys_fops intel_cstate intel_uncore ecc intel_rapl_perf mxm_wmi snd_timer snd intel_xhci_usb_role_switch input_leds pcspkr soundcore roles elan_i2c tpm_crb evdev processor_thermal_device tpm_tis intel_rapl_common tpm_tis_core mei_me int3403_thermal tpm int340x_thermal_zone mac_hid i2c_i801 mei rng_core intel_soc_dts_iosf intel_pch_thermal idma64 intel_lpss_pci intel_lpss int3400_thermal acpi_thermal_rel wmi asus_wireless ac battery sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sr_mod cdrom sd_mod ahci serio_raw libahci atkbd libps2 libata xhci_pci crc32c_intel scsi_mod xhci_hcd i8042 serio
[14576.147809] CPU: 1 PID: 272 Comm: systemd-journal Tainted: P        W  OE     5.3.8-arch1-1 #1
[14576.147809] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[14576.147823] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[14576.147824] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 40 8e f1 b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 7b 94 c0 e8 b7 53 3c f1 <0f> 0b 48 89 ef e8 7f 28 a1 f1 e9 d1 5b fb ff 48 c7 c7 60 7b 94 c0
[14576.147825] RSP: 0000:ffffab7a00120e10 EFLAGS: 00010246
[14576.147826] RAX: 0000000000000024 RBX: ffff9656387207a0 RCX: 0000000000000000
[14576.147827] RDX: 0000000000000000 RSI: ffff96563ba97708 RDI: 00000000ffffffff
[14576.147827] RBP: ffff965615251300 R08: 0000000000000d6c R09: 0000000000000001
[14576.147828] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[14576.147828] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000000
[14576.147829] FS:  00007f875aea1840(0000) GS:ffff96563ba80000(0000) knlGS:0000000000000000
[14576.147830] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14576.147830] CR2: 00007f875a305a58 CR3: 0000000276ed8003 CR4: 00000000003606e0
[14576.147831] Call Trace:
[14576.147832]  <IRQ>
[14576.147834]  ? __mod_zone_page_state+0x66/0xa0
[14576.147837]  ? kmem_cache_free_bulk+0x2e1/0x450
[14576.147846]  ieee80211_tasklet_handler+0xbc/0xd0 [mac80211]
[14576.147848]  tasklet_action_common.isra.0+0x4a/0xb0
[14576.147850]  __do_softirq+0x114/0x332
[14576.147851]  irq_exit+0xd4/0xf0
[14576.147853]  do_IRQ+0xbb/0x130
[14576.147855]  common_interrupt+0xf/0xf
[14576.147855]  </IRQ>
[14576.147856] RIP: 0033:0x7f875c37bc23
[14576.147858] Code: 89 df ff 15 df 0b 24 00 85 c0 78 53 4c 8b 54 24 38 49 8b 42 38 48 85 c0 0f 85 f9 00 00 00 49 89 6a 28 48 83 c0 01 49 83 c7 01 <49> 89 42 38 4d 39 fc 75 aa 48 83 7c 24 08 00 74 0d 48 8b 44 24 28
[14576.147858] RSP: 002b:00007ffe369e9c20 EFLAGS: 00010202 ORIG_RAX: ffffffffffffffda
[14576.147859] RAX: 00000000000001ef RBX: 0000556fa6ff9890 RCX: 0000000000000000
[14576.147860] RDX: 0000000000000006 RSI: 00007f875c4b43fc RDI: 00007f87585160f0
[14576.147860] RBP: 000000000150d740 R08: 0000000000001608 R09: 0000556fa6ff8350
[14576.147861] R10: 00007f875a305a20 R11: 00000000000038f0 R12: 0000000000000009
[14576.147862] R13: 00007f8759bae740 R14: 00007ffe369e9c58 R15: 0000000000000007
[14576.147863] ---[ end trace 6c01ee4ff02c47b7 ]---
[14576.147981] ------------[ cut here ]------------
[14576.148004] WARNING: CPU: 1 PID: 272 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[14576.148004] Modules linked in: rt2800usb rt2x00usb rt2800lib rt2x00lib ath9k_htc ath9k_common ath9k_hw ath mt76x0u mt76x0_common mt76x02_usb mt76_usb mt76x02_lib mt76 ccm fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops snd_soc_skl videobuf2_v4l2 snd_soc_hdac_hda videobuf2_common snd_hda_ext_core videodev snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp mc snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi snd_hda_codec_realtek rtsx_usb_ms rtl8821ae snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_codec_generic memstick ledtrig_audio r8169 x86_pkg_temp_thermal intel_powerclamp coretemp btcoexist rtl_pci joydev kvm_intel rtlwifi realtek mousedev i915 libphy btusb btrtl btbcm btintel kvm mac80211 ipmi_devintf bluetooth ipmi_msghandler i2c_algo_bit irqbypass snd_hda_intel drm_kms_helper snd_hda_codec drm cfg80211 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core asus_nb_wmi asus_wmi
[14576.148018]  aesni_intel intel_gtt libarc4 snd_hwdep intel_rapl_msr agpgart snd_pcm aes_x86_64 crypto_simd mei_hdcp syscopyarea cryptd sparse_keymap i2c_hid ecdh_generic sysfillrect glue_helper iTCO_wdt sysimgblt rfkill iTCO_vendor_support fb_sys_fops intel_cstate intel_uncore ecc intel_rapl_perf mxm_wmi snd_timer snd intel_xhci_usb_role_switch input_leds pcspkr soundcore roles elan_i2c tpm_crb evdev processor_thermal_device tpm_tis intel_rapl_common tpm_tis_core mei_me int3403_thermal tpm int340x_thermal_zone mac_hid i2c_i801 mei rng_core intel_soc_dts_iosf intel_pch_thermal idma64 intel_lpss_pci intel_lpss int3400_thermal acpi_thermal_rel wmi asus_wireless ac battery sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sr_mod cdrom sd_mod ahci serio_raw libahci atkbd libps2 libata xhci_pci crc32c_intel scsi_mod xhci_hcd i8042 serio
[14576.148034] CPU: 1 PID: 272 Comm: systemd-journal Tainted: P        W  OE     5.3.8-arch1-1 #1
[14576.148034] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[14576.148047] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[14576.148049] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 40 8e f1 b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 7b 94 c0 e8 b7 53 3c f1 <0f> 0b 48 89 ef e8 7f 28 a1 f1 e9 d1 5b fb ff 48 c7 c7 60 7b 94 c0
[14576.148049] RSP: 0018:ffffab7a00120e10 EFLAGS: 00010246
[14576.148050] RAX: 0000000000000024 RBX: ffff9656387207a0 RCX: 0000000000000000
[14576.148051] RDX: 0000000000000000 RSI: 0000000000000086 RDI: 00000000ffffffff
[14576.148051] RBP: ffff9655c18b8f00 R08: 0000000000000d91 R09: 0000000000000001
[14576.148052] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[14576.148052] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000000
[14576.148053] FS:  00007f875aea1840(0000) GS:ffff96563ba80000(0000) knlGS:0000000000000000
[14576.148054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14576.148055] CR2: 00007f875a045e10 CR3: 0000000276ed8003 CR4: 00000000003606e0
[14576.148055] Call Trace:
[14576.148056]  <IRQ>
[14576.148058]  ? __mod_zone_page_state+0x66/0xa0
[14576.148060]  ? kmem_cache_free_bulk+0x2e1/0x450
[14576.148070]  ieee80211_tasklet_handler+0xbc/0xd0 [mac80211]
[14576.148071]  tasklet_action_common.isra.0+0x4a/0xb0
[14576.148073]  __do_softirq+0x114/0x332
[14576.148074]  irq_exit+0xd4/0xf0
[14576.148075]  do_IRQ+0xbb/0x130
[14576.148077]  common_interrupt+0xf/0xf
[14576.148078]  </IRQ>
[14576.148080] RIP: 0010:devkmsg_poll+0x6f/0xa0
[14576.148081] Code: 80 00 49 8b 04 24 45 31 e4 48 3b 05 6b a0 7d 01 72 1e 48 c7 c7 0c 06 4b b3 e8 8d ab 80 00 e8 e8 3d 00 00 fb 66 0f 1f 44 00 00 <44> 89 e0 41 5c c3 48 3b 05 54 a0 7d 01 45 19 e4 41 83 e4 0a 41 83
[14576.148082] RSP: 0018:ffffab7a002a7d40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[14576.148083] RAX: 0000000080000000 RBX: ffff965636f9e870 RCX: 0000000000000000
[14576.148083] RDX: 0000000000000001 RSI: ffffab7a002a7d88 RDI: 00000000ffffffff
[14576.148084] RBP: ffffab7a002a7d88 R08: 0000000000000000 R09: 0000000000000000
[14576.148084] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000041
[14576.148085] R13: ffffab7a002a7de0 R14: ffffab7a002a7dc8 R15: ffffab7a002a7e78
[14576.148088]  ep_item_poll.isra.0+0x3f/0xb0
[14576.148090]  ep_send_events_proc+0xeb/0x1f0
[14576.148092]  ? balance_dirty_pages_ratelimited+0x102/0x4c0
[14576.148093]  ? ep_read_events_proc+0xe0/0xe0
[14576.148095]  ep_scan_ready_list.constprop.0+0xa3/0x1f0
[14576.148097]  ep_poll+0x118/0x4b0
[14576.148099]  do_epoll_wait+0xb0/0xd0
[14576.148100]  __x64_sys_epoll_wait+0x1a/0x20
[14576.148102]  do_syscall_64+0x5f/0x1c0
[14576.148104]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[14576.148105] RIP: 0033:0x7f875c6bf60e
[14576.148106] Code: 10 89 7c 24 0c 89 4c 24 1c e8 2e 65 f8 ff 44 8b 54 24 1c 8b 54 24 18 41 89 c0 48 8b 74 24 10 8b 7c 24 0c b8 e8 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2c 44 89 c7 89 44 24 0c e8 5e 65 f8 ff 8b 44
[14576.148107] RSP: 002b:00007ffe369ec500 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[14576.148108] RAX: ffffffffffffffda RBX: 0000556fa6ff3d90 RCX: 00007f875c6bf60e
[14576.148108] RDX: 0000000000000030 RSI: 00007ffe369ec530 RDI: 0000000000000008
[14576.148109] RBP: 00007ffe369ec870 R08: 0000000000000000 R09: 0000000000000030
[14576.148109] R10: 00000000ffffffff R11: 0000000000000293 R12: 00007ffe369ec530
[14576.148110] R13: ffffffffffffffff R14: 0000000000000000 R15: 0000000000000001
[14576.148111] ---[ end trace 6c01ee4ff02c47b8 ]---
[14577.298535] usb 1-1: USB disconnect, device number 21
[14577.299850] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
[14577.299857] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
[14577.425685] audit: type=1130 audit(1573227676.997:143): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'


If we assume the driver is ok (we can do this, because it happens on different drivers) the cause must be located within xhci (USB host).
Comment 98 Bernhard 2019-11-08 16:09:18 UTC
(In reply to Michael from comment #96)
> I can try to bisect to identify the patch, but that will take a while.

Tbh I would try reverting the commit that caused the problem for me first, just to make sure you're not spending multiple hours bisecting this issue and then find out that you're affected by the same commit.
Comment 99 Michael 2019-11-08 16:40:52 UTC
Bernhard, that will be great. I'm not at home and my ASUS notebook is really too slow to perform a bisect.
Comment 100 Michael 2019-11-09 10:57:02 UTC
@Bernhard, @Mathias
I'm not sure anymore if the issue is related to xhci, because of the lates WARNINGs and traces.
I tested a PCIe card
Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter
and running into similar issues:
12506.901197] wlp3s0: deauthenticating from 00:24:d4:9e:e8:c4 by local choice (Reason: 3=DEAUTH_LEAVING)
[12506.902535] ------------[ cut here ]------------
[12506.902589] WARNING: CPU: 1 PID: 15941 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[12506.902590] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink mt7601u ccm fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc rtsx_usb_ms snd_soc_sst_dsp rtl8821ae snd_soc_acpi_intel_match x86_pkg_temp_thermal memstick snd_soc_acpi intel_powerclamp btcoexist coretemp r8169 kvm_intel snd_soc_core rtl_pci rtlwifi snd_hda_codec_hdmi snd_compress ac97_bus kvm mac80211 snd_hda_codec_realtek ipmi_devintf ipmi_msghandler snd_pcm_dmaengine snd_hda_codec_generic irqbypass ledtrig_audio cfg80211 realtek libphy joydev mousedev iTCO_wdt i915 iTCO_vendor_support crct10dif_pclmul crc32_pclmul libarc4 ghash_clmulni_intel btusb btrtl aesni_intel btbcm btintel bluetooth aes_x86_64 snd_hda_intel snd_hda_codec crypto_simd cryptd i2c_algo_bit i2c_hid glue_helper drm_kms_helper asus_nb_wmi intel_rapl_msr
[12506.902624]  asus_wmi drm sparse_keymap mei_hdcp snd_hda_core intel_cstate mxm_wmi intel_uncore intel_rapl_perf intel_gtt agpgart ecdh_generic snd_hwdep pcspkr rfkill syscopyarea snd_pcm sysfillrect ecc sysimgblt fb_sys_fops tpm_crb input_leds snd_timer elan_i2c tpm_tis tpm_tis_core snd int3403_thermal tpm i2c_i801 evdev rng_core soundcore processor_thermal_device intel_rapl_common mac_hid idma64 intel_xhci_usb_role_switch int340x_thermal_zone roles intel_soc_dts_iosf mei_me int3400_thermal mei acpi_thermal_rel intel_pch_thermal intel_lpss_pci intel_lpss asus_wireless wmi battery ac sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sr_mod cdrom sd_mod serio_raw atkbd libps2 ahci libahci libata xhci_pci crc32c_intel i8042 xhci_hcd scsi_mod serio
[12506.902660] CPU: 1 PID: 15941 Comm: Netlink Monitor Tainted: P        W  OE     5.3.8-arch1-1 #1
[12506.902661] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[12506.902684] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[12506.902687] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 10 77 dc b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 ab eb c0 e8 b7 23 25 dc <0f> 0b 48 89 ef e8 7f f8 89 dc e9 d1 5b fb ff 48 c7 c7 60 ab eb c0
[12506.902688] RSP: 0000:ffffb624c0120e10 EFLAGS: 00010246
[12506.902690] RAX: 0000000000000024 RBX: ffff8ee22cae07a0 RCX: 0000000000000000
[12506.902691] RDX: 0000000000000000 RSI: ffff8ee23ba97708 RDI: 00000000ffffffff
[12506.902692] RBP: ffff8ee1ab8b8400 R08: 00000000000014eb R09: 0000000000000001
[12506.902692] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[12506.902694] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000000
[12506.902695] FS:  00007f27b8866700(0000) GS:ffff8ee23ba80000(0000) knlGS:0000000000000000
[12506.902696] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12506.902697] CR2: 00005599e0bfe008 CR3: 00000002417b6002 CR4: 00000000003606e0
[12506.902699] Call Trace:
[12506.902702]  <IRQ>
[12506.902707]  ? __mod_zone_page_state+0x66/0xa0
[12506.902712]  ? kmem_cache_free_bulk+0x2e1/0x450
[12506.902730]  ieee80211_tasklet_handler+0xbc/0xd0 [mac80211]
[12506.902734]  tasklet_action_common.isra.0+0x4a/0xb0
[12506.902737]  __do_softirq+0x114/0x332
[12506.902740]  irq_exit+0xd4/0xf0
[12506.902742]  do_IRQ+0xbb/0x130
[12506.902746]  common_interrupt+0xf/0xf
[12506.902747]  </IRQ>
[12506.902749] RIP: 0033:0x55b7f9be923d
[12506.902751] Code: 0f 84 9e 00 00 00 48 8b 10 48 83 e2 fe 48 09 ca 48 89 56 08 48 89 30 48 8b 4e 08 48 83 e1 fe 74 09 48 8b 51 08 f6 c2 01 75 06 <48> 89 4e 08 59 c3 48 83 e2 fe 48 89 51 08 48 8b 4e 08 48 89 ca 48
[12506.902752] RSP: 002b:00007f27b8864a10 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[12506.902754] RAX: 00007f27a6800770 RBX: 00007f27a6201628 RCX: 0000000000000000
[12506.902755] RDX: 0000000000000001 RSI: 00007f27a6201628 RDI: 0000000000000000
[12506.902755] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f27c3f0a3b6
[12506.902756] R10: 0000000000000000 R11: 0000000000000000 R12: 00007f27c8e00110
[12506.902757] R13: 00007f27a6800f80 R14: 00000000ffffffff R15: 00007f27a5d01100
[12506.902760] ---[ end trace 934096102c096064 ]---
[12506.990611] audit: type=1131 audit(1573296338.199:239): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=wpa_supplicant comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[12507.259354] ------------[ cut here ]------------
[12507.259405] WARNING: CPU: 1 PID: 16380 at net/mac80211/rx.c:804 ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[12507.259406] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink mt7601u ccm fuse nls_iso8859_1 nls_cp437 vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc rtsx_usb_ms snd_soc_sst_dsp rtl8821ae snd_soc_acpi_intel_match x86_pkg_temp_thermal memstick snd_soc_acpi intel_powerclamp btcoexist coretemp r8169 kvm_intel snd_soc_core rtl_pci rtlwifi snd_hda_codec_hdmi snd_compress ac97_bus kvm mac80211 snd_hda_codec_realtek ipmi_devintf ipmi_msghandler snd_pcm_dmaengine snd_hda_codec_generic irqbypass ledtrig_audio cfg80211 realtek libphy joydev mousedev iTCO_wdt i915 iTCO_vendor_support crct10dif_pclmul crc32_pclmul libarc4 ghash_clmulni_intel btusb btrtl aesni_intel btbcm btintel bluetooth aes_x86_64 snd_hda_intel snd_hda_codec crypto_simd cryptd i2c_algo_bit i2c_hid glue_helper drm_kms_helper asus_nb_wmi intel_rapl_msr
[12507.259436]  asus_wmi drm sparse_keymap mei_hdcp snd_hda_core intel_cstate mxm_wmi intel_uncore intel_rapl_perf intel_gtt agpgart ecdh_generic snd_hwdep pcspkr rfkill syscopyarea snd_pcm sysfillrect ecc sysimgblt fb_sys_fops tpm_crb input_leds snd_timer elan_i2c tpm_tis tpm_tis_core snd int3403_thermal tpm i2c_i801 evdev rng_core soundcore processor_thermal_device intel_rapl_common mac_hid idma64 intel_xhci_usb_role_switch int340x_thermal_zone roles intel_soc_dts_iosf mei_me int3400_thermal mei acpi_thermal_rel intel_pch_thermal intel_lpss_pci intel_lpss asus_wireless wmi battery ac sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sr_mod cdrom sd_mod serio_raw atkbd libps2 ahci libahci libata xhci_pci crc32c_intel i8042 xhci_hcd scsi_mod serio
[12507.259466] CPU: 1 PID: 16380 Comm: sudo Tainted: P        W  OE     5.3.8-arch1-1 #1
[12507.259466] Hardware name: ASUSTeK COMPUTER INC. X555UB/X555UB, BIOS X555UB.301 02/20/2017
[12507.259483] RIP: 0010:ieee80211_rx_napi.cold+0xc/0x67 [mac80211]
[12507.259485] Code: 38 48 81 c1 70 04 00 00 48 81 c6 38 01 00 00 e8 0a 10 77 dc b8 01 00 00 00 e9 26 4b fb ff 48 c7 c7 60 ab eb c0 e8 b7 23 25 dc <0f> 0b 48 89 ef e8 7f f8 89 dc e9 d1 5b fb ff 48 c7 c7 60 ab eb c0
[12507.259485] RSP: 0000:ffffb624c0120e10 EFLAGS: 00010246
[12507.259486] RAX: 0000000000000024 RBX: ffff8ee22cae07a0 RCX: 0000000000000000
[12507.259487] RDX: 0000000000000000 RSI: ffff8ee23ba97708 RDI: 00000000ffffffff
[12507.259488] RBP: ffff8ee1ab8b8b00 R08: 0000000000001511 R09: 0000000000000001
[12507.259488] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[12507.259489] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000000
[12507.259490] FS:  00007fe70c78b740(0000) GS:ffff8ee23ba80000(0000) knlGS:0000000000000000
[12507.259491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12507.259491] CR2: 00007fe70ca9c1b0 CR3: 00000001f9502001 CR4: 00000000003606e0
[12507.259492] Call Trace:
[12507.259494]  <IRQ>
[12507.259499]  ? __mod_zone_page_state+0x66/0xa0
[12507.259502]  ? kmem_cache_free_bulk+0x2e1/0x450
[12507.259512]  ieee80211_tasklet_handler+0xbc/0xd0 [mac80211]
[12507.259515]  tasklet_action_common.isra.0+0x4a/0xb0
[12507.259517]  __do_softirq+0x114/0x332
[12507.259519]  irq_exit+0xd4/0xf0
[12507.259521]  do_IRQ+0xbb/0x130
[12507.259524]  common_interrupt+0xf/0xf
[12507.259524]  </IRQ>
[12507.259527] RIP: 0010:__do_page_fault+0x12c/0x4f0
[12507.259528] Code: a9 ff ff ff 7f 0f 85 cf 01 00 00 48 85 db 0f 84 c6 01 00 00 41 f6 84 24 88 00 00 00 03 0f 84 4a 01 00 00 fb 66 0f 1f 44 00 00 <41> bf 54 00 00 00 0f 1f 44 00 00 44 89 f8 48 89 e9 83 c8 01 83 e1
[12507.259529] RSP: 0000:ffffb624c4a2fec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffda
[12507.259530] RAX: 0000000080000000 RBX: ffff8ee21a601980 RCX: 0000000000000000
[12507.259530] RDX: 00007fe70ca9c1b0 RSI: 0000000000000000 RDI: ffffb624c4a2ff58
[12507.259531] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000000
[12507.259531] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb624c4a2ff58
[12507.259532] R13: 00007fe70ca9c1b0 R14: 00007fe70ca9c000 R15: 0000000000000000
[12507.259535]  do_page_fault+0x31/0x130
[12507.259536]  page_fault+0x3e/0x50
[12507.259538] RIP: 0033:0x7fe70c9a190d
[12507.259539] Code: 64 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 8b 05 b7 a8 0f 00 85 c0 0f 8e af 02 00 00 83 c0 01 83 f8 64 0f 8f 53 01 00 00 <89> 05 9d a8 0f 00 80 3d 16 87 0f 00 00 0f 84 a3 02 00 00 48 8b 05
[12507.259540] RSP: 002b:00007ffc8325d380 EFLAGS: 00010287
[12507.259540] RAX: 000000000000000a RBX: 000055b4c6e1fc80 RCX: 0000000000000400
[12507.259541] RDX: 000055b4c6e1fcb0 RSI: 000055b4c6e1fc80 RDI: 000055b4c6e15740
[12507.259541] RBP: 0000000000000400 R08: 00007ffc8325d418 R09: 0000000000000041
[12507.259542] R10: 0000000000000000 R11: 00007fe70ca96a40 R12: 000055b4c6e1fc80
[12507.259543] R13: 000055b4c6e15740 R14: 000055b4c6e1fcb0 R15: 0000000000000000
[12507.259544] ---[ end trace 934096102c096065 ]---
[12507.907345] IPv6: ADDRCONF(NETDEV_CHANGE): wlp3s0: link becomes ready
[12516.922094] audit: type=1131 audit(1573296348.129:240): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

And we have a simliar bug report regarding an ATHEROS driver, here:
https://bugzilla.kernel.org/show_bug.cgi?id=198701

So if it is really a driver issue and not related to xhci, we have this issue on rt2800usb, mt76 and ath9k_htc.
Comment 101 Michael 2019-11-13 10:33:24 UTC
Here is a new log (dmesg and trace):
https://www.sendspace.com/file/hy2puw

Device: ALFA AWUS036NH
ID 148f:3070 Ralink Technology, Corp. RT2870/RT3070 Wireless Adapter

The device is connected and entered promiscuous mode
[76538.089897] xhci_hcd 0000:03:00.0: Waiting for status stage event
[76541.048223] xhci_hcd 0000:03:00.0: Transfer error for slot 23 ep 2 on endpoint
[76541.048233] xhci_hcd 0000:03:00.0: // Ding dong!
[76541.048356] xhci_hcd 0000:03:00.0: Ignoring reset ep completion code of 1
[76542.194353] device wlp3s0f0u2 entered promiscuous mode
...
we do not receive data via AF_PACKET socket.
...
[76542.194385] audit: type=1700 audit(1573639400.432:141): dev=wlp3s0f0u2 prom=256 old_prom=0 auid=1000 uid=0 gid=0 ses=2
[76554.680919] xhci_hcd 0000:03:00.0: Cancel URB 00000000e8c9ee79, dev 2, ep 0x81, starting at offset 0xff05d000
[76554.680929] xhci_hcd 0000:03:00.0: // Ding dong!

I can't find anything that caused it, except of the transfer error at 76541.048223.

If we connect the device to an USB2 port, everything is fine:
https://www.sendspace.com/file/azoa4a
we receive data via AF_PACKET socket.
The device is working as expected.

Note You need to log in before you can comment on or make changes to this bug.