Bug 208327 - xhci_hcd issue transfer event - usb3.0
Summary: xhci_hcd issue transfer event - usb3.0
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-26 11:38 UTC by sander44
Modified: 2020-07-22 15:17 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.14.170
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg_usb3.0 (198.57 KB, text/plain)
2020-07-15 15:09 UTC, sander44
Details
trace_usb3.0.zip (4.78 MB, application/zip)
2020-07-15 15:11 UTC, sander44
Details
usb-connect-trbs.txt (28.00 KB, text/plain)
2020-07-22 14:47 UTC, sander44
Details
usb-disconnect-trbs.txt (27.92 KB, text/plain)
2020-07-22 14:48 UTC, sander44
Details

Description sander44 2020-06-26 11:38:05 UTC
Hi Kernel Team,

With compile vanilla kernel 4.14.170, observe this issue with connect/disconnect usb device (mouse and keyboard).


usb xhci_hcd issue, USB3.0

[  146.924776] usb 1-2: USB disconnect, device number 2
[  146.926153] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  146.926163] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000ffee2350 trb-start 00000000ffee2360 trb-end 00000000ffee2360 seg-start 00000000ffee2000 seg-end 00000000ffee2ff0
[  146.927151] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  146.927160] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000ffee2360 trb-start 00000000ffee2370 trb-end 00000000ffee2370 seg-start 00000000ffee2000 seg-end 00000000ffee2ff0
[  146.928153] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  146.928163] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000ffee2370 trb-start 00000000ffee2380 trb-end 00000000ffee2380 seg-start 00000000ffee2000 seg-end 00000000ffee2ff0
[  146.929150] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  146.929159] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000ffee2380 trb-start 00000000ffee2390 trb-end 00000000ffee2390 seg-start 00000000ffee2000 seg-end 00000000ffee2ff0
[  146.930150] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  146.930158] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000ffee2390 trb-start 00000000ffee23a0 trb-end 00000000ffee23a0 seg-start 00000000ffee2000 seg-end 00000000ffee2ff0
Comment 1 Mathias Nyman 2020-06-26 12:36:31 UTC
Can this issue be reproduced with 5.7 kernel or later?

If yes, then logs and traces with a more recent kernel would help:

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
< trigger the issue >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace
Comment 2 sander44 2020-07-15 15:09:27 UTC
Created attachment 290293 [details]
dmesg_usb3.0
Comment 3 sander44 2020-07-15 15:11:34 UTC
Created attachment 290295 [details]
trace_usb3.0.zip
Comment 4 sander44 2020-07-15 15:16:41 UTC
I also notice this on: 5.7.8
5.7.8-vanilla #1 SMP PREEMPT Tue Jul 14 17:04:14 EEST 2020 x86_64 x86_64 x86_64 GNU/Linux
Comment 5 Mathias Nyman 2020-07-22 13:19:30 UTC
Thanks

Logs show we get double (duplicate) events for some ISOC transfers.
Driver expects only one event per TRB and warns when we get a second event for the same TRB

A wild guess is that this is somehow related to the soft retry of transaction errors seen on another endpoint on the same device.
Driver tries to recover from the transaction error with a soft retry
three times. the duplicate events seen on the other endpoint occur close to the
soft retries, almost like this ISOC enpoint is being retried as well.

driver will ignore the duplicate events after warning about it, so there shouldn't be any functional impact, but this isn't how it's supposed to
behave.
Comment 6 Mathias Nyman 2020-07-22 13:39:31 UTC
Could you please dump the event ring ringht after seeing the 
"Transfer event TRB DMA ptr not part of current TD.." message. 
Just to verify hardware actually writes duplicate events on the ring buffer.

This can be done by:
mount -t debugfs none /sys/kernel/debug
cat /sys/kernel/debug/usb/xhci/0000:00:14.0/event-ring/trbs
Comment 7 Mathias Nyman 2020-07-22 13:45:52 UTC
To check if this is related to the soft retry of the transaction errors
we could prevent soft retry with the following change:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 2c255d0620b0..964cacc1092e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2302,6 +2302,7 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td,
                remaining       = 0;
                break;
        case COMP_USB_TRANSACTION_ERROR:
+               break;
                if ((ep_ring->err_count++ > MAX_SOFT_RETRY) ||
                    le32_to_cpu(slot_ctx->tt_info) & TT_SLOT)
                        break;


Does that help?
After this chagen you should still see the Transactio errors, but not the
"Transfer event TRB DMA ptr not part of current TD.." message.
Comment 8 sander44 2020-07-22 14:47:36 UTC
Created attachment 290449 [details]
usb-connect-trbs.txt
Comment 9 sander44 2020-07-22 14:48:16 UTC
Created attachment 290451 [details]
usb-disconnect-trbs.txt
Comment 10 sander44 2020-07-22 15:17:55 UTC
I tried with this change:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 2c255d0620b0..964cacc1092e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2302,6 +2302,7 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td,
                remaining       = 0;
                break;
        case COMP_USB_TRANSACTION_ERROR:
+               break;
                if ((ep_ring->err_count++ > MAX_SOFT_RETRY) ||
                    le32_to_cpu(slot_ctx->tt_info) & TT_SLOT)
                        break;

And it doesn't help.
Error messages still appear.

[  245.472838] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  245.473676] xhci_hcd 0000:00:14.0: Looking for event-dma 000000035ae0cea0 trb-start 000000035ae0ceb0 trb-end 000000035ae0ceb0 seg-start 000000035ae0c000 seg-end 000000035ae0cff0
[  245.473828] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  245.474575] xhci_hcd 0000:00:14.0: Looking for event-dma 000000035ae0ceb0 trb-start 000000035ae0cec0 trb-end 000000035ae0cec0 seg-start 000000035ae0c000 seg-end 000000035ae0cff0
[  245.474829] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  245.475710] xhci_hcd 0000:00:14.0: Looking for event-dma 000000035ae0cec0 trb-start 000000035ae0ced0 trb-end 000000035ae0ced0 seg-start 000000035ae0c000 seg-end 000000035ae0cff0
[  245.475828] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
[  245.476706] xhci_hcd 0000:00:14.0: Looking for event-dma 000000035ae0ced0 trb-start 000000035ae0cee0 trb-end 000000035ae0cee0 seg-start 000000035ae0c000 seg-end 000000035ae0cff0

Note You need to log in before you can comment on or make changes to this bug.