Bug 219113 - I/O-errors freezing the system [sd 0:0:0:0: [sda] tag#11 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD OUT]
Summary: I/O-errors freezing the system [sd 0:0:0:0: [sda] tag#11 uas_eh_abort_handler...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-31 11:18 UTC by Ilari Jääskeläinen
Modified: 2024-08-02 09:01 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.1.102 gcc (Debian 12.2.0-14)
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (72.60 KB, text/plain)
2024-07-31 11:18 UTC, Ilari Jääskeläinen
Details

Description Ilari Jääskeläinen 2024-07-31 11:18:29 UTC
Created attachment 306644 [details]
dmesg
Comment 1 Mathias Nyman 2024-07-31 13:57:11 UTC
Looks like it's related to xhci streams usage.

from dmesg:
[4884.745577] xhci_hcd 0000:00:14.0: ERROR Unknown event condition 10 for slot 2 ep 7 , HC probably busted

"event condition 10" would be invalid stream type error.
Comment 2 Artem S. Tashkinov 2024-07-31 14:07:00 UTC
This kernel release is not really supported.

Could you try mainline 6.10.2?
Comment 3 Artem S. Tashkinov 2024-07-31 14:15:43 UTC
Please check whether the following helps:

1. Upgrading BIOS to the latest version
2. Using a [different] power adapter
3. Some hints from here: https://github.com/openzfs/zfs/discussions/11741

Add to kernel boot arguments: iommu=pt amd_iommu=on
Comment 4 Michał Pecio 2024-07-31 21:19:33 UTC
Disabling UAS should be enough if you need this disk to start working right now, albeit not with maximum performance.

A quick glance at xHCI spec 4.12.2.1 suggests that your USB host controller believes that the USB device or the kernel has given it a wrong stream index. This could be a bug in the device, in the kernel xhci driver, or in the host controller itself.

Do you know any kernel version which doesn't have this problem?

Upgrading host controller or device firmware may help in case it's a hardware bug already patched by the respective vendor.
Comment 5 Ilari Jääskeläinen 2024-08-01 04:13:03 UTC
Comment on attachment 306644 [details]
dmesg

> Could you try mainline 6.10.2?

No Sir, not at all. Debian GNU/Linux supports only 6.1.x and the new kernels are also buggy. I don't want to play hangman.

> Add to kernel boot arguments: iommu=pt amd_iommu=on

Butt I am on Intel Atom platform.

> Do you know any kernel version which doesn't have this problem?

No Sir, this is a new platform for me.

> Disabling UAS should be enough if you need this disk to start working right
now, albeit not with maximum performance.

Sounds like an idea to me, the hangs itself 
cause a terrible loss in performance so I think it could actually increase performance  and stability. How do I disable UAS?

Besides there are no new bioses for this certain Intel Compute Stick Cherry Trail.
Comment 6 Mathias Nyman 2024-08-01 11:05:20 UTC
6.1.102 kernel has a recently added patch that touches this area:

Commit 5ceac4402f5d xhci: Handle TD clearing for multiple streams case

After that patch xhci driver we may issue 'Set TR Deq' commands for each stream.  Before the patch we only queued one command.

If there is a flaw in one of the stream contexts, or command itself then we could see that 'Invalid Stream type error'.

Can I ask you to take logs of this issue with xhci dynamic debug enabled?
Also enabling some specific xhci tracing could help:

Steps:

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo xhci-hcd:xhci_handle_cmd_set_deq_ep >> /sys/kernel/debug/tracing/set_event
echo xhci-hcd:xhci_handle_cmd_set_deq >> /sys/kernel/debug/tracing/set_event
echo xhci-hcd:xhci_handle_command >> /sys/kernel/debug/tracing/set_event
echo 1 > /sys/kernel/debug/tracing/tracing_on
< Reproduce issue >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace
Comment 7 Ilari Jääskeläinen 2024-08-01 11:29:31 UTC
I am sure it gave those messages before AHH now I remember do I got other versions of kernel installed let me check.

6.1.99-102 I think they all gave me the same error message.
Comment 8 Mathias Nyman 2024-08-01 12:26:37 UTC
Looks like it was added to 6.1.95 stable
Commit Id for the 6.1 stable backport is 633f72cb6124ecda97b641fbc119340bd88d51

git describe --contains 633f72cb6124ecda97b641fbc119340bd88d51
v6.1.95~123

Note You need to log in before you can comment on or make changes to this bug.