[Summary]: During device enumeration, while processing the Address Device Command, the xHCI driver (xhci-hcd) leaves the Average TRB Length (avg_trb_len) field for Control Endpoint 0 (EP0) set to 0 in the Input Context. According to the xHCI 1.2 Specification (Section 6.2.3.1, p.454), the Average TRB Length must be greater than 0, and software shall set it to 8 for Control Endpoints. Some xHCI hardware vendors may validate the Input Context at Address Device time and reject contexts with invalid values, potentially causing device enumeration issues. While xhci_endpoint_init() later sets avg_trb_len correctly, setting it earlier in xhci_setup_addressable_virt_dev() would improve compliance and robustness. ==================================== [Description]: Observed in kernel 6.15-rc2 (self-built vanilla, no taint). Using KGDB during Address Device Command handling, the Input Context was dumped, showing EP0 avg_trb_len field remained 0. Stack Trace during capture: queue_trb -> queue_command -> xhci_queue_address_device -> xhci_setup_device -> xhci_address_device Memory dump of Input Context (kgdb): (logical Input Context memory) >>> x/96bx 0x11BF40000 0x11bf40000: Cannot access memory at address 0x11bf40000 (physical Input Context memory >>> p/x page_offset_base $1 = 0xffff888000000000 >>> x/96bx 0xFFFF88811BF40000 (Input Control Context) 0xffff88811bf40000: 0x00 0x00 0x00 0x00 0x03 0x00 0x00 0x00 0xffff88811bf40008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff88811bf40010: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff88811bf40018: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 Slot 0xffff88811bf40020: 0x00 0x00 0x40 0x08 0x00 0x00 0x01 0x00 0xffff88811bf40028: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff88811bf40030: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff88811bf40038: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 EP Context0 (Control EP) 0xffff88811bf40040: 0x00 0x00 0x00 0x00 0x26 0x00 0x00 0x02 0xffff88811bf40048: 0x01 0x10 0xf4 0x1b 0x01 0x00 0x00 0x00 0xffff88811bf40050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffff88811bf40058: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 EP State = 0 CErr = 3 ("Software should set CErr to ‘3’ for normal operations. The values of ‘1’ or ‘2’ should be avoided during normal operation because they will reduce transfer reliability. The value of ‘0’ is typically only used for test or debug.") EP Type = 4 (Control Bidirectional) Max Packet Size = 512 DCS = 1 TR Dequeue Pointer = 0x11BF41000 **** Average TRB Length = 0 **** SPEC xHCI_1_2_201905: (p.453, "This field represents the average Length of the TRBs executed by this endpoint. The value of this field shall be greater than ‘0’" (p.454, "Note: Software shall set Average TRB Length to ‘8’ for control endpoints." (p.454, 6.2.3.1 Address Device Command Usage: "The Input Endpoint 0 Context is considered “valid” ...... if: ... 6) all other fields are within the valid range of values" --- Tested environment: - Platform: QEMU Standard PC (Q35 + ICH9) - Host Controller: QEMU XHCI Host Controller - Device: QEMU USB Hard Drive (SuperSpeed 5Gbps)
Just checking in to see if there's any update needed from my side. Happy to provide more info or run tests.
This is up for interpretation, spec is ambiguous xhci 1.2 Section 6.2.3.1 "Address Device Command usage" does not mention Average TRB Length at all. But section 4.8.2 "Endpoint Context Initialization" states that: "All fields of an Input Endpoint Context data structure (including the Reserved fields) shall be initialized to ‘0’ with the following exceptions: 4.8.2.1 Default Control Endpoint 0 - Max Packet Size - CErr - TR Dequeue Pointer - Dequeue Cycle State (DCS) According to it the Average TRB Length should be initialized to 0 I don't object to setting the Average TRB Length earlier, especially if it solves device enumeration issues for some xHCI vendor. We do need to make sure it doesn't brake enumeration for other vendors. Can you submit a patch to linux-usb mailing list for this?
(In reply to Chen-Tzu-Chieh from comment #0) > Some xHCI hardware vendors may validate the Input Context at Address Device > time and reject contexts with invalid values, potentially causing device > enumeration issues. A scarier (and more likely?) possibility is HCs failing to validate this field and yet assuming that it's non-zero, then dividing by zero or doing some other stupid thing and crashing and burning. Bonus if it only happens once in a blue moon. But as Mathias found, the spec is self-contradictory, so it works both ways... > While xhci_endpoint_init() later sets avg_trb_len correctly, Are you sure? ;) This function is only called from add_endpoint(), which doesn't seem to ever be called on EP 0. But non-default control endpoints would be set to 8 indeed.
Hi Mathias & Michał, Thanks for your response. I’ve already submitted a patch to fix this situation (by adding a line of `ep0_ctx->tx_info |= cpu_to_le32(EP_AVG_TRB_LENGTH(8));` in `xhci_setup_addressable_virt_dev`). Link: https://lore.kernel.org/linux-usb/JH0PR06MB7294E46B393F1CA5FE0EE4F78396A@JH0PR06MB7294.apcprd06.prod.outlook.com/T/#u > This function is only called from add_endpoint(), which doesn't seem to ever > be called on EP 0. But non-default control endpoints would be set to 8 > indeed. Yes, I misunderstood that function, and thanks for the explanation. Inside `xhci_endpoint_init`, it sets `avg_trb_len` for the USB device's endpoints while the `xhci_setup_addressable_virt_dev` function initializes the input context (ref: xHCI 1.2, Ch. 6.2.5 Input Context), and EP Context 0 (Default Control Endpoint) is passed to the xHC hardware.