Bug 219824
Summary: | [6.13 regression] USB controller just died | ||
---|---|---|---|
Product: | Drivers | Reporter: | Artem S. Tashkinov (aros) |
Component: | USB | Assignee: | Default virtual assignee for Drivers/USB (drivers_usb) |
Status: | NEW --- | ||
Severity: | blocking | CC: | mathias.nyman, michal.pecio |
Priority: | P3 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | 6.13.4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Artem S. Tashkinov
2025-02-26 22:23:50 UTC
I'm utterly confused as to why the kernel decided to "xHCI host not responding to stop endpoint command". I didn't do anything at the time. Wasn't even using the mouse. Something funky is going on with 6.13. This was reported in the SUSE bug tracker earlier: https://bugzilla.suse.com/show_bug.cgi?id=1236992 I don't see it being reported here, so the issue is not new. Yet I see no patches queued for 6.13.5. The SUSE issue is seemingly unrelated, please dismiss. This just happened again: [161470.836493] PM: resume devices took 0.547 seconds [161470.836720] OOM killer enabled. [161470.836721] Restarting tasks ... done. [161470.839715] random: crng reseeded on system resumption [161470.845090] PM: suspend exit [161471.322491] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: Firmware: 400a4 vendor: 0x2 v0.43.1, 2 algorithms [161471.324469] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: cirrus/cs35l41-dsp1-spk-prot-103c8b72.bin: v0.43.1 [161471.324480] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: spk-prot: D:\Amp Tuning\HP\840\0930\103C8B45_220930.bin [161471.403951] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Calibration applied: R0=10446 [161471.407392] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Calibration applied: R0=10526 [161471.432157] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Firmware Loaded - Type: spk-prot, Gain: 17 [161471.433916] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Firmware Loaded - Type: spk-prot, Gain: 17 [161471.523827] hp_wmi: Unknown event_id - 131073 - 0x0 [162644.637587] xhci_hcd 0000:c3:00.3: xHCI host not responding to stop endpoint command [162644.651068] xhci_hcd 0000:c3:00.3: xHCI host controller not responding, assume dead [162644.651076] xhci_hcd 0000:c3:00.3: HC died; cleaning up [162644.651099] xhci_hcd 0000:c3:00.3: Timeout while waiting for stop endpoint command [162644.651102] usb 1-2: USB disconnect, device number 4 [162644.678374] usb 1-3: USB disconnect, device number 2 [162644.678748] usb 1-4: USB disconnect, device number 3 Shortly after resume all the USB ports are disabled. I'm reverting back to Linux 6.11. I cannot use my device like this. 6.13 has a lot of changes related to endpoint stopping: e21ebe51af68 xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic 474538b8dd1c usb: xhci: Avoid queuing redundant Stop Endpoint commands 484c3bab2d5d usb: xhci: Fix TD invalidation under pending Set TR Dequeue 42b758137601 usb: xhci: Limit Stop Endpoint retries Endpoints are stopped in order to cancel transfers, before suspend, and to soft reset an endpoint after clearing a halt. I understand that bisecting an issue like this that triggers rarely isn't an option, but can I ask you to try running 6.13 with xhci dynamic debug enabled. mount -t debugfs none /sys/kernel/debug echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control and send dmesg after issue is triggered. It could reveal a bit more what's going on (In reply to Mathias Nyman from comment #5) > I understand that bisecting an issue like this that triggers rarely isn't an > option, but can I ask you to try running 6.13 with xhci dynamic debug > enabled. Will do as soon as possible. Thanks a lot! Which exact versions were you running successfully and for how long? These patches listed by Mathias are instant first suspects, but they were all backported to v6.12.7 in December. Most of them also to v6.11.11 in early December and later in January to some LTS series. Any chance that hibernation is indeed a (delayed) trigger and you weren't doing it as often in the past? Did you come across similar reports from stable kernel branches in this year? > Which exact versions were you running successfully and for how long? Kernel 6.12.14 that I was running earlier didn't have this issue. Used software suspend/resume multiple times successfully. > Any chance that hibernation is indeed a (delayed) trigger and you weren't > doing it as often in the past? Not using hibernation, just software suspend. I've not changed anything software-wise except installing a new kernel on this laptop. > Did you come across similar reports from stable kernel branches in this year? I've Googled a couple of times already for this exact error message and nothing turned up. |