Created attachment 288159 [details] Output of lsusb -v Hi. I have an external USB audio device (Bus 003 Device 005: ID 0d8c:000c C-Media Electronics, Inc. Audio Adapter) / USB hub (Bus 003 Device 006: ID 058f:6254 Alcor Micro Corp. USB Hub). While the USB hub side of it is still working, the audio device fails from kernel version 5.2.0-rc3 onwards. The error is this: speaker-test -c2 -D hw:1,0 speaker-test 1.2.2 Playback device is hw:1,0 Stream parameters are 48000Hz, S16_LE, 2 channels Using 16 octaves of pink noise Rate set to 48000Hz (requested 48000Hz) Buffer size range from 96 to 262144 Period size range from 48 to 131072 Using max buffer size 262144 Periods = 4 Unable to set hw params for playback: Input/output error Setting of hwparams failed: Input/output error While dmesg shows: [ 1287.634587] usb 1-2.4: 1:1: usb_set_interface failed (-110) I bisected it and got to the following commit as the culprit: # first bad commit: [ef513be0a9057cc6baf5d29566aaaefa214ba344] usb: xhci: Add Clear_TT_Buffer Last working one was: good: [4998f1efd1904dd21697aeeead270e3eb97691dd] usb: Add devaddr in struct usb_device Current head gives me the same error messages as the ones above, so the bug is still there. I first added my report to a very similar bug (https://bugzilla.kernel.org/show_bug.cgi?id=203419) but the kernel version at which the problem started for me does not fit the original report there, which is why I am opening this as a new report.
Created attachment 288161 [details] dmesg output with usb debug enabled As per Mathias Nyman's comment over at the other bug report I'm also attaching the requested debug logs from this: mount -t debugfs none /sys/kernel/debug echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable < connect the logitec device > Send output of dmesg Send content of /sys/kernel/debug/tracing/trace
Created attachment 288163 [details] Content of /sys/kernel/debug/tracing/trace
There is also another bug which was bisected to the same commit under https://bugzilla.kernel.org/show_bug.cgi?id=206897 But that one is not related to USB audio devices. Also there has been another bug related to this commit from last year, which was discussed and fixed here: https://www.spinics.net/lists/linux-usb/msg182970.html
Thanks, traces show its related to Clearing TT buffer after a STALL on endpoint 0. The first stall looks like a protocol stall, not a function stall, meaning that endpoint isn't really halted, just that the device does not support the request in the control transfer. Anyway, xhci starts clearing what it assumes is a halted endpoint, including clearing the hub TT buffer. Specs are a bit unclear if TT should be cleared in this case, or at least I couldn't find it. I'll create a tespatch that doesn't clear TT buffer on protocol stalls, and add excessive tracing and debugging to it.
Created attachment 288293 [details] testpatch that doesn't clear TT buffer after protocol STALL Does this help? Could you share similar logs and traces with this applied? Thanks
On Thu, 9 Apr 2020 bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #4 from Mathias Nyman (mathias.nyman@linux.intel.com) --- > Thanks, traces show its related to Clearing TT buffer after a STALL on > endpoint 0. > > The first stall looks like a protocol stall, not a function stall, meaning > that > endpoint isn't really halted, just that the device does not support the > request in the control transfer. > > Anyway, xhci starts clearing what it assumes is a halted endpoint, > including clearing the hub TT buffer. > > Specs are a bit unclear if TT should be cleared in this case, > or at least I couldn't find it. TTs should be cleared when an error occurs in the protocol. STALL isn't an error; a real error would be a timeout or CRC mismatch or URB cancellation, things like that -- things that might leave the TT in a busy state (because it hasn't sent its final status back to the host) when it ought to be idle. Sending a STALL isn't a protocol error and it does clear the TT status. This is discussed (not as explicitly as one might want) in sections 11.17.3 and 11.17.5 of the USB-2.0 specification. Alan Stern
Created attachment 288331 [details] Fix typo in attachment 288293 [details] bug 206897 is resolved with the patch in attachment 288293 [details], there is a typo in the patch. This patch just fixes the typo. Apply in order 0001-xhci-testpatch-Don-t-clear-TT-buffer-on-ep0-protocol.patch 0002-Fix-a-typo-in-prevoius-patch.patch
> TTs should be cleared when an error occurs in the protocol. STALL > isn't an error; a real error would be a timeout or CRC mismatch or URB > cancellation, things like that -- things that might leave the TT in a > busy state (because it hasn't sent its final status back to the host) > when it ought to be idle. Sending a STALL isn't a protocol error and > it does clear the TT status. > > This is discussed (not as explicitly as one might want) in sections > 11.17.3 and 11.17.5 of the USB-2.0 specification. > Thanks for the clarification, makes sense.
With both patches applied the sound system is working. Thanks a lot! I will attach the requested logs with the patch applied as well.
Created attachment 288447 [details] Dmesg output with both patches applied
Created attachment 288449 [details] Content of /sys/kernel/debug/tracing/trace with both patches applied Gzipped because the file was too big to attach here.
Created attachment 288523 [details] Full dmesg output with both patches applied I just realized that the last dmesg log was truncated for some reason. This one is complete.
So I'm not entirely sure if this is related to my previous problem but I'm now encountering a different kind of USB problems. USB equipment is still the same as described in the first message. The new problem leads to all USB devices getting disconnected, USB audio as well as mouse. The USB keyboard continues working because I guess it is working via PS2 internally? The problem has been occurring pretty much since the previous issue was fixed but only sporadically. Today it occurred for the first time in ~4 weeks and sometimes it happened several times in a day. The last messages in the dmesg log are: [ 7075.685892] xhci_hcd 0000:2a:00.3: xHCI host not responding to stop endpoint command. [ 7075.685895] xhci_hcd 0000:2a:00.3: USBSTS: [ 7075.701768] xhci_hcd 0000:2a:00.3: xHCI host controller not responding, assume dead [ 7075.701775] xhci_hcd 0000:2a:00.3: HC died; cleaning up [ 7075.701808] usb 3-1: USB disconnect, device number 2 [ 7075.766108] usb 3-2: USB disconnect, device number 3 [ 7075.766110] usb 3-2.4: USB disconnect, device number 4
Created attachment 292705 [details] DMESG log from todays lockup.
Created attachment 292707 [details] Older dmesg log with debug information added Log created as per the previous instructions (message 2). I just enabled the settings on system start in the hopes of capturing the hang so the files are quite big.
Created attachment 292709 [details] Content of /sys/kernel/debug/tracing/trace Created as per previous comment. Uploaded at:
Created attachment 292711 [details] Content of /sys/kernel/debug/tracing/trace Unpacked file is ~200mb. Ran for the entirety of the systems uptime to be able to catch the bug.