Accessing the audio capture device (Em28xxAudio) of a USB TV adapter (Hauppauge HVR-900) causes a complete USB freeze (see attached log "xhci.log"). This device is working at least until kernel 3.2 with the same hardware (Intel DH87RL/Haswell, BIOS RLH8710H.86A.0320.2013.0606.1802). Switching from a USB3 to a USB2 port doesn't change anything; if xhci is available, it is used and the bus freezes. If ehci is forced by disabling xhci completely, the capture device is working again. See also https://bugzilla.kernel.org/show_bug.cgi?id=62911 for additional informations related to V4L
Created attachment 114781 [details] lspci -vvvv
Created attachment 114791 [details] lsusb -vvvv
Created attachment 114801 [details] extract of dying xhci host
Please send this to the linux-usb@vger.kernel.org mailing list.
Update: kernel 3.2 didn't cause this issue as XHCI was experimental until 3.7 and therefore simply not in the tested configuration (ubuntu 12.04.1 stock). The bug is 100% reproducible and occurs only on audio access (video is working even with xhci). Bios update to recent RLH8710H.86A.0323.2013.1204.1726 didn't help.
Please test with 3.13, with CONFIG_USB_DEBUG=y. Run this command as root to enable xHCI debugging: echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control Then start capturing dmesg, trigger the bug, and send me the resulting dmesg. I need to see a lot more of what happened to the xHCI driver before the host died to figure out what went wrong.
Created attachment 124701 [details] dmesg with enabled xHCI debugging Attached USB debug log was created with 3.13.1, hope that's OK and contains the informations you need.
Yes, that's what I was looking for. Nothing immediately wrong jumps out at me, unfortunately. I might need you to apply some debugging patches later, once I form some hypothesis as to what's going wrong.
I'd like to ask if there's been any progress on this bug? I have essentially the same kind of issue see https://bbs.archlinux.org/viewtopic.php?id=190000 with a shutdown of usb caused by something in xhci. I'm willing to assist in getting more information if that's required. My kernel is 3.17.3-1-ARCH #1 SMP PREEMPT Fri Nov 14 23:13:48 CET 2014 x86_64 GNU/Linux. Is it worth me trying the same kind of build as in comment 6?
While dealing with halted endpoints in xhci I noticed that there could be issues with how we stop endpoints and cancel URBs as well. If I make an additional debug patch could you apply it and run it with the xhci debugging enabled as in comment 6?
I am willing to try this, but when I looked at the drivers/usb/host files I do not see the debug flag mentioned in comment 6 (CONFIG_USB_DEBUG) nor is it in the .config prepared by the Arch pkgbuild. There are other flags with debug and usb in their names eg CONFIG_DVB_USB_DEBUG, but I would need advice on which debugs to turn on.
Ah, yes, the CONFIG_USB_DEBUG was for older kernels. With later kernels you only need to check that dynamic debug is enabled: CONFIG_DYNAMIC_DEBUG=y check if debugfs is mounted, if not then run: mount -t debugfs none /sys/kernel/debug And to enable xhci debugging do: echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control many distos have the dynamic debug enabled and debugfs mounted automatically so you only need to echo to the dymanic_debug/control file.
OK I have checked and it seems the stock arch linux kernel has CONFIG_DYNAMIC_DEBUG set so I don't need a recompile. I will carry out the comment 6 type process when I get home. I'll post the dmesg output here tonight.
Created attachment 158441 [details] debug journalctl output of second vlc starting up this the journalctl output of the boot when I start a second pctv 290e device with vlc after doing sudo su echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control my kernel Linux minikat 3.17.3-1-ARCH #1 SMP PREEMPT Fri Nov 14 23:13:48 CET 2014 x86_64 GNU/Linux
Created attachment 158471 [details] simpler journal output for 2x(vlc+pctv290e) I ran this script immediately after a fresh boot $ cat bin/debugger.sh logger "=========================== turning on xhci_hcd debug" sudo sh -c "echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control" logger "=========================== starting VLC on adapter 0" vlc --dvb-adapter=0 ~/.config/channels.xspf & sleep 30 logger "=========================== starting VLC on adapter 1" vlc --dvb-adapter=1 ~/.config/channels.xspf & sleep 60 sudo systemctl poweroff the system was working after the usb turnoff and my sudo doesn't need a password so the system did power off OK. attached is the journalctl -b -1 --all output
Thanks, logs show that the host is asked to cancel a transfer, so it tries to stop the ring and remove the transfer block from the ring. Stopping the ring never responses, and we timout assuming the host is dead. This might be related to bug# 75521 I got a testbranch that both fixes some related issues, and adds more debugging for this case. If you can try it out its available at: git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git for-usb-next-test Remember to switch to the for-usb-next-test branch. It includes 8 patches that also apply cleanly on top of 3.18-rc6, I can send just patches instead if you prefer it that way. I'm hoping these patches would show one of the following messages when it fails: "Cancelled TD not on stopped ring" "Cancel URB NOT on current ring" "Error unhandled cancelled TD's after dev reset" "Error unhandled TD's after dev reset"
Created attachment 158871 [details] journalctl output with patched 3.18 kernel Same process as for comment 15, but with your patched kernel. I built using the Arch Linux 3.17.4 PKGBUILD after modding to use the source from the patched kernel and changing the version numbers. During the boot I see the systemd load modules failing (probably because it has a direct reference to 3.17), but I don't normally need to load any modules specially so I guess no harm done. I don't think I see any of your expected messages though.
Created attachment 158891 [details] debug patch for command ring status Thanks. looks like its not related to bug# 75521 after all. we never even get to run the command that should stop the endpoint ring. The whole command ring is not running. Could you add this attached debug patch (its also added to the for-usb-next-test branch) and show me the output of it failing
Created attachment 159001 [details] journalctl output with patch from 20141127 I think this one has some new debugs coming out.
Another sample for comparison. Created with for-usb-next-test branch mentioned in comment 18
Created attachment 159031 [details] Output of branch "for-usb-next-test"
(In reply to dezifit from comment #21) > Created attachment 159031 [details] > Output of branch "for-usb-next-test" I think something went wrong when you cloned that branch, It shows output messages like "Endpoint 0x81 not halted, refusing to reset." which shouldn't be possible in the "for-usb-next-test" branch, one of the patches removes that line.
(In reply to Robin Becker from comment #19) > Created attachment 159001 [details] > journalctl output with patch from 20141127 > > I think this one has some new debugs coming out. Thanks, it shows the command ring is running, and that we queued two "stop endpoint" commands on the command ring, but xhci doesn't comlete the commands even if the ring is running. Its probably not a very common situation to have two reset endpoint commands for different usb slots queued at the same time, maybe that triggers something. I can't think of anything else to try than adding code that prevents queuing two stop endpoint command on the actual hw ring at the same time, putting the other command on a sw ring until first command completes. but it might take a while before I get to write a hackpatch like that
I'll try this out when you have it ready. I looked at the code and there is a comment in xhci.c suggesting that only one stop command should be in flight, but I'm not clever enough to know if that applies here ie it might be 1 per ep or 1 per cancellation etc etc. Anyway thanks for your efforts.
Not sure whether this helps in resolution of this bug, but I asked for input on linux media and apparently others have also suffered from usb3/xhci problems with usb dvb. The following was suggested by Olli Salonen <olli.salonen a-t iki.fi> https://github.com/OpenELEC/OpenELEC.tv/commit/b636927dec20652ff020e54ed7838a2e9be51e03 which advises reverting commit 47f467ac740ebf0475a5176ddb1741acba6aad4 When I apply the above to Arch linux-18.2, my problem disappears and I can run 2 x pctv-290e + vlc / tzap & usb 3.
There are indications that this patch remedies the issue: http://www.spinics.net/lists/linux-usb/msg122678.html
I'm at work right now, but will give this patch a spin this evening. I will try it with Arch 3.19.3-3 x86_64 without the patch I currently use see 25 above.
OK I can confirm that the spinics patch in http://www.spinics.net/lists/linux-usb/msg122678.html also fixes the problem for me.
Did test the patch (see comment 26) with 3.12.40 and can confirm that XHCI_AVOID_BEI prevents xhci from freezing, so the initial issue is solved for me. I'm now seeing a bunch of new messages, don't know if they are related. Examples: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD xhci_hcd 0000:00:14.0: Signal while waiting for configure endpoint command
The spinics patch appears to be in Arch linux-4.0.1-1 and the bug has gone away for me.
I am using kernel 4.4.0-127-lowlatency (Ubuntu 14.04) and I have this bug. When we run an UVC camera, the host controller dies randomly after some point. The symptoms are exactly as described by other users above. I would like to know which kernels this bug affects and where a patch has been applied. May I know which kernel has the spinics patch ? Hardware: Asus X99-E-10G WS Motherboard Intel Xeon E5-2687W v4 3.0 GHz