Created attachment 126621 [details] dmesg showing bug, without module xhci_hcd =p Corsair Voyager GT 3.0 32GB on a Intel Z87. I'm using it mostly to store source code and as ccache directory, ie. lots of small reads & writes. Currently on 3.14-rc3 (self-compiled on Arch), ie. with the other xhci revert-fixes already applied. Happens since 3.14 development started. Can't remember ever having anything like it occur on 3.13. The error always follows the same sequence — with apparently varying addresses — though I cannot (or don't know how to) trigger it manually: 1. usb 4-2: reset SuperSpeed USB device number 3 using xhci_hcd 2. xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880428207900 3. usb-storage: Error in queuecommand_lck: us->srb = ffff8804283d0d80 Then the task that called the IO hangs, and afterwards the entire system, at the latest when I try to shutdown/restart. It also often happens that if I restart leaving the stick plugged in, it either won't show after boot or immediately triggers some other file system related bug (as seen in the attachment). I've tried ext4, f2fs and btrfs — the first always seems able to recover at some point, the second sometimes (at least if one manages to avoid any corrupted files henceforth), the third never. The attachment "dmesg", which actually shows the bug, was unfortunately taken without "module xhci_hcd =p", which I did not know about then. The other two were taken with it after two or three restarts, but apparently fsck.f2fs had managed to repair the partition by then, and no follow-up bug triggered. I'll try to catch the bug with the debug flag set. But as written, since it's seemingly random I don't know when I'll manage to do that.
Created attachment 126631 [details] dmesg after restart, before plugging stick in
Created attachment 126641 [details] dmesg after restart, after plugging stick in
On Tue, Feb 18, 2014 at 04:10:32PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=70781 > > Bug ID: 70781 > Summary: [xhci_hcd] reset SuperSpeed, xhci_drop_endpoint called > with disabled ep, Error in queuecommand_lck: task > blocked Please send this to the linux-usb@vger.kernel.org mailing list.
Created attachment 126751 [details] dmesg of bug occurring while debug flag set
Created attachment 126761 [details] dmesg after reboot with debug flag set
Andreas, this may be related to a known regression. Please try these two patches and see if it fixes your issue: http://marc.info/?l=linux-kernel&m=139420429908418&w=2 http://marc.info/?l=linux-kernel&m=139420432208425&w=2
Sorry for not having replied to the mailing list, I'm busy atm and the bug being sporadic makes it hard to debug anyhow. I'll apply the patches and see if it still occurs.
Hi list / bugzilla, just want you to know that a similar or same bug also occurs for Linux version 3.13.5-1-ARCH / 3.13.6-1-ARCH. I know that it's not a vanilla kernel but I don't know of any patches related to USB 3.0 to be added by ArchLinux. kernel log: [ 4578.359232] usb 4-2: Disable of device-initiated U1 failed. [ 4580.077667] usb 4-2: reset SuperSpeed USB device number 2 using xhci_hcd [ 4580.092832] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep --address-- [ 4580.092837] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep --address-- Affected device: Bus 004 Device 003: ID 05e3:0732 Genesys Logic, Inc. labeled as something like "Transcend SD card reader USB 3.0" with 32GB SD card Let me know if I can anything do to verify the bug (or if it is a independent one).
Unfortunately it is still happening. Using linux-git (ie. both patches applied): [ 3022.774515] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd [ 3022.787928] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880422309e80 [ 3022.787930] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880422309ec0 100x [ 3053.680116] usb-storage: Error in queuecommand_lck: us->srb = ffff880414422d80
I encounter same (or similar) problem with Intel DQ77MK motherboard (based on Q77 chipset) and Oyen Digital MiniPro 2TB USB 3.0 external disk drive (uses ASMedia 2105 USB-SATA bridge). The symptomps are the same on all kernels I tried - 3.11, 3.13.6, and 3.14.0-rc6. Either, when I/O activity starts at the disk, I get these messages: [ 414.912907] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd [ 414.929082] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800360f2200 1068 [ 414.929088] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800360f2240 or right after connecting the drive I see this: [ 136.463022] usb 4-1: USB disconnect, device number 2 [ 136.463740] sd 6:0:0:0: [sdc] Synchronizing SCSI cache [ 136.463798] sd 6:0:0:0: [sdc] [ 136.463800] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK In the latter case, if I issue sync, the process hangs: [ 601.302650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 601.303368] sync D ffff88041dc144c0 0 2065 2014 0x00000000 1086 [ 601.303371] ffff88040600bd80 0000000000000082 ffff88040607e380 ffff88040600bfd8 [ 601.303373] 00000000000144c0 00000000000144c0 ffff88040607e380 ffff88040600be90 [ 601.303375] ffff88040600be98 7fffffffffffffff ffff88040607e380 ffffffff811ef9d0 [ 601.303377] Call Trace: [ 601.303382] [<ffffffff811ef9d0>] ? vfs_fsync+0x40/0x40 [ 601.303385] [<ffffffff81731949>] schedule+0x29/0x70 [ 601.303387] [<ffffffff81730bf9>] schedule_timeout+0x219/0x2b0 [ 601.303390] [<ffffffff811518cb>] ? filemap_fdatawait_range+0x13b/0x190 [ 601.303393] [<ffffffff81081c8a>] ? __queue_delayed_work+0xaa/0x1a0 [ 601.303395] [<ffffffff810822dd>] ? try_to_grab_pending+0x11d/0x160 [ 601.303396] [<ffffffff811ef9d0>] ? vfs_fsync+0x40/0x40 [ 601.303398] [<ffffffff81732476>] wait_for_completion+0xa6/0x160 [ 601.303400] [<ffffffff8109c2d0>] ? wake_up_state+0x20/0x20 [ 601.303403] [<ffffffff811e85b5>] sync_inodes_sb+0xa5/0x1c0 [ 601.303404] [<ffffffff811ef9d0>] ? vfs_fsync+0x40/0x40 [ 601.303406] [<ffffffff811ef9e9>] sync_inodes_one_sb+0x19/0x20 [ 601.303408] [<ffffffff811c3d42>] iterate_supers+0xb2/0x110 [ 601.303410] [<ffffffff811efc55>] sys_sync+0x35/0x90 [ 601.303413] [<ffffffff8173e6ed>] system_call_fastpath+0x1a/0x1f It seems that the problem occurs only when I connect the drive to the USB 3.0 port. If I connect it to USB 2.0 port the problem does not happen. The drive seems to run smoothly on Windows (which is installed on the same machine). I attach dmesg, lspci, lspci -n, lsusb -v, and uname outputs to this comment. Let me know if any further information or actions on my side are needed.
Created attachment 129191 [details] dmesg with xhci_drop_endpoint
Created attachment 129201 [details] dmesg with DID_NO_CONNECT and hung sync
Created attachment 129211 [details] lspci
Created attachment 129221 [details] lspci -n
Created attachment 129231 [details] lsusb -v
I finally managed to get a trace via usbmon of the bug. Happened during a longer compilation; the trace was started about a minute into it.
Created attachment 129591 [details] dmesg
Created attachment 129601 [details] usbmon trace
Created attachment 130041 [details] dmesg with debug patch v2 for mailing list
Created attachment 130661 [details] testpatch to disable usb3 on intel hosts Could you try this patch first to check if this is Link PM related. This patch sets LPM disabled for intel hosts, but still sends LPM related control transfers.
Created attachment 130671 [details] patch to disable all lpm related control transfers If previous patch doesn't work, please try this one. It prevents all lpm related control transfers on all hosts. If the issue still exists after this one then it's probably not USB3 LPM related
(In reply to Mathias Nyman from comment #20) > Created attachment 130661 [details] > testpatch to disable usb3 on intel hosts > > Could you try this patch first to check if this is Link PM related. > This patch sets LPM disabled for intel hosts, but still sends LPM related > control transfers. Applying this patch to 3.13.7 seems to suppress the bug being triggered for me. I get the following output in dmesg: [ 1.410752] usb 4-2: new SuperSpeed USB device number 2 using xhci_hcd [ 1.424830] usb 4-2: Parent hub missing LPM exit latency info. Power management will be impacted.
Sorry for the noise. Looks like my original problem was indeed a different one fixed with "[PATCH 1/2] Revert xhci 1.0: Limit arbitrarily-aligned scatter gather." and "[PATCH 2/2] Revert USBNET: ax88179_178a: enable tso if usb host supports sg dma" for 3.13.7. Could be that this triggered the same error as the one of the original author. I rebuilt the original (broken for me) 3.13.6 with your patch. I didn't get any xhci related log entries so far. Looks like I can't reproduce the problem anymore.
Created attachment 131571 [details] dmesg, yet another crash
Hello, I also tried the patch "patch to disable all lpm related control transfers" from Mathias Nyman, but without success, still "xHCI xhci_drop_endpoint called with disabled ep"-messages, ... (kernel 3.14) USB3 controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) I have problems with my USB 3.0 devices; external HDDs (HGST Touro Mobile Pro (Simpletech-Chip), Fantec case with WDC, Asmedia Chip) and a Sandisk card reader. ** My configuration: Home PC: ASUS P7P55D-E PRO with Intel Chipset P55 with PLX PEX PCI Bridges attached to NEC Controller => not working For my office PC I have bought a PCIe-Card with the same NEC-Controller (it is exactly the same revision reported by lspci) and all the usb3-devices are working fine there => no errors !? Furthermore my usb3.0 stick Lexar 64GByte is working fine on *both* controllers! It is very strange, that on the controller with the PCIe-card all devices are working as expected, although both controllers have the same revision (firmware?). I can only explain that with different firmware revisions or because of the different connection/interface? (A firmware update on the mainboard's onboard chip is not possible or too risky.) PLX Chip with NEC: not working, PCIe-card with NEC: working ** Chipset P55 with PLX PEX PCI Bridges vs Sandybridge Mainboard with PCIe-Card(PCIe 2.0 x1, 5Gbps): Here are the differences between the lspci outputs; PCIe Type | CacheLineSize | Flags; AuxCurrent | PME |DevCtl | MaxReadReq |AuxPwr | CommClk ------------------------------------------------------------------------------------------------- PLX Chip, NEC | 32 | AuxCurrent=375mA |D3cold+ |NoSnoop+ | 512 bytes |AuxPwr+ | CommClk- PCIe-Card NEC:works| 64 | AuxCurrent=0mA |D3cold- |NoSnoop- | 128 bytes |AuxPwr- | CommClk+ lsusb diffs; PLX Chip, NEC; bmAttributes 0x02 (no LTM support) PCIe-Card: bmAttributes 0x00 Latency Tolerance Messages (LTM) Supported Are there any new patches available for testing? Thank you for your support, Bernhard errors: [18853.074081] xhci_hcd 0000:08:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8803e2119c80 [18853.074088] xhci_hcd 0000:08:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8803e2119cc0 [18890.189088] xhci_hcd 0000:08:00.0: xHCI host not responding to stop endpoint command. [18890.189096] xhci_hcd 0000:08:00.0: Assuming host is dying, halting host. [18890.232182] xhci_hcd 0000:08:00.0: Host not halted after 16000 microseconds. [18890.232193] xhci_hcd 0000:08:00.0: Non-responsive xHCI host is not halting. [18890.232197] xhci_hcd 0000:08:00.0: Completing active URBs anyway. [18890.232209] xhci_hcd 0000:08:00.0: HC died; cleaning up or [ 8427.730368] usb 4-1: reset SuperSpeed USB device number 5 using xhci_hcd [ 8427.748097] usb 4-1: Parent hub missing LPM exit latency info. Power management will be impacted. [ 8427.749549] xhci_hcd 0000:08:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880407ca7480 [ 8427.749555] xhci_hcd 0000:08:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880407ca74c0 [ 8427.870515] usb 4-1: reset SuperSpeed USB device number 5 using xhci_hcd [ 8427.888132] usb 4-1: Parent hub missing LPM exit latency info. Power management will be impacted. [ 8427.889549] xhci_hcd 0000:08:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880407ca7480 [ 8427.889555] xhci_hcd 0000:08:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880407ca74c0
Are there any new patches to test or news regarding "xhci_drop_endpoint"-issue?
@Bernard at least a workaround (using 3.12.32) which is the latest working longterm kernel based on a very simple bisecting I've done, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1371233 for details (and yes, you seriously have to download my post and open it in a text editor to read it...)