Kernel Bug Tracker – Bug 51031
USB attached SCSI loops the scsi stack
Last modified: 2013-11-19 18:37:54 UTC
Created attachment 87331 [details]
syslog kernel output
Connecting my Dell 1TB portable external hard drive USB 3.0 PDA1000B to my USB 3.0 computer causes serious problems to the SCSI stack. Attached is the log when connecting and disconnecting the device.
The uas/scsi stack seems to not be able to get out an infinite loop. The system is not even able to halt cleanly (dunno why, but it stops after mount / readonly on gentoo).
Oh there is a known workaround, blacklisting the uas module.
Created attachment 87341 [details]
lsusb -vvv for the device
This is the output of lsusb -vvv -d 413c:9013
I'm going to test 3.7-rc6 to see if it works
3.7-rc6 is broken too :(
Enrico: as a quick work-around, can you blacklist the uas driver and just use the usb-storage driver?
As root, add a file to /etc/modprobe.d/ called blacklist-uas.conf. Add the line:
Then reboot your computer.
That should allow you to use your USB 3.0 hard drive without the UAS protocol, under the standard Bulk-only-Transport (BOT) protocol.
The uas driver is pretty beta, and I'm not sure when I'll have time to track down this bug.
Hi Sarah, thanks for the reply. As I said in comment #1 yes I can just blacklist uas and it works.
It is ok if uas is beta, but I would suggest to make it not autoloading and tag it as EXPERIMENTAL in the kernel config. Moreover if you don't have time to fix those kind of bugs quickly. So please at least make it not offensive for the common user, but do that quickly. I want to be clear: I didn't loaded uas, it autoloaded.
The EXPERIMENTAL tag is being removed from the kernel kconfig system. I did inform the major Linux distribution kernel teams (Suse, Ubuntu, Red Hat) that they should blacklist the uas driver. Sorry to cause you trouble.
No problem. Didn't know that is is being removed. Well then I suggest to add a warning in the config docs. No need to cause pain to non major Linux distribution users like me :). We exist and we want the same QA as others :). I use gentoo linux and I roll my own config for the kernel. There is no mention in the help about this not stable status.
Also "If you don't know whether you have a UAS device, it is safe to
say 'Y' or 'M' here and the kernel will use the right driver." suggest the user it is totally safe to use it.
Thank you. Cheers
On Tue, Nov 27, 2012 at 09:10:44PM +0000, firstname.lastname@example.org wrote:
> The EXPERIMENTAL tag is being removed from the kernel kconfig system. I did
> inform the major Linux distribution kernel teams (Suse, Ubuntu, Red Hat) that
> they should blacklist the uas driver. Sorry to cause you trouble.
If the driver is that broken, please make it depend on CONFIG_BROKEN
then, having it cause these kinds of problems isn't ok.
Looks like there are multiple bugs. Quoting the logs:
Nov 26 17:16:23 ivythink kernel: [ 99.626818] xhci_hcd 0000:00:14.0: ERROR Transfer event for disabled endpoint or incorrect stream ring
Nov 26 17:16:23 ivythink kernel: [ 99.626828] xhci_hcd 0000:00:14.0: @00000000c40027d0 c40a4020 00000000 01000000 03038000
This looks like a xhci streams bug to me (where uas is just the trigger because it actually uses usb3 streams).
Nov 26 17:16:53 ivythink kernel: [ 129.684045] xhci_hcd 0000:00:14.0: WARN: Slot ID 3, ep index 6 has stream IDs 1 to 32 allocated, but stream ID 9999 is requested.
Nov 26 17:16:53 ivythink kernel: [ 129.684051] scsi host7: sense urb submission failure
This is UAS submitting an urb for stream 9999 whereas there are only 32. That one should be fixed meanwhile. Can you attach a log from kernel 3.7-rc6 please? I expect those messages are gone then.
Nov 26 17:17:21 ivythink kernel: [ 157.842581] scsi host7: sense urb submission failure
Nov 26 17:17:21 ivythink kernel: [ 157.847976] scsi host7: sense urb submission failure
[ repeating forever ]
UAS retries submitting urbs over and over. That clearly must be improved. For starters we should stop doing that once the request has been canceled. Having a limit for the number of retries is probably a good idea too. I'll go cook up a patch for this one.
Created attachment 87761 [details]
syslog kernel 3.7-rc6 output
Here it is the log for 3.7-rc6 kernel. You are right it is different for hci stuff.
I also tried with 3.7-rc7 and the kernel got a PANIC, so I can't save the syslog. I took a picture of the console, I hope it will be readable. Otherwise just let me know how can I capture the dump (no serial console on a laptop). I guess I can mimic what centos does, but if you know a quick way, please share :).
Created attachment 87771 [details]
3.7-rc7 console panic
The panic don't happen immediatly when I insert the disc, but few seconds later, likely around ABORT TASK timed out, but it is very hard to tell, it happens so quickly....
Created attachment 87781 [details]
syslog kernel 3.7-rc7 output
Found a way to save the log. Login from the tty instead of X11. Ignore the sysrq for emergency sync, this is me. I was worried it was going to panic and so I was pushing the sync sysrq to save the log.
With this kernel the disc is not powered down, I can still see the led on. The device /dev/sdb is created but it is invalid and not usable.
But be aware the PANIC still happens as soon as I login to X11 again, so there is something really weird going on here. FTR I have an intel HD 4000, so no binary crap here.
As expected the stream id errors are gone, good.
Error handling is still broken.
Pushed patches improving that to
With these I expect you can plug the device in and out without the kernel crashing and without the endless stream of urb submission failures. It will probably not work though because xhci needs to be fixed too.
The panic looks unrelated on a quick glance.
Does this only happen with the uas module loaded?
The panic is definitely related. Never had a panic till I inserted the USB disc, and the update to kernel 3.7-rc7. Without uas nothing wrong with every kernel version, the disk and the system are working as expected.
I'll try that git tree as soon as I can find a bit of time. Thank you.
The server seems to be very slow in internet connection now, so I'm just copying uas.c on my 3.7-rc7 tree. There is a warning from the compiler
drivers/usb/storage/uas.c: In function ‘uas_eh_abort_handler’:
drivers/usb/storage/uas.c:715:6: warning: ‘ret’ may be used uninitialized in this function
and, from what I understand, it is perfectly right :)
Created attachment 87821 [details]
syslog kernel 3.7-rc7 with updated uas.c output
It is definitely improved. It doesn't work as you said of course, but it is a start.
And I also understood why the panic seems unrelated: What the picture shows is just the last part of a very long output.
First now it panics when I disconnect the drive, it drops to the console, showing a lot of backtraces from various kernel components. So what is shown in the picture is just some other kernel subsystem failed due to a failing of something else. Also if you wonder about the TAINTED, it is likely vmware, and I can easly remove it if needed.
I wonder if this can be related to another bug I never reported: I have another USB3 HDD, not supporting SCSI, so a "normal" one let's say. If I connect it and then quickly disconnect it before the kernel initialization is ended (so before KDE pops up there is a new removable drive) the kernel panics.
git tree updated, new patch series sent to linux-usb.
When testing please use the git tree or pick the debug patch from the git tree and apply it on top of the patch series.
When testing in qemu I've seen usb_kill_anchored_urbs() hanging forever on xhci, causing more hickups down the road. That could explain the problems you are seeing with the other HDD. Do those go away when connecting the disk to an ehci controller?
Created attachment 87941 [details]
syslog kernel 3.7-rc7 uas branch v2 output
Ok this time I cloned you repo. Still panics as soon as I type the password in kdm. If I keep kdm (and so X11) running but logged out, just using the laptop from the tty, it doesn't panic (so nothing changed from this side).
Created attachment 87951 [details]
syslog kernel 3.7-rc7 uas branch v2 output using ehci
This is using ehci. Doesn't panic but still doesn't work, ending up in urb submission failure
(In reply to comment #18)
> Created an attachment (id=87941) [details]
> syslog kernel 3.7-rc7 uas branch v2 output
> Ok this time I cloned you repo. Still panics as soon as I type the password in
> kdm. If I keep kdm (and so X11) running but logged out, just using the laptop
> from the tty, it doesn't panic (so nothing changed from this side).
xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
sd 6:0:0:0: [sdb] Sector size 0 reported, assuming 512.
sd 6:0:0:0: [sdb] 1 512-byte logical blocks: (512 B/512 B)
sd 6:0:0:0: [sdb] 0-byte physical blocks
That pretty much looks like the data from some mode sense scsi command wasn't written to the correct place. I guess xhci corrupts memory with streams active.
Maybe we should set COBNFIG_XHCI=n instead of CONFIG_UAS=n
(In reply to comment #19)
> Created an attachment (id=87951) [details]
> syslog kernel 3.7-rc7 uas branch v2 output using ehci
> This is using ehci. Doesn't panic but still doesn't work, ending up in urb
> submission failure
Ok. I ment whenever you can crash the kernel with the non-uas disk (as described in comment 16 last paragraph) @ ehci. But trying the uas disk on EHCI is useful too indeed.
The log doesn't look like this is the updated git tree. Can you double-check? The branch is rebased, maybe it wasn't picked up correctly because of that.
You are right I'm sorry, I forgotten to switch branch with checkout command. I apologize, I just have so much stuff to do eheheh. Will test the non UAS disk with time. For now I will test the UAS one couse it is the one we use at work (we have a lot of them and it is important to be fixed for our users).
I'm going to retest with the right branch!
Created attachment 88271 [details]
syslog kernel 3.7-rc7 uas branch v2 output
Ok hope this is the right one. I done the checkout uas this time. Now it seems to handle it a little better, but you can still see a infinite (very slow) loop in the check. The one starting with uas_eh_abort_handler.
After that when I disconnect I get some stack traces, then the "sense urb submission failure" begins, the machine become unresponsive (but still not dead or frozen) to user commands. I can also see more stack traces in the syslog between those submission faliures.
A patch referencing this bug report has been merged in Linux v3.8-rc1:
Author: Greg KH <email@example.com>
Date: Wed Nov 28 10:19:16 2012 -0800
USB: mark uas driver as BROKEN
I noticed that some changes have been done to the uas driver in linux-next since it was marked BROKEN - https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/log/drivers/usb/storage?qt=grep&q=uas - and this patch removes the dependency on BROKEN: https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/usb/storage/Kconfig?id=25e11ec4fe5271c4895265ecbb69531e6b0c0dd5
Is it already worth retrying if this bug still exists, or would it be premature to assume some of the changes could have fixed it?
Last time I tried uas didn't work (for usb 3.0 devices), but not due to bugs in uas itself but due to xhci streams support being broken.
The xhci issue causing the uas driver to not work is fixed for me by this *RFC*, testing, *no guarantees* patch from Gerd: http://article.gmane.org/gmane.linux.usb.general/93473
For those following this bug I've a branch with an extensive set of xhci-streams / uas fixes on top of usb-next here: http://git.linuxtv.org/hgoede/gspca.git/shortlog/refs/heads/usb-next-for-sarah
This is scheduled to go into 3.14 .