Bug 51031

Summary: USB attached SCSI loops the scsi stack
Product: Drivers Reporter: Enrico Tagliavini (enrico.tagliavini)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan, dewhite, erland, florian, jwrdegoede, kraxel, lvml, sarah, stern, sven.koehler
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6.6 3.7-rc6 3.7-rc7 Subsystem:
Regression: No Bisected commit-id:
Attachments: syslog kernel output
lsusb -vvv for the device
syslog kernel 3.7-rc6 output
3.7-rc7 console panic
syslog kernel 3.7-rc7 output
syslog kernel 3.7-rc7 with updated uas.c output
syslog kernel 3.7-rc7 uas branch v2 output
syslog kernel 3.7-rc7 uas branch v2 output using ehci
syslog kernel 3.7-rc7 uas branch v2 output

Description Enrico Tagliavini 2012-11-26 17:51:08 UTC
Created attachment 87331 [details]
syslog kernel output

Connecting my Dell 1TB portable external hard drive USB 3.0 PDA1000B to my USB 3.0 computer causes serious problems to the SCSI stack. Attached is the log when connecting and disconnecting the device.

The uas/scsi stack seems to not be able to get out an infinite loop. The system is not even able to halt cleanly (dunno why, but it stops after mount / readonly on gentoo).
Comment 1 Enrico Tagliavini 2012-11-26 17:51:39 UTC
Oh there is a known workaround, blacklisting the uas module.
Comment 2 Enrico Tagliavini 2012-11-26 17:55:04 UTC
Created attachment 87341 [details]
lsusb -vvv for the device

This is the output of lsusb -vvv -d 413c:9013

I'm going to test 3.7-rc6 to see if it works
Comment 3 Enrico Tagliavini 2012-11-26 18:15:34 UTC
3.7-rc6 is broken too :(
Comment 4 Sarah Sharp 2012-11-27 19:17:31 UTC
Enrico: as a quick work-around, can you blacklist the uas driver and just use the usb-storage driver?

As root, add a file to /etc/modprobe.d/ called blacklist-uas.conf.  Add the line:

blacklist uas

Then reboot your computer.

That should allow you to use your USB 3.0 hard drive without the UAS protocol, under the standard Bulk-only-Transport (BOT) protocol.

The uas driver is pretty beta, and I'm not sure when I'll have time to track down this bug.
Comment 5 Enrico Tagliavini 2012-11-27 20:44:03 UTC
Hi Sarah, thanks for the reply. As I said in comment #1 yes I can just blacklist uas and it works.

It is ok if uas is beta, but I would suggest to make it not autoloading and tag it as EXPERIMENTAL in the kernel config. Moreover if you don't have time to fix those kind of bugs quickly. So please at least make it not offensive for the common user, but do that quickly. I want to be clear: I didn't loaded uas, it autoloaded.
Comment 6 Sarah Sharp 2012-11-27 21:10:44 UTC
The EXPERIMENTAL tag is being removed from the kernel kconfig system.  I did inform the major Linux distribution kernel teams (Suse, Ubuntu, Red Hat) that they should blacklist the uas driver.  Sorry to cause you trouble.
Comment 7 Enrico Tagliavini 2012-11-27 21:17:22 UTC
No problem. Didn't know that is is being removed. Well then I suggest to add a warning in the config docs. No need to cause pain to non major Linux distribution users like me :). We exist and we want the same QA as others :). I use gentoo linux and I roll my own config for the kernel. There is no mention in the help about this not stable status.

Also "If you don't know whether you have a UAS device, it is safe to
say 'Y' or 'M' here and the kernel will use the right driver." suggest the user it is totally safe to use it.

Thank you. Cheers
Comment 8 Greg Kroah-Hartman 2012-11-28 00:41:23 UTC
On Tue, Nov 27, 2012 at 09:10:44PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> The EXPERIMENTAL tag is being removed from the kernel kconfig system.  I did
> inform the major Linux distribution kernel teams (Suse, Ubuntu, Red Hat) that
> they should blacklist the uas driver.  Sorry to cause you trouble.

If the driver is that broken, please make it depend on CONFIG_BROKEN
then, having it cause these kinds of problems isn't ok.
Comment 9 Gerd Hoffmann 2012-11-28 21:21:31 UTC
Looks like there are multiple bugs.  Quoting the logs:

Nov 26 17:16:23 ivythink kernel: [   99.626818] xhci_hcd 0000:00:14.0: ERROR Transfer event for disabled endpoint or incorrect stream ring
Nov 26 17:16:23 ivythink kernel: [   99.626828] xhci_hcd 0000:00:14.0: @00000000c40027d0 c40a4020 00000000 01000000 03038000

This looks like a xhci streams bug to me (where uas is just the trigger because it actually uses usb3 streams).

Nov 26 17:16:53 ivythink kernel: [  129.684045] xhci_hcd 0000:00:14.0: WARN: Slot ID 3, ep index 6 has stream IDs 1 to 32 allocated, but stream ID 9999 is requested.
Nov 26 17:16:53 ivythink kernel: [  129.684051] scsi host7: sense urb submission failure

This is UAS submitting an urb for stream 9999 whereas there are only 32.  That one should be fixed meanwhile.  Can you attach a log from kernel 3.7-rc6 please?  I expect those messages are gone then.

Nov 26 17:17:21 ivythink kernel: [  157.842581] scsi host7: sense urb submission failure
Nov 26 17:17:21 ivythink kernel: [  157.847976] scsi host7: sense urb submission failure
[ repeating forever ]

UAS retries submitting urbs over and over.  That clearly must be improved.  For starters we should stop doing that once the request has been canceled.  Having a limit for the number of retries is probably a good idea too.  I'll go cook up a patch for this one.
Comment 10 Enrico Tagliavini 2012-11-29 09:56:07 UTC
Created attachment 87761 [details]
syslog kernel 3.7-rc6 output

Here it is the log for 3.7-rc6 kernel. You are right it is different for hci stuff.
I also tried with 3.7-rc7 and the kernel got a PANIC, so I can't save the syslog. I took a picture of the console, I hope it will be readable. Otherwise just let me know how can I capture the dump (no serial console on a laptop). I guess I can mimic what centos does, but if you know a quick way, please share :).
Comment 11 Enrico Tagliavini 2012-11-29 10:04:17 UTC
Created attachment 87771 [details]
3.7-rc7 console panic 

The panic don't happen immediatly when I insert the disc, but few seconds later, likely around ABORT TASK timed out, but it is very hard to tell, it happens so quickly....
Comment 12 Enrico Tagliavini 2012-11-29 10:44:23 UTC
Created attachment 87781 [details]
syslog kernel 3.7-rc7 output

Found a way to save the log. Login from the tty instead of X11. Ignore the sysrq for emergency sync, this is me. I was worried it was going to panic and so I was pushing the sync sysrq to save the log.

With this kernel the disc is not powered down, I can still see the led on. The device /dev/sdb is created but it is invalid and not usable.

But be aware the PANIC still happens as soon as I login to X11 again, so there is something really weird going on here. FTR I have an intel HD 4000, so no binary crap here.
Comment 13 Gerd Hoffmann 2012-11-29 13:03:37 UTC
As expected the stream id errors are gone, good.

Error handling is still broken.
Pushed patches improving that to
  git://git.kraxel.org/linux uas

With these I expect you can plug the device in and out without the kernel crashing and without the endless stream of urb submission failures.  It will probably not work though because xhci needs to be fixed too.

The panic looks unrelated on a quick glance.
Does this only happen with the uas module loaded?
Comment 14 Enrico Tagliavini 2012-11-29 15:18:49 UTC
The panic is definitely related. Never had a panic till I inserted the USB disc, and the update to kernel 3.7-rc7. Without uas nothing wrong with every kernel version, the disk and the system are working as expected.

I'll try that git tree as soon as I can find a bit of time. Thank you.
Comment 15 Enrico Tagliavini 2012-11-29 17:23:20 UTC
The server seems to be very slow in internet connection now, so I'm just copying uas.c on my 3.7-rc7 tree. There is a warning from the compiler 

drivers/usb/storage/uas.c: In function ‘uas_eh_abort_handler’:
drivers/usb/storage/uas.c:715:6: warning: ‘ret’ may be used uninitialized in this function

and, from what I understand, it is perfectly right :)
Comment 16 Enrico Tagliavini 2012-11-29 17:53:30 UTC
Created attachment 87821 [details]
syslog kernel 3.7-rc7 with updated uas.c output

It is definitely improved. It doesn't work as you said of course, but it is a start.

And I also understood why the panic seems unrelated: What the picture shows is just the last part of a very long output.

First now it panics when I disconnect the drive, it drops to the console, showing a lot of backtraces from various kernel components. So what is shown in the picture is just some other kernel subsystem failed due to a failing of something else. Also if you wonder about the TAINTED, it is likely vmware, and I can easly remove it if needed.

I wonder if this can be related to another bug I never reported: I have another USB3 HDD, not supporting SCSI, so a "normal" one let's say. If I connect it and then quickly disconnect it before the kernel initialization is ended (so before KDE pops up there is a new removable drive) the kernel panics.
Comment 17 Gerd Hoffmann 2012-11-30 11:05:46 UTC
git tree updated, new patch series sent to linux-usb.

When testing please use the git tree or pick the debug patch from the git tree and apply it on top of the patch series.

When testing in qemu I've seen usb_kill_anchored_urbs() hanging forever on xhci, causing more hickups down the road.  That could explain the problems you are seeing with the other HDD.  Do those go away when connecting the disk to an ehci controller?
Comment 18 Enrico Tagliavini 2012-11-30 14:24:52 UTC
Created attachment 87941 [details]
syslog kernel 3.7-rc7 uas branch v2 output

Ok this time I cloned you repo. Still panics as soon as I type the password in kdm. If I keep kdm (and so X11) running but logged out, just using the laptop from the tty, it doesn't panic (so nothing changed from this side).
Comment 19 Enrico Tagliavini 2012-11-30 14:26:49 UTC
Created attachment 87951 [details]
syslog kernel 3.7-rc7 uas branch v2 output using ehci

This is using ehci. Doesn't panic but still doesn't work, ending up in urb submission failure
Comment 20 Gerd Hoffmann 2012-12-03 14:16:15 UTC
(In reply to comment #18)
> Created an attachment (id=87941) [details]
> syslog kernel 3.7-rc7 uas branch v2 output
> 
> Ok this time I cloned you repo. Still panics as soon as I type the password
> in
> kdm. If I keep kdm (and so X11) running but logged out, just using the laptop
> from the tty, it doesn't panic (so nothing changed from this side).

xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
sd 6:0:0:0: [sdb] Sector size 0 reported, assuming 512.
sd 6:0:0:0: [sdb] 1 512-byte logical blocks: (512 B/512 B)
sd 6:0:0:0: [sdb] 0-byte physical blocks

That pretty much looks like the data from some mode sense scsi command wasn't written to the correct place.  I guess xhci corrupts memory with streams active.  

Maybe we should set COBNFIG_XHCI=n instead of CONFIG_UAS=n
Comment 21 Gerd Hoffmann 2012-12-03 14:21:50 UTC
(In reply to comment #19)
> Created an attachment (id=87951) [details]
> syslog kernel 3.7-rc7 uas branch v2 output using ehci
> 
> This is using ehci. Doesn't panic but still doesn't work, ending up in urb
> submission failure

Ok.  I ment whenever you can crash the kernel with the non-uas disk (as described in comment 16 last paragraph) @ ehci.  But trying the uas disk on EHCI is useful too indeed.

The log doesn't look like this is the updated git tree.  Can you double-check?  The branch is rebased, maybe it wasn't picked up correctly because of that.
Comment 22 Enrico Tagliavini 2012-12-03 14:30:27 UTC
You are right I'm sorry, I forgotten to switch branch with checkout command. I apologize, I just have so much stuff to do eheheh. Will test the non UAS disk with time. For now I will test the UAS one couse it is the one we use at work (we have a lot of them and it is important to be fixed for our users).

I'm going to retest with the right branch!
Comment 23 Enrico Tagliavini 2012-12-03 15:24:29 UTC
Created attachment 88271 [details]
syslog kernel 3.7-rc7 uas branch v2 output

Ok hope this is the right one. I done the checkout uas this time. Now it seems to handle it a little better, but you can still see a infinite (very slow) loop in the check. The one starting with uas_eh_abort_handler.

After that when I disconnect I get some stack traces, then the "sense urb submission failure" begins, the machine become unresponsive (but still not dead or frozen) to user commands. I can also see more stack traces in the syslog between those submission faliures.
Comment 24 Florian Mickler 2012-12-22 09:30:14 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc1:

commit fb37ef98015f864d22be223a0e0d93547cd1d4ef
Author: Greg KH <gregkh@linuxfoundation.org>
Date:   Wed Nov 28 10:19:16 2012 -0800

    USB: mark uas driver as BROKEN
Comment 25 Lutz Vieweg 2013-07-22 22:58:27 UTC
I noticed that some changes have been done to the uas driver in linux-next since it was marked BROKEN - https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/log/drivers/usb/storage?qt=grep&q=uas - and this patch removes the dependency on BROKEN: https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/usb/storage/Kconfig?id=25e11ec4fe5271c4895265ecbb69531e6b0c0dd5

Is it already worth retrying if this bug still exists, or would it be premature to assume some of the changes could have fixed it?
Comment 26 Gerd Hoffmann 2013-07-24 08:21:07 UTC
Last time I tried uas didn't work (for usb 3.0 devices), but not due to bugs in uas itself but due to xhci streams support being broken.
Comment 27 Hans de Goede 2013-08-30 12:37:17 UTC
The xhci issue causing the uas driver to not work is fixed for me by this *RFC*, testing, *no guarantees* patch from Gerd: http://article.gmane.org/gmane.linux.usb.general/93473
Comment 28 Hans de Goede 2013-11-19 18:37:54 UTC
For those following this bug I've a branch with an extensive set of xhci-streams / uas fixes on top of usb-next here: http://git.linuxtv.org/hgoede/gspca.git/shortlog/refs/heads/usb-next-for-sarah

This is scheduled to go into 3.14 .