Bug 79511

Summary: Enabling UAS makes Seagate Expansion Desk unavailable (with workaround)
Product: Drivers Reporter: Alexandre Oliva (oliva)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: normal CC: jwrdegoede, tsester.univ, wolput
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.15.3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: lspci output

Description Alexandre Oliva 2014-07-05 00:01:26 UTC
I have a Seagate Expansion Desk 4TB disk (0bc2:3320) connected to a server on a USB3 xPCI card with a VL80x chipset (VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller (rev 03) (prog-if 30 [XHCI])).  Freed-ora (and Fedora) 3.15.* kernels that enable UAS won't recognize the disk (see below), whereas it works fine with 3.14.* and earlier, and it seems to work fine on another server when connected to a USB3 card with the same chipset, but from a different brand.  On the other server, it “works”, at least to some extent (though another similar 3TB disk (0bc2:3312), that belongs to this other server, triggers xhci oopses after a while; I'll save that for another bug).  I've managed to restore both disks to the same functioning level they enjoyed with 3.14.* adding usb-storage.quirks=0bc2:3320:u,0bc2:3312:u to the boot command line (or options usb-storage qurisk=0bc2:3312:u,0bc2:3320:u to /etc/modprobe.d/whatever.conf, and then rebuilding initrd).

[without UAS]:

Both disks work pretty much the same with the quirk above on 3.15.3, on 3.14.*, and on a 3.15.3 built with USB_UAS not set in .config.  Early on boot, the kernel prints:

usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
usb 4-1: Parent hub missing LPM exit latency info.  Power management will be impacted.
[...disk usb id and textual vendor, model and serial number msgs omitted...]
usbcore: registered new interface driver usb-storage

and then, a while later (usually while mounting the boot-time-mounted filesystems on it) it prints:

usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd
usb 4-1: Parent hub missing LPM exit latency info.  Power management will be impacted.
xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880223ce1580
xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880223ce15c0


[with UAS]:

When UAS is enabled, the initial (and final) message, printed early on boot on the original server, is:

usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
[...disk usb id and textual vendor, model and serial number msgs omitted...]
usbcore: registered new interface driver usb-storage
xhci_hcd 0000:02:00.0: ERROR: unexpected command completion code 0x11.
uas: probe of 4-1:1.0 failed with error -12
usbcore: registered new interface driver uas

and then the disk is not recognized.  (It would be nice if it could fallback to usb-storage in this case, but that would be a separate RFE :-)

I also tried to connect the disk to a USB2 port.  This made the disk be recognized, but it wouldn't work very reliably: the machine would freeze, and I believe it had to do with uas, but I couldn't catch a backtrace.

I had collected some dynamic debug messages of usb drivers at usb bus (software-initiated) resets, but although I recall seeing them in /var/log/messages, I can't find them there any more, and I can't go back and test that again right now.

So, please let me know whether you need any further info to try and make the disk be detected with UAS (and then we'll have to track down why it oopses under load), assuming we won't just mark these usb ids as not reliable for uas use.
Comment 1 Greg Kroah-Hartman 2014-07-05 16:33:01 UTC
On Sat, Jul 05, 2014 at 12:01:26AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=79511
> 
>             Bug ID: 79511
>            Summary: Enabling UAS makes Seagate Expansion Desk unavailable
>                     (with workaround)

Please email the linux-usb@vger.kernel.org mailing list about this.
Comment 2 Hans de Goede 2014-07-06 11:49:32 UTC
(In reply to Alexandre Oliva from comment #0)

<sniip>

> When UAS is enabled, the initial (and final) message, printed early on boot
> on the original server, is:
> 
> usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
> [...disk usb id and textual vendor, model and serial number msgs omitted...]
> usbcore: registered new interface driver usb-storage
> xhci_hcd 0000:02:00.0: ERROR: unexpected command completion code 0x11.
> uas: probe of 4-1:1.0 failed with error -12
> usbcore: registered new interface driver uas

Completion code 0x11 is COMP_EINVAL, Parameter Error - Context parameter is invalid.

So this seems to be an issue in communicating with the controller, rather then something specific to the usb disk enclosure in use.

We can probably best ffx this by marking via controllers as not having bulk streams (which should cause an automatic fallback to usb-storage), until we've found a workaround.

I'll prepare a patch for disabling bulk-streams on via chipsets (for now) tomorrow (Monday).

3 questions:

1) Can you please provide the output of lspci on both affected machines ?

2)  Have you tried upgrading the firmware of the xhci controller in these servers (this likely requires running windows) ?

3) Can you try the usb disk enclosure in question on a machine with another xhci controller (nec or intel) please? E.G. an ivy bridge or haswell equipped laptop ?
Comment 3 Hans de Goede 2014-07-08 09:14:50 UTC
Ping?

I really need lspci output on the 2 affected machines to be able to add a quirk to disable bulk-streams on via xhci controllers.
Comment 4 Alexandre Oliva 2014-07-09 22:30:17 UTC
On Jul  8, 2014, bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=79511

> --- Comment #3 from Hans de Goede <jwrdegoede@fedoraproject.org> ---
> Ping?

It seemed to be that bugzilla.kernel.org was down; I haven't been able
to connect to it since your earlier response.  Since you clearly could,
I'm guessing it's rejecting connections from whatever TOR exit node
that's being assigned to me.  Oh well...  Let's see if by email it gets
through.  Worst case, it won't make bugzilla, but at least it will get
to you.

> I really need lspci output on the 2 affected machines to be able to
> add a quirk to disable bulk-streams on via xhci controllers.

On the host in which UAS fails to configure streams and the disk didn't
work at all:

02:00.0 USB controller: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller (re
v 03) (prog-if 30 [XHCI])
        Subsystem: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller
        Flags: bus master, fast devsel, latency 0, IRQ 42
        Memory at fdeff000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [c4] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: xhci_hcd

On the host in which UAS brings disks up, and they work to some extent:

03:00.0 USB controller: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller (rev 03) (prog-if 30 [XHCI])
        Subsystem: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller
        Flags: bus master, fast devsel, latency 0, IRQ 43
        Memory at dffff000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [c4] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: xhci_hcd


I don't see any difference other than the IRQ number (IRQ 43 is used by
a network controller on the first host).

The MoBos have the same chipset, but they're of different brands.

The first one withstood a lightning strike years ago; that fried its
internal network interface; that's why it has a PCIe network card.  The
PCIe USB3 controllers were put in long after that.

I've considered switching the two cards, to see whether the problem
moves with them.  You think it might make sense?

There might also be something wrong with the power cable that supplies
5V (and 12V) to the USB3 card on this machine; the connection of
pre-SATA cables is always uncertain, and this card (unlke the one on the
other machine) worked fine without the additional power.  I only tried
to connect it once this problem arose, just in case.  I can't tell
whether it might make a difference that could be related with this bug
report.
Comment 5 Hans de Goede 2014-07-11 09:42:33 UTC
Alexander,

Sending the mail to bugzilla does seem to work :)

Can you please do a lspci -nn so that numerical ids get included ?

Once I've the lspci -nn output, I'll try to prepare a patch to disable streams on via xhci controllers sometime next week (as a work around for now).

It would be good to confirm that this is a problem specific to the via controllers though, can you try to attach one of the disks to a machine with kernel >= 3.15 and a nec or intel xhci controller ?

I would also like to know what chipset the disk enclosures are using, can you copy and paste the relevant lines from dmesg when you plug in the disk? Sometimes that gives a clue. Or maybe there is a firmware update available from the manufacturers site, that often gives a clue about the chipset used too.

Thanks,

Hans

p.s.

One thing which can sometimes help with xhci / uas issues is updating the firmware in the xhci controller and/or the uas device. Since these are add-on cards you could try to slap them in a windows machine to update the firmware, although I've been unable to find any specific firmware updates for these, but maybe the windows drivers to this under the hood? I know that you are a Free Software advocate, and in pains me to ask you this, but it might help ...
Comment 6 Alexandre Oliva 2014-07-11 15:39:27 UTC
03:00.0 USB controller [0c03]: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller [1106:3432] (rev 03)

02:00.0 USB controller [0c03]: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller [1106:3432] (rev 03)

Thanks for looking into this.  Unfortunately, I don't have access to any
other machine with xhci controllers.


I thought I'd posted the dmesg message before, but I can't check that
right now.  Here they are.  First, the disk that didn't work at all with
uas on the server it was connected to:

usb 4-4: new SuperSpeed USB device number 2 using xhci_hcd
usb 4-4: New USB device found, idVendor=0bc2, idProduct=3320
usb 4-4: New USB device strings: Mfr=2, Product=3, SerialNumber=1
usb 4-4: Product: Expansion Desk
usb 4-4: Manufacturer: Seagate
usb 4-4: SerialNumber: [snipped]

And here, the other disk, that worked for a while before uas oopses:

usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
usb 4-1: New USB device found, idVendor=0bc2, idProduct=3312
usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 4-1: Product: Expansion Desk
usb 4-1: Manufacturer: Seagate 
usb 4-1: SerialNumber: [snipped]


> cards you could try to slap them in a windows machine to update the firmware,

I don't have any windows machine, nor any license of windows.
Comment 7 Hans de Goede 2014-07-12 10:52:21 UTC
Thanks for the info, unfortunately this does not tell me which chipset the disk enclosure(s) in question are using. I don't want to just go and blacklist via controllers, when this just as well might be an enclosure specific problem. Can you try another uas capable enclosure on these servers ? and/or open the enclosure and see what is written on the bridge chip?

Note I've ordered a via xhci pci-e addon card myself, so alternatively we can just wait 3 weeks until that arrives and then I can test with my own uas enclosures.
Comment 8 Alexandre Oliva 2014-07-12 14:04:05 UTC
'fraid these are the only USB3 enclosures I own, and I don't think I can
open them without voiding the warranty.  I tried to find any info about
the chipset with lsusb and hdparm, to no avail.  I don't suppose my
asking Seagate customer-support services would get anywhere, so I guess
we're gonna have to wait those 3 weeks, or 3 months till the warranties
on my disks expire :-(

Thanks!
Comment 9 Alexandre Oliva 2014-08-05 03:59:25 UTC
I needed more RAM on my boxes, and the old MoBos couldn't deal with it, so I replaced them with ones that have USB3 built-in.  They identify as VIA Technologies Ltd, Device 3483.

The good news is that the 3TB enclosure has worked flawlessly with uas on one of the boxes.

the bad news is that the box to which the 4TB enclosure is connected has corrupted a number of filesystems since the upgrade, and it displays a few uas errors/resets early on, but aside from filesystem corruption all over (not limited to this external disk), it appears to work.  The obvious suspect for corruption would be one of the new memory modules, but memtest86+ is not detecting any problems, so I'm now wondering if there's any possibility that such uas errors as the ones below might place the uas driver in a state that might cause it to scribble over DMA areas at inappropriate times or somesuch.  For now, as part of hardware-bisecting the differences so as to pinpoint the culprit, I've disabled uas and taken out the new memory modules, and the system appears to be running fine.  I'm undecided between putting a memory module back in or enabling uas again as the next step.  Such comforting words as “these error messages couldn't possibly get uas into such a memory-corrupting state” would go a long way into making me confident to take that step next and verify the stattement ;-)

These are the messages I get early in the boot, on the machine running the 4TB disk:

[   78.962551] sd 6:0:0:0: [sdb] uas_eh_abort_handler ffff8800b285dc80 tag 0, inflight: CMD IN
[   81.967024] scsi host6: uas_eh_task_mgmt: ABORT TASK timed out
[   88.951366] sd 6:0:0:0: [sdb] uas_eh_abort_handler ffff8800b1098180 tag 1, inflight: CMD IN
[   88.951381] scsi host6: uas_eh_task_mgmt: ABORT TASK: error already running a task
[   88.951393] sd 6:0:0:0: uas_eh_device_reset_handler
[   88.951399] scsi host6: uas_eh_task_mgmt: LOGICAL UNIT RESET: error already running a task
[   88.951407] scsi host6: uas_eh_bus_reset_handler start
[   88.951416] sd 6:0:0:0: [sdb] uas_eh_bus_reset_handler ffff8804265f3500 tag 2, inflight: CMD IN
[   88.951591] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff8800b285dc80 tag 0, inflight: CMD IN abort
[   88.951606] sd 6:0:0:0: [sdb] cmd cmplt err -2
[   88.951820] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff8804265f3500 tag 2, inflight: CMD IN abort
[   88.951827] sd 6:0:0:0: [sdb] cmd cmplt err -2
[   88.952027] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff8800b1098180 tag 1, inflight: CMD IN abort
[   88.952037] sd 6:0:0:0: [sdb] cmd cmplt err -2
[   88.952238] usb 9-2: stat urb: killed, stream 31
[   88.952458] usb 9-2: stat urb: killed, stream 4
[   88.952669] usb 9-2: stat urb: killed, stream 3
[   88.952867] usb 9-2: stat urb: killed, stream 2
[   88.953067] sd 6:0:0:0: [sdb] uas_data_cmplt ffff8804265f3500 tag 2, inflight: CMD abort
[   88.953074] sd 6:0:0:0: [sdb] data cmplt err -2 stream 4
[   88.953261] sd 6:0:0:0: [sdb] uas_data_cmplt ffff8800b1098180 tag 1, inflight: CMD abort
[   88.953267] sd 6:0:0:0: [sdb] data cmplt err -2 stream 3
[   88.953340] sd 6:0:0:0: [sdb] uas_data_cmplt ffff8800b285dc80 tag 0, inflight: CMD abort
[   88.953345] sd 6:0:0:0: [sdb] data cmplt err -2 stream 2
[   88.953367] sd 6:0:0:0: [sdb] uas_zap_dead ffff8800b285dc80 tag 0, inflight: CMD abort
[   88.953379] sd 6:0:0:0: [sdb] abort completed
[   88.953386] sd 6:0:0:0: [sdb] uas_zap_dead ffff8800b1098180 tag 1, inflight: CMD abort
[   88.953392] sd 6:0:0:0: [sdb] abort completed
[   88.953399] sd 6:0:0:0: [sdb] uas_zap_dead ffff8804265f3500 tag 2, inflight: CMD abort
[   88.953403] sd 6:0:0:0: [sdb] abort completed
[   89.057365] usb 9-2: reset SuperSpeed USB device number 2 using xhci_hcd
[   89.068052] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719800
[   89.068063] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719848
[   89.068070] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719890
[   89.068075] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8804267198d8
[   89.069427] scsi host6: uas_eh_bus_reset_handler success
[  110.016728] sd 6:0:0:0: [sdb] uas_eh_abort_handler ffff8800b285c900 tag 7, inflight: CMD IN
[  113.021190] scsi host6: uas_eh_task_mgmt: ABORT TASK timed out
[  120.005575] sd 6:0:0:0: [sdb] uas_eh_abort_handler ffff880416583c80 tag 0, inflight: CMD IN
[  120.005609] scsi host6: uas_eh_task_mgmt: ABORT TASK: error already running a task
[  120.005635] sd 6:0:0:0: uas_eh_device_reset_handler
[  120.005651] scsi host6: uas_eh_task_mgmt: LOGICAL UNIT RESET: error already running a task
[  120.005674] scsi host6: uas_eh_bus_reset_handler start
[  120.005871] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff8800b285c900 tag 7, inflight: CMD IN abort
[  120.006011] sd 6:0:0:0: [sdb] cmd cmplt err -2
[  120.006263] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff880416583c80 tag 0, inflight: CMD IN abort
[  120.006401] sd 6:0:0:0: [sdb] cmd cmplt err -2
[  120.006687] usb 9-2: stat urb: killed, stream 31
[  120.006960] usb 9-2: stat urb: killed, stream 2
[  120.007220] usb 9-2: stat urb: killed, stream 9
[  120.007463] sd 6:0:0:0: [sdb] uas_data_cmplt ffff880416583c80 tag 0, inflight: CMD abort
[  120.007599] sd 6:0:0:0: [sdb] data cmplt err -2 stream 2
[  120.007865] sd 6:0:0:0: [sdb] uas_data_cmplt ffff8800b285c900 tag 7, inflight: CMD abort
[  120.008001] sd 6:0:0:0: [sdb] data cmplt err -2 stream 9
[  120.008142] sd 6:0:0:0: [sdb] uas_zap_dead ffff8800b285c900 tag 7, inflight: CMD abort
[  120.008167] sd 6:0:0:0: [sdb] abort completed
[  120.008184] sd 6:0:0:0: [sdb] uas_zap_dead ffff880416583c80 tag 0, inflight: CMD abort
[  120.008203] sd 6:0:0:0: [sdb] abort completed
[  120.111507] usb 9-2: reset SuperSpeed USB device number 2 using xhci_hcd
[  120.122202] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719800
[  120.122239] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719848
[  120.122262] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719890
[  120.122284] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8804267198d8
[  120.123471] scsi host6: uas_eh_bus_reset_handler success
[  141.102963] sd 6:0:0:0: [sdb] uas_eh_abort_handler ffff880426514c00 tag 6, inflight: CMD IN
[  144.107406] scsi host6: uas_eh_task_mgmt: ABORT TASK timed out
[  151.059747] sd 6:0:0:0: [sdb] uas_eh_abort_handler ffff8800b1099080 tag 0, inflight: CMD OUT
[  151.059785] scsi host6: uas_eh_task_mgmt: ABORT TASK: error already running a task
[  151.059811] sd 6:0:0:0: uas_eh_device_reset_handler
[  151.059826] scsi host6: uas_eh_task_mgmt: LOGICAL UNIT RESET: error already running a task
[  151.059850] scsi host6: uas_eh_bus_reset_handler start
[  151.059869] sd 6:0:0:0: [sdb] uas_eh_bus_reset_handler ffff8804265f2600 tag 1, inflight: CMD IN
[  151.060074] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff880426514c00 tag 6, inflight: CMD IN abort
[  151.060214] sd 6:0:0:0: [sdb] cmd cmplt err -2
[  151.060463] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff8804265f2600 tag 1, inflight: CMD IN abort
[  151.060612] sd 6:0:0:0: [sdb] cmd cmplt err -2
[  151.060900] sd 6:0:0:0: [sdb] uas_cmd_cmplt ffff8800b1099080 tag 0, inflight: CMD OUT abort
[  151.061041] sd 6:0:0:0: [sdb] cmd cmplt err -2
[  151.065342] usb 9-2: stat urb: killed, stream 31
[  151.069666] usb 9-2: stat urb: killed, stream 3
[  151.073832] usb 9-2: stat urb: killed, stream 2
[  151.077932] usb 9-2: stat urb: killed, stream 8
[  151.081925] sd 6:0:0:0: [sdb] uas_data_cmplt ffff8804265f2600 tag 1, inflight: CMD abort
[  151.085855] sd 6:0:0:0: [sdb] data cmplt err -2 stream 3
[  151.089925] sd 6:0:0:0: [sdb] uas_data_cmplt ffff8800b1099080 tag 0, inflight: CMD abort
[  151.093907] sd 6:0:0:0: [sdb] data cmplt err -2 stream 2
[  151.097934] sd 6:0:0:0: [sdb] uas_data_cmplt ffff880426514c00 tag 6, inflight: CMD abort
[  151.101758] sd 6:0:0:0: [sdb] data cmplt err -2 stream 8
[  151.105582] sd 6:0:0:0: [sdb] uas_zap_dead ffff880426514c00 tag 6, inflight: CMD abort
[  151.106264] sd 6:0:0:0: [sdb] abort completed
[  151.106908] sd 6:0:0:0: [sdb] uas_zap_dead ffff8800b1099080 tag 0, inflight: CMD abort
[  151.107560] sd 6:0:0:0: [sdb] abort completed
[  151.108208] sd 6:0:0:0: [sdb] uas_zap_dead ffff8804265f2600 tag 1, inflight: CMD abort
[  151.108853] sd 6:0:0:0: [sdb] abort completed
[  151.212752] usb 9-2: reset SuperSpeed USB device number 2 using xhci_hcd
[  151.224447] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719800
[  151.225136] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719848
[  151.225803] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880426719890
[  151.226474] xhci_hcd 0000:02:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8804267198d8
[  151.228331] scsi host6: uas_eh_bus_reset_handler success

Timing appeared to be correlated with initiating X11 (which made me think it might be some DMA setup race), but I've disabled X11 altogether and it still printed the messages at these times, so I'm confident that's not it, although the uas bus resets on a disk containing part of the root fs does slow down the initiation of X11, networking, and so on.

Assuming it's not uas that's causing memory and disk corruption, I suspect the above might be just a temporary glitch, similar to the reset messages I got (and still get occasionally) in the original description, and that uas will work flawlessly afterwards, but the zap/abort/reset messages above (in this comment), besides being scary (more so in the context of observable but possibly unrelated disk corruption), would be nice to avoid (to get rid of the ~1min delay in bootup).  Any suggestions?  (I'm running kernel-libre-3.15.7-200.fc20.gnu.x86_64 now)
Comment 10 Hans de Goede 2014-08-05 08:09:01 UTC
Hi,

Ok so it seems that the motherboards have a newer via chipset which does not exhibit the earlier problem. I hope the one I ordered does show the problem then ...

About your new problem, I don't see anything in the logs indicating an xhci / uas problem here. But it is very well possible for the xhci controller to create memory corruption due to bad dma, and since the old via controllers seem to have issues with streams, I would not rule this out,

So disabling uas certainly is worth trying.

Are the 2 new motherboards identical ? Can we get lspci -nn output on both boards please ?

Regards,

Hans
Comment 11 Alexandre Oliva 2014-08-06 23:21:23 UTC
On Aug  5, 2014, bugzilla-daemon@bugzilla.kernel.org wrote:

> Ok so it seems that the motherboards have a newer via chipset which does not
> exhibit the earlier problem. I hope the one I ordered does show the problem
> then ...

If it doesn't, send me PVT email; the two USB3 cards are now idle.

> About your new problem, I don't see anything in the logs indicating an xhci /
> uas problem here.

Indeed.  I went ahead and reenabled uas, and no corruption ensued.

Installing the memory modules on another server, OTOH, caused the other
server to start corrupting disk.

Apologies for the noise.

> Are the 2 new motherboards identical ? Can we get lspci -nn output on both
> boards please ?

Yeah, both are identical, except for variations in the “Memory at”
address:

02:00.0 USB controller [0c03]: VIA Technologies, Inc. Device [1106:3483] (rev 01
) (prog-if 30 [XHCI])
        Subsystem: Gigabyte Technology Co., Ltd Device [1458:5007]
        Flags: bus master, fast devsel, latency 0, IRQ 72
        Memory at fe200000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable+ Count=1/4 Maskable- 64bit+
        Capabilities: [c4] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: xhci_hcd

Thanks,
Comment 12 Hans de Goede 2014-08-25 10:06:13 UTC
So the via pci-e usb-3 card I ordered has finally arrived, and it has the same pci-ids as the newer via controller on Alexandre's new motherboards.

We're currently finding the best way to get Alexandre's old card from Brazil to the Netherlands. In the mean time I'm going to write a patch to disable streams on the pci-id of the older via cards which sofar have shown to have trouble with streams. Once I've such a card in my hands I will try to see if I can get streams to work somehow and then we can drop the disabling of streams on these cards.
Comment 13 Jos van Wolput 2014-08-29 11:55:58 UTC
I am using an Acer TravelMate 4530 AMD Athlon(tm) X2 Dual-Core QL-64 and a
Seagate Expansion 500GB disk (0bc2:2312).

After upgrading my linux kernel to 3.16 (Linux debian 3.16-1.dmz.1-liquorix-amd64, USB-UAS enabled) I have a similar issue with my Seagate expansion hard drive.
While mounting it or writing to it, my whole system sometimes freezes and needs a hard reset.
Comment 14 Jos van Wolput 2014-08-29 12:00:31 UTC
Created attachment 148691 [details]
lspci output

lspci attachment to my previous message
Comment 15 Hans de Goede 2014-08-29 12:01:50 UTC
(In reply to Jos van Wolput from comment #13)
> I am using an Acer TravelMate 4530 AMD Athlon(tm) X2 Dual-Core QL-64 and a
> Seagate Expansion 500GB disk (0bc2:2312).
> 
> After upgrading my linux kernel to 3.16 (Linux debian
> 3.16-1.dmz.1-liquorix-amd64, USB-UAS enabled) I have a similar issue with my
> Seagate expansion hard drive.
> While mounting it or writing to it, my whole system sometimes freezes and
> needs a hard reset.

Hmm, looking at your lspci output, your system does not have usb3 ports, only usb2, correct ?
Comment 16 Hans de Goede 2014-08-29 12:02:54 UTC
(In reply to Hans de Goede from comment #15)
> Hmm, looking at your lspci output, your system does not have usb3 ports,
> only usb2, correct ?

If I'm right about that, can you then please try building a kernel with this patch, and see of that helps ?  :
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/usb/storage/uas.c?id=e2875c33787ebda21aeecc1a9d3ff52b3aa413ec

Thanks,

Hans
Comment 17 Jos van Wolput 2014-08-29 12:13:28 UTC
(In reply to Hans de Goede from comment #15)
> Hmm, looking at your lspci output, your system does not have usb3 ports,
> only usb2, correct ?

That's correct, my laptop has usb2 and my usb disk has usb3.
Comment 18 Jos van Wolput 2014-08-29 12:33:20 UTC
(In reply to Jos van Wolput from comment #17)
> 
> If I'm right about that, can you then please try building a kernel with this
> patch, and see of that helps ?  :
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> drivers/usb/storage/uas.c?id=e2875c33787ebda21aeecc1a9d3ff52b3aa413ec
> 

I'm sorry, I never build my kernels, always using linux-images from Debian or Liquorix. The last time I did was more than 10 years ago and it was a lot of work!
I prefer to wait until the patched kernel is released.

To get rid of the UAS issue I added a file /etc/modprobe.d/ignore_uas.conf
containing 'options usb-storage quirks=0x0bc2:0x2312:u'as a workaround.

Regards,
Jos v.W.
Comment 19 Jos van Wolput 2014-09-09 03:56:52 UTC
(In reply to Hans de Goede from comment #16)

I understand this issue has been fixed in kernel 16-2.
After removing ignore_uas.conf and installing linux-image-3.16-2.dmz.1-liquorix-amd64 writing to the disk is now OK though automount is slow, about 35 seconds.

Thanks,
Jos v.W.
Comment 20 Hans de Goede 2014-09-10 14:20:55 UTC
A patch has been merged upstream disabling streams, and thus uas, on the affected via pci controllers.

Jos, your problem seems to be a different one, please respond to the mail requesting testing I've just send.
Comment 21 Hans de Goede 2014-09-12 12:03:32 UTC
A quick update on the problems Jos' is having with his Seagate Expansion Desk, which really is related to the 
Seagate Expansion Desk, and not to a via xhci controller as was the case for Alexandre Oliva, the original reporter of this bug.

I've send Jos an updated uas.c with more robust error handling, and he has tested his Seagate Expansion Desk with uas with that, it turns out that at least the 0bc2:2312 version of the seagate expansion desk hangs itself on ata pass through commands.

I'll send a patch adding a new US_FL_NO_ATA_1X flag for this + a quirk for his model upstream soon.
Comment 22 Spyros Papanastasiou 2014-12-13 14:20:18 UTC
Hi, i have a similar problem. I have an old seagate external hdd using 3.14.26-1-lts kernel in Archlinux:: 

Δεκ 13 14:26:26 Arch kernel: EXT4-fs (sdc2): mounted filesystem with ordered data mode. Opts: (null)
Δεκ 13 14:26:32 Arch kernel: end_request: critical target error, dev sdc, sector 193507368
Δεκ 13 14:26:32 Arch kernel: Aborting journal on device sdc2-8.
Δεκ 13 14:26:34 Arch kernel: EXT4-fs error (device sdc2): ext4_journal_check_start:56: Detected aborted journal
Δεκ 13 14:26:35 Arch kernel: EXT4-fs (sdc2): Remounting filesystem read-only
Δεκ 13 14:26:48 Arch kernel: EXT4-fs error (device sdc2): ext4_put_super:789: Couldn't clean up the journal
Δεκ 13 14:26:48 Arch kernel: end_request: critical target error, dev sdc, sector 0

is this related?? 

lspci -vv [ http://pastebin.com/AivNrErQ ]
journalctl -fk [ http://pastebin.com/NE2fcMij ]
Comment 23 Hans de Goede 2014-12-14 09:51:11 UTC
(In reply to Spyros Papanastasiou from comment #22)
> is this related?? 

No, this is not uas related. The problem seems to be that the disk is simply broken and needs to be replaced.

Regards,

Hans
Comment 24 Spyros Papanastasiou 2014-12-14 13:10:16 UTC
(In reply to Spyros Papanastasiou from comment #22)
> Hi, i have a similar problem. I have an old seagate external hdd using
> 3.14.26-1-lts kernel in Archlinux:: 
> 
> Δεκ 13 14:26:26 Arch kernel: EXT4-fs (sdc2): mounted filesystem with ordered
> data mode. Opts: (null)
> Δεκ 13 14:26:32 Arch kernel: end_request: critical target error, dev sdc,
> sector 193507368
> Δεκ 13 14:26:32 Arch kernel: Aborting journal on device sdc2-8.
> Δεκ 13 14:26:34 Arch kernel: EXT4-fs error (device sdc2):
> ext4_journal_check_start:56: Detected aborted journal
> Δεκ 13 14:26:35 Arch kernel: EXT4-fs (sdc2): Remounting filesystem read-only
> Δεκ 13 14:26:48 Arch kernel: EXT4-fs error (device sdc2):
> ext4_put_super:789: Couldn't clean up the journal
> Δεκ 13 14:26:48 Arch kernel: end_request: critical target error, dev sdc,
> sector 0
> 
> is this related?? 
> 
> lspci -vv [ http://pastebin.com/AivNrErQ ]
> journalctl -fk [ http://pastebin.com/NE2fcMij ]

i also tried kernel 3.12.35 with no luck. I might try writing through windows
Comment 25 Spyros Papanastasiou 2015-01-05 14:17:20 UTC
hi sorry to bother you again. I tryed creating new partition with ntfs and coping files from Windows with success. I also tried formating to ext2 and coping with success.

then i formated to ext4 and lots of problems occur: http://pastebin.com/i79yz6H7

what should i do to fix this?
Comment 26 Spyros Papanastasiou 2015-01-05 14:23:28 UTC
i forgot to mention i extracted the hdd and connected it to pc and wrote with btrfs and ext4 with no problem
Comment 27 Hans de Goede 2015-01-05 14:38:18 UTC
(In reply to Spyros Papanastasiou from comment #25)
> hi sorry to bother you again. I tryed creating new partition with ntfs and
> coping files from Windows with success. I also tried formating to ext2 and
> coping with success.
> 
> then i formated to ext4 and lots of problems occur:
> http://pastebin.com/i79yz6H7
> 
> what should i do to fix this?

As said before you seem to have broken hardware, the disk working when directly plugged into the PC indicates that the disk is likely fine and that the problem instead is that the enclosure is bad or the combination of the disk enclosure and the disk. Likely the enclosure is not providing enough power to the disk causing all sort of errors under heavy io.