Bug 41752 - [PATCH]USB 3.0 port does not recognize external disk
Summary: [PATCH]USB 3.0 port does not recognize external disk
Status: REOPENED
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: XHCI bugs virtual user
URL: http://download.opensuse.org/reposito...
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-25 19:37 UTC by Harald Brennich
Modified: 2016-02-17 08:44 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.1.0-rc3-1-vanilla
Tree: Mainline
Regression: No


Attachments
dmesg when trying to connect the disk (22.78 KB, text/plain)
2011-08-25 19:37 UTC, Harald Brennich
Details
The original dmesg unzipped (245.61 KB, text/plain)
2011-09-03 11:25 UTC, Harald Brennich
Details
Patch to try warm reset if USB3 port reports a disconnect & inactive link state. (1.99 KB, patch)
2011-09-06 16:59 UTC, Sarah Sharp
Details | Diff
/var/log/messages when the patch is applied (gzipped) (90.39 KB, application/x-gzip)
2011-09-06 18:39 UTC, Harald Brennich
Details
Make USB 3.0 work on L755D notebook (1.60 KB, application/octet-stream)
2013-12-02 11:53 UTC, Harald Brennich
Details
0002-USB-When-hot-reset-for-USB3-fails-try-warm-reset.patch (3.20 KB, application/octet-stream)
2013-12-02 21:23 UTC, Harald Brennich
Details
Make_USB3.0_port_work_with_L755D_notebook.patch (1.60 KB, application/octet-stream)
2013-12-02 21:23 UTC, Harald Brennich
Details
revert of beabe20445c60322719d8f58e9eb9dd4660c1b3e for 3.17.1 (7.37 KB, patch)
2014-10-30 20:17 UTC, help
Details | Diff

Description Harald Brennich 2011-08-25 19:37:07 UTC
Created attachment 70272 [details]
dmesg when trying to connect the disk

Hardware: Toshiba Satellite L755
USB:
08:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
Peripherals: Logilink USB 3.0 to SATA HDD Adapter with Western Digital WD2500JS
When the external disk is connected to a USB 2.0 port, it is detected and can be mounted. When it is connected to the USB 3.0 port, the drive starts spinning and there are entries in dmesg, but the external disk cannot be mounted.
Comment 1 Harald Brennich 2011-09-02 13:04:43 UTC
After adding some trace statements to the source (linux-3.1.0-rc3-1-vanilla/drivers/usb/core/hub.c) I found the reason for the failure in function hub_port_reset. Either the Logilink USB 3.0 to SATA HDD Adapter or the NEC Corporation uPD720200 USB 3.0 Host Controller responds to this request by changing the pair status/status change from 0x203/0x1 to 0x2c0/0x51. To this the function hub_port_wait_reset responds with the error code -ENOTCONN.
By modifying function hub_port_wait_reset from
		/* Device went away? */
		if (!(portstatus & USB_PORT_STAT_CONNECTION))
			return -ENOTCONN;

		/* bomb out completely if the connection bounced */
		if ((portchange & USB_PORT_STAT_C_CONNECTION))
			return -ENOTCONN;
to
		if (hub->hdev->descriptor.bcdUSB != 0x300) {
		/* Device went away? */
		if (!(portstatus & USB_PORT_STAT_CONNECTION))
			return -ENOTCONN;

		/* bomb out completely if the connection bounced */
		if ((portchange & USB_PORT_STAT_C_CONNECTION))
			return -ENOTCONN;
		} else {
		  if (!(portstatus & USB_PORT_STAT_CONNECTION) && !(portchange & USB_PORT_STAT_C_CONNECTION))
			return -ENOTCONN;
		}
I got the disk to run. Timing measured with hdparm is as following:
The internal disk:
 Timing buffered disk reads: 228 MB in  3.04 seconds =  75.02 MB/sec
The external disk connected via USB 3.0:
 Timing buffered disk reads: 172 MB in  3.02 seconds =  56.93 MB/sec
The external disk connected via USB 2.0:
 Timing buffered disk reads:  94 MB in  3.01 seconds =  31.24 MB/sec

So the hack seems to work at least partially. There remain two open points:
Firstly, I do not know whether there is a firmware bug or a programming bug, though the problem probably is with the firmware.
Secondly, when the disk is disconnected from the USB 3.0 port, the entry in 
/dev remains. When the disk is disconnected from the USB 2.0 port, the corresponding /dev/entry disappears.
Comment 2 Sarah Sharp 2011-09-02 21:51:53 UTC
That's not a feasible fix, because it won't ever allow your device to disconnect (as you noticed when the /dev entry didn't go away).

Can you please try updating the firmware on your NEC host controller first?  We've had some issues with devices not being able to connect (or at least not connect at high speed) that were fixed with a firmware upgrade.

I'll have a look at your dmesg output in the mean time.
Comment 3 Sarah Sharp 2011-09-02 21:54:18 UTC
Your dmesg is completely garbled (lots of unicode characters? Did you upload an executable?).  Can you re-upload it?
Comment 4 Harald Brennich 2011-09-03 11:25:59 UTC
Created attachment 71522 [details]
The original dmesg unzipped
Comment 5 Harald Brennich 2011-09-03 11:46:27 UTC
Do not consider the code modification as a fix, but as a hack. I am simply trying to find out why the USB 3.0 port does not work with Linux. With Windows 7, the disk is detected (and shown in the Windows Explorer) on connecting to the port and removed from the Windows Explorer when disconnecting from the port, so the USB 3.0 port seems to conform to USB 3.0 specs to a certain degree.
As to updating the firmware: I do not even find the controller on the NEC site. Also, Windows gives the controller another product name (I took the name from lspci). Lastly, I don't even know how to upgrade controller firmware :(. 
Finally, the hack may prevent the external disk from properly disconnecting. However, I think some trace of a disconnecting event should be found in dmesg. But there is nada - no call to hub_irq, no call of hub_events .. simply nothing.
Comment 6 Harald Brennich 2011-09-06 15:43:42 UTC
Hello Sarah,
the current implementation of function hub_port_wait_reset returns -ENOTCONN even during the ongoing reset. I propose to ignore changes of connection state during reset. So instead of 

static int hub_port_wait_reset(struct usb_hub *hub, int port1,
				struct usb_device *udev, unsigned int delay)
{
	int delay_time, ret;
	u16 portstatus;
	u16 portchange;

	for (delay_time = 0;
			delay_time < HUB_RESET_TIMEOUT;
			delay_time += delay) {
		/* wait to give the device a chance to reset */
		msleep(delay);

		/* read and decode port status */
		ret = hub_port_status(hub, port1, &portstatus, &portchange);
		if (ret < 0)
			return ret;
		/* Device went away? */
		if (!(portstatus & USB_PORT_STAT_CONNECTION))
			return -ENOTCONN;

		  /* bomb out completely if the connection bounced */
		  if ((portchange & USB_PORT_STAT_C_CONNECTION))
			return -ENOTCONN;

		/* if we`ve finished resetting, then break out of the loop */
		if (!(portstatus & USB_PORT_STAT_RESET) &&
		    (portstatus & USB_PORT_STAT_ENABLE)) {
		  if (hub_is_wusb(hub))
		    udev->speed = USB_SPEED_WIRELESS;
		  else if (hub_is_superspeed(hub->hdev))
		    udev->speed = USB_SPEED_SUPER;
		  else if (portstatus & USB_PORT_STAT_HIGH_SPEED)
		    udev->speed = USB_SPEED_HIGH;
		  else if (portstatus & USB_PORT_STAT_LOW_SPEED)
		    udev->speed = USB_SPEED_LOW;
		  else
		    udev->speed = USB_SPEED_FULL;
		  return 0;
		}

		/* switch to the long delay after two short delay failures */
		if (delay_time >= 2 * HUB_SHORT_RESET_TIME)
			delay = HUB_LONG_RESET_TIME;

		dev_dbg (hub->intfdev,
			"port %d not reset yet, waiting %dms\n",
			port1, delay);
	}

	return -EBUSY;
}

the following implementation:

static int hub_port_wait_reset(struct usb_hub *hub, int port1,
				struct usb_device *udev, unsigned int delay)
{
	int delay_time, ret;
	u16 portstatus;
	u16 portchange;

	for (delay_time = 0;
			delay_time < HUB_RESET_TIMEOUT;
			delay_time += delay) {
		/* wait to give the device a chance to reset */
		msleep(delay);

		/* read and decode port status */
		ret = hub_port_status(hub, port1, &portstatus, &portchange);
		if (ret < 0)
			return ret;

		/* if we`ve finished resetting, then break out of the loop */
		if (!(portstatus & USB_PORT_STAT_RESET) &&
		    (portstatus & USB_PORT_STAT_ENABLE)) {
		  /* Device went away? */
		  if (!(portstatus & USB_PORT_STAT_CONNECTION))
			return -ENOTCONN;
		  if (hub_is_wusb(hub))
		    udev->speed = USB_SPEED_WIRELESS;
		  else if (hub_is_superspeed(hub->hdev))
		    udev->speed = USB_SPEED_SUPER;
		  else if (portstatus & USB_PORT_STAT_HIGH_SPEED)
		    udev->speed = USB_SPEED_HIGH;
		  else if (portstatus & USB_PORT_STAT_LOW_SPEED)
		    udev->speed = USB_SPEED_LOW;
		  else
		    udev->speed = USB_SPEED_FULL;
		  return 0;
		}

		/* switch to the long delay after two short delay failures */
		if (delay_time >= 2 * HUB_SHORT_RESET_TIME)
			delay = HUB_LONG_RESET_TIME;

		dev_dbg (hub->intfdev,
			"port %d not reset yet, waiting %dms\n",
			port1, delay);
	}

	/* Device went away? */
	if (!(portstatus & USB_PORT_STAT_CONNECTION))
	  return -ENOTCONN;
	/* bomb out completely if the connection bounced */
	if ((portchange & USB_PORT_STAT_C_CONNECTION))
	  return -ENOTCONN;
	return -EBUSY;
}

This allows devices to report a disconnect during reset. Or is there something in the USB specs that prohibits this handling of feature RESET?
Comment 7 Sarah Sharp 2011-09-06 16:35:36 UTC
Here's what I think is happening:

[  681.859162] xhci_hcd 0000:08:00.0: Port Status Change Event for port 1
...
[  681.978459] xhci_hcd 0000:08:00.0: get port status, actual port 0 status  = 0x1203
# The SuperSpeed device shows up fine, port status = enabled, CSC, U0 state, powered, superspeed
# slot ID is allocated

[  681.978690] xhci_hcd 0000:08:00.0: set port reset, actual port 0 status  = 0x1311

# port status = CSC, disabled, port in reset, recovery, powered, superspeed
# (Not sure why the port is disabled here.)

[  681.978700] xhci_hcd 0000:08:00.0: Port Status Change Event for port 1
[  682.034421] xhci_hcd 0000:08:00.0: get port status, actual port 0 status  = 0x6202c0

# status = disconnected, disabled, no reset in progress, inactive link state, powered, unknown speed, connect status change

I think the hot port reset failed here, and we need to do a warm port reset.  This is what usually needs to be done when the port link state is in the inactive state.  The host controller is supposed to try warm port reset if hot reset fails, but it's possible your host controller does not, or it's possible your host controller is in the middle of an automatic warm port reset when we read the port status register.  In any case, I'll whip up a patch to start a warm port reset when the SuperSpeed device state is in inactive and the device is disconnected.
Comment 8 Sarah Sharp 2011-09-06 16:59:12 UTC
Created attachment 71772 [details]
Patch to try warm reset if USB3 port reports a disconnect & inactive link state.

Harald, please try out this patch.
Comment 9 Harald Brennich 2011-09-06 18:39:52 UTC
Created attachment 71782 [details]
/var/log/messages when the patch is applied (gzipped)
Comment 10 Harald Brennich 2011-09-06 18:43:28 UTC
Hello Sarah,
The suggested patch does not work for my USB 3.0. hub_port_wait_reset does not return an error, but the hub still doesn't work. The disk was plugged in at 20:18:26. 
Maybe the problem is now
Sep  6 20:18:27 HATOSH kernel: [   88.503250] usb 4-1: Device not responding to set address.
Sep  6 20:18:27 HATOSH kernel: [   88.706747] usb 4-1: device not accepting address 4, error -71
Comment 11 Florian Mickler 2012-01-12 21:29:38 UTC
A patch referencing this bug report has been merged in Linux v3.2-rc1:

commit 10d674a82e553cb8a1f41027bb3c3e309b3f6804
Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Date:   Wed Sep 14 14:24:52 2011 -0700

    USB: When hot reset for USB3 fails, try warm reset.
Comment 12 Harald Brennich 2012-01-21 20:59:51 UTC
My USB3.0 port with kernel 3.2.0-rc1-286-gd291ffb-1-vanilla and later now works out of the box.
Status can be set to resolved.
Comment 13 Florian Mickler 2012-07-25 19:37:52 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.5-rc7:

commit 8bea2bd37df08aaa599aa361a9f8b836ba98e554
Author: Stanislaw Ledwon <staszek.ledwon@linux.jf.intel.com>
Date:   Mon Jun 18 15:20:00 2012 +0200

    usb: Add support for root hub port status CAS
Comment 14 Florian Mickler 2012-09-19 22:14:26 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.6-rc6:

commit 71c731a296f1b08a3724bd1b514b64f1bda87a23
Author: Alexis R. Cortes <alexis.cortes@ti.com>
Date:   Fri Aug 3 14:00:27 2012 -0500

    usb: host: xhci: Fix Compliance Mode on SN65LVPE502CP Hardware
Comment 15 Florian Mickler 2012-10-15 21:20:52 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.7-rc1:

commit 457a73d346187c2cc5d599072f38676f18f130e0
Author: Vivek Gautam <gautam.vivek@samsung.com>
Date:   Sat Sep 22 18:11:19 2012 +0530

    usb: host: xhci: Fix Null pointer dereferencing with 71c731a for non-x86 systems
Comment 16 Florian Mickler 2012-12-22 09:25:25 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.8-rc1:

commit b0e4e606ff6ff26da0f60826e75577b56ba4e463
Author: Alexis R. Cortes <alexis.cortes@ti.com>
Date:   Thu Nov 8 16:59:27 2012 -0600

    usb: host: xhci: Stricter conditional for Z1 system models for Compliance Mode Patch
Comment 17 Harald Brennich 2013-12-02 11:53:08 UTC
Created attachment 117161 [details]
Make USB 3.0 work on L755D notebook

For the newer kernels supplied by openSuse, my USB 3.0 device again is not recognised. 
Tested kernel versions are
3.4.63-2.44-desktop
3.7.10-1.16-desktop
3.11.6-4-desktop
3.13.0-rc2-1-gaf91706-1-vanilla
Taking the patch 
"USB: When hot reset for USB3 fails, try warm reset." from Date: Tue, 6 Sep 2011 09:53:01 -0700 with some modification to take account of changes in the hub.c source code, I made a patch that makes my USB 3.0 portwork again.
It has been tested with openSuse3.11.6-4-vanilla.
Please note that I do not know
- what I am really doing
- why the patch is working
- whether the code in the patch ever was part of hub.c, and if so why it was dropped
- what side effects the code may have
All that I know is that it works.
Comment 18 Sarah Sharp 2013-12-02 18:10:40 UTC
Harald:

The upstream kernel has the commit you mentioned (commit 10d674a82e55 "USB: When hot reset for USB3 fails, try warm reset.").  When you tested 3.13.0-rc2-1-gaf91706-1-vanilla, the kernel should have had that patch.  Did 3.13-rc2 fail as well?

Can you point me to a URL for the patch you modified?  And post the modified patch that fixes your system?
Comment 19 Harald Brennich 2013-12-02 21:23:33 UTC
Created attachment 117201 [details]
0002-USB-When-hot-reset-for-USB3-fails-try-warm-reset.patch

Sarah,
I think You mailed me the original patch (attachment
0002-USB-When-hot-reset....). My modification is in attachment
Make_USB3.0_port_work...
I have not tried  3.13-rc2 from kernel.org, only the version openSuse  
offers (3.13.0-rc2-1-gaf91706-1-vanilla). However I also tried a 3.12  
kernel direct from kernel.org, that did not work either. To me it looks  
like your patch got lost somewhere in the later 3.4.x versions, maybe in  
3.4.63.


Am 02.12.2013, 19:10 Uhr, schrieb <bugzilla-daemon@bugzilla.kernel.org>:

> https://bugzilla.kernel.org/show_bug.cgi?id=41752
>
> --- Comment #18 from Sarah Sharp <sarah.a.sharp@linux.intel.com> ---
> Harald:
>
> The upstream kernel has the commit you mentioned (commit 10d674a82e55  
> "USB:
> When hot reset for USB3 fails, try warm reset.").  When you tested
> 3.13.0-rc2-1-gaf91706-1-vanilla, the kernel should have had that patch.   
> Did
> 3.13-rc2 fail as well?
>
> Can you point me to a URL for the patch you modified?  And post the  
> modified
> patch that fixes your system?
>
Comment 20 Harald Brennich 2013-12-02 21:23:42 UTC
Created attachment 117211 [details]
Make_USB3.0_port_work_with_L755D_notebook.patch
Comment 21 Harald Brennich 2013-12-03 08:17:26 UTC
I have diffed drivers/usb/core/hub.c for versions 3.12 and 3.13-rc2 from kernel.org and 3.11.6-4-vanilla from openSuse and none of these have the special treatment for (!warm) in hub_port_warm_reset_required that was in the patch from Sarah.
Comment 22 help 2014-10-30 20:17:52 UTC
Created attachment 155891 [details]
revert of beabe20445c60322719d8f58e9eb9dd4660c1b3e for 3.17.1

I just posted to linux-usb mailing list about the issue which I think is the same as this one, I'm gonna copy it here:

I have 2 very similar USB3 devices that stopped working sometime after
kernel version 3.3 - they fail to enumerate unless I reload xhci_hcd
driver.

These are the devices:
http://www.agestar.com/en/Products/Docking-Station/USB3-0/974-usb30esata-to-2535-sata-hdd-docking-station.html
http://www.agestar.com/en/Products/Docking-Station/USB3-0/980-usb30esata-to-2535-sata-hdd-docking-station.html
Basically it's some eSATA->USB3 bridge JMicron chip (I'm guessing it's
the same one in both devices):
Bus 009 Device 002: ID 152d:2509 JMicron Technology Corp. / JMicron USA Technology Corp. JMS539 SuperSpeed SATA II 3.0G Bridge

This is USB controller I am testing them with:
01:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
I even tried upgrading firmware, with no result

It appears in system as 2 buses (lsusb -t output):
/:  Bus 09.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/:  Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
One for SuperSpeed devices, another for slower.

So here's dmesg for when it fails:
[  195.207408] hub 8-0:1.0: unable to enumerate USB device on port 1
[  196.016556] usb 9-1: device not accepting address 2, error -22
[  196.520572] hub 9-0:1.0: unable to enumerate USB device on port 1

And here how it should look (when it works):
[   61.686218] hub 8-0:1.0: unable to enumerate USB device on port 1
[   63.620211] usb 9-1: new SuperSpeed USB device number 2 using xhci_hcd
[   63.636918] usb 9-1: New USB device found, idVendor=152d, idProduct=2509
[   63.636932] usb 9-1: New USB device strings: Mfr=1, Product=11, SerialNumber=3
[   63.636942] usb 9-1: Product: Usb production
[   63.636950] usb 9-1: Manufacturer: JMicron
[   63.636956] usb 9-1: SerialNumber: 00A1234578EA
[   63.638549] scsi5 : usb-storage 9-1:1.0
[   64.926983] scsi 5:0:0:0: Direct-Access     Jmicron  Corp.            0000 PQ: 0 ANSI: 5
[   64.927925] sd 5:0:0:0: Attached scsi generic sg6 type 0
[   64.929912] sd 5:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16).
[   64.930353] sd 5:0:0:0: [sdg] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[   64.931031] sd 5:0:0:0: [sdg] Write Protect is off
[   64.931042] sd 5:0:0:0: [sdg] Mode Sense: 28 00 00 00
[   64.931764] sd 5:0:0:0: [sdg] No Caching mode page present
[   64.931772] sd 5:0:0:0: [sdg] Assuming drive cache: write through
[   64.932486] sd 5:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16).
[   64.933992] sd 5:0:0:0: [sdg] No Caching mode page present
[   64.933997] sd 5:0:0:0: [sdg] Assuming drive cache: write through
[   64.989015]  sdg: sdg1 sdg2 sdg3 sdg4
[   64.992112] sd 5:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16).
[   64.993885] sd 5:0:0:0: [sdg] No Caching mode page present
[   64.993898] sd 5:0:0:0: [sdg] Assuming drive cache: write through
[   64.993909] sd 5:0:0:0: [sdg] Attached SCSI disk
First line doesn't always appear, might depend on kernel version, I'm
not sure.

I managed to bisect this down to this commit:
beabe20445c60322719d8f58e9eb9dd4660c1b3e
(it's from 3.4 branch, included in 3.4.36 release, upstream commit id
from commit message seems to be invalid, at least it's missing one
character).
I backported reverse of this commit to 3.17.1 (I can't run 3.4 kernel
due to different issues) and it helps with this issue. Patch attached
in case it is helpful, sorry though for whitespace mess, I used nano:)

I doubt that just reverting is acceptable solution for mainstream
kernel, so I'm willing to test some other patches (on top of 3.17.1
would be best) or provide additional information so that this issue
could be fixed in next releases.
Comment 23 Harald Brennich 2014-11-21 11:16:49 UTC
Kernel 3.16.6 still has this bug. The patch
Make_USB3.0_port_work_with_L755D_notebook.patch
still fixes this bug.
Comment 24 Harald Brennich 2016-02-17 08:44:29 UTC
Some new resultsfor USB 3.0 and suspend/resume with openSuse 13.2 :
Kernel 3.16.7-32-desktop: USB 3.0 and suspend/resume work out of the box
Kernel 3.16.7-32-vanilla: USB 3.0 doesn't work, external monitor connected via HDMI stays black after resume from suspend (see https://bugzilla.kernel.org/show_bug.cgi?id=88541)

Note You need to log in before you can comment on or make changes to this bug.