Bug 115961

Summary: usbcore: U3V machine vision cameras no longer work after upgrading from 3.19.0-51 to 4.2.0-34 on Ubuntu (enabling LPM)
Product: Drivers Reporter: Matthew Giassa (matthew)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: high CC: mathias.nyman, matthew
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.2.0-34, 4.5.0 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg and usbmon log files.
2016-04-07: Updated logs generated while reproducing the bug on the 4.5 kernel.
Kernel config file

Description Matthew Giassa 2016-04-06 16:43:30 UTC
Created attachment 211981 [details]
dmesg and usbmon log files.

I currently write software to support U3V machine vision cameras on Linux, primarily Ubuntu. After upgrading from Ubuntu 14.04.3 (kernel 3.19.0-51) to 14.04.4 (kernel 4.2.0-34), a number of customers have been reporting issues with the cameras "stalling".

This issue only occurs on Intel 8 Series and Intel 9 Series USB3.0 host controllers. The issues did not occur on various Fresco and Renesas controllers. After investigating the issue further, I discovered that the problem is related in one way or another to power management. The issue could be resolved by adding the following kernel parameter to my grub config:
> usbcore.autosuspend=-1

This is only a temporary measure though, as customers on embedded or low-power systems will not find disabling all USB power management acceptable. I've found some blog articles on disabling PM on a per-device basis, but that too would not be an acceptable long term solution.

I've enabled verbose xHCI logging and verbose usbcore logging as per  advice from the xHCI maintainer, Mathias Nyman, and am attaching the `dmesg' output and results of a `usbmon' trace. Any assistance is appreciated.

Thank you.
Comment 1 Matthew Giassa 2016-04-06 16:45:51 UTC
Updating CC list.

Flagging as regression since this works in 3.19 and 3.16 kernels.
Comment 2 Greg Kroah-Hartman 2016-04-06 18:44:53 UTC
On Wed, Apr 06, 2016 at 04:43:30PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=115961
> 
>             Bug ID: 115961
>            Summary: usbcore: U3V machine vision cameras not longer work
>                     after upgrading from 3.19.0-51 to 4.2.0-34 on Ubuntu

Please contact Ubuntu, we have nothing to do with their kernel releases
as we don't know what is in them, only they do.
Comment 3 Matthew Giassa 2016-04-06 19:51:46 UTC
(In reply to Greg Kroah-Hartman from comment #2)
> On Wed, Apr 06, 2016 at 04:43:30PM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=115961
> > 
> >             Bug ID: 115961
> >            Summary: usbcore: U3V machine vision cameras not longer work
> >                     after upgrading from 3.19.0-51 to 4.2.0-34 on Ubuntu
> 
> Please contact Ubuntu, we have nothing to do with their kernel releases
> as we don't know what is in them, only they do.

Ubuntu's official kernel bugs documentation (https://wiki.ubuntu.com/Kernel/Bugs) advises users to specifically post kernel bugs here.

If I can reproduce this issue on something using a Vanilla kernel with no patches, will it gain traction here? I can repeat with a stock kernel on Slackware 14.1 easily enough.
Comment 4 Greg Kroah-Hartman 2016-04-06 20:53:44 UTC
On Wed, Apr 06, 2016 at 07:51:46PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> Ubuntu's official kernel bugs documentation
> (https://wiki.ubuntu.com/Kernel/Bugs) advises users to specifically post
> kernel
> bugs here.

That's very mean of them, to have others do their work for them :(

> If I can reproduce this issue on something using a Vanilla kernel with no
> patches, will it gain traction here? I can repeat with a stock kernel on
> Slackware 14.1 easily enough.

Yes, if you can reproduce this on the 4.5 kernel release, please let the
people on the linux-media@vger.kernel.org mailing list know about it and
they will be glad to work on it.
Comment 5 Matthew Giassa 2016-04-07 15:22:23 UTC
(In reply to Greg Kroah-Hartman from comment #4)
> On Wed, Apr 06, 2016 at 07:51:46PM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > Ubuntu's official kernel bugs documentation
> > (https://wiki.ubuntu.com/Kernel/Bugs) advises users to specifically post
> kernel
> > bugs here.
> 
> That's very mean of them, to have others do their work for them :(
> 
> > If I can reproduce this issue on something using a Vanilla kernel with no
> > patches, will it gain traction here? I can repeat with a stock kernel on
> > Slackware 14.1 easily enough.
> 
> Yes, if you can reproduce this on the 4.5 kernel release, please let the
> people on the linux-media@vger.kernel.org mailing list know about it and
> they will be glad to work on it.

Thank you Greg. I can reproduce the issue on the 4.5 kernel release. I'll upload the logs (usbmon, lspci, lsmod, lsusb, dmesg with extra xHCI and usbcore logging, and kernel boot parameters) here, and e-mail the mailing list. Should I keep the bug open afterwards, or should I clone/move it somewhere else?

Cheers!
Comment 6 Matthew Giassa 2016-04-07 15:23:10 UTC
Created attachment 212011 [details]
2016-04-07: Updated logs generated while reproducing the bug on the 4.5 kernel.
Comment 7 Greg Kroah-Hartman 2016-04-07 15:27:26 UTC
On Thu, Apr 07, 2016 at 03:22:23PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> Thank you Greg. I can reproduce the issue on the 4.5 kernel release. I'll
> upload the logs (usbmon, lspci, lsmod, lsusb, dmesg with extra xHCI and
> usbcore
> logging, and kernel boot parameters) here, and e-mail the mailing list.
> Should
> I keep the bug open afterwards, or should I clone/move it somewhere else?

You can ask the media developers what they want to have happen with
this, I don't know if they use bugzilla or not.

thanks,

greg k-h
Comment 8 Matthew Giassa 2016-04-07 15:39:44 UTC
When using the 4.5 kernel, the issue is even more pronounced, as the "usbcore.autosuspend=-1" boot parameter no longer "fixes" the issue.
Comment 9 Matthew Giassa 2016-04-07 16:03:32 UTC
Created attachment 212021 [details]
Kernel config file
Comment 10 Matthew Giassa 2016-04-07 16:05:29 UTC
Adding kernel config file used. Just pulled the stock `config-$(uname -r)' from /boot after installing Ubuntu 14.04.4 from DVD media, pulled kernel 4.5 vanilla from kernel.org, then used `make oldconfig' to generate the config file.
Comment 11 Mathias Nyman 2016-04-11 13:23:47 UTC
LPM (Link power management) is only enabled on Intel Hosts in the xhci driver.

usbmon show there is a constant loop of enabling and disabling LPM.
Parsing the usbmon output it boils down to: 

Set SEL					0a0a0002 0002    (usb_enabme_lpm)
Set port feature U1 timeout		50us
Set device feaurte U1 Enable
Set SEL					0a0a0002 0002
Set port feature U2 timeout		~10ms
Set device feaurte U2 Enable
Set port feature U1 timeout		0us		(usb_disable_lpm)
Clear device feaurte U1 Enable
Set port feature U2 timeout		0us
Clear device feaurte U2 Enable		
-back to beginning setting SEL (system exit latency)

sometimes there's a 24 byte bulk out transfer followed by a 16 byte bulk in transfer in between the enable_lpm() and usb_disable_lpm()

How we end up in a loop, or sustain it is not yet clear.

This code is in usb core, could you add usb core debugging with:

echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
and show the dmesg output.

and skip the xhci debugging, otherwise it will be very hard to read.
Comment 12 Mathias Nyman 2016-04-13 17:12:28 UTC
enabling and disabling LPM could be caused by rapid runtime suspend and resume.

I'd start investigating if the device is runtime suspending and runtime resuming
at a fast rate. Codepath for a usb device runtime suspend looks like this:


 usb_device_pm_ops = {
        .runtime_suspend =      usb_runtime_suspend,

usb_runtime_suspend()
  usb_suspend_both()
    usb_suspend_device()
      udriver = &usb_generic_driver;  // usb_generic_driver.suspend = generic_suspend()
      udriver->suspend(udev, msg);
         generic_suspend()
           usb_port_suspend()
           usb_unlocked_disable_lpm()

Meaning a runtime suspend / runtime resume would trigger the LPM messages.

If that is the issue then adding pm_runtime_get_xxx() will increase the device usage count
before you know you are going to use a device. It prevents runtime suspend.
Then once you know the device is going to be idle, decrease the referece cound with a pm_runtime_put().

Optionally try setting a  pm_runtime_set_autosuspend_delay() value to prevent it from suspending too quickly.
Comment 13 Matthew Giassa 2016-04-30 16:03:08 UTC
Alan Stern has provided a patch for submission that resolves this issue.

-------- Forwarded Message --------
Subject: [PATCH] USB: leave LPM alone if possible when binding/unbinding interface drivers
Date: Fri, 29 Apr 2016 15:25:17 -0400 (EDT)
From: Alan Stern <stern@rowland.harvard.edu>
To: Greg KH <greg@kroah.com>, Mathias Nyman <mathias.nyman@intel.com>
CC: Matthew Giassa <matthew@giassa.net>, Linux USB Developer Mailing List <linux-usb@vger.kernel.org>

When a USB driver is bound to an interface (either through probing or
by claiming it) or is unbound from an interface, the USB core always
disables Link Power Management during the transition and then
re-enables it afterward.  The reason is because the driver might want
to prevent hub-initiated link power transitions, in which case the HCD
would have to recalculate the various LPM parameters.  This
recalculation takes place when LPM is re-enabled and the new
parameters are sent to the device and its parent hub.

However, if the driver does not want to prevent hub-initiated link
power transitions then none of this work is necessary.  The parameters
don't need to be recalculated, and LPM doesn't need to be disabled and
re-enabled.

It turns out that disabling and enabling LPM can be time-consuming,
enough so that it interferes with user programs that want to claim and
release interfaces rapidly via usbfs.  Since the usbfs kernel driver
doesn't set the disable_hub_initiated_lpm flag, we can speed things up
and get the user programs to work by leaving LPM alone whenever the
flag isn't set.

And while we're improving the way disable_hub_initiated_lpm gets used,
let's also fix its kerneldoc.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Matthew Giassa <matthew@giassa.net>
CC: Mathias Nyman <mathias.nyman@intel.com>
CC: <stable@vger.kernel.org>

---

Mathias, can you check that this is right?  If the driver's
disable_hub_initiated_lpm flag isn't set then binding or unbinding it
shouldn't require any changes to the LPM parameters.  The flag gets
used in only one place, in xhci_calculate_lpm_timeout(), and if it
isn't set then then result would be the same as if the driver weren't
bound.

Thanks.


[as1800]


 drivers/usb/core/driver.c |   40 +++++++++++++++++++++++-----------------
 include/linux/usb.h       |    2 +-
 2 files changed, 24 insertions(+), 18 deletions(-)

Index: usb-4.x/include/linux/usb.h
===================================================================
--- usb-4.x.orig/include/linux/usb.h
+++ usb-4.x/include/linux/usb.h
@@ -1069,7 +1069,7 @@ struct usbdrv_wrap {
  *    for interfaces bound to this driver.
  * @soft_unbind: if set to 1, the USB core will not kill URBs and disable
  *    endpoints before calling the driver's disconnect method.
- * @disable_hub_initiated_lpm: if set to 0, the USB core will not allow hubs
+ * @disable_hub_initiated_lpm: if set to 1, the USB core will not allow hubs
  *    to initiate lower power link state transitions when an idle timeout
  *    occurs.  Device-initiated USB 3.0 link PM will still be allowed.
  *
Index: usb-4.x/drivers/usb/core/driver.c
===================================================================
--- usb-4.x.orig/drivers/usb/core/driver.c
+++ usb-4.x/drivers/usb/core/driver.c
@@ -284,7 +284,7 @@ static int usb_probe_interface(struct de
     struct usb_device *udev = interface_to_usbdev(intf);
     const struct usb_device_id *id;
     int error = -ENODEV;
-    int lpm_disable_error;
+    int lpm_disable_error = -ENODEV;

     dev_dbg(dev, "%s\n", __func__);

@@ -336,12 +336,14 @@ static int usb_probe_interface(struct de
      * setting during probe, that should also be fine.  usb_set_interface()
      * will attempt to disable LPM, and fail if it can't disable it.
      */
-    lpm_disable_error = usb_unlocked_disable_lpm(udev);
-    if (lpm_disable_error && driver->disable_hub_initiated_lpm) {
-        dev_err(&intf->dev, "%s Failed to disable LPM for driver %s\n.",
-                __func__, driver->name);
-        error = lpm_disable_error;
-        goto err;
+    if (driver->disable_hub_initiated_lpm) {
+        lpm_disable_error = usb_unlocked_disable_lpm(udev);
+        if (lpm_disable_error) {
+            dev_err(&intf->dev, "%s Failed to disable LPM for driver %s\n.",
+                    __func__, driver->name);
+            error = lpm_disable_error;
+            goto err;
+        }
     }

     /* Carry out a deferred switch to altsetting 0 */
@@ -391,7 +393,8 @@ static int usb_unbind_interface(struct d
     struct usb_interface *intf = to_usb_interface(dev);
     struct usb_host_endpoint *ep, **eps = NULL;
     struct usb_device *udev;
-    int i, j, error, r, lpm_disable_error;
+    int i, j, error, r;
+    int lpm_disable_error = -ENODEV;

     intf->condition = USB_INTERFACE_UNBINDING;

@@ -399,12 +402,13 @@ static int usb_unbind_interface(struct d
     udev = interface_to_usbdev(intf);
     error = usb_autoresume_device(udev);

-    /* Hub-initiated LPM policy may change, so attempt to disable LPM until
+    /* If hub-initiated LPM policy may change, attempt to disable LPM until
      * the driver is unbound.  If LPM isn't disabled, that's fine because it
      * wouldn't be enabled unless all the bound interfaces supported
      * hub-initiated LPM.
      */
-    lpm_disable_error = usb_unlocked_disable_lpm(udev);
+    if (driver->disable_hub_initiated_lpm)
+        lpm_disable_error = usb_unlocked_disable_lpm(udev);

     /*
      * Terminate all URBs for this interface unless the driver
@@ -505,7 +509,7 @@ int usb_driver_claim_interface(struct us
     struct device *dev;
     struct usb_device *udev;
     int retval = 0;
-    int lpm_disable_error;
+    int lpm_disable_error = -ENODEV;

     if (!iface)
         return -ENODEV;
@@ -526,12 +530,14 @@ int usb_driver_claim_interface(struct us

     iface->condition = USB_INTERFACE_BOUND;

-    /* Disable LPM until this driver is bound. */
-    lpm_disable_error = usb_unlocked_disable_lpm(udev);
-    if (lpm_disable_error && driver->disable_hub_initiated_lpm) {
-        dev_err(&iface->dev, "%s Failed to disable LPM for driver %s\n.",
-                __func__, driver->name);
-        return -ENOMEM;
+    /* See the comment about disabling LPM in usb_probe_interface(). */
+    if (driver->disable_hub_initiated_lpm) {
+        lpm_disable_error = usb_unlocked_disable_lpm(udev);
+        if (lpm_disable_error) {
+            dev_err(&iface->dev, "%s Failed to disable LPM for driver %s\n.",
+                    __func__, driver->name);
+            return -ENOMEM;
+        }
     }

     /* Claimed interfaces are initially inactive (suspended) and