Created attachment 211981 [details] dmesg and usbmon log files. I currently write software to support U3V machine vision cameras on Linux, primarily Ubuntu. After upgrading from Ubuntu 14.04.3 (kernel 3.19.0-51) to 14.04.4 (kernel 4.2.0-34), a number of customers have been reporting issues with the cameras "stalling". This issue only occurs on Intel 8 Series and Intel 9 Series USB3.0 host controllers. The issues did not occur on various Fresco and Renesas controllers. After investigating the issue further, I discovered that the problem is related in one way or another to power management. The issue could be resolved by adding the following kernel parameter to my grub config: > usbcore.autosuspend=-1 This is only a temporary measure though, as customers on embedded or low-power systems will not find disabling all USB power management acceptable. I've found some blog articles on disabling PM on a per-device basis, but that too would not be an acceptable long term solution. I've enabled verbose xHCI logging and verbose usbcore logging as per advice from the xHCI maintainer, Mathias Nyman, and am attaching the `dmesg' output and results of a `usbmon' trace. Any assistance is appreciated. Thank you.
Updating CC list. Flagging as regression since this works in 3.19 and 3.16 kernels.
On Wed, Apr 06, 2016 at 04:43:30PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=115961 > > Bug ID: 115961 > Summary: usbcore: U3V machine vision cameras not longer work > after upgrading from 3.19.0-51 to 4.2.0-34 on Ubuntu Please contact Ubuntu, we have nothing to do with their kernel releases as we don't know what is in them, only they do.
(In reply to Greg Kroah-Hartman from comment #2) > On Wed, Apr 06, 2016 at 04:43:30PM +0000, > bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=115961 > > > > Bug ID: 115961 > > Summary: usbcore: U3V machine vision cameras not longer work > > after upgrading from 3.19.0-51 to 4.2.0-34 on Ubuntu > > Please contact Ubuntu, we have nothing to do with their kernel releases > as we don't know what is in them, only they do. Ubuntu's official kernel bugs documentation (https://wiki.ubuntu.com/Kernel/Bugs) advises users to specifically post kernel bugs here. If I can reproduce this issue on something using a Vanilla kernel with no patches, will it gain traction here? I can repeat with a stock kernel on Slackware 14.1 easily enough.
On Wed, Apr 06, 2016 at 07:51:46PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > Ubuntu's official kernel bugs documentation > (https://wiki.ubuntu.com/Kernel/Bugs) advises users to specifically post > kernel > bugs here. That's very mean of them, to have others do their work for them :( > If I can reproduce this issue on something using a Vanilla kernel with no > patches, will it gain traction here? I can repeat with a stock kernel on > Slackware 14.1 easily enough. Yes, if you can reproduce this on the 4.5 kernel release, please let the people on the linux-media@vger.kernel.org mailing list know about it and they will be glad to work on it.
(In reply to Greg Kroah-Hartman from comment #4) > On Wed, Apr 06, 2016 at 07:51:46PM +0000, > bugzilla-daemon@bugzilla.kernel.org wrote: > > Ubuntu's official kernel bugs documentation > > (https://wiki.ubuntu.com/Kernel/Bugs) advises users to specifically post > kernel > > bugs here. > > That's very mean of them, to have others do their work for them :( > > > If I can reproduce this issue on something using a Vanilla kernel with no > > patches, will it gain traction here? I can repeat with a stock kernel on > > Slackware 14.1 easily enough. > > Yes, if you can reproduce this on the 4.5 kernel release, please let the > people on the linux-media@vger.kernel.org mailing list know about it and > they will be glad to work on it. Thank you Greg. I can reproduce the issue on the 4.5 kernel release. I'll upload the logs (usbmon, lspci, lsmod, lsusb, dmesg with extra xHCI and usbcore logging, and kernel boot parameters) here, and e-mail the mailing list. Should I keep the bug open afterwards, or should I clone/move it somewhere else? Cheers!
Created attachment 212011 [details] 2016-04-07: Updated logs generated while reproducing the bug on the 4.5 kernel.
On Thu, Apr 07, 2016 at 03:22:23PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > Thank you Greg. I can reproduce the issue on the 4.5 kernel release. I'll > upload the logs (usbmon, lspci, lsmod, lsusb, dmesg with extra xHCI and > usbcore > logging, and kernel boot parameters) here, and e-mail the mailing list. > Should > I keep the bug open afterwards, or should I clone/move it somewhere else? You can ask the media developers what they want to have happen with this, I don't know if they use bugzilla or not. thanks, greg k-h
When using the 4.5 kernel, the issue is even more pronounced, as the "usbcore.autosuspend=-1" boot parameter no longer "fixes" the issue.
Created attachment 212021 [details] Kernel config file
Adding kernel config file used. Just pulled the stock `config-$(uname -r)' from /boot after installing Ubuntu 14.04.4 from DVD media, pulled kernel 4.5 vanilla from kernel.org, then used `make oldconfig' to generate the config file.
LPM (Link power management) is only enabled on Intel Hosts in the xhci driver. usbmon show there is a constant loop of enabling and disabling LPM. Parsing the usbmon output it boils down to: Set SEL 0a0a0002 0002 (usb_enabme_lpm) Set port feature U1 timeout 50us Set device feaurte U1 Enable Set SEL 0a0a0002 0002 Set port feature U2 timeout ~10ms Set device feaurte U2 Enable Set port feature U1 timeout 0us (usb_disable_lpm) Clear device feaurte U1 Enable Set port feature U2 timeout 0us Clear device feaurte U2 Enable -back to beginning setting SEL (system exit latency) sometimes there's a 24 byte bulk out transfer followed by a 16 byte bulk in transfer in between the enable_lpm() and usb_disable_lpm() How we end up in a loop, or sustain it is not yet clear. This code is in usb core, could you add usb core debugging with: echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control and show the dmesg output. and skip the xhci debugging, otherwise it will be very hard to read.
enabling and disabling LPM could be caused by rapid runtime suspend and resume. I'd start investigating if the device is runtime suspending and runtime resuming at a fast rate. Codepath for a usb device runtime suspend looks like this: usb_device_pm_ops = { .runtime_suspend = usb_runtime_suspend, usb_runtime_suspend() usb_suspend_both() usb_suspend_device() udriver = &usb_generic_driver; // usb_generic_driver.suspend = generic_suspend() udriver->suspend(udev, msg); generic_suspend() usb_port_suspend() usb_unlocked_disable_lpm() Meaning a runtime suspend / runtime resume would trigger the LPM messages. If that is the issue then adding pm_runtime_get_xxx() will increase the device usage count before you know you are going to use a device. It prevents runtime suspend. Then once you know the device is going to be idle, decrease the referece cound with a pm_runtime_put(). Optionally try setting a pm_runtime_set_autosuspend_delay() value to prevent it from suspending too quickly.
Alan Stern has provided a patch for submission that resolves this issue. -------- Forwarded Message -------- Subject: [PATCH] USB: leave LPM alone if possible when binding/unbinding interface drivers Date: Fri, 29 Apr 2016 15:25:17 -0400 (EDT) From: Alan Stern <stern@rowland.harvard.edu> To: Greg KH <greg@kroah.com>, Mathias Nyman <mathias.nyman@intel.com> CC: Matthew Giassa <matthew@giassa.net>, Linux USB Developer Mailing List <linux-usb@vger.kernel.org> When a USB driver is bound to an interface (either through probing or by claiming it) or is unbound from an interface, the USB core always disables Link Power Management during the transition and then re-enables it afterward. The reason is because the driver might want to prevent hub-initiated link power transitions, in which case the HCD would have to recalculate the various LPM parameters. This recalculation takes place when LPM is re-enabled and the new parameters are sent to the device and its parent hub. However, if the driver does not want to prevent hub-initiated link power transitions then none of this work is necessary. The parameters don't need to be recalculated, and LPM doesn't need to be disabled and re-enabled. It turns out that disabling and enabling LPM can be time-consuming, enough so that it interferes with user programs that want to claim and release interfaces rapidly via usbfs. Since the usbfs kernel driver doesn't set the disable_hub_initiated_lpm flag, we can speed things up and get the user programs to work by leaving LPM alone whenever the flag isn't set. And while we're improving the way disable_hub_initiated_lpm gets used, let's also fix its kerneldoc. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Matthew Giassa <matthew@giassa.net> CC: Mathias Nyman <mathias.nyman@intel.com> CC: <stable@vger.kernel.org> --- Mathias, can you check that this is right? If the driver's disable_hub_initiated_lpm flag isn't set then binding or unbinding it shouldn't require any changes to the LPM parameters. The flag gets used in only one place, in xhci_calculate_lpm_timeout(), and if it isn't set then then result would be the same as if the driver weren't bound. Thanks. [as1800] drivers/usb/core/driver.c | 40 +++++++++++++++++++++++----------------- include/linux/usb.h | 2 +- 2 files changed, 24 insertions(+), 18 deletions(-) Index: usb-4.x/include/linux/usb.h =================================================================== --- usb-4.x.orig/include/linux/usb.h +++ usb-4.x/include/linux/usb.h @@ -1069,7 +1069,7 @@ struct usbdrv_wrap { * for interfaces bound to this driver. * @soft_unbind: if set to 1, the USB core will not kill URBs and disable * endpoints before calling the driver's disconnect method. - * @disable_hub_initiated_lpm: if set to 0, the USB core will not allow hubs + * @disable_hub_initiated_lpm: if set to 1, the USB core will not allow hubs * to initiate lower power link state transitions when an idle timeout * occurs. Device-initiated USB 3.0 link PM will still be allowed. * Index: usb-4.x/drivers/usb/core/driver.c =================================================================== --- usb-4.x.orig/drivers/usb/core/driver.c +++ usb-4.x/drivers/usb/core/driver.c @@ -284,7 +284,7 @@ static int usb_probe_interface(struct de struct usb_device *udev = interface_to_usbdev(intf); const struct usb_device_id *id; int error = -ENODEV; - int lpm_disable_error; + int lpm_disable_error = -ENODEV; dev_dbg(dev, "%s\n", __func__); @@ -336,12 +336,14 @@ static int usb_probe_interface(struct de * setting during probe, that should also be fine. usb_set_interface() * will attempt to disable LPM, and fail if it can't disable it. */ - lpm_disable_error = usb_unlocked_disable_lpm(udev); - if (lpm_disable_error && driver->disable_hub_initiated_lpm) { - dev_err(&intf->dev, "%s Failed to disable LPM for driver %s\n.", - __func__, driver->name); - error = lpm_disable_error; - goto err; + if (driver->disable_hub_initiated_lpm) { + lpm_disable_error = usb_unlocked_disable_lpm(udev); + if (lpm_disable_error) { + dev_err(&intf->dev, "%s Failed to disable LPM for driver %s\n.", + __func__, driver->name); + error = lpm_disable_error; + goto err; + } } /* Carry out a deferred switch to altsetting 0 */ @@ -391,7 +393,8 @@ static int usb_unbind_interface(struct d struct usb_interface *intf = to_usb_interface(dev); struct usb_host_endpoint *ep, **eps = NULL; struct usb_device *udev; - int i, j, error, r, lpm_disable_error; + int i, j, error, r; + int lpm_disable_error = -ENODEV; intf->condition = USB_INTERFACE_UNBINDING; @@ -399,12 +402,13 @@ static int usb_unbind_interface(struct d udev = interface_to_usbdev(intf); error = usb_autoresume_device(udev); - /* Hub-initiated LPM policy may change, so attempt to disable LPM until + /* If hub-initiated LPM policy may change, attempt to disable LPM until * the driver is unbound. If LPM isn't disabled, that's fine because it * wouldn't be enabled unless all the bound interfaces supported * hub-initiated LPM. */ - lpm_disable_error = usb_unlocked_disable_lpm(udev); + if (driver->disable_hub_initiated_lpm) + lpm_disable_error = usb_unlocked_disable_lpm(udev); /* * Terminate all URBs for this interface unless the driver @@ -505,7 +509,7 @@ int usb_driver_claim_interface(struct us struct device *dev; struct usb_device *udev; int retval = 0; - int lpm_disable_error; + int lpm_disable_error = -ENODEV; if (!iface) return -ENODEV; @@ -526,12 +530,14 @@ int usb_driver_claim_interface(struct us iface->condition = USB_INTERFACE_BOUND; - /* Disable LPM until this driver is bound. */ - lpm_disable_error = usb_unlocked_disable_lpm(udev); - if (lpm_disable_error && driver->disable_hub_initiated_lpm) { - dev_err(&iface->dev, "%s Failed to disable LPM for driver %s\n.", - __func__, driver->name); - return -ENOMEM; + /* See the comment about disabling LPM in usb_probe_interface(). */ + if (driver->disable_hub_initiated_lpm) { + lpm_disable_error = usb_unlocked_disable_lpm(udev); + if (lpm_disable_error) { + dev_err(&iface->dev, "%s Failed to disable LPM for driver %s\n.", + __func__, driver->name); + return -ENOMEM; + } } /* Claimed interfaces are initially inactive (suspended) and