Bug 22562 - Regression in 2.6.37-rc1 - logs spammed with "unable to enumerate USB port" - bisected to commit 3df7169e
Summary: Regression in 2.6.37-rc1 - logs spammed with "unable to enumerate USB port" -...
Status: CLOSED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Larry Finger
URL:
Keywords:
Depends on:
Blocks: 21782
  Show dependency tree
 
Reported: 2010-11-09 20:37 UTC by Maciej Rutecki
Modified: 2011-02-02 23:05 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.37-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
USB debug output without enumerate messages (10.37 KB, text/plain)
2010-12-20 04:58 UTC, Larry Finger
Details
USB debug output with unable to enumerate messaged (228.79 KB, text/plain)
2010-12-20 05:00 UTC, Larry Finger
Details
dmesg output for USB debug without the enumerate messages (42.79 KB, text/plain)
2010-12-31 22:40 UTC, Larry Finger
Details
USB debug output in dmesg with the failure to enumerate messages (230.92 KB, text/plain)
2010-12-31 22:42 UTC, Larry Finger
Details
USB DEBUG dmesg output with no ehci-hcd (282.50 KB, text/plain)
2011-01-01 05:28 UTC, Larry Finger
Details
dmesg with USB debug - no ehci and ohci reloaded (341.00 KB, text/plain)
2011-01-01 19:15 UTC, Larry Finger
Details
Proposed patch to limit spamming of the logs (1.13 KB, patch)
2011-01-01 19:20 UTC, Larry Finger
Details | Diff
dmesg output with no ehci configured, controller reset off, ohci reloaded (337.96 KB, text/plain)
2011-01-02 04:41 UTC, Larry Finger
Details
Fix initialization order of ehci->periodic_size (1.76 KB, patch)
2011-01-05 17:12 UTC, Alan Stern
Details | Diff

Description Maciej Rutecki 2010-11-09 20:37:51 UTC
Subject    : Regression in 2.6.37-rc1 - logs spammed with "unable to enumerate USB port" - bisected to commit 3df7169e
Submitter  : Larry Finger <Larry.Finger@lwfinger.net>
Date       : 2010-11-02 22:32
Message-ID : 4CD09166.4060202@lwfinger.net
References : http://marc.info/?l=linux-kernel&m=128873713207906&w=2

This entry is being used for tracking a regression from 2.6.36. Please don't
close it until the problem is fixed in the mainline.

Caused by:

commit 3df7169e73fc1d71a39cffeacc969f6840cdf52b
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Fri Sep 10 16:37:05 2010 -0400

    OHCI: work around for nVidia shutdown problem

    Reported-by: Pali Rohár <pali.rohar@gmail.com>
    Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
    CC: David Brownell <david-b@pacbell.net>
    Tested-by: Pali Rohár <pali.rohar@gmail.com>
    CC: stable <stable@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

First-Bad-Commit : 3df7169e73fc1d71a39cffeacc969f6840cdf52b
Comment 1 Rafael J. Wysocki 2010-11-19 20:33:32 UTC
On Friday, November 19, 2010, Larry Finger wrote:
> On 11/18/2010 05:42 PM, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.36.  Please verify if it still should be listed and let the
> tracking team
> > know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=22562
> > Subject             : Regression in 2.6.37-rc1 - logs spammed with "unable
> to enumerate USB port" - bisected to commit 3df7169e
> > Submitter   : Larry Finger <Larry.Finger@lwfinger.net>
> > Date                : 2010-11-02 22:32 (17 days old)
> 
> This bug has not yet been fixed.
Comment 2 Dan Carpenter 2010-11-25 12:32:16 UTC
Larry, did you ping Alan and Greg about this?  They've probably forgotten about this.
Comment 3 Larry Finger 2010-11-25 15:59:56 UTC
No, it is not forgotten. Alan sent me mail about it a few days ago.

The laptop with the problem is under repair for a faulty motherboard, thus no testing ATM. As the failure was with the SATA interface, I don't think that was the cause of the messages, but will check when I get it back.
Comment 4 Rafael J. Wysocki 2010-12-03 21:29:20 UTC
On Friday, December 03, 2010, Larry Finger wrote:
> On 12/02/2010 05:41 PM, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.36.  Please verify if it still should be listed and let the
> tracking team
> > know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=22562
> > Subject             : Regression in 2.6.37-rc1 - logs spammed with "unable
> to enumerate USB port" - bisected to commit 3df7169e
> > Submitter   : Larry Finger <Larry.Finger@lwfinger.net>
> > Date                : 2010-11-02 22:32 (31 days old)
> 
> Still present in 2.6.37-rc4. The box with the problem has been repaired and I
> can now debug the problem again. Any hints would be appreciated.
Comment 5 Rafael J. Wysocki 2010-12-19 18:54:57 UTC
On Sunday, December 19, 2010, Larry Finger wrote:
> On 12/19/2010 10:34 AM, Greg KH wrote:
> > On Sun, Dec 19, 2010 at 10:07:51AM -0600, Larry Finger wrote:
> >>
> >> Yes, this regression is still present in 2.6.37-rc6. I'm trying to debug
> this
> >> with very little success.
> > 
> > To be fair, you seem to have the only machine with this problem :(
> 
> I know, which is why I feel that I need to debug it.
Comment 6 Larry Finger 2010-12-20 04:58:19 UTC
Created attachment 40962 [details]
USB debug output without enumerate messages

The enumerate messages are turned off by commenting out the statement "writel(control & OHCI_CTRL_MASK, base + OHCI_CONTROL);" in quirk_usb_handoff_ohci().
Comment 7 Larry Finger 2010-12-20 05:00:09 UTC
Created attachment 40972 [details]
USB debug output with unable to enumerate messaged

In this case, the commit 3df7169e is left intact.
Comment 8 Larry Finger 2010-12-31 22:40:40 UTC
Created attachment 42052 [details]
dmesg output for USB debug without the enumerate messages

This output results when the reset_controller in drivers/usb/host/pci-quirks.c is commented out. The failure to enumerate messages do not appear.
Comment 9 Larry Finger 2010-12-31 22:42:11 UTC
Created attachment 42062 [details]
USB debug output in dmesg with the failure to enumerate messages
Comment 10 Larry Finger 2010-12-31 22:47:21 UTC
One section of the difference between the two preceding attachments is shown below. In the case without the enumerate messages (-), hub 2-0:1.0 is reported with chg of 0. For the other case, chg is 0x0020. Is this significant? What does this bit mean, and where is it set?

@@ -720,7 +720,6 @@
  ohci_hcd 0000:00:02.0: irq 17, io mem 0xfc386000
  ehci_hcd 0000:00:02.1: GetStatus port:4 status 001803 0  ACK POWER sig=j CSC CONNECT
  hub 1-0:1.0: port 4: status 0501 change 0001
- hub 2-0:1.0: state 7 ports 5 chg 0000 evt 0000
  ohci_hcd 0000:00:02.0: OHCI controller state
  ohci_hcd 0000:00:02.0: OHCI 1.0, NO legacy support registers
  ohci_hcd 0000:00:02.0: control 0x683 RWE RWC HCFS=operational CBSR=3
@@ -743,6 +742,8 @@
  usb usb3: Product: OHCI Host Controller
  usb usb3: Manufacturer: Linux 2.6.37-rc8-wl+ ohci_hcd
  usb usb3: SerialNumber: 0000:00:02.0
+ ehci_hcd 0000:00:04.1: GetStatus port:5 status 001c03 0  ACK POWER sig=? CSC CONNECT
+ hub 2-0:1.0: port 5: status 0501 change 0001
  usb usb3: usb_probe_device
  usb usb3: configuration #1 chosen from 1 choice
  usb usb3: adding 3-0:1.0 (config #1, interface 0)
@@ -835,45 +836,219 @@
  usb 1-4: adding 1-4:1.1 (config #1, interface 1)
  drivers/usb/core/inode.c: creating file '002'
  usb: return 0 from usb_new_device
+ hub 2-0:1.0: state 7 ports 5 chg 0020 evt 0000
+ hub 2-0:1.0: port 5, status 0501, change 0000, 480 Mb/s
+ ehci_hcd 0000:00:04.1: port 5 full speed --> companion
+ ehci_hcd 0000:00:04.1: GetStatus port:5 status 003001 0  ACK POWER OWNER sig=se0 CONNECT
+ hub 2-0:1.0: port 5 not reset yet, waiting 50ms
+ ehci_hcd 0000:00:04.1: GetStatus port:5 status 001002 0  ACK POWER sig=se0 CSC
+ hub 2-0:1.0: unable to enumerate USB device on port 5
  hub 3-0:1.0: state 7 ports 5 chg 0000 evt 0000
  hub 4-0:1.0: state 7 ports 5 chg 0000 evt 0000
+ hub 2-0:1.0: state 7 ports 5 chg 0000 evt 0020
+ hub 2-0:1.0: state 7 ports 5 chg 0000 evt 0020
+ ehci_hcd 0000:00:04.1: GetStatus port:5 status 001c03 0  ACK POWER sig=? CSC CONNECT
+ hub 2-0:1.0: port 5, status 0501, change 0001, 480 Mb/s
+ hub 2-0:1.0: debounce: port 5: total 100ms stable 100ms status 0x501
Comment 11 Alan Stern 2011-01-01 03:22:10 UTC
Not only does the log in comment #8 not have any enumeration errors -- it doesn't have any full-speed enumerations at all!  Only the high-speed HP webcam.  It looks like you didn't plug in the full/low-speed test device while testing that kernel.

A better approach might be to unload ehci-hcd before plugging in the test device.  That will at least remove one layer of confusion.
Comment 12 Larry Finger 2011-01-01 05:26:55 UTC
In the kernel that generated #8, I had commented out the statement that says

       writel(control & OHCI_CTRL_MASK, base + OHCI_CONTROL);

in quirk_usb_handoff_ohci(). When I do that, I do not get any enumeration errors. That puts my system into the pre commit 3df7169e situation.

The 4 hubs reported are all internal to my computer. I need not plug any devices into any of my ports to get the enumeration failed messages. In fact, the only device on any of the hubs is that webcam that I am not using. None of my tests have ever involved any other device being plugged in.

I tried to generate an initrd without ehci-hcd and ohci-hcd, but that effort failed. I do not know where the mkinitrd script is getting the info that causes those modules to be included. To get around this, I generated a kernel with echi-hcd excluded from the configuration. I also removed the restriction on the number of enumeration messages that are logged. That log will be in the next posting.
Comment 13 Larry Finger 2011-01-01 05:28:45 UTC
Created attachment 42072 [details]
USB DEBUG dmesg output with no ehci-hcd

As stated, ehci-hcd is not in this configuration.
Comment 14 Alan Stern 2011-01-01 17:58:24 UTC
To tell the truth, I suspect you have a hardware problem.  It might be something wrong with a USB controller, or there might be a device hardwired into the port that's not behaving correctly.  On the other hand, I have no idea why skipping the OHCI controller reset would prevent the problem from showing up.

It certainly is odd that the same thing doesn't happen on any of the eight other unoccupied ports.  And nobody else has reported the same kind of symptoms; see http://fr.pastebin.ca/2031877 for an example where the existing code works correctly.

What happens if you boot a kernel with your patch applied (and let's say without ehci-hcd configured), but then unload and reload ohci-hcd after the system is running?
Comment 15 Larry Finger 2011-01-01 19:15:53 UTC
Created attachment 42102 [details]
dmesg with USB debug - no ehci and ohci reloaded

I agree that it looks like a hardware problem with my box, but as it is a laptop, it would be hard to discover where the problem might occur.

This attachment is the dmesg output with ehci-hcd not configured and ohci-hcd loaded on boot, then unloaded. Nothing changed that I can see. I limit the "unable to enumerate" messages to 9, and print a message when the maximum number is exceeded. I also dumped the reason for hitting that branch. In all cases, hub->hdev->parent is NULL. The failure is because the call to the routine specified by hcd->driver->port_handed_over returns 0 (when ehci is loaded), or the routine is not specified when ehci-hcd is not loaded (expected).

The only other line I spotted in the log is a line

hub 2-0:1.0: port 5, status 0301, change 0000, 1.5 Mb/s

This is the only port such specified.
Comment 16 Larry Finger 2011-01-01 19:20:20 UTC
Created attachment 42112 [details]
Proposed patch to limit spamming of the logs

If it turns out that the problem is generic to all boxes with the same motherboard as mine, would you entertain a patch like the one attached?

I don't mind running with a dirty kernel and could apply this patch to every kernel I build, but anyone else getting these messages might not have the luxury of making this change.
Comment 17 Alan Stern 2011-01-02 04:05:23 UTC
That's not what I meant -- I meant for you to try a kernel with your patch from comment #6 and no ehci-hcd.  You don't have to unload and reload usbcore too; ohci-hcd alone will be good enough.

I'm not sure whether limiting the "unable to enumerate" messages is a good idea.  Yes, it will help in situations like yours where there's nothing else to be done, but it won't fix anything -- and all the useless USB activity will prevent your computer from entering various low-power states the way it should.
Comment 18 Larry Finger 2011-01-02 04:41:44 UTC
Created attachment 42122 [details]
dmesg output with no ehci configured, controller reset off, ohci reloaded

Sorry, I misunderstood what patch you wanted.

With ehci-hcd not configured, the "unable to enumerate" messages were logged even though the reset was disabled. I used modprobe to remove ohci-hcd at 86 seconds. This also unloaded usbcore. When ohci-hcd was reloaded, the enumerate messages were repeated.

At 135 seconds, I used rmmod to unload ohci-hcd, and then reloaded it with modprobe. This time the messages did not occur.

If you want any other logs, let me know.
Comment 19 Alan Stern 2011-01-02 15:20:11 UTC
So even your suggested patch from comment #6 doesn't truly fix the problem.

Have you tried to see if there is anything attached to the troublesome port?  I'm not sure how you would check; maybe there's an additional USB device on that computer which should be working but isn't.  Even something as simple as a transient short circuit could cause this sort of effect.

Failing anything else, early on during the boot sequence you could unbind the 0000:00:04.0 device from ohci-hcd and 0000:00:04.1 from ehci-hcd.  That would stop all the activity and log spamming, but it would mean you couldn't use either of those buses.
Comment 20 Larry Finger 2011-01-03 00:12:50 UTC
From what I can tell, those hubs are not connected to anything. Everything that I know about is functioning, and my external ports all work when I unbind the drivers using 

echo "0000:00:04.0" > /sys/bus/pci/drivers/ohci_hcd/unbind
echo "0000:00:04.1" > /sys/bus/pci/drivers/ehci_hcd/unbind

How is the best way to get these commands executed early in the boot process, or is there some other way to get the drivers unbound? I tried to write a rules file for udev, but it is not right.
Comment 21 Alan Stern 2011-01-03 02:51:19 UTC
udev won't help; it doesn't control driver binding.

There's no way to prevent the drivers from binding to those controllers, so all you can do is unbind them afterward.  Perhaps a custom shell script can be added to your initrd -- you'd have to check with your distro to find out how.  If nothing else, you can add a couple of lines to /etc/rc.d/rc.local (or the equivalent file on your system).
Comment 22 Larry Finger 2011-01-03 03:33:13 UTC
My distro is openSUSE 11.3. The file is /etc/rc.d/boot.local. After adding the commands to that file, I get 45 of the "unable to enumerate" messages before the commands are executed.

Unfortunately, the following WARNING showed up:

ehci_hcd 0000:00:04.1: remove, state 1
usb usb2: USB disconnect, address 1
hub 2-0:1.0: hub_port_status failed (err = -19)
hub 2-0:1.0: connect-debounce failed, port 5 disabled
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:812 check_unmap+0x16c/0x7e0()
Hardware name: HP Pavilion dv2700 Notebook PC
ehci_hcd 0000:00:04.1: DMA-API: device driver frees DMA memory with different size [device address=0x00000000b392c000] [map size=4096 bytes] [
unmap size=2048 bytes]
Modules linked in: ext4 jbd2 crc16 loop dm_mod ide_cd_mod cdrom snd_hda_codec_conexant arc4 ide_pci_generic ecb b43 rng_core mac80211 cfg80211
 snd_hda_intel rfkill snd_hda_codec ssb snd_pcm snd_timer mmc_core amd74xx ide_core snd i2c_nforce2 k8temp soundcore snd_page_alloc pcmcia serio_raw hwmon jo
ydev i2c_core sg pcmcia_core forcedeth battery ac button ext3 jbd mbcache ohci_hcd sd_mod ehci_hcd usbcore fan processor ahci libahci libata scsi_mod thermal
Pid: 2738, comm: sh Not tainted 2.6.37-rc8-wl+ #425
Call Trace:
 [<ffffffff81047b8a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff8107fbe4>] ? debug_check_no_locks_freed+0x94/0x160
 [<ffffffff81047c61>] warn_slowpath_fmt+0x41/0x50
 [<ffffffff811ca54c>] check_unmap+0x16c/0x7e0
 [<ffffffff81111587>] ? kfree+0xd7/0x160
 [<ffffffff811cadd3>] debug_dma_free_coherent+0x73/0x90
 [<ffffffffa01af471>] ehci_mem_cleanup+0x131/0x1e0 [ehci_hcd]
 [<ffffffffa01af5c1>] ehci_stop+0xa1/0xd0 [ehci_hcd]
 [<ffffffffa010483d>] usb_remove_hcd+0xcd/0x120 [usbcore]
 [<ffffffffa0113bd4>] usb_hcd_pci_remove+0x44/0xc0 [usbcore]
 [<ffffffff811d36c2>] pci_device_remove+0x32/0x60
 [<ffffffff8125fd41>] __device_release_driver+0x61/0xc0
 [<ffffffff8125fdca>] device_release_driver+0x2a/0x40
 [<ffffffff8125ef2b>] driver_unbind+0x9b/0xc0
 [<ffffffff8125e797>] drv_attr_store+0x27/0x30
 [<ffffffff81175b21>] sysfs_write_file+0xe1/0x160
 [<ffffffff81117606>] vfs_write+0xc6/0x170
 [<ffffffff811178ec>] sys_write+0x4c/0x80
 [<ffffffff81002d3b>] system_call_fastpath+0x16/0x1b
---[ end trace c4dfc9893d936182 ]---
Mapped at:
 [<ffffffff811caf0b>] debug_dma_alloc_coherent+0x3b/0x90
 [<ffffffffa01afdf2>] ehci_pci_setup+0x2b2/0xab0 [ehci_hcd]
 [<ffffffffa010559d>] usb_add_hcd+0x19d/0x5e0 [usbcore]
 [<ffffffffa0113e05>] usb_hcd_pci_probe+0x1b5/0x330 [usbcore]
 [<ffffffff811d2762>] local_pci_probe+0x12/0x20

It does not matter whether I unlink the ohci-hcd or ehci-hcd device first.
Comment 23 Alan Stern 2011-01-05 17:12:31 UTC
Created attachment 42402 [details]
Fix initialization order of ehci->periodic_size

Ooh, that's bad.  This patch should fix the problem.  The value of ehci->periodic_size was getting changed after it was used for computing the size of a DMA-coherent region in ehci_mem_init().
Comment 24 Larry Finger 2011-01-06 02:57:16 UTC
Good patch. You can add a Tested-by: Larry Finger <Larry.Finger@lwfinger.net> if you wish.
Comment 25 Larry Finger 2011-01-06 02:58:50 UTC
Hardware error on Larry Finger's box. Workaround found and separate bug that the W/A exposed has a code fix.

Note You need to log in before you can comment on or make changes to this bug.