Bug 22562
Summary: | Regression in 2.6.37-rc1 - logs spammed with "unable to enumerate USB port" - bisected to commit 3df7169e | ||
---|---|---|---|
Product: | Drivers | Reporter: | Maciej Rutecki (maciej.rutecki) |
Component: | USB | Assignee: | Larry Finger (Larry.Finger) |
Status: | CLOSED INVALID | ||
Severity: | normal | CC: | error27, Larry.Finger, maciej.rutecki, rjw, stern |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.37-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 21782 | ||
Attachments: |
USB debug output without enumerate messages
USB debug output with unable to enumerate messaged dmesg output for USB debug without the enumerate messages USB debug output in dmesg with the failure to enumerate messages USB DEBUG dmesg output with no ehci-hcd dmesg with USB debug - no ehci and ohci reloaded Proposed patch to limit spamming of the logs dmesg output with no ehci configured, controller reset off, ohci reloaded Fix initialization order of ehci->periodic_size |
Description
Maciej Rutecki
2010-11-09 20:37:51 UTC
On Friday, November 19, 2010, Larry Finger wrote:
> On 11/18/2010 05:42 PM, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.36. Please verify if it still should be listed and let the
> tracking team
> > know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=22562
> > Subject : Regression in 2.6.37-rc1 - logs spammed with "unable
> to enumerate USB port" - bisected to commit 3df7169e
> > Submitter : Larry Finger <Larry.Finger@lwfinger.net>
> > Date : 2010-11-02 22:32 (17 days old)
>
> This bug has not yet been fixed.
Larry, did you ping Alan and Greg about this? They've probably forgotten about this. No, it is not forgotten. Alan sent me mail about it a few days ago. The laptop with the problem is under repair for a faulty motherboard, thus no testing ATM. As the failure was with the SATA interface, I don't think that was the cause of the messages, but will check when I get it back. On Friday, December 03, 2010, Larry Finger wrote:
> On 12/02/2010 05:41 PM, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.36. Please verify if it still should be listed and let the
> tracking team
> > know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=22562
> > Subject : Regression in 2.6.37-rc1 - logs spammed with "unable
> to enumerate USB port" - bisected to commit 3df7169e
> > Submitter : Larry Finger <Larry.Finger@lwfinger.net>
> > Date : 2010-11-02 22:32 (31 days old)
>
> Still present in 2.6.37-rc4. The box with the problem has been repaired and I
> can now debug the problem again. Any hints would be appreciated.
On Sunday, December 19, 2010, Larry Finger wrote:
> On 12/19/2010 10:34 AM, Greg KH wrote:
> > On Sun, Dec 19, 2010 at 10:07:51AM -0600, Larry Finger wrote:
> >>
> >> Yes, this regression is still present in 2.6.37-rc6. I'm trying to debug
> this
> >> with very little success.
> >
> > To be fair, you seem to have the only machine with this problem :(
>
> I know, which is why I feel that I need to debug it.
Created attachment 40962 [details]
USB debug output without enumerate messages
The enumerate messages are turned off by commenting out the statement "writel(control & OHCI_CTRL_MASK, base + OHCI_CONTROL);" in quirk_usb_handoff_ohci().
Created attachment 40972 [details]
USB debug output with unable to enumerate messaged
In this case, the commit 3df7169e is left intact.
Created attachment 42052 [details]
dmesg output for USB debug without the enumerate messages
This output results when the reset_controller in drivers/usb/host/pci-quirks.c is commented out. The failure to enumerate messages do not appear.
Created attachment 42062 [details]
USB debug output in dmesg with the failure to enumerate messages
One section of the difference between the two preceding attachments is shown below. In the case without the enumerate messages (-), hub 2-0:1.0 is reported with chg of 0. For the other case, chg is 0x0020. Is this significant? What does this bit mean, and where is it set? @@ -720,7 +720,6 @@ ohci_hcd 0000:00:02.0: irq 17, io mem 0xfc386000 ehci_hcd 0000:00:02.1: GetStatus port:4 status 001803 0 ACK POWER sig=j CSC CONNECT hub 1-0:1.0: port 4: status 0501 change 0001 - hub 2-0:1.0: state 7 ports 5 chg 0000 evt 0000 ohci_hcd 0000:00:02.0: OHCI controller state ohci_hcd 0000:00:02.0: OHCI 1.0, NO legacy support registers ohci_hcd 0000:00:02.0: control 0x683 RWE RWC HCFS=operational CBSR=3 @@ -743,6 +742,8 @@ usb usb3: Product: OHCI Host Controller usb usb3: Manufacturer: Linux 2.6.37-rc8-wl+ ohci_hcd usb usb3: SerialNumber: 0000:00:02.0 + ehci_hcd 0000:00:04.1: GetStatus port:5 status 001c03 0 ACK POWER sig=? CSC CONNECT + hub 2-0:1.0: port 5: status 0501 change 0001 usb usb3: usb_probe_device usb usb3: configuration #1 chosen from 1 choice usb usb3: adding 3-0:1.0 (config #1, interface 0) @@ -835,45 +836,219 @@ usb 1-4: adding 1-4:1.1 (config #1, interface 1) drivers/usb/core/inode.c: creating file '002' usb: return 0 from usb_new_device + hub 2-0:1.0: state 7 ports 5 chg 0020 evt 0000 + hub 2-0:1.0: port 5, status 0501, change 0000, 480 Mb/s + ehci_hcd 0000:00:04.1: port 5 full speed --> companion + ehci_hcd 0000:00:04.1: GetStatus port:5 status 003001 0 ACK POWER OWNER sig=se0 CONNECT + hub 2-0:1.0: port 5 not reset yet, waiting 50ms + ehci_hcd 0000:00:04.1: GetStatus port:5 status 001002 0 ACK POWER sig=se0 CSC + hub 2-0:1.0: unable to enumerate USB device on port 5 hub 3-0:1.0: state 7 ports 5 chg 0000 evt 0000 hub 4-0:1.0: state 7 ports 5 chg 0000 evt 0000 + hub 2-0:1.0: state 7 ports 5 chg 0000 evt 0020 + hub 2-0:1.0: state 7 ports 5 chg 0000 evt 0020 + ehci_hcd 0000:00:04.1: GetStatus port:5 status 001c03 0 ACK POWER sig=? CSC CONNECT + hub 2-0:1.0: port 5, status 0501, change 0001, 480 Mb/s + hub 2-0:1.0: debounce: port 5: total 100ms stable 100ms status 0x501 Not only does the log in comment #8 not have any enumeration errors -- it doesn't have any full-speed enumerations at all! Only the high-speed HP webcam. It looks like you didn't plug in the full/low-speed test device while testing that kernel. A better approach might be to unload ehci-hcd before plugging in the test device. That will at least remove one layer of confusion. In the kernel that generated #8, I had commented out the statement that says writel(control & OHCI_CTRL_MASK, base + OHCI_CONTROL); in quirk_usb_handoff_ohci(). When I do that, I do not get any enumeration errors. That puts my system into the pre commit 3df7169e situation. The 4 hubs reported are all internal to my computer. I need not plug any devices into any of my ports to get the enumeration failed messages. In fact, the only device on any of the hubs is that webcam that I am not using. None of my tests have ever involved any other device being plugged in. I tried to generate an initrd without ehci-hcd and ohci-hcd, but that effort failed. I do not know where the mkinitrd script is getting the info that causes those modules to be included. To get around this, I generated a kernel with echi-hcd excluded from the configuration. I also removed the restriction on the number of enumeration messages that are logged. That log will be in the next posting. Created attachment 42072 [details]
USB DEBUG dmesg output with no ehci-hcd
As stated, ehci-hcd is not in this configuration.
To tell the truth, I suspect you have a hardware problem. It might be something wrong with a USB controller, or there might be a device hardwired into the port that's not behaving correctly. On the other hand, I have no idea why skipping the OHCI controller reset would prevent the problem from showing up. It certainly is odd that the same thing doesn't happen on any of the eight other unoccupied ports. And nobody else has reported the same kind of symptoms; see http://fr.pastebin.ca/2031877 for an example where the existing code works correctly. What happens if you boot a kernel with your patch applied (and let's say without ehci-hcd configured), but then unload and reload ohci-hcd after the system is running? Created attachment 42102 [details]
dmesg with USB debug - no ehci and ohci reloaded
I agree that it looks like a hardware problem with my box, but as it is a laptop, it would be hard to discover where the problem might occur.
This attachment is the dmesg output with ehci-hcd not configured and ohci-hcd loaded on boot, then unloaded. Nothing changed that I can see. I limit the "unable to enumerate" messages to 9, and print a message when the maximum number is exceeded. I also dumped the reason for hitting that branch. In all cases, hub->hdev->parent is NULL. The failure is because the call to the routine specified by hcd->driver->port_handed_over returns 0 (when ehci is loaded), or the routine is not specified when ehci-hcd is not loaded (expected).
The only other line I spotted in the log is a line
hub 2-0:1.0: port 5, status 0301, change 0000, 1.5 Mb/s
This is the only port such specified.
Created attachment 42112 [details]
Proposed patch to limit spamming of the logs
If it turns out that the problem is generic to all boxes with the same motherboard as mine, would you entertain a patch like the one attached?
I don't mind running with a dirty kernel and could apply this patch to every kernel I build, but anyone else getting these messages might not have the luxury of making this change.
That's not what I meant -- I meant for you to try a kernel with your patch from comment #6 and no ehci-hcd. You don't have to unload and reload usbcore too; ohci-hcd alone will be good enough. I'm not sure whether limiting the "unable to enumerate" messages is a good idea. Yes, it will help in situations like yours where there's nothing else to be done, but it won't fix anything -- and all the useless USB activity will prevent your computer from entering various low-power states the way it should. Created attachment 42122 [details]
dmesg output with no ehci configured, controller reset off, ohci reloaded
Sorry, I misunderstood what patch you wanted.
With ehci-hcd not configured, the "unable to enumerate" messages were logged even though the reset was disabled. I used modprobe to remove ohci-hcd at 86 seconds. This also unloaded usbcore. When ohci-hcd was reloaded, the enumerate messages were repeated.
At 135 seconds, I used rmmod to unload ohci-hcd, and then reloaded it with modprobe. This time the messages did not occur.
If you want any other logs, let me know.
So even your suggested patch from comment #6 doesn't truly fix the problem. Have you tried to see if there is anything attached to the troublesome port? I'm not sure how you would check; maybe there's an additional USB device on that computer which should be working but isn't. Even something as simple as a transient short circuit could cause this sort of effect. Failing anything else, early on during the boot sequence you could unbind the 0000:00:04.0 device from ohci-hcd and 0000:00:04.1 from ehci-hcd. That would stop all the activity and log spamming, but it would mean you couldn't use either of those buses. From what I can tell, those hubs are not connected to anything. Everything that I know about is functioning, and my external ports all work when I unbind the drivers using echo "0000:00:04.0" > /sys/bus/pci/drivers/ohci_hcd/unbind echo "0000:00:04.1" > /sys/bus/pci/drivers/ehci_hcd/unbind How is the best way to get these commands executed early in the boot process, or is there some other way to get the drivers unbound? I tried to write a rules file for udev, but it is not right. udev won't help; it doesn't control driver binding. There's no way to prevent the drivers from binding to those controllers, so all you can do is unbind them afterward. Perhaps a custom shell script can be added to your initrd -- you'd have to check with your distro to find out how. If nothing else, you can add a couple of lines to /etc/rc.d/rc.local (or the equivalent file on your system). My distro is openSUSE 11.3. The file is /etc/rc.d/boot.local. After adding the commands to that file, I get 45 of the "unable to enumerate" messages before the commands are executed. Unfortunately, the following WARNING showed up: ehci_hcd 0000:00:04.1: remove, state 1 usb usb2: USB disconnect, address 1 hub 2-0:1.0: hub_port_status failed (err = -19) hub 2-0:1.0: connect-debounce failed, port 5 disabled ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:812 check_unmap+0x16c/0x7e0() Hardware name: HP Pavilion dv2700 Notebook PC ehci_hcd 0000:00:04.1: DMA-API: device driver frees DMA memory with different size [device address=0x00000000b392c000] [map size=4096 bytes] [ unmap size=2048 bytes] Modules linked in: ext4 jbd2 crc16 loop dm_mod ide_cd_mod cdrom snd_hda_codec_conexant arc4 ide_pci_generic ecb b43 rng_core mac80211 cfg80211 snd_hda_intel rfkill snd_hda_codec ssb snd_pcm snd_timer mmc_core amd74xx ide_core snd i2c_nforce2 k8temp soundcore snd_page_alloc pcmcia serio_raw hwmon jo ydev i2c_core sg pcmcia_core forcedeth battery ac button ext3 jbd mbcache ohci_hcd sd_mod ehci_hcd usbcore fan processor ahci libahci libata scsi_mod thermal Pid: 2738, comm: sh Not tainted 2.6.37-rc8-wl+ #425 Call Trace: [<ffffffff81047b8a>] warn_slowpath_common+0x7a/0xb0 [<ffffffff8107fbe4>] ? debug_check_no_locks_freed+0x94/0x160 [<ffffffff81047c61>] warn_slowpath_fmt+0x41/0x50 [<ffffffff811ca54c>] check_unmap+0x16c/0x7e0 [<ffffffff81111587>] ? kfree+0xd7/0x160 [<ffffffff811cadd3>] debug_dma_free_coherent+0x73/0x90 [<ffffffffa01af471>] ehci_mem_cleanup+0x131/0x1e0 [ehci_hcd] [<ffffffffa01af5c1>] ehci_stop+0xa1/0xd0 [ehci_hcd] [<ffffffffa010483d>] usb_remove_hcd+0xcd/0x120 [usbcore] [<ffffffffa0113bd4>] usb_hcd_pci_remove+0x44/0xc0 [usbcore] [<ffffffff811d36c2>] pci_device_remove+0x32/0x60 [<ffffffff8125fd41>] __device_release_driver+0x61/0xc0 [<ffffffff8125fdca>] device_release_driver+0x2a/0x40 [<ffffffff8125ef2b>] driver_unbind+0x9b/0xc0 [<ffffffff8125e797>] drv_attr_store+0x27/0x30 [<ffffffff81175b21>] sysfs_write_file+0xe1/0x160 [<ffffffff81117606>] vfs_write+0xc6/0x170 [<ffffffff811178ec>] sys_write+0x4c/0x80 [<ffffffff81002d3b>] system_call_fastpath+0x16/0x1b ---[ end trace c4dfc9893d936182 ]--- Mapped at: [<ffffffff811caf0b>] debug_dma_alloc_coherent+0x3b/0x90 [<ffffffffa01afdf2>] ehci_pci_setup+0x2b2/0xab0 [ehci_hcd] [<ffffffffa010559d>] usb_add_hcd+0x19d/0x5e0 [usbcore] [<ffffffffa0113e05>] usb_hcd_pci_probe+0x1b5/0x330 [usbcore] [<ffffffff811d2762>] local_pci_probe+0x12/0x20 It does not matter whether I unlink the ohci-hcd or ehci-hcd device first. Created attachment 42402 [details]
Fix initialization order of ehci->periodic_size
Ooh, that's bad. This patch should fix the problem. The value of ehci->periodic_size was getting changed after it was used for computing the size of a DMA-coherent region in ehci_mem_init().
Good patch. You can add a Tested-by: Larry Finger <Larry.Finger@lwfinger.net> if you wish. Hardware error on Larry Finger's box. Workaround found and separate bug that the W/A exposed has a code fix. |