Bug 29902
Summary: | System suspend failed after 1-2 successful resumes | ||
---|---|---|---|
Product: | Drivers | Reporter: | Alex Terekhov (a.terekhov) |
Component: | USB | Assignee: | Greg Kroah-Hartman (greg) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | florian, maciej.rutecki, rjw, stern, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.38-rc6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216 | ||
Attachments: |
Syslog
lspci -vvv dmesg dmesg with CONFIG_USB_DEBUG usbmon trace Use atomic flags instead of hcd->state dmesg with uhci_hcd debug param = 2 dmesg for the first case dmesg for the second case Add delay during UHCI startup Delay 50 microseconds dmesg (suspend-resume for all USB ports) |
Created attachment 49132 [details]
lspci -vvv
Does it only fail to suspend, or does it also crash or something? If it doesn't crash during the failing suspend, please attach full dmesg output including the failing attempt to suspend. Created attachment 49152 [details]
dmesg
No, it doesn't crash or something else, It just fails to suspend Here's the source of the problem: uhci_hcd 0000:00:1d.2: Root hub is not suspended pci_pm_suspend(): hcd_pci_suspend+0x0/0x19 [usbcore] returns -16 sdhci-pci 0000:03:00.0: PCI INT A disabled pm_op(): pci_pm_suspend+0x0/0xbe returns -16 PM: Device 0000:00:1d.2 failed to suspend async: error -16 (reassigning to USB). Is the problem reproducible after you've done # echo 0 > /sys/power/pm_async ? BTW, the SATA messages are from the error code path that's not the nicest one, so to speak, but they are harmless. Hmm.. the WARN_ON() indicates that ATA port was requested to resume without successfully completing suspend. Does PM core calls in resume for devices which haven't successfully complete suspend on suspend failure? Or is libata doing something wrong? (In reply to comment #5) > Is the problem reproducible after you've done > > # echo 0 > /sys/power/pm_async > > ? Hi, thanks, the problem didn't encounter after that. Is the USB problem reproducible? If it is, please build a kernel with CONFIG_USB_DEBUG and CONFIG_USB_MON enabled and post the dmesg log. During the same test, acquire a usbmon trace (see Documentation/usb/usbmon.txt) using the 0u file, and attach that to the bug report as well. (In reply to comment #9) > Is the USB problem reproducible? If it is, please build a kernel with After # echo 0 > /sys/power/pm_async USB problem isn't reproducible. Do you want me to get a usbmon trace without previous command? Created attachment 49242 [details]
dmesg with CONFIG_USB_DEBUG
Created attachment 49252 [details]
usbmon trace
I'm a little surprised that pm_async made any difference. Anyway, looking through your dmesg log, the problem starts with these two errors: usb usb6: usb resume usb usb6: wakeup_rh uhci_hcd 0000:00:1d.0: host controller halted, very bad! uhci_hcd 0000:00:1d.0: HC died; cleaning up usb usb7: usb resume usb usb7: wakeup_rh usb usb8: usb resume usb usb8: wakeup_rh uhci_hcd 0000:00:1d.2: host controller halted, very bad! uhci_hcd 0000:00:1d.2: HC died; cleaning up It's not clear why this happened. You'll get a little more information if you do "echo 2 >/sys/module/uhci_hcd/parameters/debug" before starting the first suspend. (And forget about the usbmon trace, it's not going to help.) But knowing why it happened won't fix the bug. I'll write up a patch in a few days. Created attachment 49662 [details]
Use atomic flags instead of hcd->state
Here's a patch that ought to fix the problem. But I'd still like to see the debugging output; your UHCI controllers should not have died like that. Or maybe they didn't die, maybe they're just slow to get started.
Created attachment 49782 [details]
dmesg with uhci_hcd debug param = 2
Created attachment 49822 [details]
dmesg for the first case
Alan, looks like USB + suspend still not working properly.
1. unplug after suspend, plug in after resume or replug after suspend to another USB-socket
2. right it dies without any plugging/unplugging
Both logs attached
Created attachment 49832 [details]
dmesg for the second case
The second log
Created attachment 49902 [details]
Add delay during UHCI startup
It looks like the patch worked perfectly. Even though the host controllers died, you were able to go ahead and suspend the system afterwards with no problem.
The next problem is why the controllers died. I can't tell exactly, but it looks like there's a race with the hardware. This new patch adds a short (10 microsecond) delay, which may give the hardware enough time to start up properly. Try it along with the other patch and see what happens. Also try changing the length of the delay; see if you can find the shortest value that will work.
Created attachment 49942 [details]
Delay 50 microseconds
Nothing changed, even when I set 50 microseconds delay.
I don't know what's causing this problem with your controllers, or how to fix it. It may be a hardware issue. Have you tried plugging the two affected devices (the Saitek Cyborg USB Stick and the Broadcom Bluetooth Device) into different ports, connected to different controllers? For some reason the 0000:00:1a.0 - 0000:00:1a.2 controllers don't seem to have this problem. Dropping from the list of recent regressions. There's one more thing you can try. Before starting a suspend, do: echo -n 0000:00:1d.0 >/sys/bus/pci/drivers/uhci_hcd/unbind echo -n 0000:00:1d.2 >/sys/bus/pci/drivers/uhci_hcd/unbind Then after the resume, do: echo -n 0000:00:1d.0 >/sys/bus/pci/drivers/uhci_hcd/bind echo -n 0000:00:1d.2 >/sys/bus/pci/drivers/uhci_hcd/bind If you get the "controller died" or similar errors during the "bind" part, then I think there's nothing more that can be done. Either way, the dmesg log will be informative. Created attachment 50632 [details]
dmesg (suspend-resume for all USB ports)
Hi, Alan
Sorry for delay, looks like you're right
I've tested all USB ports and looks like one of them works fine and two other not.
Thanks for your last suggestions, I think making rmmod uhci_hcd before suspend and modprobe uhci_hcd after resume would be the easiest way, don't you think so?
Yes, using rmmod would be the easier than doing those "echo" commands, if uhci-hcd is built as a module in your kernel. But until you try it, there's no way to know whether the problem will still occur. Your experiment does show pretty clearly that two of the USB ports don't work right. This definitely looks like a hardware bug. A commit referencing this bug has been merged for .39-rc1: commit 9b37596a2e860404503a3f2a6513db60c296bfdc Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Mar 7 11:11:52 2011 -0500 USB: move usbcore away from hcd->state It looks like this bug report can be closed. The software part has been fixed, and the rest is just a hardware problem. Thanks, Alan A patch referencing a commit referencing this bug report has been merged in v3.0-rc1: commit 69fff59de4d844f8b4c2454c3c23d32b69dcbfd7 Author: Alan Stern <stern@rowland.harvard.edu> Date: Tue May 17 17:27:12 2011 -0400 USB: remove remaining usages of hcd->state from usbcore and fix regression |
Created attachment 49122 [details] Syslog After starting my system going to suspend, then resume the system. After that while trying to suspend my system again I get some warnings from libata and that's all. Feb 26 00:10:06 immortal kernel: ------------[ cut here ]------------ Feb 26 00:10:06 immortal kernel: WARNING: at drivers/ata/libata-eh.c:3976 ata_scsi_error+0x14f/0x57d [libata]() Feb 26 00:10:06 immortal kernel: Hardware name: VPCY11M1R Feb 26 00:10:06 immortal kernel: Modules linked in: i915 drm_kms_helper drm i2c_algo_bit loop sha256_generic aes_i586 aes_generic cbc dm_crypt dm_mod arc4 ecb ath9k joydev snd_hda_codec_realtek mac80211 ath9k_common snd_hda_intel snd_hda_codec snd_hwdep snd_pcm ath9k_hw ath snd_seq uvcvideo snd_timer sony_laptop i2c_i801 cfg80211 snd_seq_device videodev psmouse battery snd ac rfkill evdev power_supply soundcore i2c_core snd_page_alloc video button processor ext3 jbd mbcache sd_mod uhci_hcd ahci libahci libata sdhci_pci sdhci scsi_mod ehci_hcd usbcore atl1c thermal mmc_core thermal_sys nls_base [last unloaded: scsi_wait_scan] Feb 26 00:10:06 immortal kernel: Pid: 184, comm: scsi_eh_0 Not tainted 2.6.38-rc6 #15 Feb 26 00:10:06 immortal kernel: Call Trace: Feb 26 00:10:06 immortal kernel: [<c10304a8>] ? warn_slowpath_common+0x6a/0x7b Feb 26 00:10:06 immortal kernel: [<f88f1875>] ? ata_scsi_error+0x14f/0x57d [libata] Feb 26 00:10:06 immortal kernel: [<c10304c6>] ? warn_slowpath_null+0xd/0x10 Feb 26 00:10:06 immortal kernel: [<f88f1875>] ? ata_scsi_error+0x14f/0x57d [libata] Feb 26 00:10:06 immortal kernel: [<f88395f2>] ? scsi_error_handler+0xaa/0x36f [scsi_mod] Feb 26 00:10:06 immortal kernel: [<c10266c9>] ? complete+0x28/0x36 Feb 26 00:10:06 immortal kernel: [<f8839548>] ? scsi_error_handler+0x0/0x36f [scsi_mod] Feb 26 00:10:06 immortal kernel: [<c1042c61>] ? kthread+0x63/0x68 Feb 26 00:10:06 immortal kernel: [<c1042bfe>] ? kthread+0x0/0x68 Feb 26 00:10:06 immortal kernel: [<c100933e>] ? kernel_thread_helper+0x6/0x10 Feb 26 00:10:06 immortal kernel: ---[ end trace 19a7d3ffb6ba8aa9 ]---