Bug 22642
Summary: | 2.6.37-rc1: Disk takes 10 seconds to resume - MacBook2,1 | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Tobias (devnull) |
Component: | Serial ATA | Assignee: | Rafael J. Wysocki (rjw) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | acpi-bugzilla, devnull, error27, florian, rjw, stern, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.37-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 21782 | ||
Attachments: |
dmesg
dmesg.resume |
Description
Tobias
2010-11-10 19:33:18 UTC
Apparently, the resume of devices took too much time. Is this reproducible? I can reproduce it on my eee. How do I debug this? Please try to boot with initcall_debug in the kernel command line and see what devices take long time to resume. It's my harddrive that takes 10 seconds. [ 1.565105] scsi 0:0:0:0: Direct-Access ATA ST9160310AS 0303 PQ: 0 ANSI: 5 [ 69.523147] call 0:0:0:0+ returned 0 after 10428995 usecs I'll look into it some more, but I doubt there is anything that will speed it up. I suspect this warning may lead to a lot of bogus bug reports where we just have to tell people their hardware sucks. Well, there have not been too many of them and I wouldn't like to overlook a valid one. This particular one may indicate a driver problem as well, so I'm reassigning to Storage/SATA. It get's delayed in ata_wait_ready(). That function has a printk "link is slow to respond, please be patient" if you go over 5 seconds. I added a printk to see exactly how long it was taking. It first waits for 6 seconds and triggers the message and it waits for 4 seconds which is where my 10 second delay comes from. This doesn't look like a kernel bug to me. Tobias, you may be experiencing a different bug. I sort of hijacked your bug report. Could you boot with initcall_debug and post your dmesg? Yeah, around 10 seconds is pretty normal duration for disk spinup. Nothing wrong with Dan's case. Tobias? Created attachment 37882 [details]
dmesg
Sorry for the delay, I was on an internet-less vacation. I have not been able to reproduce this. If it was somewhat storage-related, could a slow nfsmount or a cdrom drive that is starting to fail trigger this? I have now switched to rc3 and will see what happens. Kernel command line twisting is not my strong point, but from my noobish look at the attached dmesg initcall_debug seems to work. I'll report back if this happens again. Please tell me if there is more I can do. We need a suspend resume cycle in your dmesg. :P Created attachment 37962 [details]
dmesg.resume
Doh, I blame the bugzilla breakage yesterday. Is the new dmesg of any use? [ 9287.020921] ata3.01: configured for UDMA/133 [ 9287.035898] call 2:0:1:0+ returned 0 after 2379249 usecs (that's your /dev/sda taking 2 seconds) [ 9289.840112] usb 1-4: reset high speed USB device using ehci_hcd and address 3 [ 9289.990593] call 1-4+ returned 0 after 5205424 usecs (that's your usb taking 5 seconds) The other things take an additional 2 seconds for a total of 9 seconds. [ 9294.004368] PM: resume of devices complete after 9412.372 msecs You're on the border. The warning and the stack dump only get printed if you hit 10 seconds. I doubt this is a problem that's fixable with software. Another bad thing about the stack dump is that it taints your kernel with the 'W' for warning. Well, we can drop the stack dump, but then people won't even report the problem, I guess. And about ehci_hcd taking 5 seconds to reset... theres nothing to fix there? It was 1-4 that took 5 seconds, not usb1 or 0000:00:1d.7 (either of which might reasonably be called ehci_hcd). What is the 1-4 device, anyway? Dan's timing calculations in comment #13 aren't right. The ATA and USB resumes ran in parallel, taking 5 seconds overall. As for why the USB resume took so long, I can't tell from this log. Enabling CONFIG_USB_DEBUG might help. By the way, the real bottleneck is the PNP0C0A:00 device, whatever that is: [ 9294.000258] call PNP0C0A:00+ returned 0 after 9187140 usecs PNP0C0A is ACPI battery. could this be something like https://bugzilla.kernel.org/show_bug.cgi?id=8246 ? does the kernel boot option "acpi=rsdt" help? (In reply to comment #9) > I'll report back if this happens again. Please tell me if there is more I can > do. I guess the >10s resume didn't happen anymore, closing. |