Kernel Bug Tracker – Bug 22642
2.6.37-rc1: Disk takes 10 seconds to resume - MacBook2,1
Last modified: 2011-04-17 16:16:15 UTC
New with 2.6.37-rc1 (from the ubuntu ppa):
[108572.434303] ------------[ cut here ]------------
[108572.434311] WARNING: at /home/kernel-ppa/COD/linux/kernel/power/suspend_test.c:53 suspend_test_finish+0x89/0x90()
[108572.434313] Hardware name: MacBook2,1
[108572.434315] Component: resume devices, time: 17260
[108572.434316] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc nls_utf8 isofs nls_iso8859_1 nls_cp437 vfat fat usb_storage binfmt_misc rfcomm sco bnep l2cap parport_pc ppdev arc4 snd_hda_codec_idt ath9k snd_hda_intel mac80211 snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi ath9k_common ath9k_hw snd_seq_midi_event snd_seq snd_timer ath snd_seq_device applesmc cfg80211 snd soundcore joydev led_class btusb bluetooth snd_page_alloc isight_firmware input_polldev appletouch lp mbp_nvidia_bl parport sha256_generic cryptd aes_x86_64 aes_generic hid_apple dm_crypt usbhid hid i915 drm_kms_helper drm i2c_algo_bit firewire_ohci sky2 firewire_core intel_agp intel_gtt video crc_itu_t output
[108572.434369] Pid: 15747, comm: pm-suspend Tainted: G W 2.6.37-020637rc1-generic #201011020905
[108572.434371] Call Trace:
[108572.434376] [<ffffffff810a44c9>] ? suspend_test_finish+0x89/0x90
[108572.434382] [<ffffffff810646e0>] warn_slowpath_common+0x90/0xc0
[108572.434385] [<ffffffff8106480e>] warn_slowpath_fmt+0x6e/0x70
[108572.434390] [<ffffffff81038649>] ? default_spin_lock_flags+0x9/0x10
[108572.434394] [<ffffffff812cb607>] ? kobject_put+0x27/0x60
[108572.434398] [<ffffffff81399ae7>] ? put_device+0x17/0x20
[108572.434402] [<ffffffff813a3b68>] ? dpm_complete+0x128/0x140
[108572.434406] [<ffffffff810a44c9>] suspend_test_finish+0x89/0x90
[108572.434409] [<ffffffff810a4282>] suspend_devices_and_enter+0xc2/0x100
[108572.434412] [<ffffffff810a43a1>] enter_state+0xe1/0xf0
[108572.434416] [<ffffffff810a3804>] state_store+0xb4/0xf0
[108572.434419] [<ffffffff812cb3a7>] kobj_attr_store+0x17/0x20
[108572.434423] [<ffffffff811ca1a2>] flush_write_buffer+0x62/0x90
[108572.434426] [<ffffffff811ca2e6>] sysfs_write_file+0x66/0xa0
[108572.434431] [<ffffffff8115e76c>] vfs_write+0xcc/0x190
[108572.434434] [<ffffffff8115f195>] sys_write+0x55/0x90
[108572.434437] [<ffffffff8100c132>] system_call_fastpath+0x16/0x1b
[108572.434440] ---[ end trace bdcb9cd79c701bb3 ]---
Apparently, the resume of devices took too much time.
Is this reproducible?
I can reproduce it on my eee. How do I debug this?
Please try to boot with initcall_debug in the kernel command line and see what devices take long time to resume.
It's my harddrive that takes 10 seconds.
[ 1.565105] scsi 0:0:0:0: Direct-Access ATA ST9160310AS 0303 PQ: 0 ANSI: 5
[ 69.523147] call 0:0:0:0+ returned 0 after 10428995 usecs
I'll look into it some more, but I doubt there is anything that will speed it up. I suspect this warning may lead to a lot of bogus bug reports where we just have to tell people their hardware sucks.
Well, there have not been too many of them and I wouldn't like to overlook
a valid one.
This particular one may indicate a driver problem as well, so I'm reassigning
It get's delayed in ata_wait_ready().
That function has a printk "link is slow to respond, please be patient" if you go over 5 seconds. I added a printk to see exactly how long it was taking. It first waits for 6 seconds and triggers the message and it waits for 4 seconds which is where my 10 second delay comes from.
This doesn't look like a kernel bug to me.
Tobias, you may be experiencing a different bug. I sort of hijacked your bug report. Could you boot with initcall_debug and post your dmesg?
Yeah, around 10 seconds is pretty normal duration for disk spinup. Nothing wrong with Dan's case. Tobias?
Created attachment 37882 [details]
Sorry for the delay, I was on an internet-less vacation.
I have not been able to reproduce this. If it was somewhat storage-related, could a slow nfsmount or a cdrom drive that is starting to fail trigger this?
I have now switched to rc3 and will see what happens. Kernel command line twisting is not my strong point, but from my noobish look at the attached dmesg initcall_debug seems to work.
I'll report back if this happens again. Please tell me if there is more I can do.
We need a suspend resume cycle in your dmesg. :P
Created attachment 37962 [details]
Doh, I blame the bugzilla breakage yesterday. Is the new dmesg of any use?
[ 9287.020921] ata3.01: configured for UDMA/133
[ 9287.035898] call 2:0:1:0+ returned 0 after 2379249 usecs
(that's your /dev/sda taking 2 seconds)
[ 9289.840112] usb 1-4: reset high speed USB device using ehci_hcd and address 3
[ 9289.990593] call 1-4+ returned 0 after 5205424 usecs
(that's your usb taking 5 seconds)
The other things take an additional 2 seconds for a total of 9 seconds.
[ 9294.004368] PM: resume of devices complete after 9412.372 msecs
You're on the border. The warning and the stack dump only get printed if you hit 10 seconds.
I doubt this is a problem that's fixable with software.
Another bad thing about the stack dump is that it taints your kernel with the 'W' for warning.
Well, we can drop the stack dump, but then people won't even report the
problem, I guess.
And about ehci_hcd taking 5 seconds to reset... theres nothing to fix there?
It was 1-4 that took 5 seconds, not usb1 or 0000:00:1d.7 (either of which might reasonably be called ehci_hcd). What is the 1-4 device, anyway?
Dan's timing calculations in comment #13 aren't right. The ATA and USB resumes ran in parallel, taking 5 seconds overall. As for why the USB resume took so long, I can't tell from this log. Enabling CONFIG_USB_DEBUG might help.
By the way, the real bottleneck is the PNP0C0A:00 device, whatever that is:
[ 9294.000258] call PNP0C0A:00+ returned 0 after 9187140 usecs
PNP0C0A is ACPI battery.
could this be something like https://bugzilla.kernel.org/show_bug.cgi?id=8246 ?
does the kernel boot option "acpi=rsdt" help?
(In reply to comment #9)
> I'll report back if this happens again. Please tell me if there is more I can
I guess the >10s resume didn't happen anymore, closing.