Bug 13269 - WARNING: at kernel/hrtimer.c:625 hres_timers_resume+0x3c/0x48() when resuming
Summary: WARNING: at kernel/hrtimer.c:625 hres_timers_resume+0x3c/0x48() when resuming
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 low
Assignee: ykzhao
URL:
Keywords:
: 13578 (view as bug list)
Depends on:
Blocks: 7216 12398
  Show dependency tree
 
Reported: 2009-05-08 08:48 UTC by cedric
Modified: 2012-01-19 02:05 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.30-rc4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg after resuming (59.74 KB, text/plain)
2009-07-14 18:40 UTC, cedric
Details
Warn if interrupts are enabled during suspend-resume of sysdevs (2.15 KB, patch)
2009-07-21 08:54 UTC, ykzhao
Details | Diff
dmesg after resuming and after applying Rafael patch (59.87 KB, text/plain)
2009-07-21 16:30 UTC, cedric
Details
add the debug info to identify where it is enabled in cpufreq_suspend (1.65 KB, patch)
2009-07-22 02:32 UTC, ykzhao
Details | Diff
dmesg after resuming showing irq_disabled (59.78 KB, text/plain)
2009-07-22 10:06 UTC, cedric
Details
Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume (5.66 KB, patch)
2009-07-23 07:04 UTC, ykzhao
Details | Diff
lspci -vxx (11.83 KB, text/plain)
2009-07-23 11:13 UTC, cedric
Details
/proc/cpuinfo (457 bytes, text/plain)
2009-07-23 11:14 UTC, cedric
Details
Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume (3.24 KB, patch)
2009-07-27 02:35 UTC, ykzhao
Details | Diff

Description cedric 2009-05-08 08:48:27 UTC
Following bug #13243 (comment #13) : the warning is caused by :

commit 1d4a7f1c4faf53eb9e822743ec8a70b3019a26d2
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sun Jan 18 16:39:29 2009 +0100
hrtimers: fix inconsistent lock state on resume in hres_timers_resume

But i can't put Peter in cc.

The warning:

------------[ cut here ]------------
WARNING: at kernel/hrtimer.c:625 hres_timers_resume+0x3c/0x48()
Hardware name: HP Compaq nc6000 (DU358S#UUG)
hres_timers_resume() called with IRQs enabled!Modules linked in: nfs lockd
nfs_acl sunrpc adm1031 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
acpi_cpufreq usbhid pcmcia snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm sr_mod
snd_timer ehci_hcd cdrom snd rtc_cmos uhci_hcd yenta_socketrtc_core usbcore
i2c_i801 tg3 rsrc_nonstatic snd_page_alloc pcmcia_core libphy rtc_lib battery
ac
Pid: 2943, comm: pm-suspend Not tainted 2.6.30-rc4 #1
Call Trace:
 [<c011ce4b>] warn_slowpath+0x74/0xc3
 [<c0310038>] ? rt_mutex_slowunlock+0xe8/0x13b
 [<c010728b>] ? pit_next_event+0x19/0x1e
 [<c0135d0c>] ? clockevents_program_event+0x92/0x128
 [<c012f78a>] ? ktime_get_ts+0x45/0x49
 [<c0136add>] ? tick_dev_program_event+0x33/0xb4
 [<c0136ba5>] ? tick_program_event+0x17/0x19
 [<c0136bc8>] ? tick_resume_oneshot+0x21/0x39
 [<c013036b>] ? notifier_call_chain+0x2b/0x5a
 [<c012f72b>] hres_timers_resume+0x3c/0x48
 [<c013299f>] timekeeping_resume+0xd1/0xf0
 [<c0247b6d>] __sysdev_resume+0x14/0x42
 [<c0247bd2>] sysdev_resume+0x37/0x66
 [<c013e0f7>] suspend_devices_and_enter+0x18e/0x1b5
 [<c030ee9f>] ? printk+0x18/0x21
 [<c013e261>] enter_state+0x11e/0x17b
 [<c013e33a>] state_store+0x7c/0xad
 [<c013e2be>] ? state_store+0x0/0xad
 [<c01d551b>] kobj_attr_store+0x20/0x27
 [<c019e5cc>] sysfs_write_file+0x97/0xe0
 [<c0166eaf>] vfs_write+0x8a/0x110
 [<c019e535>] ? sysfs_write_file+0x0/0xe0
 [<c0166fde>] sys_write+0x3d/0x6b
 [<c0102cc8>] sysenter_do_call+0x12/0x26
---[ end trace fbfd487192b3ff22 ]---

If more info needed, ask me.
Comment 1 ykzhao 2009-06-09 02:35:39 UTC
Hi, Credic
    Will you please try the boot option of "highres=off" and see whether the issue still exists?


Hi, Peterz
    Will you please look at this issue?
    
thanks.
Comment 2 cedric 2009-06-09 08:35:05 UTC
hello,

the problem is still there (using 2.6.30-rc8 now) even with the boot option.

cedric@enea ~ $ cat /proc/cmdline
root=/dev/sda1 resume=/dev/sda5 vga=0x316 video=vesafb:ywrap,mtrr,1024x768-16@60 splash=silent,theme:emerge-world CONSOLE=/dev/tty1 acpi_sleep=s3_bios lapic hpet=force highres=off
Comment 3 ykzhao 2009-07-13 05:17:16 UTC
*** Bug 13578 has been marked as a duplicate of this bug. ***
Comment 4 ykzhao 2009-07-13 05:23:42 UTC
Hi, Cedric
    Will you please attach the output of dmesg after resuming?
    Thanks.
Comment 5 cedric 2009-07-14 18:40:45 UTC
Created attachment 22342 [details]
dmesg after resuming

sorry for the late answer (holiday period).
Is this enough for you or do I have to increment the dmesg log size ?

PS:I had to reboot on a 2.6.30-rc8 kernel as the 2.6.31-rc serie does not resume :-( I'll open a bug for that later (trying to bisect the problem now)
Comment 6 ykzhao 2009-07-21 08:54:12 UTC
Created attachment 22427 [details]
Warn if interrupts are enabled during suspend-resume of sysdevs

Will you please try the debug patch from Rafael on the 2.6.30-rc4 kernel and attach the output of dmesg after suspend/resume?
Thanks.
Comment 7 cedric 2009-07-21 16:30:50 UTC
Created attachment 22430 [details]
dmesg after resuming and after applying Rafael patch

Patch applied and here is the result in attach.Note that now I have more than one warning. Is this expected ?
Comment 8 ykzhao 2009-07-22 02:32:39 UTC
Created attachment 22434 [details]
add the debug info to identify where it is enabled in cpufreq_suspend

Hi, Cderic
    From the log in comment #7 it seems that the irq is enabled spontaneously in course of suspend.
    >WARNING: at drivers/base/sys.c:405 sysdev_suspend+0x201/0x2a1()
    >Hardware name: HP Compaq nc6000 (DU358S#UUG)
    >Interrupts enabled after cpufreq_suspend+0x0/0x156

   Then it will cause that the irq is also enabled in course of resume.
    Will you please try the debug patch and attach the output of dmesg after suspend/resume?
   Thanks.
Comment 9 cedric 2009-07-22 10:06:27 UTC
Created attachment 22439 [details]
dmesg after resuming showing irq_disabled

Hello,

here is the result of the suspend/resume cycle with your patch.

Relevant part (?)

ACPI: Preparing to enter system sleep state S3
cpufreq driver name is acpi-cpufreq, irq disabled 1
irq_disabled 1 in 1247
irq_disabled 1 in 1266 after driver suspend
irq_disabled 0 in 1279 after driver get
irq_disabled 0 in 1301 before cpu_put
irq_disabled 0 in 1304 after cpu_put
Back to C!
Comment 10 ykzhao 2009-07-23 02:16:25 UTC
Will you please attach the output of lspci -vxx, /proc/cpuinfo?
Thanks.
Comment 11 ykzhao 2009-07-23 05:15:48 UTC
Hi, Cedric
    From the test log it seems that your  box is working on the UP kernel. And the irq is enabled while calling the function of cpufreq_driver->get.
    Will you please compile as SMP kernel and see whether it still complain that the hres_timer_resume is called with irq enabled?
   Thanks.
Comment 12 ykzhao 2009-07-23 07:04:54 UTC
Created attachment 22462 [details]
Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume
Comment 13 ykzhao 2009-07-23 07:06:01 UTC
Hi, Cedric
    Will you please try the debug patch and see whether the issue still exists?
    Thanks.
Comment 14 cedric 2009-07-23 11:13:40 UTC
Created attachment 22469 [details]
lspci -vxx
Comment 15 cedric 2009-07-23 11:14:16 UTC
Created attachment 22470 [details]
/proc/cpuinfo
Comment 16 cedric 2009-07-23 11:38:16 UTC
Hello,

Don't know if you still need the attachements, but posted in case of.

The patch doesn't apply on .30-rc4; do i need to test it on HEAD ?

cedric@enea ~/devel/repos/git/linux-2.6 $ git apply patch-linux3.diff
error: patch failed: arch/x86/kernel/cpu/cpufreq/powernow-k8.c:1388
error: arch/x86/kernel/cpu/cpufreq/powernow-k8.c: patch does not apply
error: patch failed: arch/x86/kernel/cpu/cpufreq/speedstep-ich.c:246
error: arch/x86/kernel/cpu/cpufreq/speedstep-ich.c: patch does not apply

i can't hand apply them on "arch/x86/kernel/cpu/cpufreq/speedstep-ich.c" and "arch/x86/kernel/cpu/cpufreq/powernow-k8.c" (too much diff between .30 and .31 and not enough skills from my part)

I'll try the SMP compilation for now.
Comment 17 ykzhao 2009-07-24 01:11:09 UTC
It will be better that you can try it on the latest kernel(2.6.31-rc3/4).
thanks.
Comment 18 cedric 2009-07-24 09:31:15 UTC
As stated in comment #5, the .31-rc serie doesn't resume here but will try it however. btw with the smp kernel, I don't have this warning anymore but another one :-)

...
ehci_hcd 0000:00:1d.7: LATE suspend
uhci_hcd 0000:00:1d.2: LATE suspend
uhci_hcd 0000:00:1d.1: LATE suspend
uhci_hcd 0000:00:1d.0: LATE suspend
pci 0000:00:01.0: LATE suspend
agpgart-intel 0000:00:00.0: LATE suspend
ACPI: Preparing to enter system sleep state S3
Disabling non-boot CPUs ...
------------[ cut here ]------------
WARNING: at kernel/smp.c:289 smp_call_function_single+0x87/0xfe()
Hardware name: HP Compaq nc6000 (DU358S#UUG)   
Modules linked in: nfs lockd nfs_acl sunrpc radeon drm adm1031 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss acpi_cpufreq pcmcia arc4 ecb cryptomgr aead pcompress crypto_blkcipher crypto_hash usbhid crypto_algapi snd_intel8x0 snd_ac97_codec ac97_bus sr_mod snd_pcm ath5k snd_timer mac80211 cdrom yenta_socket snd led_class uhci_hcd rtc_cmos ehci_hcd rsrc_nonstatic tg3 usbcore snd_page_alloc i2c_i801 cfg80211 rtc_core libphy pcmcia_core rtc_lib ac battery
Pid: 3875, comm: pm-suspend Not tainted 2.6.30-rc4 #1
Call Trace:
 [<c033564d>] ? printk+0x18/0x1b
 [<c0124c53>] warn_slowpath+0x75/0xc4
 [<c011c929>] ? put_prev_task_fair+0xd0/0xe3
 [<c0335976>] ? __schedule+0x22a/0x756
 [<c0144d95>] smp_call_function_single+0x87/0xfe
 [<f884a507>] ? do_drv_read+0x0/0x30 [acpi_cpufreq]
 [<f884a2c2>] get_cur_val+0x99/0xd6 [acpi_cpufreq]
 [<c01411c7>] ? tick_suspend_broadcast+0x2a/0x2d
 [<f884a36c>] get_cur_freq_on_cpu+0x6d/0xb7 [acpi_cpufreq]
 [<c029afc7>] cpufreq_suspend+0x99/0x149
 [<c0140326>] ? clockevents_notify+0x21/0x65
 [<c0260376>] sysdev_suspend+0x7d/0x1d0
 [<c014e5f9>] ? stop_machine_destroy+0x3c/0x3e
 [<c01498b4>] suspend_devices_and_enter+0x17b/0x1cd
 [<c033564d>] ? printk+0x18/0x1b
 [<c0149a54>] enter_state+0x129/0x18c
 [<c0149b33>] state_store+0x7c/0xac
 [<c0149ab7>] ? state_store+0x0/0xac
 [<c01eb22b>] kobj_attr_store+0x20/0x27
 [<c01b10be>] sysfs_write_file+0x97/0xe0
 [<c017655f>] vfs_write+0x8a/0x110
 [<c01b1027>] ? sysfs_write_file+0x0/0xe0
 [<c017668e>] sys_write+0x3d/0x6b
 [<c0102dc8>] sysenter_do_call+0x12/0x26
---[ end trace 2c0c876a0e5b6ebd ]---
Back to C!
microcode: CPU0 updated from revision 0x7 to 0x7, date = 2004-11-09 
ACPI: Waking up from system sleep state S3
agpgart-intel 0000:00:00.0: EARLY resume
agpgart-intel 0000:00:00.0: restoring config space at offset 0x1 (was 0x900006, writing 0x20900006)
pci 0000:00:01.0: EARLY resume
...
Comment 19 Peter Zijlstra 2009-07-24 10:04:15 UTC
On Fri, 2009-07-24 at 09:31 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> WARNING: at kernel/smp.c:289 smp_call_function_single+0x87/0xfe()
> Hardware name: HP Compaq nc6000 (DU358S#UUG)   
> Modules linked in: nfs lockd nfs_acl sunrpc radeon drm adm1031 snd_seq
> snd_seq_device snd_pcm_oss snd_mixer_oss acpi_cpufreq pcmcia arc4 ecb
> cryptomgr
> aead pcompress crypto_blkcipher crypto_hash usbhid crypto_algapi snd_intel8x0
> snd_ac97_codec ac97_bus sr_mod snd_pcm ath5k snd_timer mac80211 cdrom
> yenta_socket snd led_class uhci_hcd rtc_cmos ehci_hcd rsrc_nonstatic tg3
> usbcore snd_page_alloc i2c_i801 cfg80211 rtc_core libphy pcmcia_core rtc_lib
> ac
> battery
> Pid: 3875, comm: pm-suspend Not tainted 2.6.30-rc4 #1
> Call Trace:
>  [<c033564d>] ? printk+0x18/0x1b
>  [<c0124c53>] warn_slowpath+0x75/0xc4
>  [<c011c929>] ? put_prev_task_fair+0xd0/0xe3
>  [<c0335976>] ? __schedule+0x22a/0x756
>  [<c0144d95>] smp_call_function_single+0x87/0xfe
>  [<f884a507>] ? do_drv_read+0x0/0x30 [acpi_cpufreq]
>  [<f884a2c2>] get_cur_val+0x99/0xd6 [acpi_cpufreq]
>  [<c01411c7>] ? tick_suspend_broadcast+0x2a/0x2d
>  [<f884a36c>] get_cur_freq_on_cpu+0x6d/0xb7 [acpi_cpufreq]
>  [<c029afc7>] cpufreq_suspend+0x99/0x149
>  [<c0140326>] ? clockevents_notify+0x21/0x65
>  [<c0260376>] sysdev_suspend+0x7d/0x1d0
>  [<c014e5f9>] ? stop_machine_destroy+0x3c/0x3e
>  [<c01498b4>] suspend_devices_and_enter+0x17b/0x1cd
>  [<c033564d>] ? printk+0x18/0x1b
>  [<c0149a54>] enter_state+0x129/0x18c
>  [<c0149b33>] state_store+0x7c/0xac
>  [<c0149ab7>] ? state_store+0x0/0xac
>  [<c01eb22b>] kobj_attr_store+0x20/0x27
>  [<c01b10be>] sysfs_write_file+0x97/0xe0
>  [<c017655f>] vfs_write+0x8a/0x110
>  [<c01b1027>] ? sysfs_write_file+0x0/0xe0
>  [<c017668e>] sys_write+0x3d/0x6b
>  [<c0102dc8>] sysenter_do_call+0x12/0x26
> ---[ end trace 2c0c876a0e5b6ebd ]---

Fun, so this acpi-cpufreq muck uses smp_call_function*() stuff from
suspend hooks, which are ran with IRQs disabled.

Now I suspect that in this case we're calling to the current cpu, why
exactly its passing full cpu-masks around to indicate a single cpu is a
mystery to me though.

Also, cpufreq_suspend() looks very suspicious for calling an SRCU
notifier chain from something that's supposed to run with IRQs disabled.

I'm not seeing why we should do that whole get_freq() compare, notify
change junk on suspend, resume should simply query the hardware when its
getting back up and fix-up the state.
Comment 20 Anonymous Emailer 2009-07-24 15:05:00 UTC
Reply-To: davej@redhat.com

On Fri, Jul 24, 2009 at 12:05:48PM +0200, Peter Zijlstra wrote:
 > On Fri, 2009-07-24 at 09:31 +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
 > > WARNING: at kernel/smp.c:289 smp_call_function_single+0x87/0xfe()
 > > Hardware name: HP Compaq nc6000 (DU358S#UUG)   
 > > Modules linked in: nfs lockd nfs_acl sunrpc radeon drm adm1031 snd_seq
 > > snd_seq_device snd_pcm_oss snd_mixer_oss acpi_cpufreq pcmcia arc4 ecb cryptomgr
 > > aead pcompress crypto_blkcipher crypto_hash usbhid crypto_algapi snd_intel8x0
 > > snd_ac97_codec ac97_bus sr_mod snd_pcm ath5k snd_timer mac80211 cdrom
 > > yenta_socket snd led_class uhci_hcd rtc_cmos ehci_hcd rsrc_nonstatic tg3
 > > usbcore snd_page_alloc i2c_i801 cfg80211 rtc_core libphy pcmcia_core rtc_lib ac
 > > battery
 > > Pid: 3875, comm: pm-suspend Not tainted 2.6.30-rc4 #1
 > > Call Trace:
 > >  [<c033564d>] ? printk+0x18/0x1b
 > >  [<c0124c53>] warn_slowpath+0x75/0xc4
 > >  [<c011c929>] ? put_prev_task_fair+0xd0/0xe3
 > >  [<c0335976>] ? __schedule+0x22a/0x756
 > >  [<c0144d95>] smp_call_function_single+0x87/0xfe
 > >  [<f884a507>] ? do_drv_read+0x0/0x30 [acpi_cpufreq]
 > >  [<f884a2c2>] get_cur_val+0x99/0xd6 [acpi_cpufreq]
 > >  [<c01411c7>] ? tick_suspend_broadcast+0x2a/0x2d
 > >  [<f884a36c>] get_cur_freq_on_cpu+0x6d/0xb7 [acpi_cpufreq]
 > >  [<c029afc7>] cpufreq_suspend+0x99/0x149
 > >  [<c0140326>] ? clockevents_notify+0x21/0x65
 > >  [<c0260376>] sysdev_suspend+0x7d/0x1d0
 > >  [<c014e5f9>] ? stop_machine_destroy+0x3c/0x3e
 > >  [<c01498b4>] suspend_devices_and_enter+0x17b/0x1cd
 > >  [<c033564d>] ? printk+0x18/0x1b
 > >  [<c0149a54>] enter_state+0x129/0x18c
 > >  [<c0149b33>] state_store+0x7c/0xac
 > >  [<c0149ab7>] ? state_store+0x0/0xac
 > >  [<c01eb22b>] kobj_attr_store+0x20/0x27
 > >  [<c01b10be>] sysfs_write_file+0x97/0xe0
 > >  [<c017655f>] vfs_write+0x8a/0x110
 > >  [<c01b1027>] ? sysfs_write_file+0x0/0xe0
 > >  [<c017668e>] sys_write+0x3d/0x6b
 > >  [<c0102dc8>] sysenter_do_call+0x12/0x26
 > > ---[ end trace 2c0c876a0e5b6ebd ]---
 > 
 > Fun, so this acpi-cpufreq muck uses smp_call_function*() stuff from
 > suspend hooks, which are ran with IRQs disabled.
 > 
 > Now I suspect that in this case we're calling to the current cpu, why
 > exactly its passing full cpu-masks around to indicate a single cpu is a
 > mystery to me though.
 > 
 > Also, cpufreq_suspend() looks very suspicious for calling an SRCU
 > notifier chain from something that's supposed to run with IRQs disabled.
 > 
 > I'm not seeing why we should do that whole get_freq() compare, notify
 > change junk on suspend, resume should simply query the hardware when its
 > getting back up and fix-up the state.

This came up a week or so ago. When I investigated where that code came
from, I found it was added to work around a problem on ppc powerbooks.
x86 doesn't need to be running any of that suspend/resume code at all afaics.

	Dave
Comment 21 cedric 2009-07-25 10:06:22 UTC
Hello,

I can't still resume with .31-rc4.
What should I do next ? Is there a way to apply this patch on .30 ?
Comment 22 ykzhao 2009-07-27 02:35:05 UTC
Created attachment 22500 [details]
Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume
Comment 23 ykzhao 2009-07-27 02:36:36 UTC
Hi, Cedric
    Will you please try the debug patch on the 2.6.30-rc4?
    As the intel cpu is used on this box, the powernow-k8 is removed from this patch.
    Thanks.
Comment 24 cedric 2009-07-27 09:24:59 UTC
Hello,

I've tested the patch and no more WARNING on UP config.
Should I test on SMP too ?
Comment 25 ykzhao 2009-07-28 01:34:24 UTC
If there is no warning message again, it is unnecessary to test it on SMP.
thanks for your test.
Comment 26 ykzhao 2009-07-28 01:44:44 UTC
From the test it seems that there is no warning message again after applying the patch in comment #22. 
    I will try to push it. But I am sure whether it can hit the upstream kernel.

As it is related with the cpufreq  driver, it will be moved to power management/cpufreq category.

    Thanks.
Comment 27 Len Brown 2011-07-31 17:44:16 UTC
Is this still a problem in a recent kernel?
Comment 28 Zhang Rui 2012-01-18 01:48:55 UTC
It's great that kernel bugzilla is back.

Cedric,
can you please verify if the problem still exists in the latest upstream kernel?
Comment 29 cedric 2012-01-18 10:39:34 UTC
After several sleep down/wake up loops I didn't see it anymore. I'm on 3.2.0 for the moment.
Comment 30 Zhang Rui 2012-01-19 02:05:09 UTC
Okay.
I'll close this bug for now as it is not reproducible.
Please feel free to reopen it in case you can reproduce it again.

Note You need to log in before you can comment on or make changes to this bug.