Following bug #13243 (comment #13) : the warning is caused by : commit 1d4a7f1c4faf53eb9e822743ec8a70b3019a26d2 Author: Peter Zijlstra <peterz@infradead.org> Date: Sun Jan 18 16:39:29 2009 +0100 hrtimers: fix inconsistent lock state on resume in hres_timers_resume But i can't put Peter in cc. The warning: ------------[ cut here ]------------ WARNING: at kernel/hrtimer.c:625 hres_timers_resume+0x3c/0x48() Hardware name: HP Compaq nc6000 (DU358S#UUG) hres_timers_resume() called with IRQs enabled!Modules linked in: nfs lockd nfs_acl sunrpc adm1031 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss acpi_cpufreq usbhid pcmcia snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm sr_mod snd_timer ehci_hcd cdrom snd rtc_cmos uhci_hcd yenta_socketrtc_core usbcore i2c_i801 tg3 rsrc_nonstatic snd_page_alloc pcmcia_core libphy rtc_lib battery ac Pid: 2943, comm: pm-suspend Not tainted 2.6.30-rc4 #1 Call Trace: [<c011ce4b>] warn_slowpath+0x74/0xc3 [<c0310038>] ? rt_mutex_slowunlock+0xe8/0x13b [<c010728b>] ? pit_next_event+0x19/0x1e [<c0135d0c>] ? clockevents_program_event+0x92/0x128 [<c012f78a>] ? ktime_get_ts+0x45/0x49 [<c0136add>] ? tick_dev_program_event+0x33/0xb4 [<c0136ba5>] ? tick_program_event+0x17/0x19 [<c0136bc8>] ? tick_resume_oneshot+0x21/0x39 [<c013036b>] ? notifier_call_chain+0x2b/0x5a [<c012f72b>] hres_timers_resume+0x3c/0x48 [<c013299f>] timekeeping_resume+0xd1/0xf0 [<c0247b6d>] __sysdev_resume+0x14/0x42 [<c0247bd2>] sysdev_resume+0x37/0x66 [<c013e0f7>] suspend_devices_and_enter+0x18e/0x1b5 [<c030ee9f>] ? printk+0x18/0x21 [<c013e261>] enter_state+0x11e/0x17b [<c013e33a>] state_store+0x7c/0xad [<c013e2be>] ? state_store+0x0/0xad [<c01d551b>] kobj_attr_store+0x20/0x27 [<c019e5cc>] sysfs_write_file+0x97/0xe0 [<c0166eaf>] vfs_write+0x8a/0x110 [<c019e535>] ? sysfs_write_file+0x0/0xe0 [<c0166fde>] sys_write+0x3d/0x6b [<c0102cc8>] sysenter_do_call+0x12/0x26 ---[ end trace fbfd487192b3ff22 ]--- If more info needed, ask me.
Hi, Credic Will you please try the boot option of "highres=off" and see whether the issue still exists? Hi, Peterz Will you please look at this issue? thanks.
hello, the problem is still there (using 2.6.30-rc8 now) even with the boot option. cedric@enea ~ $ cat /proc/cmdline root=/dev/sda1 resume=/dev/sda5 vga=0x316 video=vesafb:ywrap,mtrr,1024x768-16@60 splash=silent,theme:emerge-world CONSOLE=/dev/tty1 acpi_sleep=s3_bios lapic hpet=force highres=off
*** Bug 13578 has been marked as a duplicate of this bug. ***
Hi, Cedric Will you please attach the output of dmesg after resuming? Thanks.
Created attachment 22342 [details] dmesg after resuming sorry for the late answer (holiday period). Is this enough for you or do I have to increment the dmesg log size ? PS:I had to reboot on a 2.6.30-rc8 kernel as the 2.6.31-rc serie does not resume :-( I'll open a bug for that later (trying to bisect the problem now)
Created attachment 22427 [details] Warn if interrupts are enabled during suspend-resume of sysdevs Will you please try the debug patch from Rafael on the 2.6.30-rc4 kernel and attach the output of dmesg after suspend/resume? Thanks.
Created attachment 22430 [details] dmesg after resuming and after applying Rafael patch Patch applied and here is the result in attach.Note that now I have more than one warning. Is this expected ?
Created attachment 22434 [details] add the debug info to identify where it is enabled in cpufreq_suspend Hi, Cderic From the log in comment #7 it seems that the irq is enabled spontaneously in course of suspend. >WARNING: at drivers/base/sys.c:405 sysdev_suspend+0x201/0x2a1() >Hardware name: HP Compaq nc6000 (DU358S#UUG) >Interrupts enabled after cpufreq_suspend+0x0/0x156 Then it will cause that the irq is also enabled in course of resume. Will you please try the debug patch and attach the output of dmesg after suspend/resume? Thanks.
Created attachment 22439 [details] dmesg after resuming showing irq_disabled Hello, here is the result of the suspend/resume cycle with your patch. Relevant part (?) ACPI: Preparing to enter system sleep state S3 cpufreq driver name is acpi-cpufreq, irq disabled 1 irq_disabled 1 in 1247 irq_disabled 1 in 1266 after driver suspend irq_disabled 0 in 1279 after driver get irq_disabled 0 in 1301 before cpu_put irq_disabled 0 in 1304 after cpu_put Back to C!
Will you please attach the output of lspci -vxx, /proc/cpuinfo? Thanks.
Hi, Cedric From the test log it seems that your box is working on the UP kernel. And the irq is enabled while calling the function of cpufreq_driver->get. Will you please compile as SMP kernel and see whether it still complain that the hres_timer_resume is called with irq enabled? Thanks.
Created attachment 22462 [details] Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume
Hi, Cedric Will you please try the debug patch and see whether the issue still exists? Thanks.
Created attachment 22469 [details] lspci -vxx
Created attachment 22470 [details] /proc/cpuinfo
Hello, Don't know if you still need the attachements, but posted in case of. The patch doesn't apply on .30-rc4; do i need to test it on HEAD ? cedric@enea ~/devel/repos/git/linux-2.6 $ git apply patch-linux3.diff error: patch failed: arch/x86/kernel/cpu/cpufreq/powernow-k8.c:1388 error: arch/x86/kernel/cpu/cpufreq/powernow-k8.c: patch does not apply error: patch failed: arch/x86/kernel/cpu/cpufreq/speedstep-ich.c:246 error: arch/x86/kernel/cpu/cpufreq/speedstep-ich.c: patch does not apply i can't hand apply them on "arch/x86/kernel/cpu/cpufreq/speedstep-ich.c" and "arch/x86/kernel/cpu/cpufreq/powernow-k8.c" (too much diff between .30 and .31 and not enough skills from my part) I'll try the SMP compilation for now.
It will be better that you can try it on the latest kernel(2.6.31-rc3/4). thanks.
As stated in comment #5, the .31-rc serie doesn't resume here but will try it however. btw with the smp kernel, I don't have this warning anymore but another one :-) ... ehci_hcd 0000:00:1d.7: LATE suspend uhci_hcd 0000:00:1d.2: LATE suspend uhci_hcd 0000:00:1d.1: LATE suspend uhci_hcd 0000:00:1d.0: LATE suspend pci 0000:00:01.0: LATE suspend agpgart-intel 0000:00:00.0: LATE suspend ACPI: Preparing to enter system sleep state S3 Disabling non-boot CPUs ... ------------[ cut here ]------------ WARNING: at kernel/smp.c:289 smp_call_function_single+0x87/0xfe() Hardware name: HP Compaq nc6000 (DU358S#UUG) Modules linked in: nfs lockd nfs_acl sunrpc radeon drm adm1031 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss acpi_cpufreq pcmcia arc4 ecb cryptomgr aead pcompress crypto_blkcipher crypto_hash usbhid crypto_algapi snd_intel8x0 snd_ac97_codec ac97_bus sr_mod snd_pcm ath5k snd_timer mac80211 cdrom yenta_socket snd led_class uhci_hcd rtc_cmos ehci_hcd rsrc_nonstatic tg3 usbcore snd_page_alloc i2c_i801 cfg80211 rtc_core libphy pcmcia_core rtc_lib ac battery Pid: 3875, comm: pm-suspend Not tainted 2.6.30-rc4 #1 Call Trace: [<c033564d>] ? printk+0x18/0x1b [<c0124c53>] warn_slowpath+0x75/0xc4 [<c011c929>] ? put_prev_task_fair+0xd0/0xe3 [<c0335976>] ? __schedule+0x22a/0x756 [<c0144d95>] smp_call_function_single+0x87/0xfe [<f884a507>] ? do_drv_read+0x0/0x30 [acpi_cpufreq] [<f884a2c2>] get_cur_val+0x99/0xd6 [acpi_cpufreq] [<c01411c7>] ? tick_suspend_broadcast+0x2a/0x2d [<f884a36c>] get_cur_freq_on_cpu+0x6d/0xb7 [acpi_cpufreq] [<c029afc7>] cpufreq_suspend+0x99/0x149 [<c0140326>] ? clockevents_notify+0x21/0x65 [<c0260376>] sysdev_suspend+0x7d/0x1d0 [<c014e5f9>] ? stop_machine_destroy+0x3c/0x3e [<c01498b4>] suspend_devices_and_enter+0x17b/0x1cd [<c033564d>] ? printk+0x18/0x1b [<c0149a54>] enter_state+0x129/0x18c [<c0149b33>] state_store+0x7c/0xac [<c0149ab7>] ? state_store+0x0/0xac [<c01eb22b>] kobj_attr_store+0x20/0x27 [<c01b10be>] sysfs_write_file+0x97/0xe0 [<c017655f>] vfs_write+0x8a/0x110 [<c01b1027>] ? sysfs_write_file+0x0/0xe0 [<c017668e>] sys_write+0x3d/0x6b [<c0102dc8>] sysenter_do_call+0x12/0x26 ---[ end trace 2c0c876a0e5b6ebd ]--- Back to C! microcode: CPU0 updated from revision 0x7 to 0x7, date = 2004-11-09 ACPI: Waking up from system sleep state S3 agpgart-intel 0000:00:00.0: EARLY resume agpgart-intel 0000:00:00.0: restoring config space at offset 0x1 (was 0x900006, writing 0x20900006) pci 0000:00:01.0: EARLY resume ...
On Fri, 2009-07-24 at 09:31 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > WARNING: at kernel/smp.c:289 smp_call_function_single+0x87/0xfe() > Hardware name: HP Compaq nc6000 (DU358S#UUG) > Modules linked in: nfs lockd nfs_acl sunrpc radeon drm adm1031 snd_seq > snd_seq_device snd_pcm_oss snd_mixer_oss acpi_cpufreq pcmcia arc4 ecb > cryptomgr > aead pcompress crypto_blkcipher crypto_hash usbhid crypto_algapi snd_intel8x0 > snd_ac97_codec ac97_bus sr_mod snd_pcm ath5k snd_timer mac80211 cdrom > yenta_socket snd led_class uhci_hcd rtc_cmos ehci_hcd rsrc_nonstatic tg3 > usbcore snd_page_alloc i2c_i801 cfg80211 rtc_core libphy pcmcia_core rtc_lib > ac > battery > Pid: 3875, comm: pm-suspend Not tainted 2.6.30-rc4 #1 > Call Trace: > [<c033564d>] ? printk+0x18/0x1b > [<c0124c53>] warn_slowpath+0x75/0xc4 > [<c011c929>] ? put_prev_task_fair+0xd0/0xe3 > [<c0335976>] ? __schedule+0x22a/0x756 > [<c0144d95>] smp_call_function_single+0x87/0xfe > [<f884a507>] ? do_drv_read+0x0/0x30 [acpi_cpufreq] > [<f884a2c2>] get_cur_val+0x99/0xd6 [acpi_cpufreq] > [<c01411c7>] ? tick_suspend_broadcast+0x2a/0x2d > [<f884a36c>] get_cur_freq_on_cpu+0x6d/0xb7 [acpi_cpufreq] > [<c029afc7>] cpufreq_suspend+0x99/0x149 > [<c0140326>] ? clockevents_notify+0x21/0x65 > [<c0260376>] sysdev_suspend+0x7d/0x1d0 > [<c014e5f9>] ? stop_machine_destroy+0x3c/0x3e > [<c01498b4>] suspend_devices_and_enter+0x17b/0x1cd > [<c033564d>] ? printk+0x18/0x1b > [<c0149a54>] enter_state+0x129/0x18c > [<c0149b33>] state_store+0x7c/0xac > [<c0149ab7>] ? state_store+0x0/0xac > [<c01eb22b>] kobj_attr_store+0x20/0x27 > [<c01b10be>] sysfs_write_file+0x97/0xe0 > [<c017655f>] vfs_write+0x8a/0x110 > [<c01b1027>] ? sysfs_write_file+0x0/0xe0 > [<c017668e>] sys_write+0x3d/0x6b > [<c0102dc8>] sysenter_do_call+0x12/0x26 > ---[ end trace 2c0c876a0e5b6ebd ]--- Fun, so this acpi-cpufreq muck uses smp_call_function*() stuff from suspend hooks, which are ran with IRQs disabled. Now I suspect that in this case we're calling to the current cpu, why exactly its passing full cpu-masks around to indicate a single cpu is a mystery to me though. Also, cpufreq_suspend() looks very suspicious for calling an SRCU notifier chain from something that's supposed to run with IRQs disabled. I'm not seeing why we should do that whole get_freq() compare, notify change junk on suspend, resume should simply query the hardware when its getting back up and fix-up the state.
Reply-To: davej@redhat.com On Fri, Jul 24, 2009 at 12:05:48PM +0200, Peter Zijlstra wrote: > On Fri, 2009-07-24 at 09:31 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > > WARNING: at kernel/smp.c:289 smp_call_function_single+0x87/0xfe() > > Hardware name: HP Compaq nc6000 (DU358S#UUG) > > Modules linked in: nfs lockd nfs_acl sunrpc radeon drm adm1031 snd_seq > > snd_seq_device snd_pcm_oss snd_mixer_oss acpi_cpufreq pcmcia arc4 ecb cryptomgr > > aead pcompress crypto_blkcipher crypto_hash usbhid crypto_algapi snd_intel8x0 > > snd_ac97_codec ac97_bus sr_mod snd_pcm ath5k snd_timer mac80211 cdrom > > yenta_socket snd led_class uhci_hcd rtc_cmos ehci_hcd rsrc_nonstatic tg3 > > usbcore snd_page_alloc i2c_i801 cfg80211 rtc_core libphy pcmcia_core rtc_lib ac > > battery > > Pid: 3875, comm: pm-suspend Not tainted 2.6.30-rc4 #1 > > Call Trace: > > [<c033564d>] ? printk+0x18/0x1b > > [<c0124c53>] warn_slowpath+0x75/0xc4 > > [<c011c929>] ? put_prev_task_fair+0xd0/0xe3 > > [<c0335976>] ? __schedule+0x22a/0x756 > > [<c0144d95>] smp_call_function_single+0x87/0xfe > > [<f884a507>] ? do_drv_read+0x0/0x30 [acpi_cpufreq] > > [<f884a2c2>] get_cur_val+0x99/0xd6 [acpi_cpufreq] > > [<c01411c7>] ? tick_suspend_broadcast+0x2a/0x2d > > [<f884a36c>] get_cur_freq_on_cpu+0x6d/0xb7 [acpi_cpufreq] > > [<c029afc7>] cpufreq_suspend+0x99/0x149 > > [<c0140326>] ? clockevents_notify+0x21/0x65 > > [<c0260376>] sysdev_suspend+0x7d/0x1d0 > > [<c014e5f9>] ? stop_machine_destroy+0x3c/0x3e > > [<c01498b4>] suspend_devices_and_enter+0x17b/0x1cd > > [<c033564d>] ? printk+0x18/0x1b > > [<c0149a54>] enter_state+0x129/0x18c > > [<c0149b33>] state_store+0x7c/0xac > > [<c0149ab7>] ? state_store+0x0/0xac > > [<c01eb22b>] kobj_attr_store+0x20/0x27 > > [<c01b10be>] sysfs_write_file+0x97/0xe0 > > [<c017655f>] vfs_write+0x8a/0x110 > > [<c01b1027>] ? sysfs_write_file+0x0/0xe0 > > [<c017668e>] sys_write+0x3d/0x6b > > [<c0102dc8>] sysenter_do_call+0x12/0x26 > > ---[ end trace 2c0c876a0e5b6ebd ]--- > > Fun, so this acpi-cpufreq muck uses smp_call_function*() stuff from > suspend hooks, which are ran with IRQs disabled. > > Now I suspect that in this case we're calling to the current cpu, why > exactly its passing full cpu-masks around to indicate a single cpu is a > mystery to me though. > > Also, cpufreq_suspend() looks very suspicious for calling an SRCU > notifier chain from something that's supposed to run with IRQs disabled. > > I'm not seeing why we should do that whole get_freq() compare, notify > change junk on suspend, resume should simply query the hardware when its > getting back up and fix-up the state. This came up a week or so ago. When I investigated where that code came from, I found it was added to work around a problem on ppc powerbooks. x86 doesn't need to be running any of that suspend/resume code at all afaics. Dave
Hello, I can't still resume with .31-rc4. What should I do next ? Is there a way to apply this patch on .30 ?
Created attachment 22500 [details] Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume
Hi, Cedric Will you please try the debug patch on the 2.6.30-rc4? As the intel cpu is used on this box, the powernow-k8 is removed from this patch. Thanks.
Hello, I've tested the patch and no more WARNING on UP config. Should I test on SMP too ?
If there is no warning message again, it is unnecessary to test it on SMP. thanks for your test.
From the test it seems that there is no warning message again after applying the patch in comment #22. I will try to push it. But I am sure whether it can hit the upstream kernel. As it is related with the cpufreq driver, it will be moved to power management/cpufreq category. Thanks.
Is this still a problem in a recent kernel?
It's great that kernel bugzilla is back. Cedric, can you please verify if the problem still exists in the latest upstream kernel?
After several sleep down/wake up loops I didn't see it anymore. I'm on 3.2.0 for the moment.
Okay. I'll close this bug for now as it is not reproducible. Please feel free to reopen it in case you can reproduce it again.