Bug 15364

Summary: Linux kernel prevents CPU cores from entering C3/C6 power savings states
Product: Process Management Reporter: Artem S. Tashkinov (aros)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: RESOLVED CODE_FIX    
Severity: high CC: acpi_power-processor, akpm, arjan, arjan, lenb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: All known Subsystem:
Regression: No Bisected commit-id:

Description Artem S. Tashkinov 2010-02-21 10:02:06 UTC
According to i7z utility (http://code.google.com/p/i7z/) CPU cores of Intel Core i5 enter C6 state only 1% of the time because Linux kernel runs tasks on all cores even though a system idles.

From i7z output:

...

Current Freqs
True Frequency 1241.64 MHz (Intel specifies largest of below to be running Freq)
        Processor  :Actual Freq (Mult.)  C0%   Halt(C1)%  C3 %   C6 %
        Processor 1:  1241.60 (9.00x)   3.63    3.46    94.2       1
        Processor 2:  1241.64 (9.00x)   1.94    98.3       0       1

...

I have a firm belief that power savings could be improved very much if Linux kernel tasks scheduler didn't revolve tasks on all available cores.
Comment 1 Arjan van de Ven 2010-02-24 23:07:49 UTC
the scheduler comment is a red herring; the scheduler does not impact C state selection at all.

Can we get dmesg ouput as well as powertop -d output for this machine?
We've seen this kind of thing before where the BIOS is giving incorrect information to the OS, or a case where Linux was a bit picky about doing correctness checks.
Comment 2 Artem S. Tashkinov 2010-02-25 10:47:01 UTC
powertop
PowerTOP 1.11   (C) 2007, 2008 Intel Corporation

Collecting data for 5 seconds


Your CPU supports the following C-states : C1 C2 C3
Your BIOS reports the following C-states : C1 C2 C3
_______________________________________________________________________________

     PowerTOP version 1.11      (C) 2007 Intel Corporation

Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 1.7%)         3.21 Ghz    98.3%
polling           0.0ms ( 0.0%)         3.21 Ghz     0.0%
C1 mwait          0.0ms ( 0.0%)         3.07 Ghz     0.0%
C2 mwait          8.4ms (92.8%)         2.81 Ghz     0.8%
C3 mwait          3.4ms ( 5.6%)         2.27 Ghz     0.8%

Wakeups-from-idle per second : 127.9    interval: 15.0s
no ACPI power usage estimate available

Top causes for wakeups:
  81.5% (181.7)     <kernel core> : hrtimer_start_range_ns (tick_sched_timer)
   4.6% ( 10.2)         knetstats : hrtimer_start_range_ns (hrtimer_wakeup)
   2.9% (  6.4)             artsd : hrtimer_start_range_ns (hrtimer_wakeup)
   1.9% (  4.2)     <kernel core> : hrtimer_start (tick_sched_timer)
   1.3% (  3.0)            kicker : hrtimer_start_range_ns (hrtimer_wakeup)
   1.0% (  2.1)           gkrellm : hrtimer_start_range_ns (hrtimer_wakeup)
   0.9% (  2.0)     <kernel core> : clocksource_watchdog (clocksource_watchdog)
   0.6% (  1.3)      <kernel IPI> : TLB shootdowns
   0.4% (  1.0)       <interrupt> : nvidia
   0.4% (  1.0)              kwin : hrtimer_start_range_ns (hrtimer_wakeup)
   0.4% (  1.0)          kwrapper : hrtimer_start_range_ns (hrtimer_wakeup)
   0.4% (  1.0)           klipper : hrtimer_start_range_ns (hrtimer_wakeup)
   0.4% (  1.0)             artsd : hrtimer_start (it_real_fn)
   0.4% (  1.0)     <kernel core> : nv_kern_rc_timer (nv_kern_rc_timer)
   0.4% (  1.0)   vmware-usbarbit : hrtimer_start_range_ns (hrtimer_wakeup)
   0.3% (  0.7)     <kernel core> : dev_watchdog (dev_watchdog)
   0.3% (  0.6)       <interrupt> : eth1
   0.3% (  0.6)          kdesktop : hrtimer_start_range_ns (hrtimer_wakeup)
   0.2% (  0.5)     <kernel core> : e100_watchdog (e100_watchdog)
   0.2% (  0.4)      <kernel IPI> : Function call interrupts
   0.2% (  0.4)           knotify : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.2)       <interrupt> : hda_intel
   0.1% (  0.2)              kded : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.2)       bdi-default : bdi_forker_task (process_timeout)
   0.1% (  0.2)         flush-8:0 : schedule_timeout_interruptible (process_timeout)
   0.1% (  0.2)     <kernel core> : __enqueue_rt_entity (sched_rt_period_timer)
   0.1% (  0.1)              lisa : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.1)         klauncher : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.1)         kkbswitch : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.1)     <kernel core> : arm_supers_timer (sync_supers_timer_fn)
   0.0% (  0.1)      <kernel IPI> : Rescheduling interrupts
   0.0% (  0.1)         hd-audio1 : schedule_timeout_uninterruptible (process_timeout)
   0.0% (  0.1)              hald : hrtimer_start_range_ns (hrtimer_wakeup)
   0.0% (  0.1)             pppoe : queue_delayed_work (delayed_work_timer_fn)
   0.0% (  0.1)              pppd : queue_delayed_work (delayed_work_timer_fn)
Comment 3 Artem S. Tashkinov 2010-02-25 10:51:00 UTC
Oops, I've forgotten about -d flag.

Here's the right output:

[root@localhost ~]# powertop -d
PowerTOP 1.11   (C) 2007, 2008 Intel Corporation

Collecting data for 15 seconds


Your CPU supports the following C-states : C1 C2 C3
Your BIOS reports the following C-states : C1 C2 C3
Cn                Avg residency
C0 (cpu running)        ( 0.6%)
polling           0.0ms ( 0.0%)
C1 mwait          0.0ms ( 0.0%)
C2 mwait          8.4ms (94.6%)
C3 mwait          2.9ms ( 4.8%)
P-states (frequencies)
  3.21 Ghz    46.7%
  2.81 Ghz    13.3%
  2.67 Ghz    13.3%
  2.54 Ghz     6.7%
  1.60 Ghz    13.3%
Wakeups-from-idle per second : 130.0    interval: 15.0s
no ACPI power usage estimate available
Top causes for wakeups:
  75.2% (150.5)     <kernel core> : hrtimer_start_range_ns (tick_sched_timer)
   5.1% ( 10.3)         knetstats : hrtimer_start_range_ns (hrtimer_wakeup)
   3.2% (  6.5)             artsd : hrtimer_start_range_ns (hrtimer_wakeup)
   2.8% (  5.5)       firefox-bin : hrtimer_start_range_ns (hrtimer_wakeup)
   2.2% (  4.3)     <kernel core> : hrtimer_start (tick_sched_timer)
   1.5% (  3.0)            kicker : hrtimer_start_range_ns (hrtimer_wakeup)
   1.0% (  2.1)           gkrellm : hrtimer_start_range_ns (hrtimer_wakeup)
   1.0% (  2.0)     <kernel core> : clocksource_watchdog (clocksource_watchdog)
   0.9% (  1.8)       <interrupt> : eth1
   0.7% (  1.5)      <kernel IPI> : TLB shootdowns
   0.6% (  1.1)             pppoe : queue_delayed_work (delayed_work_timer_fn)
   0.5% (  1.0)       <interrupt> : nvidia
   0.5% (  1.0)     <kernel core> : nv_kern_rc_timer (nv_kern_rc_timer)
   0.5% (  1.0)   vmware-usbarbit : hrtimer_start_range_ns (hrtimer_wakeup)
   0.5% (  1.0)          kwrapper : hrtimer_start_range_ns (hrtimer_wakeup)
   0.5% (  1.0)           klipper : hrtimer_start_range_ns (hrtimer_wakeup)
   0.5% (  1.0)             artsd : hrtimer_start (it_real_fn)
   0.5% (  1.0)              kwin : hrtimer_start_range_ns (hrtimer_wakeup)
   0.3% (  0.7)     <kernel core> : dev_watchdog (dev_watchdog)
   0.3% (  0.6)          kdesktop : hrtimer_start_range_ns (hrtimer_wakeup)
   0.3% (  0.5)     <kernel core> : e100_watchdog (e100_watchdog)
   0.2% (  0.4)      <kernel IPI> : Function call interrupts
   0.2% (  0.4)           knotify : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.3)     <kernel core> : __enqueue_rt_entity (sched_rt_period_timer)
   0.1% (  0.2)     <kernel core> : arm_supers_timer (sync_supers_timer_fn)
   0.1% (  0.2)              kded : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.1)              lisa : hrtimer_start_range_ns (hrtimer_wakeup)
   0.1% (  0.1)       bdi-default : bdi_forker_task (process_timeout)
   0.1% (  0.1)         flush-8:0 : schedule_timeout_interruptible (process_timeout)
   0.0% (  0.1)       <interrupt> : PS/2 keyboard/mouse/touchpad
   0.0% (  0.1)       <interrupt> : ata_piix
   0.0% (  0.1)      <kernel IPI> : Rescheduling interrupts
   0.0% (  0.1)           konsole : hrtimer_start_range_ns (hrtimer_wakeup)
   0.0% (  0.1)                 X : hrtimer_start_range_ns (hrtimer_wakeup)
   0.0% (  0.1)         flush-8:0 : add_timer (commit_timeout)
   0.0% (  0.1)     <kernel core> : cfq_completed_request (cfq_idle_slice_timer)
   0.0% (  0.1)         flush-8:0 : blk_add_timer (blk_rq_timed_out_timer)
   0.0% (  0.1)              pppd : queue_delayed_work (delayed_work_timer_fn)
   0.0% (  0.1)          events/0 : queue_delayed_work (delayed_work_timer_fn)
   0.0% (  0.1)         klauncher : hrtimer_start_range_ns (hrtimer_wakeup)
   0.0% (  0.1)         kkbswitch : hrtimer_start_range_ns (hrtimer_wakeup)

A USB device is active 100.0% of the time:
/sys/bus/usb/devices/2-1

Suggestion: Enable USB autosuspend by pressing the U key or adding
usbcore.autosuspend=1 to the kernel command line in the grub config

Suggestion: increase the VM dirty writeback time from 5.00 to 15 seconds with:
  echo 1500 > /proc/sys/vm/dirty_writeback_centisecs
This wakes the disk up less frequently for background VM activity

Suggestion: enable the power aware CPU scheduler with the following command:
  echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
or by pressing the C key.

Suggestion: Enable the CONFIG_ACPI_BATTERY kernel configuration option.
 This option is required to get power estimages from PowerTOP

Recent USB suspend statistics
Active  Device name
100.0%  USB device 2-1.6 : USB Gaming Mouse (Logitech)
100.0%  /sys/bus/usb/devices/2-1
  0.0%  /sys/bus/usb/devices/1-1
100.0%  USB device usb2 : EHCI Host Controller (Linux 2.6.33-icd ehci_hcd)
  0.0%  USB device usb1 : EHCI Host Controller (Linux 2.6.33-icd ehci_hcd)
Comment 4 Artem S. Tashkinov 2010-11-26 11:25:04 UTC
It seems to be fixed in 2.6.36.x (or even in earlier versions I haven't tested), closing then.