Bug 118751 - Laptop CPU cores stuck at minimal 800MHz on AC
Summary: Laptop CPU cores stuck at minimal 800MHz on AC
Status: CLOSED OBSOLETE
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Chen Yu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-23 10:23 UTC by Marcin Nowak
Modified: 2016-12-22 01:51 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.6.0-1
Tree: Mainline
Regression: No


Attachments
Kernel config (43.53 KB, application/gzip)
2016-05-31 07:11 UTC, Marcin Nowak
Details
trace output when stuck (1.02 MB, text/plain)
2016-05-31 07:23 UTC, Marcin Nowak
Details
trace output after cpu unlock (4.53 MB, text/plain)
2016-05-31 07:24 UTC, Marcin Nowak
Details
stress+stuck trace (1.21 MB, text/x-log)
2016-05-31 07:41 UTC, Marcin Nowak
Details
stress+unstuck trace (702.42 KB, text/x-log)
2016-05-31 07:42 UTC, Marcin Nowak
Details
Turbostat started when cpu was stuck, then I've reconnected AC during capture (9.50 KB, text/x-log)
2016-05-31 12:52 UTC, Marcin Nowak
Details
Turbostat when unstuck (5.57 KB, text/x-log)
2016-05-31 12:52 UTC, Marcin Nowak
Details
turbostart (torvalds/linux@master) results when stuck (5.61 KB, text/x-log)
2016-05-31 13:20 UTC, Marcin Nowak
Details
turbostart (torvalds/linux@master) results when unstuck (6.39 KB, text/x-log)
2016-05-31 13:21 UTC, Marcin Nowak
Details
Stuck: sleep+stress (1.13 MB, text/x-log)
2016-06-02 08:07 UTC, Marcin Nowak
Details
Unstuck: sleep+stress trace (1001.42 KB, text/x-log)
2016-06-02 08:08 UTC, Marcin Nowak
Details

Description Marcin Nowak 2016-05-23 10:23:47 UTC
When connecting AC, the CPU slows down to minimum:

]$ cat /proc/cpuinfo | grep MHz
cpu MHz         : 799.945
cpu MHz         : 799.945
cpu MHz         : 799.945
cpu MHz         : 799.945

I'm trying to stress CPUs by `stress -c 4` command - still stuck at minimal.

But when I'm disconnecting the power, the CPU immediately unlocks:

cpu MHz         : 2499.960
cpu MHz         : 2499.960
cpu MHz         : 2499.960
cpu MHz         : 2499.960

When connecting AC again, just after few seconds CPU clocks degrades to 799.945 (stress cmd is still working) 

Sometimes I can "force" kernel to work properly on AC by some magical (quite fast) disconnecting and reconnecting AC power. 

It's a huge problem for 4.x kernel versions and I'm having troubles for months. I'm pretty sure that some of first releases of 4.x was free from this issue (my hw config was not changed).


I've completely removed TLP from my OS to be sure that is the kernel problem.


$ uname -a
Linux draco 4.6.0-1-MANJARO #1 SMP PREEMPT Mon May 16 02:44:59 UTC 2016 x86_64 GNU/Linux


Part of dmidecode output:

BIOS Information
	Vendor: Dell Inc.
	Version: A05
	Release Date: 01/03/2013
	Address: 0xE0000
	Runtime Size: 128 kB
	ROM Size: 4608 kB
	Characteristics:
		PCI is supported
		PNP is supported
		BIOS is upgradeable
		BIOS shadowing is allowed
		Boot from CD is supported
		Selectable boot is supported
		EDD is supported
		Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
		Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
		5.25"/360 kB floppy services are supported (int 13h)
		5.25"/1.2 MB floppy services are supported (int 13h)
		3.5"/720 kB floppy services are supported (int 13h)
		3.5"/2.88 MB floppy services are supported (int 13h)
		Print screen service is supported (int 5h)
		8042 keyboard services are supported (int 9h)
		Serial services are supported (int 14h)
		Printer services are supported (int 17h)
		CGA/mono video services are supported (int 10h)
		ACPI is supported
		USB legacy is supported
		Smart battery is supported
		BIOS boot specification is supported
		Function key-initiated network boot is supported
		Targeted content distribution is supported
		UEFI is supported
	Firmware Revision: 1.1

Handle 0x0001, DMI type 1, 27 bytes
System Information
	Manufacturer: Dell Inc.
	Product Name: Inspiron 5521
	Version: A05
	Serial Number: [...]
	UUID: [...]
	Wake-up Type: Power Switch


$ grep "model name" /proc/cpuinfo 
model name	: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz
model name	: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz
model name	: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz
model name	: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz
Comment 1 Marcin Nowak 2016-05-29 11:48:31 UTC
Sometimes battery isn't charging, even upower prints something else: 

$ upower -i /org/freedesktop/UPower/devices/battery_BAT1
  native-path:          BAT1
  vendor:               SANYO
  model:                DELL 6HY59349
  serial:               09CE
  power supply:         yes
  updated:              nie, 29 maj 2016, 13:41:00 (20 seconds ago)
  has history:          yes
  has statistics:       yes
  battery
    present:             yes
    rechargeable:        yes
    state:               charging
    warning-level:       none
    energy:              9,3684 Wh
    energy-empty:        0 Wh
    energy-full:         46,4868 Wh
    energy-full-design:  66,6 Wh
    energy-rate:         13,0425 W
    voltage:             11,316 V
    time to full:        2,8 hours
    percentage:          20%
    capacity:            69,8%
    icon-name:          'battery-low-charging-symbolic'
  History (rate):
    1464522055	13,043	charging

The percentage is stuck at 20% for almost an hour.
The CPU is also stuck at 800MHz.


When I do some reconnectings of AC power, the CPU unlocks and laptop is charging. 

AC reconnection sequence:

  History (rate):
    1464522339	12,210	unknown
    1464522338	13,575	discharging
    1464522326	16,139	charging
    1464522325	11,844	discharging
    1464522323	15,207	discharging
    1464522322	12,354	discharging
    1464522314	2,808	charging
    1464522313	11,400	charging
    1464522312	11,566	charging
    1464522311	12,354	charging
    1464522309	14,096	discharging
    1464522302	15,107	charging
    1464522293	17,778	discharging
    1464522282	5,594	charging
    1464522281	23,044	discharging
    1464522274	13,942	charging

Upower BAT after unlock:

$ upower -i /org/freedesktop/UPower/devices/battery_BAT1
  native-path:          BAT1
  vendor:               SANYO
  model:                DELL 6HY59349
  serial:               09CE
  power supply:         yes
  updated:              nie, 29 maj 2016, 13:47:40 (5 seconds ago)
  has history:          yes
  has statistics:       yes
  battery
    present:             yes
    rechargeable:        yes
    state:               charging
    warning-level:       none
    energy:              10,3119 Wh
    energy-empty:        0 Wh
    energy-full:         46,4868 Wh
    energy-full-design:  66,6 Wh
    energy-rate:         23,0325 W
    voltage:             11,728 V
    time to full:        1,6 hours
    percentage:          22%
    capacity:            69,8%
    icon-name:          'battery-low-charging-symbolic'
  History (charge):
    1464522460	22,000	charging
  History (rate):
    1464522460	23,033	charging
Comment 2 Chen Yu 2016-05-31 05:59:55 UTC
Intersting...
could you attach your kernel config?

sudo grep . /sys/devices/system/cpu/cpu1/cpufreq/*
sudo grep . /sys/devices/system/cpu/intel_pstate/

and 
# cd /sys/kernel/debug/tracing/
# echo 1 > events/power/pstate_sample/enable
# echo 1 > events/power/cpu_frequency/enable
# cat trace // also attach here
Comment 3 Chen Yu 2016-05-31 06:01:23 UTC
(In reply to Chen Yu from comment #2)
> Intersting...
> could you attach your kernel config?
> 
> sudo grep . /sys/devices/system/cpu/cpu1/cpufreq/*
> sudo grep . /sys/devices/system/cpu/intel_pstate/
> 
> and 
> # cd /sys/kernel/debug/tracing/
> # echo 1 > events/power/pstate_sample/enable
> # echo 1 > events/power/cpu_frequency/enable
> # cat trace // also attach here

sudo grep . /sys/devices/system/cpu/intel_pstate/*
and try to provide 'cat trace' before/after reproduced
Comment 4 Marcin Nowak 2016-05-31 07:11:27 UTC
Created attachment 218311 [details]
Kernel config
Comment 5 Marcin Nowak 2016-05-31 07:22:12 UTC
Thank you for concern.

When stuck:

$ sudo grep . /sys/devices/system/cpu/cpu1/cpufreq/*

/sys/devices/system/cpu/cpu1/cpufreq/affected_cpus:1
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq:799945
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq:2700000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_min_freq:800000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_transition_latency:4294967295
/sys/devices/system/cpu/cpu1/cpufreq/related_cpus:1
/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors:performance powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:799945
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:2700000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:800000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed:<unsupported>

$ sudo grep . /sys/devices/system/cpu/intel_pstate/*

/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:29
/sys/devices/system/cpu/intel_pstate/no_turbo:0
/sys/devices/system/cpu/intel_pstate/num_pstates:20
/sys/devices/system/cpu/intel_pstate/turbo_pct:50


When unstuck:

$ sudo grep . /sys/devices/system/cpu/cpu1/cpufreq/*

/sys/devices/system/cpu/cpu1/cpufreq/affected_cpus:1
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq:864984
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq:2700000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_min_freq:800000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_transition_latency:4294967295
/sys/devices/system/cpu/cpu1/cpufreq/related_cpus:1
/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors:performance powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:1003710
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:2700000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:800000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed:<unsupported>


$ sudo grep . /sys/devices/system/cpu/intel_pstate/*

/sys/devices/system/cpu/intel_pstate/max_perf_pct:100
/sys/devices/system/cpu/intel_pstate/min_perf_pct:29
/sys/devices/system/cpu/intel_pstate/no_turbo:0
/sys/devices/system/cpu/intel_pstate/num_pstates:20
/sys/devices/system/cpu/intel_pstate/turbo_pct:50
Comment 6 Marcin Nowak 2016-05-31 07:23:03 UTC
Created attachment 218321 [details]
trace output when stuck
Comment 7 Marcin Nowak 2016-05-31 07:24:33 UTC
Created attachment 218331 [details]
trace output after cpu unlock
Comment 8 Chen Yu 2016-05-31 07:30:43 UTC
please run 'stress -c 4' and do the same testing. I saw most the trace data are sampled when in idle.
Comment 9 Marcin Nowak 2016-05-31 07:34:53 UTC
should I clean trace somehow? it outputs 16M file for now...
Comment 10 Chen Yu 2016-05-31 07:37:23 UTC
(In reply to Marcin Nowak from comment #9)
> should I clean trace somehow? it outputs 16M file for now...

Yes, please echo NULL > trace to clear it.

stuck:
stress -c 4
echo NULL > trace
cat trace > stuck.log

unstuck:
stress -c 4
echo NULL > trace
cat trace > unstuck.log
Comment 11 Marcin Nowak 2016-05-31 07:41:49 UTC
Created attachment 218341 [details]
stress+stuck trace
Comment 12 Marcin Nowak 2016-05-31 07:42:21 UTC
Created attachment 218351 [details]
stress+unstuck trace
Comment 13 Chen Yu 2016-05-31 10:08:36 UTC
please provide the following info both when stunk/unstunk:

rdmsr 0x774 -p 0
rdmsr 0x774 -p 1
rdmsr 0x774 -p 2
rdmsr 0x774 -p 3
rdmsr 0x19c -p 0
rdmsr 0x19c -p 1
rdmsr 0x19c -p 2
rdmsr 0x19c -p 3


use latest turbostat (cd linux/tools/power/x86/turbostat/;make)
turbostat --debug -i 10




Is it reproduced if boot with 'intel_pstate=disable' ?
Comment 14 Chen Yu 2016-05-31 10:14:22 UTC
plus:
rdmsr 0x19a -p 0
rdmsr 0x19a -p 1
rdmsr 0x19a -p 2
rdmsr 0x19a -p 3
Comment 15 Marcin Nowak 2016-05-31 12:24:38 UTC
Unstuck:

# rdmsr 0x774 -p 0
rdmsr: CPU 0 cannot read MSR 0x00000774
# rdmsr 0x774 -p 1
rdmsr: CPU 1 cannot read MSR 0x00000774
# rdmsr 0x774 -p 2
rdmsr: CPU 2 cannot read MSR 0x00000774
# rdmsr 0x774 -p 3
rdmsr: CPU 3 cannot read MSR 0x00000774
# rdmsr 0x19c -p 0
88360008
# rdmsr 0x19c -p 1
88370008
# rdmsr 0x19c -p 2
88380008
# rdmsr 0x19c -p 3
88390008
# rdmsr 0x19a -p 0
8
# rdmsr 0x19a -p 1
8
# rdmsr 0x19a -p 2
8
# rdmsr 0x19a -p 3
8

Stuck:

# rdmsr 0x774 -p 0
rdmsr: CPU 0 cannot read MSR 0x00000774
# rdmsr 0x774 -p 1
rdmsr: CPU 1 cannot read MSR 0x00000774
# rdmsr 0x774 -p 2
rdmsr: CPU 2 cannot read MSR 0x00000774
# rdmsr 0x774 -p 3
rdmsr: CPU 3 cannot read MSR 0x00000774
# rdmsr 0x19c -p 0
8833000c
# rdmsr 0x19c -p 1
8834000c
# rdmsr 0x19c -p 2
8833000c
# rdmsr 0x19c -p 3
8833000c
# rdmsr 0x19a -p 0
8
# rdmsr 0x19a -p 1
8
# rdmsr 0x19a -p 2
8
# rdmsr 0x19a -p 3
8


Give me some time to test latest turbostat and intel_pstate disabled. I'll try to deliver results today.
Comment 16 Marcin Nowak 2016-05-31 12:50:55 UTC
Until I get newest turbostat I'll post results from v4.12 5 Apr 2016
Comment 17 Marcin Nowak 2016-05-31 12:52:17 UTC
Created attachment 218381 [details]
Turbostat started when cpu was stuck, then I've reconnected AC during capture
Comment 18 Marcin Nowak 2016-05-31 12:52:52 UTC
Created attachment 218391 [details]
Turbostat when unstuck
Comment 19 Marcin Nowak 2016-05-31 13:20:54 UTC
Created attachment 218401 [details]
turbostart (torvalds/linux@master) results when stuck
Comment 20 Marcin Nowak 2016-05-31 13:21:16 UTC
Created attachment 218411 [details]
turbostart (torvalds/linux@master) results when unstuck
Comment 21 Marcin Nowak 2016-05-31 14:08:41 UTC
After a few tests I can't reproduce the bug when intel_pstate is disabled (cpu scales well, just the max is limited to 1801.000 MHz) 

$ sudo grep . /sys/devices/system/cpu/cpu1/cpufreq/*
/sys/devices/system/cpu/cpu1/cpufreq/affected_cpus:1
/sys/devices/system/cpu/cpu1/cpufreq/bios_limit:1801000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq:1801000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq:1801000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_min_freq:774000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_transition_latency:10000
/sys/devices/system/cpu/cpu1/cpufreq/freqdomain_cpus:0 1 2 3
/sys/devices/system/cpu/cpu1/cpufreq/related_cpus:1
/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_frequencies:1801000 1800000 1700000 1600000 1500000 1400000 1300000 1200000 1100000 1000000 900000 800000 774000 
/sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors:ondemand performance 
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:1801000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:acpi-cpufreq
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:ondemand
/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:1801000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:774000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed:<unsupported>

I'll work next days without intel_pstate to be sure for 1000% (but it looks that the issue is related to intel_pstate).
Comment 22 Marcin Nowak 2016-06-01 14:39:06 UTC
Today is a second day without intel_pstate - still can't reproduce the issue.

Should I check/test something else?
Comment 23 Chen Yu 2016-06-02 07:39:54 UTC
acpi-cpufreq works well because it does not rely on the feedback of
aperf/mperf as intel_pstate does.

BTW, msr 19c:bit10~11 are not set, this means the rapl does not restrict the freq.

According to the trace log at #Comment 11, the increment of aperf is very small compared to unstuck:
stress-21785 [000] d.h2 195679.724800: pstate_sample: core_busy=44 scaled=88 from=9 to=8 mperf=17959383 aperf=7981962 tsc=17959365 freq=799945 

could you adjust the test sequence to the following:

stuck:
cd /sys/kernel/debug/tracing/
echo NULL > trace
echo 1 > events/power/pstate_sample/enable
echo 1 > events/power/cpu_frequency/enable
sleep 5
stress -c 4
cat trace > stuck_all.log



unstuck:
cd /sys/kernel/debug/tracing/
echo NULL > trace
echo 1 > events/power/pstate_sample/enable
echo 1 > events/power/cpu_frequency/enable
sleep 5
stress -c 4
cat trace > unstuck_all.log
Comment 24 Chen Yu 2016-06-02 07:42:27 UTC
please stop 'stress' before each time you retest with plug/unplug AC, since I want to see the full process how the cpu freq raise up in unstuck situation.
Comment 25 Marcin Nowak 2016-06-02 08:06:12 UTC
> acpi-cpufreq works well because it does not rely on the feedback of
aperf/mperf as intel_pstate does.

I'm still thinking about the issue. The strange for me is that when working on battery the CPU is working at full speed, but when connecting AC - it slows down to minimum. Shouldn't be opposite? CPU scales well but sometimes is locked to minimum on AC.

To be clear - the term "stuck" means: on AC - 800MHz, on BATT - full speed. And "unstuck" means: on AC - full speed, after disconnecting AC - full speed.  And I need just one AC connect to get from "unstuck" into "stuck", but escaping from "stuck" into "unstuck" requires a few AC reconnects (sometime less, sometime more).

I'm attaching fresh logs... I've tried to do tests the way you've asked.
Comment 26 Marcin Nowak 2016-06-02 08:07:43 UTC
Created attachment 218771 [details]
Stuck: sleep+stress
Comment 27 Marcin Nowak 2016-06-02 08:08:12 UTC
Created attachment 218781 [details]
Unstuck: sleep+stress trace
Comment 28 Marcin Nowak 2016-06-02 08:25:40 UTC
I can also reproduce on Linux 3.18.34-1-MANJARO #1 SMP PREEMPT, but I can't remember which version was okay... 

Maybe root of the problem lies elsewhere?
Comment 29 Chen Yu 2016-06-02 11:35:55 UTC
Then this is not a regression, It's weird the increment of aperf is so small,besides Thermal monitor/Clock modulation, the other component might affect 
aperf is HDC, what's the value when stunk/unstunk:

rdmsr 0xdb0 -p 0
rdmsr 0xdb1 -p 0
rdmsr 0xdb2 -p 0
rdmsr 0x652 -p 0
rdmsr 0x653 -p 0
rdmsr 0x655 -p 0
rdmsr 0x656 -p 0


rdmsr 0xdb0 -p 1
rdmsr 0xdb1 -p 1
rdmsr 0xdb2 -p 1
rdmsr 0x652 -p 1
rdmsr 0x653 -p 1
rdmsr 0x655 -p 1
rdmsr 0x656 -p 1


rdmsr 0xdb0 -p 2
rdmsr 0xdb1 -p 2
rdmsr 0xdb2 -p 2
rdmsr 0x652 -p 2
rdmsr 0x653 -p 2
rdmsr 0x655 -p 2
rdmsr 0x656 -p 2


rdmsr 0xdb0 -p 3
rdmsr 0xdb1 -p 3
rdmsr 0xdb2 -p 3
rdmsr 0x652 -p 3
rdmsr 0x653 -p 3
rdmsr 0x655 -p 3
rdmsr 0x656 -p 3

Also + Srinivas and Rafael
Comment 30 Marcin Nowak 2016-06-02 11:46:30 UTC
For every call I've got:

rdmsr: CPU <number> cannot read MSR <hex>


(while stuck and unstuck)
Comment 31 Marcin Nowak 2016-06-02 12:54:17 UTC
I'm wondering about these failed reads. Maybe I should do tests on some fresh install of some distro booted from usb stick? Feel free to ask for anything, I'd like to help as I can...
Comment 32 Chen Yu 2016-06-02 14:19:24 UTC
(In reply to Marcin Nowak from comment #30)
> For every call I've got:
> 
> rdmsr: CPU <number> cannot read MSR <hex>
> 
> 
> (while stuck and unstuck)
I guess your cpu does not support HDC (Ivy Bridge), so HDC is not the cause of this problem.

I've no idea why the aperf increase so slow when AC plugged, @Rafael @Srinivas do you have any clue on this? thanks
Comment 33 Srinivas Pandruvada 2016-06-02 18:13:39 UTC
With AC connected and without connected, please provide dump of the following:

rdmsr 0xCE
rdmsr 0x1AD

rdmsr 0x610
rdmsr 0x614

rdmsr 0x648
rdmsr 0x649
rdmsr 0x64A
rdmsr 0x64B
rdmsr 0x64C
Comment 34 Marcin Nowak 2016-06-02 19:45:01 UTC
AC Connected / unstuck:
-----------------------

rdmsr 0xCE: 80813e0011200
rdmsr 0x1AD: 1919191b

rdmsr 0x610: 800080aa00dc8088
rdmsr 0x614: 88

rdmsr 0x648: 12
rdmsr 0x649: 80070
rdmsr 0x64A: 0
rdmsr 0x64B: 0
rdmsr 0x64C: 11


AC disconnected:
----------------

rdmsr 0xCE: 80813e0011200
rdmsr 0x1AD: 1919191b

rdmsr 0x610: 800080aa00dc8088
rdmsr 0x614: 88

rdmsr 0x648: 12
rdmsr 0x649: 80070
rdmsr 0x64A: 0
rdmsr 0x64B: 0
rdmsr 0x64C: 11


AC connected / stuck:
---------------------

rdmsr 0xCE: 80813e0011200
rdmsr 0x1AD: 1919191b

rdmsr 0x610: 800080aa00dc8088
rdmsr 0x614: 88

rdmsr 0x648: 12
rdmsr 0x649: 80070
rdmsr 0x64A: 0
rdmsr 0x64B: 0
rdmsr 0x64C: 11
Comment 35 Srinivas Pandruvada 2016-06-02 20:49:17 UTC
These values are not different for connected vs unconnected. So we are not TDP limited.
Comment 36 Srinivas Pandruvada 2016-06-02 20:49:30 UTC
These values are not different for connected vs unconnected. So we are not TDP limited.
Comment 37 Marcin Nowak 2016-06-17 10:38:45 UTC
I'm using currently Linux draco 4.6.2-1-MANJARO #1 SMP PREEMPT Wed Jun 8 11:00:08 UTC 2016 x86_64 GNU/Linux.

The problem still exists.

Today the issue is corelated with unability to charge battery, as mentioned in comment #1. And the going from stuck to unstuck is much harder (I can't "unlock" it for a 15 minutes for now...).  Still, when working on battery, the CPU scales well.

After some reconnects it looks like the battery started charging. 

* * *

I've tried booting from ArchLinux iso installer image, i.e. using very clean and lightweight linux image, and the issue exists, too. So it looks like it is not related to my installed OS. 


* * *

If I should deliver more info, please tell me what can I do.
Comment 38 Marcin Nowak 2016-06-17 10:59:50 UTC
Errata:   

The batt is not charging or charging very, very, very slowly. It's still 10% for a half an hour..
Comment 39 Doug Smythies 2016-06-21 15:37:10 UTC
Typically, Dell bios forces low CPU frequency when something is wrong with the power and / or it no longer recognizes the power adapter. However, such limiting should also apply when using the acpi-cpufreq CPU frequency driver.
Comment 40 Marcin Nowak 2016-06-21 15:40:44 UTC
Thank you, Doug. I'll replace power adapter.
Comment 41 Doug Smythies 2016-06-30 19:26:00 UTC
(In reply to Marcin Nowak from comment #21)
> After a few tests I can't reproduce the bug when intel_pstate is disabled
> (cpu scales well, just the max is limited to 1801.000 MHz) 

Please note that when you are using the acpi-cpufreq CPU frequency scaling driver, CPU frequencies listed are what were asked for and may not be what you are actually getting. Also, "1801" means somewhere in the turbo range, with no indication as to where. You have to use turbostat to know for sure. Below is an example from my computer, where max non-turbo = 3.4GHz, and a CPU brun program is running on CPU 7:

$ grep MHz /proc/cpuinfo
cpu MHz         : 1600.000
cpu MHz         : 1600.000
cpu MHz         : 1600.000
cpu MHz         : 1600.000
cpu MHz         : 1600.000
cpu MHz         : 1600.000
cpu MHz         : 1600.000
cpu MHz         : 3401.000

$ sudo turbostat sleep 10
CPUID(7): No-SGX
10.000828 sec
     CPU Avg_MHz   Busy% Bzy_MHz TSC_MHz
       -     472   12.53    3754    3411
       0       3    0.09    3631    3411
       4       1    0.02    3578    3411
       1       3    0.09    3585    3411
       5       1    0.04    3562    3411
       2       2    0.05    3573    3411
       6       1    0.03    3573    3411
       3       1    0.02    3737    3411
       7    3765   99.94    3754    3411

As a sanity check, I get my CPU burn programs to print something every so often, so that, once I have calibrated myself, I can calculate the approximate CPU frequency. Example:

doug@s15:~/c$ taskset -c 7 ./test1
Elapsed:    175.75 s.
Elapsed:    351.23 s.

Whereas using powersave mode to lock the CPU frequency at 1.6 GHz (for my processor), I get:

doug@s15:~/c$ taskset -c 7 ./test1
Elapsed:    409.56 s.

1600 MHz * 409.56 / 175.75 = 3729 MHz Therefore sanity check is O.K.

(In reply to Marcin Nowak from comment #40)
> Thank you, Doug. I'll replace power adapter.

I say again, both CPU scaling drivers should be affected, not just intel_pstate. So something still doesn't make sense.
Comment 42 Marcin Nowak 2016-07-19 12:58:11 UTC
I've found an info that it may be related to Intel SpeedStep. The issue is reproducible when Intel SpeedStep is enabled in my bios. When I disable it, all cpu cores are stuck at ~800MHz (on AC, on battery, with powersave and performance governors).

I don't know how it should work when SpeedStep is disabled. I read somewhere that some "crappy" bioses will disable turbo mode. I have that one, I think. Or I have a hardware failure, as somebody suggsted earlier.

In that case I'm going to buy a new laptop. Let's don't waste more time for it. Thank you for your help and patience.

Marcin
Comment 43 Dugi 2016-07-25 00:13:23 UTC
I am experiencing the same issue. Some discussion has taken place on: http://askubuntu.com/questions/802170/cpu-frequency-is-always-at-minimum-even-if-cpu-isage-is-100
A workaround is known there.

Do you need any experimental results, Doug Smythies?
Comment 44 Doug Smythies 2016-07-25 03:07:42 UTC
@Dugi: O.K. thanks for coming here and adding to this bug report. I'll copy and paste my last entry from our chat over on ask.ubuntu.com.

I do not see anything special in the pstate-frequency program, but might have missed something, so I think we should be able to use primitive commands to achieve the same result.

Using the intel_pstate CPU frequency driver, and with all CPU under full load, such that the problems exists, what do you get for:

cat /sys/devices/system/cpu/intel_pstate/turbo_pct
cat /sys/devices/system/cpu/intel_pstate/num_pstates
cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
cat /sys/devices/system/cpu/intel_pstate/no_turbo
sudo cat /sys/kernel/debug/pstate_snb/sample_rate_ms
sudo cat /sys/kernel/debug/pstate_snb/setpoint
sudo grep . /sys/devices/system/cpu/cpufreq/policy0/*

Anything you can observe in /sys/class/power_supply

@Chen Yu: Marcin's trace data suggests that something external to the intel_pstate driver is holding the pstate at minimum. However, I haven't been able to figure out what. Dugi looked at:

cat /sys/devices/system/cpu/cpufreq/policy0/bios_limit

while using the acpi-cpufreq driver, and that wasn't it (and that is what I am used to seeing Dell use).

Dugi also looked at what pstates were been asked for and what pstates are actually been given (MSRs 0x198 and 0x199), and it didn't seem to be ramping up properly either.

I still think this is a Dell AC adpater problem, but: Would like to sure; Would like to know how to identify it as such, quickly in the future.
Comment 45 Dugi 2016-07-25 07:45:28 UTC
I was trying to check if the problem was a problem of my installation or generally a problem of this version of Ubuntu, so I booted Ubuntu from a USB stick. The problem was still in place. The battery started charging and now it's fully charged. The problem cannot be replicated now, the CPU clock works as it's supposed to.

I recall that the battery problem reappears after some time, so it's quite likely that the problem will return. I will report back then.
Comment 46 Marcin Nowak 2016-07-25 08:03:37 UTC
pstate-frequency 3.4.0 does not work for me. CPU is still locked at 800MHz on AC. Enabling  "auto" mode makes incorrect max_perf_pct and sometimes is disabling turbo. No luck here.

I'm pretty sure that I have a problem with something like charging circuit. I've noticed problems with battery charging, and I did some checks in  Dell Diagnostics recently. "Current battery flow was" ~0mA, AC adapter was recognized as 255 watt (which is invalid AFAIK), status was "Charging" but nothing were changed. After few touches of AC connector battery charging has begun, AC adapter was recognized properly, and current flow grown to ~1.5A or something like that.
Comment 47 Dugi 2016-07-25 08:06:46 UTC
@Marcin Nowak
What helped me was the "max" mode, other modes did nothing.
Comment 48 Marcin Nowak 2016-07-25 10:07:46 UTC
@Dugi
yes, I've tried "max" mode, but without luck for me. You can check what your Dell Diagnostics shows about battery/AC. Maybe there are two independent problems.
Comment 49 Dugi 2016-07-25 10:31:28 UTC
@Marcin Nowak
It would be quite a coincidence that we have the same two rare problems. Furthermore, the fact that my problem vanished at similar time as my battery got charged (after two weeks) is even more unlikely.

I can't install Dell Diagnostics, it won't let me click on the I agree with the licence button. I've tried all browsers I have.
Comment 50 Marcin Nowak 2016-07-25 10:50:28 UTC
@Dugi, my Inspiron has Diagnostics accessible from BIOS from "Boot options" menu (hit a F12 key when booting).
Comment 51 Dugi 2016-07-25 13:10:45 UTC
Mine doesn't. The stupid UEFI gives me access to almost nothing.

I've tried another browser, this time webkit based, but the stupid website won't let me get to it. I have saved the page, edited it locally and it gave me only a Windows executable, which is useless since I broke my Windows some time ago and it's not bootable.

This is what I get when I print the contents of /sys/class/power_supply/BAT0/
File: alarm
0

File: capacity
127

File: capacity_level
Full

File: charge_full
3135000

File: charge_full_design
4000000

File: charge_now
4000000

File: current_now
1000

File: cycle_count
0

File: device
cat: device: Is a directory

File: manufacturer
SANYO

File: model_name
DELL TPMCF2C

File: power
cat: power: Is a directory

File: present
1

File: serial_number
 2519

File: status
Full

File: subsystem
cat: subsystem: Is a directory

File: technology
Li-ion

File: type
Battery

File: uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=11100000
POWER_SUPPLY_VOLTAGE_NOW=12475000
POWER_SUPPLY_CURRENT_NOW=1000
POWER_SUPPLY_CHARGE_FULL_DESIGN=4000000
POWER_SUPPLY_CHARGE_FULL=3135000
POWER_SUPPLY_CHARGE_NOW=4000000
POWER_SUPPLY_CAPACITY=127
POWER_SUPPLY_CAPACITY_LEVEL=Full
POWER_SUPPLY_MODEL_NAME=DELL TPMCF2C
POWER_SUPPLY_MANUFACTURER=SANYO
POWER_SUPPLY_SERIAL_NUMBER= 2519

File: voltage_min_design
11100000

File: voltage_now
12475000

File: power/async
disabled

File: power/autosuspend_delay_ms
cat: power/autosuspend_delay_ms: Input/output error

File: power/control
auto

File: power/runtime_active_kids
0

File: power/runtime_active_time
0

File: power/runtime_enabled
disabled

File: power/runtime_status
unsupported

File: power/runtime_suspended_time
0

File: power/runtime_usage
0

This is what I get for AC power:
File: power/async
disabled

File: power/autosuspend_delay_ms
cat: power/autosuspend_delay_ms: Input/output error

File: power/control
auto

File: power/runtime_active_kids
0

File: power/runtime_active_time
0

File: power/runtime_enabled
disabled

File: power/runtime_status
unsupported

File: power/runtime_suspended_time
0

File: power/runtime_usage
0

File: power/wakeup
enabled

File: power/wakeup_abort_count
0

File: power/wakeup_active
0

File: power/wakeup_active_count
1

File: power/wakeup_count
1

File: power/wakeup_expire_count
0

File: power/wakeup_last_time_ms
990

File: power/wakeup_max_time_ms
2

File: power/wakeup_total_time_ms
2
Comment 52 Marcin Nowak 2016-07-25 16:14:40 UTC
After replacing power adapter CPU scaling back to (almost) normal. On AC it scales up to 2499MHz (when turbo is on), and on battery downscales to ~800MHz. 
I have a little trouble with disabling turbo mode after disconnecting AC - it's not getting enabled after reconnecting AC and CPU goes up to ~1699.945. But it may be related to TLP or something like that. Calling pstate-frequency with plan set to "max" solves the issue. @Dugi, thanks for pstate-freq tip.

--

@Doug, thanks for a tip about Dell bioses and power issues! It looks like the BIOS (or something else) was forcing CPU to minimum when AC was connected but when battery wasn't charging.

--

@Chen, @Srinivas - thank you for focusing on my issue and helping me with solving the problem. Also many thanks to everyone who also was involved in my issue.

--

@Dugi, try to find and run some Dell Diagnostics before digging the problem in kernel or other software. I wasted hundreds of hours for searching problem in a wrong place, probably same for people from Intel mentioned here. I saw many posts related to Dell-AC power problems, faulty bioses, etc. Google it first and check everything related to your laptop. It may be hardware issue, so call to Dell support if you have warranty. Feel free to email me if you have some questions related to this issue.

Marcin.
Comment 53 Dugi 2016-07-27 22:02:44 UTC
Interesting. After slightly discharging the battery, the problem is back. I can collect some more information if requested, as long as it does not concern Dell Diagnostics to which I don't have access.

My response time will be bad in the near future.
Comment 54 Doug Smythies 2016-07-27 22:28:07 UTC
@Dugi: I (still) think your problem is your Dell AC power adapter. Is there any way you could borrow one from a friend or something, as a test?

(In reply to Dugi from comment #49)
> @Marcin Nowak
> It would be quite a coincidence that we have the same two rare problems.

It actually isn't that rare.
Comment 55 Dugi 2016-07-28 06:09:10 UTC
@Doug Smythies
I am not able to do it right now. I will report back when I get to try it, but it will not be too soon.

Still, I am not really sure how could an AC adapter that doesn't seem to have any trouble feeding sufficient power to the computer cause all of this.
Comment 56 Marcin Nowak 2016-07-28 07:40:26 UTC
@Dugi, my AC adapter was powering laptop but not always charging the battery. It looks like there are seperate ciricuts, for powering and charging. 

Currently, while stressing my CPU (stress -p 4), I'm observing sequence where on battery it works at ~800MHz, and when connecting AC it upscales to max freq, then downscales for a ~second to 800MHz and upscales againg to max. 

When using the old AC adapter the CPU was stuck at 800MHz (not always and I was able to "switch" betwen stuck and unstuck, what was very misleading). That was repeatable for months. Last days, just after Doug's comment, the adapter stopped charging battery completely (fortunately!) and I replaced it. 

The only one thing that I don't understand is why on the old AC adapter CPU was working at max freq after disconnecting AC (currently it is working at MIN freq after disconnecting AC). Maybe this is related to some bugs in Dell's BIOS routines. 
 
Marcin
Comment 57 Dugi 2016-08-05 22:36:38 UTC
Another report:

The problem is becoming even weirder. I have disabled intel_pstate in /etc/default/grub. Now, ACPI should control the CPU. I can set the CPU frequency to higher values and these higher values are reported by lscpu, but the performance is terrible. If I unplug the AC power, everything becomes much more fluent and performance-heavy tasks are completed way faster.

Meanwhile, the battery sometimes does charge, and when it charged, the problem disappeared for a while. I will try to get another AC power source when possible.

The pstate-frequency trick does not always work, which means that there is no clear way to get past this issue except for fixing the underlying hardware issue.
Comment 58 Doug Smythies 2016-08-05 22:47:41 UTC
(In reply to Dugi from comment #57)
> Another report:
> 
> The problem is becoming even weirder. I have disabled intel_pstate in
> /etc/default/grub. Now, ACPI should control the CPU. I can set the CPU
> frequency to higher values and these higher values are reported by lscpu,
> but the performance is terrible.

It is very important to know that when using the acpi-cpufreq CPU frequency scaling driver, the system reports CPU frequencies (pstates, actually) that have been asked for, and may not be what you actually get. You need to use turbostat to know for sure. see also my comment 41 above.

> If I unplug the AC power, everything
> becomes much more fluent and performance-heavy tasks are completed way
> faster.
> 
> Meanwhile, the battery sometimes does charge, and when it charged, the
> problem disappeared for a while. I will try to get another AC power source
> when possible.

Due to the ID pin stuff, it should be the correct Dell part.

> 
> The pstate-frequency trick does not always work, which means that there is
> no clear way to get past this issue except for fixing the underlying
> hardware issue.

Agreed.

Please report back whenever you can. Eventually we will want to close this bug report, as it is invalid with respect to the intel_pstate CPU frequency scaling driver.
Comment 59 Dugi 2016-08-05 23:01:15 UTC
I have found the root cause. There was a piece of black cloth in the (black) power socket in the laptop. It was invisible until I put a spotlight right in front of it. When I removed it, charging and CPU frequency is back in normal.

However, I still don't understand why the processor frequency is minimal when the battery is not charging but AC power is present.

Note You need to log in before you can comment on or make changes to this bug.