Bug 176411 - cpufreq/cpuinfo_cur_freq reports <unknown>, MSI X99A SLI PLUS i7-5820K, MSI i7-5930K, MSI X99A Raider i7-5820K
Summary: cpufreq/cpuinfo_cur_freq reports <unknown>, MSI X99A SLI PLUS i7-5820K, MSI ...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: linux-pm@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-05 14:53 UTC by Mike Lui
Modified: 2017-05-02 15:54 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.7.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
turbostat --debug output (12.11 KB, text/plain)
2016-10-05 22:57 UTC, Mike Lui
Details
turbostat --debug --msr=0x199 (14.72 KB, text/plain)
2016-10-07 20:17 UTC, Mike Lui
Details
perf output (1.32 MB, application/octet-stream)
2016-10-07 20:18 UTC, Mike Lui
Details
archlinux kernel config (179.09 KB, text/plain)
2016-10-10 14:06 UTC, Mike Lui
Details
dmesg after toggling cpu1 (4.59 KB, text/plain)
2016-10-10 14:09 UTC, Mike Lui
Details
WARN if the time argument of the governor hook is 0 (559 bytes, patch)
2016-10-11 23:49 UTC, Rafael J. Wysocki
Details | Diff
attachment-3164-0.html (1.61 KB, text/html)
2017-01-31 21:06 UTC, Zhang Rui
Details

Description Mike Lui 2016-10-05 14:53:35 UTC
# cat /proc/cpuinfo | grep model

model           : 63
model name      : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
...

# cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq

<unknown><unknown><unknown><unknown><unknown><unknown><unknown><unknown><unknown><unknown><unknown><unknown>

# cat /proc/[pid]/stat

will also show '0' for utime, stime, cutime, and cstime

Certain utilities like 'top' and 'ps' will show '0%' for %CPU for a given process, but not for a given CPU.

I've upgraded my kernel a few times in the past several weeks, and I noticed this issue before, although I'm not sure the exact kernel version I started noticing this. This same behavior has been reported for an i7 5930k (also 6-core Haswell-E). Performance degradation that I began noticing around the same time may be related.
Comment 1 Rafael J. Wysocki 2016-10-05 20:39:44 UTC
What's in

/sys/devices/system/cpu/cpu*/cpufreq/scaling_driver

?
Comment 2 Rafael J. Wysocki 2016-10-05 20:41:22 UTC
And can you try 4.8, please?
Comment 3 Mike Lui 2016-10-05 22:37:45 UTC
4.8 has the same results/symptoms.

# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver

intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
Comment 4 Rafael J. Wysocki 2016-10-05 22:51:15 UTC
Apparently, intel_pstate_get() returns 0 on your system for some reason.

Please run "turbostat --debug" (say for around 30s) on it and attach the output.
Comment 5 Mike Lui 2016-10-05 22:57:49 UTC
Created attachment 240901 [details]
turbostat --debug output
Comment 6 Srinivas Pandruvada 2016-10-05 23:34:41 UTC
Looks like the schedule util tick is not getting called back.
Since you have issue with all time keeping issues for process stats also, I suspect something to do with tick.

Can you run some load and attach same turbostat:
# turbostat --debug --msr=0x199

Do you have some special kernel config or running a distro config?
Attach .config also.

Kernel command line: What is cat /proc/cmdline?
Comment 7 Srinivas Pandruvada 2016-10-05 23:42:01 UTC
Also attach output of:
# perf record -a --event=power:pstate_sample sleep 300
Comment 8 Mike Lui 2016-10-07 20:16:37 UTC
# cat /proc/cmdline
initrd=\initramfs-linux.img root=/dev/sda3 rw fbcon=rotate:3

I'm just using the vanilla 4.7 linux kernel from the archlinux [core] repo. For the 4.8 test, I used the vanilla 4.8 kernel from the archlinux [testing] repo.


Attaching the requested output after this comment
Comment 9 Mike Lui 2016-10-07 20:17:48 UTC
Created attachment 241141 [details]
turbostat --debug --msr=0x199
Comment 10 Mike Lui 2016-10-07 20:18:57 UTC
Created attachment 241151 [details]
perf output

perf record -a --event=power:pstate_sample sleep 300
Comment 11 Doug Smythies 2016-10-07 21:04:56 UTC
(In reply to Mike Lui from comment #10)
> Created attachment 241151 [details]
> perf output
> 
> perf record -a --event=power:pstate_sample sleep 300

There doesn't seem to be any actual sample data in the file. The header seems O.K., as does all the module stuff.
Comment 12 Srinivas Pandruvada 2016-10-07 21:21:34 UTC
Same as Doug.

./perf report --header
Error:
The perf.data file has no samples!
# ========
# captured on: Fri Oct  7 14:08:08 2016
# hostname : stravinsky
# os release : 4.8.0-1-ARCH
# perf version : 4.7.g523d93
# arch : x86_64
# nrcpus online : 12
# nrcpus avail : 12
# cpudesc : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
# cpuid : GenuineIntel,6,63,2
# total memory : 32844308 kB
# cmdline : /usr/bin/perf record -a --event=power:pstate_sample sleep 300 
# event : name = power:pstate_sample, , type = 2, size = 112, config = 0x184, { sample_period, sa
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, intel_bts = 6, uncore_imc_4 = 17, uncore_sbox_1 = 26, uncore_cbox_5 = 24
# HEADER_CACHE info available, use -I to display
# ========
#


No samples collected. I guess intel_pstate sched_util not called.

-Please also attach your .config

In addition do this step:
# echo -n 'file intel_pstate.c +p' > /sys/kernel/debug/dynamic_debug/control 
# cd /sys/device/system/cpu/cpu1
# echo 0 > online
# echo 1 > online
Take dmesg and send output
Comment 13 Mike Lui 2016-10-10 14:05:29 UTC
Hrmm, yea not sure why the perf.data is empty. I'm not too familiar with using perf for events outside the standard performance profiling.

Attaching .config and dmesg after executing the steps from comment #12.
Comment 14 Mike Lui 2016-10-10 14:06:09 UTC
Created attachment 241311 [details]
archlinux kernel config
Comment 15 Mike Lui 2016-10-10 14:07:15 UTC
Comment on attachment 241311 [details]
archlinux kernel config

# zcat /proc/config.gz > .config
Comment 16 Mike Lui 2016-10-10 14:09:26 UTC
Created attachment 241321 [details]
dmesg after toggling cpu1
Comment 17 Rafael J. Wysocki 2016-10-11 22:39:56 UTC
Please attach dmesg from a fresh boot too.
Comment 18 Rafael J. Wysocki 2016-10-11 23:04:32 UTC
And please boot with

dyndbg="file intel_pstate.c +p"

in the kernel command line for that.
Comment 19 Rafael J. Wysocki 2016-10-11 23:49:06 UTC
Created attachment 241581 [details]
WARN if the time argument of the governor hook is 0

Also please apply this patch and see if you see the new warning in dmesg output.
Comment 20 Srinivas Pandruvada 2016-10-12 18:03:05 UTC
I used same kernel config with 4.8 on Arch Linux on CPU model as yours. I couldn't reproduce. So we need to add debug patches.
So start with patch in comment #19.
Comment 21 Mike Lui 2016-10-13 14:32:44 UTC
comment #20 leads to believe this is an issue not with my CPU, but instead with my other hardware configuration.

I went into my BIOS (UEFI?) on my MSI X99A SLI PLUS and changed 3 settings I must have set a while back:
  - disabled on-board OC'ing
  - disabled intel c-states
  - enabled turbo-boost

After this the issues in comment #1 are no longer present.

Should this be marked as resolved, or should this still be considered a bug, for such a corner case?
Comment 22 Srinivas Pandruvada 2016-10-13 17:27:24 UTC
So your current settings are:
 - disabled on-board OC'ing
  - disabled intel c-states
  - enabled turbo-boost
 
What was setting before for these?

I still want to reproduce for analysis.
Comment 23 Dominik Gronkiewicz 2016-11-26 12:32:53 UTC
I can reproduce it on my machine running Fedora 24.

model name	: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
model		: 63

Kernel 4.8.8-200.fc24.x86_64

I also noticed that machine has slowed down significantly.

If there's any help I can provide to debug this, please let me know. I use this machine for computing so CPU time is essential for me to benchmark my code.
Comment 24 Dominik Gronkiewicz 2016-11-26 13:34:01 UTC
(In reply to Dominik Gronkiewicz from comment #23)
> I can reproduce it on my machine running Fedora 24.
> 
> model name    : Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
> model         : 63
> 
> Kernel 4.8.8-200.fc24.x86_64
> 
> I also noticed that machine has slowed down significantly.
> 
> If there's any help I can provide to debug this, please let me know. I use
> this machine for computing so CPU time is essential for me to benchmark my
> code.

Sorry for posting two comments in a row. Disabling C-states in MSI BIOS indeed brings back full functionality and performance of the CPU. Still, if there's any way I can help in debugging, please let me know.
Comment 25 Srinivas Pandruvada 2016-11-28 17:06:25 UTC
Please apply patch "WARN if the time argument of the governor hook is 0" attached in this bugzilla.
During boot add this to kernel command line:
"dyndbg="file intel_pstate.c +p"

and send dmesg log.
Comment 26 Zhang Rui 2016-12-22 02:20:56 UTC
ping...
Comment 27 ish 2017-01-31 21:06:00 UTC
Thought I would chime in hear and say that I'm seeing something very similar.

model		: 63
model name	: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
stepping	: 2

The motherboard is a MSI X99A Raider.  I did not see an option to disable over clocking, but did disable C-states and made sure turbo boost was enabled.

On cold boot the system is fine. The cores fluctuate between 1200 and 3400 Mhz as expected. But on a normal reboot they appear to be locked at 1200 Mhz and never accelerate.  Additionally, apps like ps and htop show all processes with a CPU usage of 0%.

I can increase the speed by doing a "cpupower frequency-set -g performance", but all processes show 0% CPU usage which is not correct for the apps I'm running.
Comment 28 Zhang Rui 2017-01-31 21:06:14 UTC
Created attachment 253741 [details]
attachment-3164-0.html

OOO till Feb 7th.
Comment 29 Dominik Gronkiewicz 2017-01-31 21:10:30 UTC
After careful examination, I have noticed a similar behavior: the problem seems to occur only after a reboot and never after a cold shutdown.
Comment 30 Len Brown 2017-04-03 23:41:42 UTC
Mike Lui : 

  - disabled on-board OC'ing
  - disabled intel c-states
  - enabled turbo-boost

    After this the issues in comment #1 are no longer present.

@ Mike Lui
    Can you verify that there is no problem when you use SETUP DEFAULTs?
    or at least return C-states to their default ENABLED setting)
Comment 31 Len Brown 2017-04-03 23:44:21 UTC
@ Dominik Gronkiewicz 

> I also noticed that machine has slowed down significantly.

This suggests that timers are broken.
Please see if there is anything in the dmesg about this -- such as your TSC being de-activited as a clock source.

> Disabling C-states in MSI BIOS
> indeed brings back full functionality and performance of the CPU.

This is important if this is true, can you verify that re-enabling
C-states breaks your system?


> the problem seems to occur only after a reboot and never after a cold
> shutdown.

is that with C-states enabled or disabled?
Comment 32 Len Brown 2017-04-03 23:58:48 UTC
@ ish@unx.ca 

> I did not see an option to disable over clocking,
> but did disable C-states and made sure turbo boost was enabled.

Does the problem go away when C-states are disabled,
or is is still possible to see the failure with c-states disabled?

> But on a normal reboot they appear to be locked at 1200 Mhz and never
> accelerate.

Is it true that the failure is NEVER SEEN without a reboot?

@ Dominik Gronkiewicz 

> MSI BIOS

Seems all three sightings are on MSI boards, can you identify yours?
(see /sys/class/dmi/id/ )
Comment 33 ish 2017-04-06 16:37:48 UTC
It appears that this has resolved itself for me. I've recently updated to kernel:

Linux desktop 4.10.8-200.fc25.x86_64 #1 SMP Fri Mar 31 13:20:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

with Fedora 25.

I did a few reboots without power cycle with c-states disabled, which is how I last left my bios configured.  Running htop showed CPU percentages after each boot, i7z showed cores hitting max mhz.

I then reset my bios to factory default, which enables c-state. And did a few non-power cycle reboots, again seeing expected behaviour with the cores.  I did not have to power cycle this time to get full speed.

I'll keep monitoring it, but is it possible a kernel update could have fixed it?

===> /sys/class/dmi/id/

./product_serial =====> Default string
./bios_vendor =====> American Megatrends Inc.
./product_version =====> 5.0
./chassis_asset_tag =====> Default string
./chassis_serial =====> Default string
./board_vendor =====> MSI
./board_asset_tag =====> To be filled by O.E.M.
./board_version =====> 5.0
./power/runtime_suspended_time =====> 0
./power/autosuspend_delay_ms =====>
./power/runtime_active_time =====> 0
./power/control =====> auto
./power/runtime_status =====> unsupported
./chassis_vendor =====> MSI
./modalias =====> dmi:bvnAmericanMegatrendsInc.:bvrP.50:bd07/19/2016:svnMSI:pnMS-7885:pvr5.0:rvnMSI:rnX99ARAIDER(MS-7885):rvr5.0:cvnMSI:ct3:cvr5.0:
./product_uuid =====> AAAAAAAA-AAAA-AAAA-AAAA-D8CB8AEDA146
./bios_version =====> P.50
./sys_vendor =====> MSI
./board_serial =====> To be filled by O.E.M.
./chassis_version =====> 5.0
./chassis_type =====> 3
./uevent =====> MODALIAS=dmi:bvnAmericanMegatrendsInc.:bvrP.50:bd07/19/2016:svnMSI:pnMS-7885:pvr5.0:rvnMSI:rnX99ARAIDER(MS-7885):rvr5.0:cvnMSI:ct3:cvr5.0:
./product_name =====> MS-7885
./bios_date =====> 07/19/2016
./board_name =====> X99A RAIDER (MS-7885)
Comment 34 Zhang Rui 2017-04-07 06:58:41 UTC
(In reply to ish from comment #33)
> It appears that this has resolved itself for me. I've recently updated to
> kernel:
> 
> Linux desktop 4.10.8-200.fc25.x86_64 #1 SMP Fri Mar 31 13:20:22 UTC 2017
> x86_64 x86_64 x86_64 GNU/Linux
> 
> with Fedora 25.
> 

Good to know.

> I did a few reboots without power cycle with c-states disabled, which is how
> I last left my bios configured.  Running htop showed CPU percentages after
> each boot, i7z showed cores hitting max mhz.
> 
> I then reset my bios to factory default, which enables c-state. And did a
> few non-power cycle reboots, again seeing expected behaviour with the cores.
> I did not have to power cycle this time to get full speed.
> 
> I'll keep monitoring it, but is it possible a kernel update could have fixed
> it?

Well, maybe, I'm not quite sure, but I do know there is quite a lot of intel-pstate fixes in recent kernels.
Anyway, bug closes as it can not be reproduced. Please feel free to reopen it if you can reproduce the problem in the latest upstream kernel again.
Comment 35 Mike Lui 2017-05-02 15:54:42 UTC
(In reply to Len Brown from comment #30)
> Mike Lui : 
> 
>   - disabled on-board OC'ing
>   - disabled intel c-states
>   - enabled turbo-boost
> 
>     After this the issues in comment #1 are no longer present.
> 
> @ Mike Lui
>     Can you verify that there is no problem when you use SETUP DEFAULTs?
>     or at least return C-states to their default ENABLED setting)

I got caught up with work recently and just reviewed the comments, since my last comment.
The problem appears to be fixed, for me. Unfortunately (or fortunately), the solution was a simple UEFI BIOS update. This points to a problem in the firmware, for my situation, which explains the intermittent nature of the problem.

Note You need to log in before you can comment on or make changes to this bug.