Bug 35722 - busy thread on idle core i7
Summary: busy thread on idle core i7
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_idle (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lan Tianyu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-23 22:04 UTC by Demanu
Modified: 2012-05-24 07:59 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.38
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Demanu 2011-05-23 22:04:28 UTC
there is problem with idle state of one of the cores on i7 2630qm CPU. One core runs still without halting which increases cpu temperature and makes all system very laggy. I see it on i7z-svn (1st number is frequency and second is load in % without halting, last is temperature):

on 2.6.32 - 2.6.37
Core 1 [0]:       1575.49 (15.79x) 3.07    4.43       0    93.1    54      

on 2.6.38
Core 1 [0]:       1995.00 (20.00x) 100    0       0    0    62
Comment 1 Len Brown 2011-05-31 01:26:19 UTC
is this something that is working normally from 2.6.32 - 2.6.37,
and when you upgraded to 2.6.38 it is not working normally?

please run top(1) and identify if any programs are running.

please send the output from turbostat, available here:

http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools/turbostat/
for both the success and failure cases.
Comment 2 Demanu 2011-05-31 09:46:38 UTC
hello Len Brown. Exactly: every kernel up from 2.6.38 included has this issue. Top/htop doesnt show any program which is using 100% of core.

Here is output from turbostat:

2.6.37-r4 http://paste.pocoo.org/show/398166/
2.6.38-r4 http://paste.pocoo.org/show/398168/
2.6.39 http://paste.pocoo.org/show/398170/

i saw someone on arch linux forum has similar problem.
Comment 3 Lan Tianyu 2011-06-03 05:23:03 UTC
Please run powertop and have a check.
Comment 4 Len Brown 2011-06-07 02:29:11 UTC
it looks like 2.6.37 is okay,
and 2.6.38 and 2.6.39 have a single thread that is running full speed.
It would be good if you could run 2.6.38, rather than -rc4 to confirm
that this is a 2.6.39 regression.

it would be interesting what you see when booted with maxcpus=1
is the only available cpu 100% busy?

powertop -d
output would be useful in both cases.

also, if there is an interrupt problem,
watch -d 2 cat /proc/interrupts
will expose it pretty quickly
Comment 5 Len Brown 2011-08-01 15:39:38 UTC
The turbostat logs in comment #2 confirm that 2.6.37-rc4 was fine,
but that 2.6.38-rc4 and 2.6.39 have a logical processor 99% in c0,
which is preventing the use of deep idle states:

broken:

GenuineIntel 13 CPUID levels; family:model:stepping 0x6:2a:7 (6:42:7)
8 * 100 = 800 MHz max efficiency
20 * 100 = 2000 MHz TSC frequency
26 * 100 = 2600 MHz max turbo 4 active cores
26 * 100 = 2600 MHz max turbo 3 active cores
28 * 100 = 2800 MHz max turbo 2 active cores
29 * 100 = 2900 MHz max turbo 1 active cores
core CPU   %c0   GHz  TSC   %c1    %c3    %c6    %c7   %pc2   %pc3   %pc6   %pc7 
          12.52 2.00 2.00  46.91   0.00  40.57   0.00   0.00   0.00   0.00   0.00
   0   0  99.93 2.00 2.00   0.07   0.00   0.00   0.00   0.00   0.00   0.00   0.00
   0   4   0.12 2.00 2.00  99.88   0.00   0.00   0.00   0.00   0.00   0.00   0.00
   1   1   0.01 2.00 2.00   8.98   0.00  91.01   0.00   0.00   0.00   0.00   0.00
   1   5   0.06 2.00 2.00   8.93   0.00  91.01   0.00   0.00   0.00   0.00   0.00
   2   2   0.00 2.00 2.00  59.42   0.00  40.58   0.00   0.00   0.00   0.00   0.00
   2   6   0.07 2.00 2.00  59.35   0.00  40.58   0.00   0.00   0.00   0.00   0.00
   3   3   0.00 2.01 2.00  69.33   0.00  30.67   0.00   0.00   0.00   0.00   0.00
   3   7   0.00 1.99 2.00  69.33   0.00  30.67   0.00   0.00   0.00   0.00   0.00

Curiously that busy processor is running at 2.0 GHz, the HFM
for the processor -- even though higher turbo bins are available.

I'm unable to reproduce this issue on my sandy-bridge:

[root@sandy tmp]# uname -a
Linux sandy 2.6.38.6-26.rc1.fc15.x86_64 #1 SMP Mon May 9 20:45:15 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@sandy tmp]# dmesg |grep idle
[    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
[    0.005731] using mwait in idle threads.
[    1.835687] intel_idle: MWAIT substates: 0x21120
[    1.835689] intel_idle: v0.4 model 0x2A
[    1.835690] intel_idle: lapic_timer_reliable_states 0xffffffff
[    1.859572] ACPI: acpi_idle yielding to intel_idle
[    1.961634] cpuidle: using governor ladder
[    1.962117] cpuidle: using governor menu
[root@sandy tmp]# ./turbostat -v sleep 10
GenuineIntel 13 CPUID levels; family:model:stepping 0x6:2a:7 (6:42:7)
8 * 100 = 800 MHz max efficiency
23 * 100 = 2300 MHz TSC frequency
31 * 100 = 3100 MHz max turbo 4 active cores
31 * 100 = 3100 MHz max turbo 3 active cores
33 * 100 = 3300 MHz max turbo 2 active cores
34 * 100 = 3400 MHz max turbo 1 active cores
core CPU   %c0   GHz  TSC   %c1    %c3    %c6    %c7   %pc2   %pc3   %pc6   %pc7 
           0.12 1.06 2.29   1.23   0.03   0.00  98.62   0.60   0.08   1.86  92.49
   0   0   0.37 1.44 2.29   1.07   0.12   0.00  98.44   0.60   0.08   1.86  92.49
   0   4   0.15 0.82 2.29   1.30   0.12   0.00  98.44   0.60   0.08   1.86  92.49
   1   1   0.02 0.81 2.29   0.24   0.00   0.00  99.74   0.60   0.08   1.86  92.49
   1   5   0.17 0.81 2.29   0.09   0.00   0.00  99.74   0.60   0.08   1.86  92.49
   2   2   0.19 0.80 2.29   3.42   0.00   0.00  96.39   0.60   0.08   1.86  92.48
   2   6   0.01 0.82 2.29   3.60   0.00   0.00  96.39   0.60   0.08   1.86  92.48
   3   3   0.01 0.82 2.29   0.08   0.00   0.00  99.91   0.60   0.08   1.86  92.48
   3   7   0.03 0.86 2.29   0.06   0.00   0.00  99.91   0.60   0.08   1.86  92.48
10.004346 sec

This doesn't look like an issue with the idle code to me --
it looks like something is actually running on your system
and the idle code is simply reflecting that.


can you reproduce this on v3.0?
Comment 6 Zhang Rui 2012-01-18 05:17:05 UTC
Ping Demanu...

Can you please verify if the problem still exists in the latest upstream
kernel?
Comment 7 Zhang Rui 2012-05-24 07:59:05 UTC
bug closed as there is no response from the bug reporter.
please feel free to reopen it if the problem still exists in the latest upstream kernel.

Note You need to log in before you can comment on or make changes to this bug.