Bug 5165

Summary: smp c-states on Pentium 4 with hyperthreading causes big slow-down
Product: ACPI Reporter: Karl Tomlinson (bugs+kernel)
Component: Power-ProcessorAssignee: Venkatesh Pallipadi (venki)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, awaria, iaindb, jbb, kernel, mstamp
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13 Subsystem:
Regression: --- Bisected commit-id:
Attachments: output from acpidump
acpidump --addr 0x3ffefe54 --length 0x1db
acpidump --addr 0x3ffefdf8 --length 0x5c
CPU 0 CST disassembly
CPU 1 CST disassembly
_CST debug patch
_CST scan debug output (bzip2 compressed)
kernel .config
Don't use P_LVL when there is a valid _CST
Watchout for P_LVL2_UP flag in fadt, before using C2 and beyond on SMP systems
Watchout for P_LVL2_UP flag in fadt, before using C2 and beyond on SMP systems
p_LVL2_UP flag: increament against 2.6.15-rc3-mm1
incremental patch (#3) from David Shaohua Li vs 2.6.15-rc5

Description Karl Tomlinson 2005-08-31 16:15:15 UTC
Most recent kernel where this bug did not occur: 2.6.12  
Distribution: vanilla (Gentoo) 
Hardware Environment: Dell Inspiron 9100, Intel Pentium 4 3.2GHz with w/HT 
stepping 9 
Software Environment: gcc version 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, 
ssp-3.3.5.20050130-1, pie-8.7.7.1) 
Problem Description: Processes take in the order of 10 times longer to run when 
both cpus are online and max_cstate is greater than 1. 
  
Steps to reproduce: Boot with hyperthreading enabled 
(and see how long it takes to run init scripts). 
 
Loading processor module with max_cstate=1 
or taking one cpu off line results in the processes running as expected. 
 
(Unloading processor module and reloading the module with a different 
max_cstate resulted in 
 
  dswload-0304: *** Error looking up _CST in namespace: AE_ALREADY_EXISTS 
 
and a system hang such that MagicSysRq was ineffective but 
 
  echo 1 >| /sys/module/processor/parameters/max_cstate 
 
works fine.) 
 
When max_cstate >= 2, processes still report 100% cpu. 
 
% cat /proc/acpi/processor/CPU0/power /proc/acpi/processor/CPU1/power 
active state:            C1 
max_cstate:              C8 
bus master activity:     ffffffe9 
states: 
   *C1:                  type[C1] promotion[C2] demotion[--] latency[001] 
usage[00245832] 
    C2:                  type[C2] promotion[C3] demotion[C1] latency[050] 
usage[00181279] 
    C3:                  type[C3] promotion[--] demotion[C2] latency[050] 
usage[00000000] 
active state:            C1 
max_cstate:              C8 
bus master activity:     0200801d 
states: 
   *C1:                  type[C1] promotion[C2] demotion[--] latency[001] 
usage[00115864] 
    C2:                  type[C2] promotion[C3] demotion[C1] latency[050] 
usage[00469656] 
    C3:                  type[C3] promotion[--] demotion[C2] latency[050] 
usage[00000000] 
 
Using PREEMPT_VOLUNTARY or PREEMPT_NONE instead of PREEMPT makes the behaviour 
more eratic - sometimes jobs seem to run as expected, but sometimes they take 
twice as long, so the average time is similar. 
 
Using HZ=250 instead of 1000 seems to make processes take even longer. 
 
There is a thin continuous sound of about 2kHz (I guess) during inactivity with 
HZ=1000.  With HZ=250 the sound is either gone or blends in with the fans. 
This sound is the same that occurs with maxcpus=1 and hyperthreading enabled 
since at least Linux 2.6.11.  (With hyperthreading disabled in the bios or 
max_cstate=1 the sound is not present.) 
 
With maxcpus=1 and HT enabled: 
% cat /proc/acpi/processor/CPU0/power 
active state:            C2 
max_cstate:              C8 
bus master activity:     ffdffffd 
states: 
    C1:                  type[C1] promotion[C2] demotion[--] latency[001] 
usage[00055200] 
   *C2:                  type[C2] promotion[C3] demotion[C1] latency[050] 
usage[00338121] 
    C3:                  type[C3] promotion[--] demotion[C2] latency[050] 
usage[00000000] 
 
With HT disabled: 
active state:            C2 
max_cstate:              C8 
bus master activity:     00000000 
states: 
    C1:                  type[C1] promotion[C2] demotion[--] latency[001] 
usage[00000010] 
   *C2:                  type[C2] promotion[--] demotion[C1] latency[001] 
usage[00229907] 
 
The other notable difference the logs is the new message that occurs several 
times: 
 
acpi_bus-0212 [-12] acpi_bus_set_power : Device is not power manageable
Comment 1 Len Brown 2005-08-31 18:56:09 UTC
I don't the C-state latencies advertised by this BIOS.
Please verify that you're running the latest advertised BIOS revision
and then attach the output from acpidump, available in pmtools here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
Comment 2 Karl Tomlinson 2005-08-31 20:05:39 UTC
Created attachment 5838 [details]
output from acpidump

The BIOS revision I have is A06 (03/08/2005), which is the latest available
from Dell.
(http://support.ap.dell.com/apjsite/Downloads/format.aspx?releaseid=R95766)
Comment 3 Venkatesh Pallipadi 2005-09-07 19:04:03 UTC
This particular issue
(Unloading processor module and reloading the module with a different 
max_cstate resulted in 
 
  dswload-0304: *** Error looking up _CST in namespace: AE_ALREADY_EXISTS 

Is a BIOS issue where it tries to load a same module twice. Should be fixed in 
recent acpi patchset with duplicate SSDT load fix.

But, the main issue with C2, C3 state is still unclear to me and I will need 
some more info. Can you please provide output of 

#acpidump --addr 0x3ffefe54 --length 0x1db
and
#acpidump --addr 0x3ffefdf8 --length 0x5c

Comment 4 Karl Tomlinson 2005-09-08 02:27:13 UTC
Created attachment 5929 [details]
acpidump --addr 0x3ffefe54 --length 0x1db
Comment 5 Karl Tomlinson 2005-09-08 02:30:35 UTC
Created attachment 5930 [details]
acpidump --addr 0x3ffefdf8 --length 0x5c

Thanks for the info on dswload-0304 and for looking into the C-states.

Hope this is helpful.
Comment 6 Venkatesh Pallipadi 2005-09-08 12:06:11 UTC
There may be still more than one bug here.

Looking at the ACPI disassembly, BIOS is doing some tricky things with CST here.
1) CST for CPU 0 and CPU 1 are different.CPU 1 always has only C1.
2) CPU 0 has either only C1, or C1, C2 or C1, C2, C3 or C1, C2, C3, C4 depending
on some configuration (Things like whether HT is on or off).

The disassembly explains somethings:
1) Why you are seeing different number of states whe HT is enabled/disabled.

But, there are lot of unexplained things here
1) Why 2 CPUs are having 3 C-states when BIOS has only one entry in CPU 1 CST.
This seems like a OS bug.
2) Why there is a C2 latency of 1 when HT is disabled. Probably a BIOS bug. Bad CST.
3) I am still not able to map latency 50 to any entry in BIOSes CST. Probably
another bug somewhere.

So, when you boot with HT enabled-maxcpus=1 do you still see the slowdown?
When you boot with HT disabled in BIOS, do you still see the slowdown?
Comment 7 Venkatesh Pallipadi 2005-09-08 12:07:27 UTC
Created attachment 5936 [details]
CPU 0 CST disassembly

Attaching the CST disassembly for reference.
Comment 8 Venkatesh Pallipadi 2005-09-08 12:08:15 UTC
Created attachment 5937 [details]
CPU 1 CST disassembly
Comment 9 Karl Tomlinson 2005-09-08 12:26:14 UTC
I am not seeing a slowdown either with HT enabled and maxcpus=1 (when the    
sound is present) or with HT disabled in the BIOS (when there is no sound).  In   
both cases max_cstate was not specified, and so was 8 according  
to /proc/acpi/processor/CPU0/power.  
  
Comment 10 Venkatesh Pallipadi 2005-09-09 10:44:21 UTC
Created attachment 5956 [details]
_CST debug patch


OK. That means the slowness is mostly coming from the second CPU trying to go
to C2 state, while BIOS says it can only go to C1.

When you have only one CPU enabled, both with HT disabeld in BIOS or maxcpus=1,
everything seems OK (though the latencies advertised by the BIOS is a suspect
there).

Can you please try the patch on 2.6.13 kernel, and boot with both the CPUs.
That should print out a lot of messages in dmesg (probably you may need to
increase the dmesg log_buf size in order to capture the whole message). Then
send in that complete dmesg output.

Thanks.
Comment 11 Karl Tomlinson 2005-09-09 21:39:01 UTC
Created attachment 5963 [details]
_CST scan debug output (bzip2 compressed)

The whole dmesg output is included for completeness but processor was loaded
last so the output from this is at the end.

The kernel was vanilla 2.6.13 + acpi-20050815-2.6.13.diff + _CST debug patch.
HT was enabled and the only kernel parameter was log_buf_len, so both cpus and
all c-states were available.

(With acpi-20050815-2.6.13.diff the message,

 swload-0304: *** Error looking up _CST in namespace: AE_ALREADY_EXISTS

is no longer present, but the kernel still hangs after unloading and reloading
processor, although MagicSysRq enabled a reboot.)
Comment 12 Karl Tomlinson 2005-09-09 21:40:45 UTC
Created attachment 5964 [details]
kernel .config

Just in case this is useful.
Comment 13 Karl Tomlinson 2005-09-09 21:49:35 UTC
I don't know how c-states work, but I wonder whether it is even possible to 
change the c-state on only one of a pair of sibling logical cpus. 
If there is a process running on CPU 1 (which seems to only have one c-state), 
wouldn't the process be affected by sending CPU 0 to C2? 
 
Comment 14 Venkatesh Pallipadi 2005-09-14 15:03:23 UTC
I think I have root-caused this one. This is what is happening:
- We try to get supported C-states from _CST.
- We only find 1 C-state in there.
- We fall back to P_LVL/P_BLK way of determining number of C-states.
- That says C2 and C3 are supported
- And we go ahead and use this C2 and C3 on both the processors.

The only problem with all the above is a single bit P_LVL2_UP in Fixed Feature
Flags of FADT. That bit says whether C2 is only supported on UP system. And on
this system that bit is set and Linux kernel today is happily ignoring that bit.

As a result, when one processor requests a C2 or C3, looks like both of them are
going to idle and hence affecting the performance. This problem got unmasked
only in 2.6.13 as before we used to disabled C2, C3 on all SMP systems. Now we
are enabling it as some SMP system do support C2, C3.

I will provide a patch for this one soon (tomorrow). Once the patch is there can
you please test it on your system and make sure that it works correctly.

Thanks.

Comment 15 Venkatesh Pallipadi 2005-09-15 12:13:20 UTC
After talking with Len, we have identified 2 bugs here:
1) Linux should not use C-states based on P_LVL, when _CST is present.
2) Linux should look at P_LVL2_UP flag in fadt, when using P_LVL based C-
states on an SMP system.

Below are the patches (1 for each bug above). Please apply both of them, 
rebuild the kernel and verify that the problem is solved here. The expected 
result is: When HT is enabled and 2 CPUs are running, only C1 should be used.

Thanks.
Comment 16 Venkatesh Pallipadi 2005-09-15 12:19:26 UTC
Created attachment 6034 [details]
Don't use P_LVL when there is a valid _CST
Comment 17 Venkatesh Pallipadi 2005-09-15 12:20:22 UTC
Created attachment 6035 [details]
Watchout for P_LVL2_UP flag in fadt, before using C2 and beyond on SMP systems
Comment 18 Karl Tomlinson 2005-09-15 18:01:16 UTC
The "Don't use P_LVL when there is a valid _CST" patch seems to work as  
intended.  
  
With HT (and both cpus) enabled only one c-state is available (with 0  
latency?), and there was no slow down.  
  
active state:            C1  
max_cstate:              C8  
bus master activity:     00000000  
states:  
   *C1:                  type[C1] promotion[--] demotion[--] latency[000]  
usage[00134855]  
  
With HT disabled 2 c-states were available as before. 
Comment 19 Karl Tomlinson 2005-09-15 18:32:51 UTC
To test the "Watchout for P_LVL2_UP..." patch, I used the nocst=1 processor    
module parameter.    
    
The behaviour reverted to the same as without the patches:  The same 3 c-states    
were available (with HT) with the same latencies and similar usages, and the    
slow down was back.    
    
The patches (both of them) were applied to Linux 2.6.13.1.   
(acpi-20050815-2.6.13.diff was not used as some hunks were rejected on   
application.)  
  
CONFIG_HOTPLUG_CPU was enabled.   
   
With kernel parameters acpi_dbg_level=0x1f acpi_dbg_layer=0x01000002, the   
output from processor was:   
   
acpi_processor-0476 [06] acpi_processor_get_inf: Bus mastering arbitration   
control present   
acpi_processor-0527 [06] acpi_processor_get_inf: Processor [0:0]   
acpi_processor-0192 [07] acpi_processor_get_thr: pblk_address[0x000010e0]   
duty_offset[1] duty_width[3]   
acpi_processor-0241 [07] acpi_processor_get_thr: Found 8 throttling states   
acpi_processor-0098 [08] acpi_processor_get_thr: Throttling state is T0 (0%   
throttling applied)   
acpi_processor-0560 [08] acpi_processor_get_pow: lvl2[0x000010e4] lvl3  
[0x000010e5]   
ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])   
ACPI: Processor [CPU0] (supports 8 throttling states)   
acpi_processor-0476 [06] acpi_processor_get_inf: Bus mastering arbitration   
control present   
acpi_processor-0527 [06] acpi_processor_get_inf: Processor [1:1]   
acpi_processor-0192 [07] acpi_processor_get_thr: pblk_address[0x000010e0]   
duty_offset[1] duty_width[3]   
acpi_processor-0241 [07] acpi_processor_get_thr: Found 8 throttling states   
acpi_processor-0098 [08] acpi_processor_get_thr: Throttling state is T0 (0%   
throttling applied)   
acpi_processor-0560 [12] acpi_processor_get_pow: lvl2[0x000010e4] lvl3  
[0x000010e5]   
ACPI: CPU1 (power states: C1[C1] C2[C2] C3[C3])   
ACPI: Processor [CPU1] (supports 8 throttling states)   
  
I don't know what the BIOS's P_LVL2_UP value is. 
Is it contained in the acpidump output here or are there other arguments to 
extract it? 
  
Despite the _CST reporting that there is only one C-state, 
it seems that C2 does have an effect but would need to be entered only when 
both processors were idle. 
Whether or not the power savings are worth the effort I don't know. 
   
Comment 20 Venkatesh Pallipadi 2005-09-16 16:30:12 UTC
Looks liek I read the FADT wrongly before.
BIOS LVL2_UP value is '0' here.

This is what I see in FADT/FACP:
Flags: 0x32203235

LVL2_UP is bit 3.

Anyways, both these patches are required for Linux. And patch 1 solves the
problem here. 

So, I will send both these patches towards Len.
Comment 21 Len Brown 2005-09-21 23:16:19 UTC
applied patches in comment #16 and comment #17 to acpi test tree
Comment 22 Iain Buchanan 2005-10-09 17:33:09 UTC
The patches in comments 16 & 17 fix the "slow-down" on my Dell Inspiron 9100
laptop (P4 3 GHz HT).  However, it seems to screw up suspend and resume.  I'm
talking about the normal kernel suspend / resume, no extra patches or utilities.
 Should I open a new bug or add more info here?

PS. I suspend, as root, by doing:
# echo shutdown > /sys/power/disk; echo disk > /sys/power/state
Comment 23 Karl Tomlinson 2005-10-10 01:36:15 UTC
suspend2 (2.2-rc7) is working well for my I9100. 
I'm using the patch in Comment #16 which is the one that is required. 
I can't remember but I'm probably not using the patch in Comment #17 as it 
didn't do anything for the I9100. 
 
I think there may have been an issue when suspending with processor module 
loaded and trying to resume from a kernel without the module.  So build in 
processor and thermal because processor at least can't be unloaded successfully 
anyway. 
 
You'll probably want hibernate-script to unload some other problem modules 
(ndiswrapper, button, battery) and don't include any cpufreq stuff in the 
kernel config (even as modules). 
 
http://www.suspend2.net/downloads/ 
Comment 24 Len Brown 2005-10-13 18:31:55 UTC
*** Bug 5432 has been marked as a duplicate of this bug. ***
Comment 25 Iain Buchanan 2005-10-29 00:25:23 UTC
For the record, I just tried (vanilla) 2.6.14_rc4, and I still get this "slowdown".

> I'm using the patch in Comment #16 which is the one that is required.
> I can't remember but I'm probably not using the patch in Comment #17 as it
> didn't do anything for the I9100. 

I tried gentoo-2.6.13-gentoo-r5 with only comment #16 patch, and I got kernel
oopses.

> I think there may have been an issue when suspending with processor module
> loaded and trying to resume from a kernel without the module.  So build in
> processor and thermal because processor at least can't be unloaded
> successfully anyway.

I have processor and thermal built-in.  I used almost exactly the same options
as with my working suspend on 2.6.12.x, so I would assume its 2.6.13 (or the
patch) thats breaking the suspend... for me!
 
> You'll probably want hibernate-script to unload some other problem modules
> (ndiswrapper, button, battery) and don't include any cpufreq stuff in the
> kernel config (even as modules). 

I made my own script to remove some modules - all I had to do was unload
ndiswrapper, and b44 (wired network) and 2.6.12 suspended ok, but not 2.6.13...

So, what now?
Comment 26 Daniel Drake 2005-10-29 06:54:23 UTC
Downstream bug report:
http://bugs.gentoo.org/110661
Comment 27 Venkatesh Pallipadi 2005-10-29 08:44:51 UTC
Slowness issue and suspend-resume issue seems to be unrelated at this point. I
don't see how the changes for the slowness issue affect suspend-resume in any way.

Patches for slowness issue is in Len's acpi test tree now. So, it should get
into mm and the bse soon.

For suspend-resume issue, it will be great if you can open another bug. Was it
working fine on say 2.6.12? Was it working fine before this patch (even though
the whole system was slow)? How exactly does it fail? Fails during suspend? or
during resume? Does it hang or oops or panic?
Comment 28 Iain Buchanan 2005-11-13 17:48:19 UTC
Just installed vanilla-sources 2.6.15_rc1 (on gentoo) and the "slowness" is
still there.  The good news is the two patches still fix the issue.

(Also, I can now cleanly compile and load ndiswrapper, which I haven't been able
to do for all of 2.6.14 :)

Now I just have to wait for suspend2 patches for 2.6.15 and I'll be a happy
chappy again ;)
Comment 29 Shaohua 2005-11-28 11:53:27 UTC
Created attachment 6707 [details]
Watchout for P_LVL2_UP flag in fadt, before using C2 and beyond on SMP systems

Venki's patch with some typos fixed. Venki, please look at it.
Comment 30 Shaohua 2005-12-01 17:00:35 UTC
Created attachment 6742 [details]
p_LVL2_UP flag: increament against 2.6.15-rc3-mm1
Comment 31 Len Brown 2005-12-05 13:53:51 UTC
Created attachment 6774 [details]
incremental patch (#3) from David Shaohua Li vs 2.6.15-rc5

patches in comment #16 and comment #17 shipped in linux-2.6.15-rc5
the refreshed 3rd patch attached from Shaohua & Venki applied to acpi-test tree
Comment 32 marco 2005-12-07 10:39:42 UTC
Hi, I didn't understood, is it fixed in 2.6.15-rc5? Because I have exactly the 
same problem (P4 HT), but upgrading to 2.6.15-rc5 didn't solve it. I didn't 
apply any patches, should I ? 
Thanks 
Comment 33 marco 2005-12-07 10:59:50 UTC
ok, it works if I apply patch from comment #31. I've should have tried first 
and asked after. Sorry! 
Marco 
Comment 34 Len Brown 2006-02-02 14:34:50 UTC
Shipped in 2.6.16-rc1-git6 -- closing.