Bug 10952 - "scaling_max_freq" constantly shifting between minimum and maximum frequency
Summary: "scaling_max_freq" constantly shifting between minimum and maximum frequency
Status: REJECTED DOCUMENTED
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-22 10:31 UTC by Viktor Kojouharov
Modified: 2008-08-28 18:30 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.25.7
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
The ver_linux script output. I've removed the nvidia module before reproducing, just in case (1.81 KB, text/plain)
2008-06-22 10:32 UTC, Viktor Kojouharov
Details
cpufreq.debug=7 and CONFIG_CPU_FREQ_DEBUG are set. governors are not set yet (111.70 KB, application/octet-stream)
2008-06-23 11:07 UTC, Viktor Kojouharov
Details
more complete dump of the syslog. showing complete cycle from 2201MHz to 800 and back. Still no change from the default ondemand governor (71.37 KB, application/octet-stream)
2008-06-23 11:08 UTC, Viktor Kojouharov
Details
cpu[01][ic]st files (930 bytes, application/octet-stream)
2008-07-01 10:23 UTC, Viktor Kojouharov
Details
dmesg output, after switching from ondemand to performance (122.41 KB, application/octet-stream)
2008-07-01 10:31 UTC, Viktor Kojouharov
Details
dumps of thermal_zone during various system states (330 bytes, application/x-bzip)
2008-07-03 10:32 UTC, Viktor Kojouharov
Details

Description Viktor Kojouharov 2008-06-22 10:31:11 UTC
Latest working kernel version: None found
Earliest failing kernel version: 2.6.22 perhaps (Ubuntu gutsy did not exhibit this problem)
Distribution: Ubuntu Hardy
Hardware Environment: Intel Core 2 Duo T7500
Software Environment: ubuntu kernel 2.6.24-19 and vanilla 2.6.25.7
Problem Description: 

Regardless of the system load, the "scaling_max_freq" value will suddently, but quickly, decrease (pass through all available middle frequencies) until it reaches the lowest available frequency. Then, it will stay there for a duration of time (more than 10 minutes), regardless of the system load. During this time, it is not possible to raise the cpu frequency in any way. switching the governor from "ondemand" to "performance" has no effect (probably because of the limiting factor of "scaling_max_freq"). After this undefined period of time is over, the value of the "scaling_max_freq" will quickly rise to its maximum value, passing through each available frequency in between.

This cycle of maximum frequency change repeats indefinitely, at seemingly random intervals. Furthermore, it is not possible to change the "scaling_max_freq" value. Although echoing a new valid value with not produce any errors, a 'cat' immediately after the echo will show the old value.

Steps to reproduce:

Just wait. Also seems to occur when the system is under load (ironically).

I've also posted a launchpad bug here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/242006

This bug is probably related, but is not a duplicate of these bugs:
http://bugzilla.kernel.org/show_bug.cgi?id=10383
http://bugzilla.kernel.org/show_bug.cgi?id=9919

Addng acpi_oci="!Windows 2006" does not remove the problem.
Comment 1 Viktor Kojouharov 2008-06-22 10:32:29 UTC
Created attachment 16578 [details]
The ver_linux script output. I've removed the nvidia module before reproducing, just in case
Comment 2 Zhang Rui 2008-06-22 19:34:51 UTC
hmm, I think you can add a dump_stack in cpu_update_policy to see who is keeping on changing the scaling_max_freq?
something like:
---
 drivers/cpufreq/cpufreq.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.orig/drivers/cpufreq/cpufreq.c    2008-06-17 06:35:42.000000000 +0800
+++ linux-2.6/drivers/cpufreq/cpufreq.c 2008-06-23 10:25:13.000000000 +0800
@@ -1722,6 +1722,7 @@
        if (unlikely(lock_policy_rwsem_write(cpu)))
                return -EINVAL;

+       dump_stack();
        dprintk("updating policy for CPU %u\n", cpu);
        memcpy(&policy, data, sizeof(struct cpufreq_policy));
        policy.min = data->user_policy.min;
Comment 3 ykzhao 2008-06-22 20:07:50 UTC
   Will you please attach the output of acpidump?
   Will you please enable the CONFIG_CPU_FREQ_DEBUG in kernel configuration and boot the system with the option of "cpufreq.debug=7"? 
   After the system is booted, please change the cpufreq governor several times and attach the output of dmesg.
Thanks.
Comment 4 Len Brown 2008-06-23 09:47:28 UTC
Does this make the changes to scaling_max_freq stop:

# echo 1 > /sys/module/processor/parameters/ignore_ppc 

(or boot with processor.ignore_ppc=1)

In addition to the acpidump requested above, please
build with CONFIG_ACPI_DEBUG=y and attach the complete
output from dmesg -s64000
Comment 5 Viktor Kojouharov 2008-06-23 11:07:55 UTC
Created attachment 16584 [details]
cpufreq.debug=7 and CONFIG_CPU_FREQ_DEBUG are set. governors are not set yet
Comment 6 Viktor Kojouharov 2008-06-23 11:08:40 UTC
Created attachment 16585 [details]
more complete dump of the syslog. showing complete cycle from 2201MHz to 800 and back. Still no change from the default ondemand governor
Comment 7 Viktor Kojouharov 2008-06-24 00:14:11 UTC
echo 1 > /sys/module/processor/parameters/ignore_ppc 
stops the scaling_max_freq jumping, though the system tends to totally freeze sometimes now
Comment 8 Viktor Kojouharov 2008-06-24 14:47:52 UTC
More info:

setting ignore_ppc to 1 does not stop the problem from occurring.

Under heavy load, the system decided to bring the cpu cores down from 2200 to 1200. I don't know whether the actual scaling_max_freq was changed, or whether just the scaling frequency was changed by the ondemand governor. Before I could see whether the scaling_max_freq was changed, the system froze :\ The core temp for both was around 60-65C, which is pretty normal for these cores in this laptop. I don't think it froze because of overheating.

Also, according to the string in 'processor_preflib.c', this option is used if the BIOS is the culprit. However, the cores were scaled correctly with 2.6.22, and now bios updates have been installed.
Comment 9 ykzhao 2008-06-29 20:33:01 UTC
Will you please attach the following outputs?
    acpidump --addr 0x7fe6e4f2 --length 0x286 -o cpu0ist
    acpidump --addr 0x7fe6de88 --length 0x5e5 -o cpu0cst
    acpidump --addr 0x7fe6e778 --length 0xc4  -o cpu1ist
    acpidump --addr 0x7fe6e46d --length 0x85  -o cpu1cst
    
Will you please change the cpufreq governor from ondemand to performance and see whether the problem still exists? Had better boot the system with the option of "cpufreq.debug=7" and attach the output of dmesg.

Thanks.
         
Comment 10 Viktor Kojouharov 2008-07-01 10:23:03 UTC
Created attachment 16677 [details]
cpu[01][ic]st files
Comment 11 Viktor Kojouharov 2008-07-01 10:31:34 UTC
Created attachment 16678 [details]
dmesg output, after switching from ondemand to performance

Changing the governor to performance doesn't help. 'scaling_max_freq' is still being changed, and performance just uses that value (from what I gather).

The dmesg output includes changing the governor to performance.
Comment 12 ykzhao 2008-07-03 02:42:55 UTC
Hi, Viktor
    Thanks for the info.
    It seems that this issue is related with BIOS. When the system is running, BIOS often sends the notification event(0x80) , which causes that OS will evaluate the _PPC object and get the new performance limit.  Then OS will use the new limit  to update the cpufreq policy. Of course the scaling_max_freq will be changed as the change of performance limit.  If the boot option of "processor.ignore_ppc" is added, OS won't update the cpufreq policy according to the change of _PPC object and the scaling_max_freq can be normal.

    From the acpidump it seems that there is no cooling device when the temperate reaches some conditions. There only exists the following objects under the scope of thermalzone : _CRT, _TMP. If the temperature returned by _TMP object is greater than the _CRT,  the system will be shutdown.Maybe this is related with that the system freezes.
    Anyway, will you please attach the following output?
    cat /proc/acpi/thermal_zone/THM/*

    Thanks.
   
    
    
Comment 13 Viktor Kojouharov 2008-07-03 10:32:57 UTC
Created attachment 16722 [details]
dumps of thermal_zone during various system states

These are the output of the files in the thermal_zone/THM/, without processor.ignore_ppc. Please let me know if I need to make another batch with that option turned on.

the starting_thermal file contains the state when the system first booted up. starting_minor_load_2200 is state with little cpu load with the max freq set to 2200. The other two files are with max cpu load and with scaling_max_freq set to the given frequency
Comment 14 Zhang Rui 2008-08-28 00:24:45 UTC
1. the "scaling_max_freq" change is caused by BIOS/Hardware. IMO, it's not a Linux kernel bug, thus I'm afraid we can not change this behaviour.
2. no_ppc freezes system. This is weird. would you please try to boot with
"idle=poll" and then "echo 1 > /sys/module/processor/parameters/ignore_ppc", and see if the system still freezes?
Comment 15 Viktor Kojouharov 2008-08-28 02:38:13 UTC
Turns out the unstable system was due to faulty hardware. So I will keep using ignore_ppc in the future. I guess this bug can be closed now. Are there any side effects to ignore_ppc?
Comment 16 ykzhao 2008-08-28 18:30:47 UTC
Hi, Viktor
    Thanks for the quick response.
    To use the "ignore_ppc" is harmless in theory. But you had better confirm whether the temperature is below the critical threshold under heavy load when the system is in max cpufreq(2200). If the temperature is still below the critical threshold, it is harmless. 
    
    As the issue is related with the BIOS and can be suppressed by the ignore_ppc, the bug will be rejected and marked as "Documented".
    Thanks.

Note You need to log in before you can comment on or make changes to this bug.