Latest working kernel version: None found Earliest failing kernel version: 2.6.22 perhaps (Ubuntu gutsy did not exhibit this problem) Distribution: Ubuntu Hardy Hardware Environment: Intel Core 2 Duo T7500 Software Environment: ubuntu kernel 2.6.24-19 and vanilla 2.6.25.7 Problem Description: Regardless of the system load, the "scaling_max_freq" value will suddently, but quickly, decrease (pass through all available middle frequencies) until it reaches the lowest available frequency. Then, it will stay there for a duration of time (more than 10 minutes), regardless of the system load. During this time, it is not possible to raise the cpu frequency in any way. switching the governor from "ondemand" to "performance" has no effect (probably because of the limiting factor of "scaling_max_freq"). After this undefined period of time is over, the value of the "scaling_max_freq" will quickly rise to its maximum value, passing through each available frequency in between. This cycle of maximum frequency change repeats indefinitely, at seemingly random intervals. Furthermore, it is not possible to change the "scaling_max_freq" value. Although echoing a new valid value with not produce any errors, a 'cat' immediately after the echo will show the old value. Steps to reproduce: Just wait. Also seems to occur when the system is under load (ironically). I've also posted a launchpad bug here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/242006 This bug is probably related, but is not a duplicate of these bugs: http://bugzilla.kernel.org/show_bug.cgi?id=10383 http://bugzilla.kernel.org/show_bug.cgi?id=9919 Addng acpi_oci="!Windows 2006" does not remove the problem.
Created attachment 16578 [details] The ver_linux script output. I've removed the nvidia module before reproducing, just in case
hmm, I think you can add a dump_stack in cpu_update_policy to see who is keeping on changing the scaling_max_freq? something like: --- drivers/cpufreq/cpufreq.c | 1 + 1 file changed, 1 insertion(+) Index: linux-2.6/drivers/cpufreq/cpufreq.c =================================================================== --- linux-2.6.orig/drivers/cpufreq/cpufreq.c 2008-06-17 06:35:42.000000000 +0800 +++ linux-2.6/drivers/cpufreq/cpufreq.c 2008-06-23 10:25:13.000000000 +0800 @@ -1722,6 +1722,7 @@ if (unlikely(lock_policy_rwsem_write(cpu))) return -EINVAL; + dump_stack(); dprintk("updating policy for CPU %u\n", cpu); memcpy(&policy, data, sizeof(struct cpufreq_policy)); policy.min = data->user_policy.min;
Will you please attach the output of acpidump? Will you please enable the CONFIG_CPU_FREQ_DEBUG in kernel configuration and boot the system with the option of "cpufreq.debug=7"? After the system is booted, please change the cpufreq governor several times and attach the output of dmesg. Thanks.
Does this make the changes to scaling_max_freq stop: # echo 1 > /sys/module/processor/parameters/ignore_ppc (or boot with processor.ignore_ppc=1) In addition to the acpidump requested above, please build with CONFIG_ACPI_DEBUG=y and attach the complete output from dmesg -s64000
Created attachment 16584 [details] cpufreq.debug=7 and CONFIG_CPU_FREQ_DEBUG are set. governors are not set yet
Created attachment 16585 [details] more complete dump of the syslog. showing complete cycle from 2201MHz to 800 and back. Still no change from the default ondemand governor
echo 1 > /sys/module/processor/parameters/ignore_ppc stops the scaling_max_freq jumping, though the system tends to totally freeze sometimes now
More info: setting ignore_ppc to 1 does not stop the problem from occurring. Under heavy load, the system decided to bring the cpu cores down from 2200 to 1200. I don't know whether the actual scaling_max_freq was changed, or whether just the scaling frequency was changed by the ondemand governor. Before I could see whether the scaling_max_freq was changed, the system froze :\ The core temp for both was around 60-65C, which is pretty normal for these cores in this laptop. I don't think it froze because of overheating. Also, according to the string in 'processor_preflib.c', this option is used if the BIOS is the culprit. However, the cores were scaled correctly with 2.6.22, and now bios updates have been installed.
Will you please attach the following outputs? acpidump --addr 0x7fe6e4f2 --length 0x286 -o cpu0ist acpidump --addr 0x7fe6de88 --length 0x5e5 -o cpu0cst acpidump --addr 0x7fe6e778 --length 0xc4 -o cpu1ist acpidump --addr 0x7fe6e46d --length 0x85 -o cpu1cst Will you please change the cpufreq governor from ondemand to performance and see whether the problem still exists? Had better boot the system with the option of "cpufreq.debug=7" and attach the output of dmesg. Thanks.
Created attachment 16677 [details] cpu[01][ic]st files
Created attachment 16678 [details] dmesg output, after switching from ondemand to performance Changing the governor to performance doesn't help. 'scaling_max_freq' is still being changed, and performance just uses that value (from what I gather). The dmesg output includes changing the governor to performance.
Hi, Viktor Thanks for the info. It seems that this issue is related with BIOS. When the system is running, BIOS often sends the notification event(0x80) , which causes that OS will evaluate the _PPC object and get the new performance limit. Then OS will use the new limit to update the cpufreq policy. Of course the scaling_max_freq will be changed as the change of performance limit. If the boot option of "processor.ignore_ppc" is added, OS won't update the cpufreq policy according to the change of _PPC object and the scaling_max_freq can be normal. From the acpidump it seems that there is no cooling device when the temperate reaches some conditions. There only exists the following objects under the scope of thermalzone : _CRT, _TMP. If the temperature returned by _TMP object is greater than the _CRT, the system will be shutdown.Maybe this is related with that the system freezes. Anyway, will you please attach the following output? cat /proc/acpi/thermal_zone/THM/* Thanks.
Created attachment 16722 [details] dumps of thermal_zone during various system states These are the output of the files in the thermal_zone/THM/, without processor.ignore_ppc. Please let me know if I need to make another batch with that option turned on. the starting_thermal file contains the state when the system first booted up. starting_minor_load_2200 is state with little cpu load with the max freq set to 2200. The other two files are with max cpu load and with scaling_max_freq set to the given frequency
1. the "scaling_max_freq" change is caused by BIOS/Hardware. IMO, it's not a Linux kernel bug, thus I'm afraid we can not change this behaviour. 2. no_ppc freezes system. This is weird. would you please try to boot with "idle=poll" and then "echo 1 > /sys/module/processor/parameters/ignore_ppc", and see if the system still freezes?
Turns out the unstable system was due to faulty hardware. So I will keep using ignore_ppc in the future. I guess this bug can be closed now. Are there any side effects to ignore_ppc?
Hi, Viktor Thanks for the quick response. To use the "ignore_ppc" is harmless in theory. But you had better confirm whether the temperature is below the critical threshold under heavy load when the system is in max cpufreq(2200). If the temperature is still below the critical threshold, it is harmless. As the issue is related with the BIOS and can be suppressed by the ignore_ppc, the bug will be rejected and marked as "Documented". Thanks.