Bug 43284
Description
Suloev Dmitry
2012-05-23 19:35:44 UTC
> Mobile AMD Sempron(tm) Processor 3800+
Please run the following command on the failing
kernel and capture the output:
grep . /sys/devices/system/cpu/cpu*/cpufreq/*
Then please run the same kernel on the latest
working kernel and capture the output.
This will tell us if something changed, like the
cpufreq driver binding to this system etc.
Problem in commit 9bcb8118965ab4631a65ee0726e6518f75cda6c5 After reverting of this commit all problem go away. Created attachment 73537 [details]
Original linux 3.4.1
Created attachment 73538 [details]
Linux 3.4.1 with reverting commit 9bcb8118965ab4631a65ee0726e6518f75cda6c5
Dmitry, is your laptop over-heating? Please monitor: grep . /sys/class/thermal/thermal*/* please also attach the output from acpidump (In reply to comment #5) > Dmitry, > is your laptop over-heating? > Please monitor: > > grep . /sys/class/thermal/thermal*/* Looks like what all fine. Maybe it's because I compile kernel before test? Created attachment 73564 [details]
output for grep -r . /sys/class/thermal/thermal*/*
Created attachment 73565 [details]
Linux 3.3.7 acpidump output
Created attachment 73566 [details]
Linux 3.4.0 acpidump output
Created attachment 73567 [details]
Linux 3.4.0+ acpidump output (with reverting commit 9bcb8118965ab4631a65ee0726e6518f75cda6c5)
Adding Matthew. Created attachment 74811 [details]
path for fixing issue on my machine
Hi. There's a thread for the same bug in linux kernel ML: * lkml.org: http://lkml.org/lkml/2012/5/24/122 http://lkml.org/lkml/2012/9/27/478 or * gmane.org: http://comments.gmane.org/gmane.linux.kernel/1302487 http://comments.gmane.org/gmane.linux.kernel/1303848 http://comments.gmane.org/gmane.linux.kernel/1305135 Same, by saying that the commit 9bcb8118965ab4631a65ee0726e6518f75cda6c5 is "bad", and it's for HP Compaq 6715b. My 6715s is affected, too. Jason Van Dias proposed a patch[1], but it seems that it's not accepted due to slight lack of quality. Thanks beforehand. [1]: http://thread.gmane.org/gmane.linux.kernel/1324409/focus=1324503 I meant *one* thread in the ML, but archives are split into several entries. Created attachment 83791 [details] Patch to fix the issue. For commit log: Read temperature before trip points for HP Compaq 6715b and HP Compaq 6715s. Originally written by Jason Vas Dias. See http://thread.gmane.org/gmane.linux.kernel/1324409/focus=1324503 ---------- Additional description: Tested for linux-3.4.9 on HP Compaq 6715s. The patch successfully applies to the git HEAD with offsets, but I haven't tried to compile it. Changes from Jason Vas Dias's one are: * HP Compaq 6715s is detected, too. The original only fixes for HP Compaq 6715b. (Notice the suffix "s" and "b".) * Dropped module_param which Jason included for this fix. Borislav Petkov and Rusty Russell said a module_param is undesirable here. * Changed the new variable type from bool to int, and the name from temp_b4_trip to temp_before_trip * Used checkpatch.pl. Now it doesn't give any error, nor any warning. (But the indent of the second hunk looks unnatural. Dunno why.) > HP Compaq 6715s is detected, too. The original only fixes for HP Compaq > 6715b. (Notice the suffix "s" and "b".) I would remove the s/b alltogether. It's shorter (only one entry). If there is another, similar model with whatever other suffix, it's likely effected and then fixed as well or at least the chance that it causes harm is as good as zero... You can add a: Reviewed-by: Thomas Renninger <trenn@suse.de> if you like to... Created attachment 83951 [details] New patch, following Documentation/SubmittingPatches Thank you very much, Thomas. The new patch doesn't distinguish 6715b and 6715s. Both use the same chipset, RS690T for northbridge and SB600 for southbridge, according to HP's documentation[1] and (seemingly) from the above lspci output. (I don't know what's chipset.) If it's better to write the chipset info into the code, I'll rewrite. In arch/x86/kernel/early-quirks.c there's a fix for SB600, so this bug may matter non-HP PCs, too. Indent is fixed too. It looks natural now. Recompiled and seems to work. The patch is produced by git-format-patch, and I think I followed /usr/src/linux/Documentation/SubmittingPatches. Used checkpath.pl. For recording, let me be precise on the symptom of this bug, slowing down, is separate from cpu frequency fixing. Even if you limit it to the lowest frequency from the beginning, it slows down much. BTW two extra proposals: 1. Style: other two quirk supporter, static int thermal_tzp, and thermal_psv, have strings broken into two lines. If you want, I can create another patch to combine them into one. 2. Removal of redundant code: In 2008, Andreas Herrmann proposed[2] to remove some quirks for 6715b and other HP PCs in arch/x86/kernel/acpi/boot.c. These codes became redundant because of new codes in arch/x86/kernel/early-quirks.c, but doesn't harm, because it simply detects twice. Andreas later said "let's keep it for the next version", and probably they forgot. (The bug they'd fixed also was reported by Jason Vas Dias.[3] Thanks Jason indeed!) I hit upon it yesterday. If you want, I'll try to update and resend Andreas' code. I tested in the opposite way - My 6715s was not detected by the supposedly redundant code, so I added 6715s there, instead of removing the entire code, and it doesn't change the situation. So these codes must in fact be useless. Best regards. [1] p. 1 of: HP Compaq 6715b and 6715s Notebook PC, HP Compaq 6710b and 6710s Notebook PC http://h10032.www1.hp.com/ctg/Manual/c02834030.pdf (6710's chipsets are different from 6715's.) [2] http://lkml.indiana.edu/hypermail/linux/kernel/0810.2/0126.html [3] https://bugzilla.kernel.org/show_bug.cgi?id=11516 I test patch. It's work! Thx! (In reply to comment #8) > Created an attachment (id=73564) [details] > output for grep -r . /sys/class/thermal/thermal*/* is this got from the broken kernel? Created attachment 87071 [details]
workaround for _TMP and _CRT/_HOT/_PSV/_ACx dependency
please check if this patch helps.
It works. I tested it agaist linux-3.4.9's original code on HP Compaq 6715s, booting three times and every time it was ok. Thanks! Created attachment 87171 [details]
debug patch V2
thanks for the testing.
As the previous patch may bring some redundant warning messages, I cooked up a new one and will push it upstream if it works for you.
please test it. :)
This works too under the same condition. Tried two boots. FYI: Both of your patches don't emit any extra dmesg lines. 非常感謝 (Thank you very much!) you're welcome. BTW, could you please attach the output of "grep . /sys/class/thermal/*/*" and "grep . /sys/devices/system/cpu/cpu*/cpufreq/*" both with and without this patch? Created attachment 87231 [details]
Output of grep . /sys/devices/system/cpu/cpu*/cpufreq/*
Both versions match.
Created attachment 87241 [details]
"Bad" output of grep . /sys/class/thermal/*/*
Created attachment 87251 [details]
"Good" output of grep . /sys/class/thermal/*/*
For my dayly use, cooling_device3/cur_state and trip_point_5_temp fluctuate, but other parameters (except current temperatures) don't. Obviously it's wrong.
Let me be precise on the cpufreq info. With the bad kernel, the symptom, slowing down, does not present itself immediately after booting, but after a couple of minutes it does. I don't know what's the trigger. The fan seems to work correctly. The output of grep . /sys/devices/system/cpu/cpu*/cpufreq/* was correct right after booting even with the bad kernel. But after it happened, the /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq field dropped from 1800000, the correct value, to 800000, the minimum freq. But the pc works much slower than the cpufreq fixed to the minimum. A simple command like "ls" takes a second or so. problem fixed by https://patchwork.kernel.org/patch/1812481/ A patch referencing this bug report has been merged in Linux v3.8-rc1: commit 261cba2deb7d3bebd180c35d5dbf8961f6e9afc4 Author: Zhang Rui <rui.zhang@intel.com> Date: Tue Nov 27 20:42:11 2012 +0100 ACPI / thermal: _TMP and _CRT/_HOT/_PSV/_ACx dependency fix Thx a lot! I think this patch should be backported to the 3.4 series, which is a "long-term", because it fixes a regression introduced in 3.4, but are there any plan? And perhaps to the 3.7 ? It was skipped from 3.4.25 through 3.4.27, and 3.7.2 through 3.7.4. this patch shipped in 3.8-rc. It was not marked for .stable -- perhaps because the author wanted to make sure it survived in 3.8 before back-porting. It sounds like if it survives 3.8, then it should be sent to the various kernel trees back to 3.4. But as this bugzilla is for tracking upstream rather than back-ports, this bug report is closed. |