Latest working kernel version: 2.6.21.7 Earliest failing kernel version: Distribution: Gentoo Hardware Environment: HP Pavillion DV8000t Software Environment: Problem Description: I have upgraded my system from 2.6.21.7 to 2.6.27 and now my laptop keeps shutting down with ACPI – Critical Trip Point when i try to use any cpu intensive app. i dont think this is a real overheating problem cause i made a test in windows running stress programs for a few hours and it worked just fine. What else information should i provide to help fix this ?
Will you please compile the drivers/acpi/thermal as built-in kernel and add the boot option of "thermal.nocrt=1"? After the system is booted, please cat the output of /proc/acpi/thermal/*/* Of course please attach the output of acpidump. Thanks.
Created attachment 18740 [details] acpidump acpidump
There is no /proc/acpi/thermal/*/ , so i think the path is: root@gentoo ~ # cat /proc/acpi/thermal_zone/TZ0*/* <setting not supported> <polling disabled> state: ok temperature: 55 C critical (S5): 99 C <disabled> <setting not supported> <polling disabled> state: ok temperature: 27 C critical (S5): 105 C <disabled>
Will you please attach the following output ? ./acpidump --addr 0x0xBFE93E4C --length 0x0100 -o gnvs Please add the boot option of "thermal.nocrt=1". Please try to use any cpu intensive app and then attach the output of "cat /proc/acpi/thermal_zone/TZ*/*" Thanks
Coretemp module shows CPU core0 at 86C and core1 at 82C when i cat this: <setting not supported> <polling disabled> state: critical temperature: 100 C critical (S5): 99 C <disabled> <setting not supported> <polling disabled> state: ok temperature: 27 C critical (S5): 105 C <disabled>
Created attachment 18753 [details] acpidump --addr 0x0xBFE93E4C --length 0x0100 -o gnvs acpidump --addr 0x0xBFE93E4C --length 0x0100 -o gnvs
(In reply to comment #6) > Created an attachment (id=18753) [details] > acpidump --addr 0x0xBFE93E4C --length 0x0100 -o gnvs > sorry, typo. please do the test like this: 0. boot into 2.6.27 kernel 1. "acpidump --addr 0xBFE93E4C --length 0x0100 -o gnvs" 2. "cat /proc/acpi/thermal_zone/*/* > temp" 3. re-do this test in 2.6.21 kernel then attach the four test results here. :)
> coretemp does this issue go away if the kernel is built with CONFIG_HWMON=n?
Sorry , since i deleted 2.6.21 kernel , i tested on 2.6.25.7 cause i already had the source. On Kernel 2.6.25.7 , machine does not shutdown . I will try to redownload 2.6.21 but i am o dial up connection so it will take a while. <setting not supported> <polling disabled> state: ok temperature: 89 C critical (S5): 99 C <disabled> <setting not supported> <polling disabled> state: ok temperature: 27 C critical (S5): 105 C <disabled>
Created attachment 18880 [details] gnvs_2.6.25.7 2.6.25.7 acpidump --addr 0xBFE93E4C --length 0x0100 -o gnvs
Len , i will do the test right now. sorry not post earlier but i could not reboot the machine. I am having lots of other problems with kernel 2.6.27, and most annoying after the auto reboot is my keyboard/mouse just stop working ...btw , where should i report this ?
From the acpidump it seems that the temperature of TZ01 thermal zone is obtained by the evaluating the following ACPI object: > Method (_TMP, 0, Serialized) { If (LEqual (\_SB.PCI0.LPCB.EC0.ECRY, One)) { If (DTSE) { Store (DTS2, Local1) If (LGreaterEqual (DTS1, DTS2)) { Store (DTS1, Local1) } Multiply (Local1, 0x0A, Local0) Add (Local0, 0x0AAC, Local0) Return (Local0) } } Return (0x0BB8) } It is related with the DTSE, DTS1,DTS2 object defined in GNVS memory region. From the gnvs it seems that the initial temperature of TZ01 thermal zone is 55. But why is the 100 degree returned by the TZ01 thermal zone after use the cpu intensive application? It is very strange. Will you please do the test as suggested by Len in comment #8? thanks.
CONFIG_HWMON=n changed nothing :(, it keeps shutting down. any other info i can provide ?
Thanks for the confirmation. It seems that the problem still exists even when the hardware monitor is disabled. From the acpidump it seems that the temperature of TZ01 thermal zone is related with the DTSE/DTS1/DTS2. And the DTS1/DTS2 are not changed by OS. It seems that they are changed by BIOS. From the problem description it seems that the box will be shutdown after running the CPU intensive application. And we don't know what happens when running the CPU intensive application.Very sorry that this bug can't be fixed by Linux ACPI. Will you please confirm whether the box will be shutdown if you don't running the CPU intensive application?
It does not shut down when idle. it is strange , why does this only happen with new kernels ? Something must be changed... Anyway , thanks. I will workaround by limiting the cpu speed ... at least for a while.
From the acpidump it seems that this issue is related with the BIOS. As this issue only happens on the new kernel, will you please use git-bisect to identify which commit the regression is caused by? Thanks.
ping salatiel.
pong. I was sure i had answer this thread :) I have no idea how to use git bisector :(
http://www.lesswatts.org/projects/acpi/debug.php Debug: How to Isolate Linux ACPI Issues there are several links about how to use git-bisect. :) maybe this would help. :)
salatiel, does the info in comment #19 help?
Hi Zhang , i am in a trip right now, i will post some info as soon as i get back.
salatiel, are you back and ready for the git-bisect? :)
Hi , i am back now , sorry , but i almost forgot this. Since i installed ubuntu 8.10 last week i didn`t have one single "trip point" shutdown. If it is still need i can start bisect next monday , after the parties :)
Hi, salatiel, can you reproduce this bug any more? I'll close this bug if it's not reproducible.
Ping Salatiel.... As there is no response for more than one month, the bug will be rejected. If the problem still exists, please use the git-bisect to identify the issue and attach the output of bisect.