Most recent kernel where this bug did *NOT* occur: None Distribution: Debian 4.0 - Etch (Final) Hardware Environment: HP/Compaq nc6000 (laptop), Pentium M 1.6 GHz, Intel ICH4, ATI Radeon Mobility 9600, 512 MB RAM, newest available BIOS Software Environment: Debian Etch, no self-compiled tools but the kernel Problem Description: This is gonna be quite a long but hopefully equally useful description: Shortly after I bought the laptop and installed Linux on it, I noticed that it sometimes turned itself off while I was away and did not turn off the computer. It took me quite a while to figure out what was wrong until the laptop shut down by itself while I was using it. The kernel printed a message that a ACPI critical trip point was reached and shut the system down to prevent damage. After I knew there was a temperature problem I wrote a script to log the temperatures and after it happend again I saw that the temperatures suddenly started to rise from about 40
>There are _four_ fans reported, but I only have _one_ real fan in the laptop. It's possible that four fans reported by ACPI stand for the different speed of one real fan. >Another crazy thing is that echoing the number "3" into the state files makes >the fan slow down sometimes, but no other number seems to work... That's right. echo "o"/"3" to the state file will turn the fan on/off. >The next annoying thing is that the real fan can run with various speeds, >but combining the virtual fans mostly just turns it either ON or OFF, nothing in between. Now you know the reason. :) > "ACPI: Unable to turn cooling device [c146aec8] 'on'" This seems to be the real problem to me. Please attach the result of #cat /proc/acpi/thermal_zone/*/* And better provide the dmesg output when the system begins to over-heat. Maybe a script that keeps on polling /proc/acpi/thermal_zone/xxx/tempeture will help you do this.
Created attachment 11309 [details] Output of 'cat /proc/acpi/thermal_zone/*/*'
Created attachment 11310 [details] Shows the temperatures increasing without fan kicking in before auto-shut-down
>>There are _four_ fans reported, but I only have _one_ real fan in the >laptop. >It's possible that four fans reported by ACPI stand for the different speed of >one real fan. May be possible, but is surely malfunctioning as well, see below. >>Another crazy thing is that echoing the number "3" into the state files makes >>the fan slow down sometimes, but no other number seems to work... >That's right. echo "o"/"3" to the state file will turn the fan on/off. If you mean the letter "o" will turn it on, you're right. I tried with a "0" (Zero) but this did not work ;) If the virtual fans represent different speeds of the real fan, there has to be something wrong as well: I got four virtual fans in /proc/acpi/fan named C20F, C210, C211 and C212. Usually only C212 is "on" and the fan blows as slow as it can go. echo "o" > /proc/acpi/fan/C20F ==> fan blows with full speed echo "3" > /proc/acpi/fan/C20F ==> fan won't slow down, though it's state file reports "off" again. echo "3" > /proc/acpi/fan/C212 ==> fan stops completely echo "o" > /proc/acpi/fan/C212 ==> fan starts to blow in slowest mode again Why does it not return to the slow mode when turning the first virtual fan off? The state files change correctly, but the real fan is not affected by the change until all virtual fans are turned off. >> "ACPI: Unable to turn cooling device [c146aec8] 'on'" >This seems to be the real problem to me. Yes, probably that's the biggest bug we're hunting here ;) >Please attach the result of >#cat /proc/acpi/thermal_zone/*/* Done. >And better provide the dmesg output when the system begins to over-heat. Maybe >a script that keeps on polling /proc/acpi/thermal_zone/xxx/tempeture will help >you do this. I attached a log of a script I wrote myself to record temperatures, the attached part shows the temperatures rise before the system shut itself down because reaching a critical trip point, that's what /var/log/messages recorded at the same time: Apr 26 12:53:43 shutdownm1 kernel: ACPI: Critical trip point Apr 26 12:53:44 shutdownm1 shutdown[27743]: shutting down for system halt So if you need anything else just tell me, and you should know that I appreciate your help a lot! BTW: Shutting down the system with the power button after playing with the fans did not work because acpid must have failed, I found out with the help of google - killing acpid reproduces this. At the moment I have disabled acpid completely to be sure that it does not interfere or even cause the bug though this seems unlikely to me.
>>That's right. echo "o"/"3" to the state file will turn the fan on/off. >If you mean the letter "o" will turn it on, you're right. Sorry, I mean "0"/"3". But anyway, all the invalid charactors are considered a "0", so the result is the same. :) >I tried with a "0" (Zero) but this did not work ;) I don't think so. I have a hpnx5000 on hand, and the ACPI in this box is pretty much the same as yours. I did some test and it seems to work well. >echo "o" > /proc/acpi/fan/C20F ==> fan blows with full speed >echo "3" > /proc/acpi/fan/C20F ==> fan won't slow down, >though it's state file reports "off" again. Make sure the other fans are off, and try again. I think you'll get something new. :) >Why does it not return to the slow mode when turning the first virtual fan off? >The state files change correctly, but the real fan is not affected by the >change until all virtual fans are turned off. Please turn on the fan C212,C211,C210,C20F in turn, and turn them off in the opposite order. I did this test and all the fans seem to work well. >>> "ACPI: Unable to turn cooling device [c146aec8] 'on'" >>This seems to be the real problem to me. >Yes, probably that's the biggest bug we're hunting here ;) Let's dig more about this. Please apply the debug patch I attached and #echo 0x1f >/sys/module/acpi/patameters/debug_level and attach the dmesg output after "ACPI: Unable to turn cooling device .... 'on', result is .." pops up.
Created attachment 11324 [details] debug patch: export ACPI device bus_id instead of acpi_handle
Created attachment 11325 [details] Add Thomas as this patch's author.
And please test if compiling CONFIG_MOUSE_PS2=m (and best not loading it) or compiling it out helps. :)
Seems you were right about the fan, turning them on and off in reverse order works to make the fan go faster and slower in steps, though there is no difference between C210 and C20F. C212, C211 and C210 however are different speeds of the fan for sure. But why does the fan _not_ slow down when C210 and C20F are "on" and C20F is turned off? This behaviour is 100% reproducable and I'm not sure if this is all good. I'm going to patch my kernel now and build it without CONFIG_MOUSE_PS2 - btw, how could this module make a difference?
ACPI: Transitioning device [C211] to D0 ACPI: Unable to turn cooling device [C211] 'on', result is -8 CONFIG_MOUSE_PS2 was still enabled, Kernel was 2.6.21 (final). Problem occured when C212 and C20F were on and C211 was switched on and off a few times. "bash: echo: write error: Exec format error" occured as well. As expected the fan did not kick in when producing load by compiling something. I stopped compiling when about 70
With CONFIG_MOUSE_PS2 disabled my RAID won't get detected anymore - though both hard drives which are part of my RAID1 array are detected properly and the partition type still reads "FD - Linux Raid Autodetect", /dev/md0 will not be created. Did we just discover another bug? What's so special about CONFIG_MOUSE_PS2 and why may it interfere with the ACPI subsystem or the RAID subsystem? I really don't see what connection may exist, but I tried a few times rebooting now - the kernel never detects my raid without CONFIG_MOUSE_PS2 but never fails with it. Is this a known issue (you told me to get rid of it, so you may know at least something I suppose) or should I open yet another bug report? :P
> What's so special about CONFIG_MOUSE_PS2 and why may it interfere with the > ACPI subsystem Hard to explain, the EC (Embedded Controller) touches mouse/kbd HW large parts of ACPI code make use of the EC(e.g. reading out temp, battery,...). The EC is a separated microcontroller with its own firmware, so don't ask why/where exactly the interference happens, it just happens, especially on the HPs which (as far as I can tell) seem to have broken EC firmware. > or the RAID subsystem This is really weird. Better recheck your .config file with previous ones... The patches that should fix up most mouse interference problems on HPs can be found here: bug #7689. Latest kernels should already have them included, but you might want to double check, that your's also has... Can you please also attach a full dmesg output (Have you experienced any AE_TIME errors in /var/log/messages?).
>Hard to explain, the EC (Embedded Controller) touches mouse/kbd HW large parts >of ACPI code make use of the EC(e.g. reading out temp, battery,...). The EC is >a separated microcontroller with its own firmware, so don't ask why/where >exactly the interference happens, it just happens, especially on the HPs which >(as far as I can tell) seem to have broken EC firmware. Crazy, but interesting. Do you think HP will fix this with a BIOS update or is this beyond the BIOS' functions? If there is no possible way to reflash this separate microcontrollers memory this is of course no option ;) >This is really weird. Better recheck your .config file with previous ones... I double-checked everything, the .config is identical with older kernels of mine except for CONFIG_MOUSE_PS2 - it's crazy, but my RAID won't function without a MOUSE_PS2-enabled kernel. I'm gonna ask the guys at linuxforen.de about this, a few of them know a lot as well - if they don't have a clue as well I'm gonna file another bug report about this. >The patches that should fix up most mouse interference problems on HPs can be >found here: bug #7689. Latest kernels should already have them included, but >you might want to double check, that your's also has... Thanks, I'll see if any of those patches are not in my kernel. BTW, it's plain Vanilla 2.6.21. >Can you please also attach a full dmesg output (Have you experienced any >AE_TIME errors in /var/log/messages?). "cat /var/log/messages | grep AE_" does not show anything at all. dmesg output attached.
Created attachment 11347 [details] dmesg output of HP nc6000 running 2.6.21
Though I have not had any trouble with the laptop for a few days now, the auto-shutdown problem still exists with 2.6.21 and acpid disabled - guess acpid is not involved after all. Just wanted to let you know that. But I can tell you at least one good thing, I identified the CONFIG_MOUSE_PS2 and RAID mystery's cause: without CONFIG_MOUSE_PS2 my external usb disks (of which my RAID consists) will be detected after the kernel checks if there are any disks which are part of a RAID array and therefor no RAID will be detected - with CONFIG_MOUSE_PS2 it's simply the other way round and everything works like a charm. For details check http://bugzilla.kernel.org/show_bug.cgi?id=8412 .
Since 2.6.22-rc1 I haven't had any more problems with the laptop, maybe the problem is fixed now? 2.6.21 still did auto-shutdowns, but 2.6.22-rc* seem to work a lot better for me. Just wanted to let you know about this.
Already fixed in the upstream kernel. Please reopen it if you still have some problems.
Thanks a lot, looks pretty fixed to me ;)
Hi, I just installed Debian 4.0 with the same machine. I have exactly the same problem but my kernel is still: $ uname -r 2.6.18-6-686 I am new to linux, could you please tell me how I could upgrade to 2.6.22-rc1? Thanks
I don't know what's the newest kernel available for stock Debian 4.0, but personally I don't recommend upgrading the kernel manually to any new/unexperienced Linux user. There is, however, a tool named "synaptic" included with the Debian Linux distribution (used for package management, so installing/uninstalling software provided by Debian). Look for this tool and see what's the latest kernel available there (you may want to search for "kernel" or "linux" in the available packages - then look for the newest kernel with the same suffix as yours, which would be "x.y.z-*-686". (The "*" stands for Debian-specific updates to that specific kernel version, so higher numbers are better for this field). If you still have trouble you may want to search with Google for a how-to (there are lots) or ask in a Linux support forum. A great German support forum is linuxforen.de, just in case you're from Germany.
Hi, I installed on my HP NC6000: Mandriva 2008 (2.6.24.4-laptop-lmnb) --- problem fixed Fedora 9 (2.6.25-14.fc9.i686) --- problem fixed Debian Lenny/sid (2.6.24-1-686) --- problem not fixed Why does the Debian kernel with higher version still have this problem? Thanks