Latest working kernel version: Earliest failing kernel version: Distribution: gentoo Hardware Environment: lenovo 3000 n100 Software Environment: Problem Description: Sometimes (but not everytime) after resuming from hibernate, cat /proc/acpi/thermal_zone/TZ00/temperature says that CPU temp is 70C and fan blows cold air. But coretemp says that CPU temp is about 40 C. Also I always get following message on dmesg during loading thermal module: ACPI Exception (thermal-0471): AE_NOT_FOUND, Invalid active threshold [0] [20070126]. Reloading thermal module doesn't help, and I have no such bug in windows. Steps to reproduce: put laptop into hibernate resume from hibernate
Created attachment 15213 [details] acpidump output Here's output produced by acpidump
Here's similar problem for ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/114312 but there fan doesn't work after resuming from hibernate, while I have opposite problem - fan doesn't stop after resuming :(
Hi, Vasily From the description it seems that there are two problems about your laptop. a. The exception warning message: > The ACPI Exception (thermal-0471): AE_NOT_FOUND, Invalid active threshold [0] This is caused by the BIOS. In your laptop there exists the _AC0 object.But the AL0 object doesn't exist. Of course the message is harmless. It only reports the problem of the BIOS. b. disaccord between the temperature from coretemp and the one from proc I/F after the system is resumed from S3. Will you please confirm whether they are the same before suspend? Thanks.
Yes, before suspend all is ok, and temperatures showed by coretemp and proc I/F are almost the same. But _sometimes_ (not everytime) after resuming from suspend thermal becomes mad, and it shows always 70C (it locked at 70C!), but coretemp doesn't. CPU temp definitely isn't 70C, because fan begins to blow cold air. Actually I don't care what thermal module says, but noise from working fan is very annoying. I've noticed if bug was reproduced (thermal shows 70C) and if I heat CPU to ~63-65C, thermal unlocks from 70C, and then it shows correct temperature.
This smells like a platform EC firmware or sensor issue. Is the system running the latest BIOS? Does this happen only after hibernate, or does it happen also after suspend to RAM? It would be interesting if we could see if Windows behaves any better on this system. please attach the complete output from dmesg -s64000 pleaes attach the output from dmidecode please paste the output from cat /proc/cpuinfo It would be intersting if the hwmon Digital Thermal Sensor support works on this system, have you tried CONFIG_SENSORS_CORETEMP?
Created attachment 15250 [details] output produced by dmidecode
Created attachment 15252 [details] dmesg output I don't think it will be somehow useful, as it doesn't contains any interesting info :(
Created attachment 15253 [details] cpuinfo
>This smells like a platform EC firmware or sensor issue. Is it possible to make some workaround? :( I don't think that it's warranty case :( >Is the system running the latest BIOS? Yep, it's running latest BIOS I found on lenovo site >Does this happen only after hibernate, or does it happen >also after suspend to RAM? Only for hibernate, can't reproduce bug with suspend to ram >It would be interesting if we could see if Windows behaves >any better on this system. I _do not_ have this issue in windows >It would be intersting if the hwmon Digital Thermal Sensor >support works on this system, have you tried CONFIG_SENSORS_CORETEMP? Yep, coretemp supported, and it shows correct info. Btw, I don't care what thermal says, I just want my fan not to work when CPU is not hot :) Does fan activity managed by bios or it's possible to manipulate with fan (turn on\turn off) from userspace?
Hi, vasily, sorry for the delay. By reading the acpidump you attached, I think that there is not ACPI fan control on your laptop, which means that the fan can *not* be controlled via ACPI. Sorry we can not help you on this. But I still suggest you to set CONFIG_HWMON and run lmsensor on your laptop to see if there is some difference.
Created attachment 16512 [details] patch: add hooks to save/restore arch specific pages during hibernation
Created attachment 16513 [details] patch: x86_64: save/restore ACPI DATA/NVS memory during hibernation
Created attachment 16514 [details] patch: x86_32: save/restore ACPI DATA/NVS memory during hibernation
vasily, please download the latest kernel source, say 2.6.26-rc6, apply this patch set and see if it helps. Please attach the dmesg after hibernation whether it works or not.
Ok, I'm testing it. Btw, now I'm using swsusp and problem occurs very rarely. I'll report in ~week whether I can reproduce bug with your patches
Oops, I mean uswsusp :)
Created attachment 16541 [details] patch: save/restore ACPI NVS during hibernation
Created attachment 16542 [details] x86_64: mark ACPI NVS regions
Created attachment 16543 [details] x86_32: mark ACPI NVS regions
there must be a simpler way to do this, say by modifying the part of hibernate that decides what part of the e820 map to save and restore.
Created attachment 16555 [details] patch: save/restore ACPI NVS during hibernation oops, I've attached the wrong patch, please try this one
Hmmm... there's a typo: drivers/acpi/sleep/main.c: In function ‘acpi_hibernation_notifier_cb’: drivers/acpi/sleep/main.c:321: error: ‘PM_POST_RESOTRE’ undeclared (first use in this function) drivers/acpi/sleep/main.c:321: error: (Each undeclared identifier is reported only once drivers/acpi/sleep/main.c:321: error: for each function it appears in.) make[3]: *** [drivers/acpi/sleep/main.o] Error 1 make[2]: *** [drivers/acpi/sleep] Error 2 make[1]: *** [drivers/acpi] Error 2 I am doing the obvious change, then retry
Hi, Vasily, according to http://bugzilla.kernel.org/show_bug.cgi?id=10482#c34 your problem can not be reproduced on another lenovo 3000 n100 laptop. please attach your dmidecode
Olav Morken's post begins with "My laptop is a Thinkpad Z61t" and it isn't a lenovo 3000 n100, does it? Btw, I've already attached dmidecode output
Vasily, could you please try the patch attached?
any updates? vasily, please try the patches attached, or else I have to reject this bug and mark it as INSUFFICIENT_DATA. :)
Sorry for long response, I've tested patch, it seems it didn't help - I still have same problem, but for some reason now it's quite harder to reproduce it.
Sorry that I lost the track of this bug. can you re-produce it with the latest kernel release?
Yes, it's reproducible with 2.6.27.4 on 32bit system. Now I'm testing it on 64bit
Created attachment 19023 [details] customized DSDT: debug _TMP method please recompile the kernel with CONFIG_ACPI_DEBUG=y and this customized DSDT used, boot and attach the dmesg output when the _TMP is valid & invalid.
Ok, but how to use this customized DSDT?
hah, my fault. :p please look at this page: http://www.lesswatts.org/projects/acpi/overridingDSDT.php
Created attachment 19081 [details] dmesg with overrided DSDT after several suspend/resume cycles Tried custom DSDT, but I can't see any _TMP messages on dmesg. Here's some options from my kernel config: CONFIG_ACPI_DEBUG=y # CONFIG_ACPI_DEBUG_FUNC_TRACE is not set CONFIG_ACPI_CUSTOM_DSDT_FILE="/usr/src/linux/DSDT.hex" CONFIG_ACPI_CUSTOM_DSDT=y Also custom DSDT breaks acpi_cpufreq: modprobe acpi_cpufreq produces: FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.27-gentoo-r4-anarsoul/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device dmesg attached
1. please boot without acpi_no_auto_ssdt, 2. please boot with acpi.debug_level=0x03 acpi.debug_layer=0xffffffff. 3. run "cat /proc/acpi/thermal_zone/TZ00/temperature" both before and after hibernate and attach the dmesg output. 4. attach the dmesg output when you read a wrong temperature after hibernation.
any update? BTW: there is a couple of hibernation fixes recently, so please try the customized DSDT in the latest kernel.
Sorry, really had not time to test it (pre-exam period in university), I hope I'll try it before this weekend
ping vasily. :)
Created attachment 19521 [details] dmesg with acpi_debug enabled Pong :) Here it is, after resume themperature shown in /proc/acpi/thermal_zone/TZ00/temperature is incorrect (57 C), correct temperature is 43 C, shown in /sys/class/hwmon/hwmon2/device/temp1_input (coretemp)
Btw, suspend_to_ram after "incorrect temperature issue" makes all things work OK. It seems to me that some memory region that should be reserved is being overwritten during resume from suspend_to_disk.
Bug seems to be fixed with customized DSDT table (I've changed _TMP method to take into account only DTS1). At least I can't reproduce it for ~18 days. Thanks for pointing right direction ;)
I can't reproduce this bug with my customized DSDT table for ~1.5 month. So I think it's time to close it.