Bug 10223 - lenovo 3000 n100 thermal+hibernate problem
Summary: lenovo 3000 n100 thermal+hibernate problem
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks: 7216
  Show dependency tree
 
Reported: 2008-03-11 06:37 UTC by Vasily Khoruzhick
Modified: 2009-02-15 12:49 UTC (History)
4 users (show)

See Also:
Kernel Version: exists in 2.6.20-2.6.24, didn't try older versions
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpidump output (106.91 KB, text/plain)
2008-03-11 06:38 UTC, Vasily Khoruzhick
Details
output produced by dmidecode (10.01 KB, text/plain)
2008-03-13 11:00 UTC, Vasily Khoruzhick
Details
dmesg output (32.58 KB, text/plain)
2008-03-13 11:05 UTC, Vasily Khoruzhick
Details
cpuinfo (1.19 KB, text/plain)
2008-03-13 11:07 UTC, Vasily Khoruzhick
Details
patch: add hooks to save/restore arch specific pages during hibernation (3.75 KB, patch)
2008-06-16 20:44 UTC, Zhang Rui
Details | Diff
patch: x86_64: save/restore ACPI DATA/NVS memory during hibernation (1.48 KB, patch)
2008-06-16 20:53 UTC, Zhang Rui
Details | Diff
patch: x86_32: save/restore ACPI DATA/NVS memory during hibernation (1.25 KB, patch)
2008-06-16 20:55 UTC, Zhang Rui
Details | Diff
patch: save/restore ACPI NVS during hibernation (5.16 KB, patch)
2008-06-19 01:38 UTC, Zhang Rui
Details | Diff
x86_64: mark ACPI NVS regions (1.42 KB, patch)
2008-06-19 01:42 UTC, Zhang Rui
Details | Diff
x86_32: mark ACPI NVS regions (1.26 KB, patch)
2008-06-19 01:43 UTC, Zhang Rui
Details | Diff
patch: save/restore ACPI NVS during hibernation (5.62 KB, application/octet-stream)
2008-06-19 18:21 UTC, Zhang Rui
Details
customized DSDT: debug _TMP method (159.12 KB, application/octet-stream)
2008-11-25 22:55 UTC, Zhang Rui
Details
dmesg with overrided DSDT after several suspend/resume cycles (10.40 KB, application/gzip)
2008-11-30 09:10 UTC, Vasily Khoruzhick
Details
dmesg with acpi_debug enabled (236.32 KB, application/octet-stream)
2008-12-29 04:45 UTC, Vasily Khoruzhick
Details

Description Vasily Khoruzhick 2008-03-11 06:37:10 UTC
Latest working kernel version:
Earliest failing kernel version:
Distribution: gentoo
Hardware Environment: lenovo 3000 n100
Software Environment:
Problem Description:

Sometimes (but not everytime) after resuming from hibernate, cat /proc/acpi/thermal_zone/TZ00/temperature says that CPU temp is 70C and fan blows cold air. But coretemp says that CPU temp is about 40 C. Also I always get following message on dmesg during loading thermal module: 
ACPI Exception (thermal-0471): AE_NOT_FOUND, Invalid active threshold [0] [20070126]. Reloading thermal module doesn't help, and I have no such bug in windows.

Steps to reproduce:
put laptop into hibernate
resume from hibernate
Comment 1 Vasily Khoruzhick 2008-03-11 06:38:21 UTC
Created attachment 15213 [details]
acpidump output

Here's output produced by acpidump
Comment 2 Vasily Khoruzhick 2008-03-11 06:53:51 UTC
Here's similar problem for ubuntu:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/114312
but there fan doesn't work after resuming from hibernate, while I have opposite
problem - fan doesn't stop after resuming :(
Comment 3 ykzhao 2008-03-11 20:03:59 UTC
Hi, Vasily 
   From the description it seems that there are two problems about your laptop.
a. The exception warning message:
 > The ACPI Exception (thermal-0471): AE_NOT_FOUND, Invalid active threshold [0]
   This is caused by the BIOS. In your laptop there exists the _AC0 object.But the  AL0 object doesn't exist. 
   Of course the message is harmless. It only reports the problem of the BIOS.
b. disaccord between the temperature from coretemp and the one from proc I/F after the system is resumed from S3.
   Will you please confirm whether they are the same before suspend? 
   
   Thanks.
 
Comment 4 Vasily Khoruzhick 2008-03-12 00:32:50 UTC
Yes, before suspend all is ok, and temperatures showed by coretemp and proc I/F are almost the same.

But _sometimes_ (not everytime) after resuming from suspend thermal becomes mad, and it shows always 70C (it locked at 70C!), but coretemp doesn't. CPU temp definitely isn't 70C, because fan begins to blow cold air. Actually I don't care what thermal module says, but noise from working fan is very annoying.

I've noticed if bug was reproduced (thermal shows 70C) and if I heat CPU to ~63-65C, thermal unlocks from 70C, and then it shows correct temperature.
Comment 5 Len Brown 2008-03-12 19:17:19 UTC
This smells like a platform EC firmware or sensor issue.
Is the system running the latest BIOS?

Does this happen only after hibernate, or does it happen
also after suspend to RAM?

It would be interesting if we could see if Windows behaves
any better on this system.

please attach the complete output from dmesg -s64000
pleaes attach the output from dmidecode
please paste the output from cat /proc/cpuinfo

It would be intersting if the hwmon Digital Thermal Sensor
support works on this system, have you tried CONFIG_SENSORS_CORETEMP?
Comment 6 Vasily Khoruzhick 2008-03-13 11:00:57 UTC
Created attachment 15250 [details]
output produced by dmidecode
Comment 7 Vasily Khoruzhick 2008-03-13 11:05:42 UTC
Created attachment 15252 [details]
dmesg output

I don't think it will be somehow useful, as it doesn't contains any interesting info :(
Comment 8 Vasily Khoruzhick 2008-03-13 11:07:23 UTC
Created attachment 15253 [details]
cpuinfo
Comment 9 Vasily Khoruzhick 2008-03-13 11:19:06 UTC
>This smells like a platform EC firmware or sensor issue.
Is it possible to make some workaround? :( I don't think that it's warranty case :(

>Is the system running the latest BIOS?
Yep, it's running latest BIOS I found on lenovo site

>Does this happen only after hibernate, or does it happen
>also after suspend to RAM?

Only for hibernate, can't reproduce bug with suspend to ram

>It would be interesting if we could see if Windows behaves
>any better on this system.
I _do not_ have this issue in windows

>It would be intersting if the hwmon Digital Thermal Sensor
>support works on this system, have you tried CONFIG_SENSORS_CORETEMP?

Yep, coretemp supported, and it shows correct info. Btw, I don't care what thermal says, I just want my fan not to work when CPU is not hot :)
Does fan activity managed by bios or it's possible to manipulate with fan (turn on\turn off) from userspace?
Comment 10 Zhang Rui 2008-05-05 23:58:30 UTC
Hi, vasily,
sorry for the delay.
By reading the acpidump you attached, I think that there is not ACPI fan control on your laptop, which means that the fan can *not* be controlled via ACPI.
Sorry we can not help you on this.
But I still suggest you to set CONFIG_HWMON and run lmsensor on your laptop to see if there is some difference.
Comment 11 Zhang Rui 2008-06-16 20:44:55 UTC
Created attachment 16512 [details]
patch: add hooks to save/restore arch specific pages during hibernation
Comment 12 Zhang Rui 2008-06-16 20:53:27 UTC
Created attachment 16513 [details]
patch: x86_64: save/restore ACPI DATA/NVS memory during hibernation
Comment 13 Zhang Rui 2008-06-16 20:55:05 UTC
Created attachment 16514 [details]
patch: x86_32: save/restore ACPI DATA/NVS memory during hibernation
Comment 14 Zhang Rui 2008-06-16 20:56:54 UTC
vasily,
please download the latest kernel source, say 2.6.26-rc6,
apply this patch set and see if it helps.
Please attach the dmesg after hibernation whether it works or not.
Comment 15 Vasily Khoruzhick 2008-06-17 01:30:27 UTC
Ok, I'm testing it. Btw, now I'm using swsusp and problem occurs very rarely. I'll report in ~week whether I can reproduce bug with your patches
Comment 16 Vasily Khoruzhick 2008-06-17 01:31:53 UTC
Oops, I mean uswsusp :)
Comment 17 Zhang Rui 2008-06-19 01:38:44 UTC
Created attachment 16541 [details]
patch: save/restore ACPI NVS during hibernation
Comment 18 Zhang Rui 2008-06-19 01:42:39 UTC
Created attachment 16542 [details]
x86_64: mark ACPI NVS regions
Comment 19 Zhang Rui 2008-06-19 01:43:09 UTC
Created attachment 16543 [details]
x86_32: mark ACPI NVS regions
Comment 20 Len Brown 2008-06-19 11:35:25 UTC
there must be a simpler way to do this, say by modifying
the part of hibernate that decides what part of the e820 map
to save and restore.
Comment 21 Zhang Rui 2008-06-19 18:21:31 UTC
Created attachment 16555 [details]
patch: save/restore ACPI NVS during hibernation

oops, I've attached the wrong patch, please try this one
Comment 22 Romano Giannetti 2008-06-20 01:16:53 UTC
Hmmm... there's a typo:

drivers/acpi/sleep/main.c: In function ‘acpi_hibernation_notifier_cb’:
drivers/acpi/sleep/main.c:321: error: ‘PM_POST_RESOTRE’ undeclared (first use in this function)
drivers/acpi/sleep/main.c:321: error: (Each undeclared identifier is reported only once
drivers/acpi/sleep/main.c:321: error: for each function it appears in.)
make[3]: *** [drivers/acpi/sleep/main.o] Error 1
make[2]: *** [drivers/acpi/sleep] Error 2
make[1]: *** [drivers/acpi] Error 2

I am doing the obvious change, then retry
Comment 23 Zhang Rui 2008-07-09 19:52:57 UTC
Hi, Vasily,

according to http://bugzilla.kernel.org/show_bug.cgi?id=10482#c34
your problem can not be reproduced on another lenovo 3000 n100 laptop.
please attach your dmidecode
Comment 24 Vasily Khoruzhick 2008-07-09 23:10:55 UTC
Olav Morken's post begins with "My laptop is a Thinkpad Z61t" and it isn't a lenovo 3000 n100, does it? Btw, I've already attached dmidecode output
Comment 25 Zhang Rui 2008-08-13 22:55:44 UTC
Vasily,
could you please try the patch attached?
Comment 26 Zhang Rui 2008-09-09 19:34:33 UTC
any updates?
vasily, please try the patches attached, or else I have to reject this bug and mark it as INSUFFICIENT_DATA. :)
Comment 27 Vasily Khoruzhick 2008-09-09 23:04:03 UTC
Sorry for long response,
I've tested patch, it seems it didn't help - I still have same problem, but for some reason now it's quite harder to reproduce it.
Comment 28 Zhang Rui 2008-10-31 00:18:01 UTC
Sorry that I lost the track of this bug.
can you re-produce it with the latest kernel release?
Comment 29 Vasily Khoruzhick 2008-10-31 00:40:58 UTC
Yes, it's reproducible with 2.6.27.4 on 32bit system. Now I'm testing it on 64bit
Comment 30 Zhang Rui 2008-11-25 22:55:14 UTC
Created attachment 19023 [details]
customized DSDT: debug _TMP method

please recompile the kernel with CONFIG_ACPI_DEBUG=y and this customized DSDT used,
boot and attach the dmesg output when the _TMP is valid & invalid.
Comment 31 Vasily Khoruzhick 2008-11-25 23:02:36 UTC
Ok, but how to use this customized DSDT?
Comment 32 Zhang Rui 2008-11-25 23:19:42 UTC
hah, my fault. :p

please look at this page:
http://www.lesswatts.org/projects/acpi/overridingDSDT.php
Comment 33 Vasily Khoruzhick 2008-11-30 09:10:55 UTC
Created attachment 19081 [details]
dmesg with overrided DSDT after several suspend/resume cycles

Tried custom DSDT, but I can't see any _TMP messages on dmesg. 
Here's some options from my kernel config:
CONFIG_ACPI_DEBUG=y
# CONFIG_ACPI_DEBUG_FUNC_TRACE is not set
CONFIG_ACPI_CUSTOM_DSDT_FILE="/usr/src/linux/DSDT.hex"
CONFIG_ACPI_CUSTOM_DSDT=y

Also custom DSDT breaks acpi_cpufreq: modprobe acpi_cpufreq produces:

FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.27-gentoo-r4-anarsoul/kernel/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko): No such device

dmesg attached
Comment 34 Zhang Rui 2008-11-30 17:46:17 UTC
1. please boot without acpi_no_auto_ssdt,
2. please boot with acpi.debug_level=0x03 acpi.debug_layer=0xffffffff.
3. run "cat /proc/acpi/thermal_zone/TZ00/temperature" both before and after hibernate and attach the dmesg output.
4. attach the dmesg output when you read a wrong temperature after hibernation.
Comment 35 Zhang Rui 2008-12-15 22:54:08 UTC
any update?
BTW: there is a couple of hibernation fixes recently, so please try the customized DSDT in the latest kernel.
Comment 36 Vasily Khoruzhick 2008-12-15 23:02:56 UTC
Sorry, really had not time to test it (pre-exam period in university), I hope I'll try it before this weekend
Comment 37 Zhang Rui 2008-12-28 18:49:45 UTC
ping vasily. :)
Comment 38 Vasily Khoruzhick 2008-12-29 04:45:09 UTC
Created attachment 19521 [details]
dmesg with acpi_debug enabled

Pong :)
Here it is, after resume themperature shown in /proc/acpi/thermal_zone/TZ00/temperature is incorrect (57 C), correct temperature is 43 C, shown in /sys/class/hwmon/hwmon2/device/temp1_input (coretemp)
Comment 39 Vasily Khoruzhick 2008-12-29 04:51:27 UTC
Btw, suspend_to_ram after "incorrect temperature issue" makes all things work OK. It seems to me that some memory region that should be reserved is being overwritten during resume from suspend_to_disk.
Comment 40 Vasily Khoruzhick 2009-01-17 23:43:01 UTC
Bug seems to be fixed with customized DSDT table (I've changed _TMP method to take into account only DTS1). At least I can't reproduce it for ~18 days. Thanks for pointing right direction ;)
Comment 41 Vasily Khoruzhick 2009-02-15 12:49:13 UTC
I can't reproduce this bug with my customized DSDT table for ~1.5 month. So I think it's time to close it.

Note You need to log in before you can comment on or make changes to this bug.