Bug 12994 - Thinkpad R51e is overheating even when idle
Summary: Thinkpad R51e is overheating even when idle
Status: CLOSED INVALID
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_power-thermal
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-01 09:51 UTC by uzytkownik2@gmail.com
Modified: 2009-05-07 21:17 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.28,2.6.29
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump (174.86 KB, text/plain)
2009-04-03 18:12 UTC, uzytkownik2@gmail.com
Details

Description uzytkownik2@gmail.com 2009-04-01 09:51:14 UTC
Thinkpad R51e is overheating very much even when idle except when run on 128 MHz (please see also bug #12826). AFAIR it is not preset on Windows and some earielr version of Linux (both I'll check out today).

The only way of preventing this to happen is to set by thinkpad_acpi fan controll to full-speed (i.e. removing the safety settings).
Comment 1 Henrique de Moraes Holschuh 2009-04-01 21:24:10 UTC
It would be nice to know what "overheating very much" means (i.e. the output of
the thermal sensors), after all, that's highly subjective.  For example, I
certainly like to keep my T43 a lot cooler than the firmware would make it if
left to its own devices, so I run it at fan level 7 any time I am going to go
hard on the CPU or GPU for more than 10 minutes.

Anyway, in my experience, any IBM thinkpad that overheats (as in exceeds the
hardware safety margins) when at full number-crunching mode typically is either
suffering a hardware problem, or operating at a _very_ hot place (above 35°C).

Note that the hardware can, and will, operate at temperatures that are quite
uncomfortable for your lap.

If it is a hardware problem, the two most common ones are: the fan is not being
able to blow enough air (due to dust), or the thermal interface between the
thermal sink assembly and the CPU and/or the GPU has cracked.  Both are easy to fix.
Comment 2 Henrique de Moraes Holschuh 2009-04-01 21:27:07 UTC
I noticed now that the report implies we did better in earlier versions, so regardless of whether that thinkpad needs some hardware loving care, we will need to find out why we are not idling it right, if the submitter can confirm that going back to an earlier Linux kernel makes his machine behave better.

I doubt I can help with fixing any such issues, btw. It is not related to thinkpad-acpi.
Comment 3 uzytkownik2@gmail.com 2009-04-01 21:49:19 UTC
(In reply to comment #1)
> It would be nice to know what "overheating very much" means (i.e. the output
> of
> the thermal sensors), after all, that's highly subjective.

According to ACPI reading - around 89 C. About 70-80 C in idle. With fan on full-speed (i.e. with manual controll) it tends to be 70 C.

I2C readings (shouldn't it be blacklisted on thinkpad - it is 'pulled' by something in kernel so I have it even though I am awere of problems. Systems boots as far) however as much lower - 44 C in the idle with full-speed (a bit more with automatic mode). Anyway measuring 'by hand' the computer (the bottom surface and near the 'fan') indicates it is hotter then it used to be a few months ago and it is much cooler then in automatic mode.

> For example, I
> certainly like to keep my T43 a lot cooler than the firmware would make it if
> left to its own devices, so I run it at fan level 7 any time I am going to go
> hard on the CPU or GPU for more than 10 minutes.
> 

As far as I observed reported speed have nothing in common with fan speed (measuring both noise and presure of wind). All measurement are made from the default level unless stated otherwise.

> Anyway, in my experience, any IBM thinkpad that overheats (as in exceeds the
> hardware safety margins) when at full number-crunching mode typically is
> either
> suffering a hardware problem, or operating at a _very_ hot place (above
> 35°C).
> 

It's beginning of spring here ;) The temperature is normal and it stands near the middle of room with partially opened window. 

> Note that the hardware can, and will, operate at temperatures that are quite
> uncomfortable for your lap.
> 

It used to not. Anyway I tend to not use laptop as a... laptop. When I'm using it as a mobile PC I tend to find a table (especially since they are near power source) unless it is for so short it didnot have a time to heat.

> If it is a hardware problem, the two most common ones are: the fan is not
> being
> able to blow enough air (due to dust),

My room tends to be a dutie place (good for alergic and asthmatic ;) ) - however it changes depending on the changes of buildings neighbourhood. The computer was cleaned before reporting and it shown little or no effect. The fan could move without problem.

> or the thermal interface between the
> thermal sink assembly and the CPU and/or the GPU has cracked.  Both are easy
> to
> fix.

No visible hardware problems (i.e. after taking off the case) was seen. 

(In reply to comment #2)
> I noticed now that the report implies we did better in earlier versions, so
> regardless of whether that thinkpad needs some hardware loving care, we will
> need to find out why we are not idling it right, if the submitter can confirm
> that going back to an earlier Linux kernel makes his machine behave better.
> 
> I doubt I can help with fixing any such issues, btw. It is not related to
> thinkpad-acpi.

I had no time to check on Windows and earlier version today (homework). Unless something extraordinary will happen tomorrow I will check.
Comment 4 uzytkownik2@gmail.com 2009-04-01 21:50:37 UTC
(In reply to comment #3)
> (In reply to comment #1)
> > It would be nice to know what "overheating very much" means (i.e. the
> output of
> > the thermal sensors), after all, that's highly subjective.
> 
> According to ACPI reading - around 89 C. About 70-80 C in idle. With fan on
> full-speed (i.e. with manual controll) it tends to be 70 C.
> 

I forgot - only when run on lower speed. On highier it run untill it turned off.
Comment 5 Henrique de Moraes Holschuh 2009-04-02 03:39:06 UTC
You won't _see_ a crack in the thermal interface by inspecting the thing with normal means.  If the thermal glue broke, there is an air gap (or more of one) between the CPU and the heatsink, you can't see that with the heatsink in place, and once you remove it, you need to replace the thermal pad/compound anyway...

Well, your R51e is overheating by any sane definition of the term, and you're hereby advised that the machine definitely must have some grown some sort of defect in its cooling system, and that you should fix that ASAP as it will damage things further as time goes by.

I don't follow what you say about the fan speed readings.  They're usually correct, if the fan tachometer sensor is operating normally (and the fan has the kind of tachometer expected by the EC -- shouldn't be a problem unless you replaced the fan). If that thing is broken, the EC will drive the fan incorrectly, and that could explain bad thermal behavior.

Well, please report back when you find out if an earlier version of Linux gives you better thermal behavior.  But please make sure you're using the same cpuidle governor as you used to, etc.

As a datapoint, latest 2.6.28.y seems to be working just fine on my T43.  But I am not sure the T43 uses the same cpuidle implementation as the R51e.
Comment 6 Len Brown 2009-04-02 03:41:47 UTC
just to clarify the purpose of this bug report...

per bug #12826, the R51e's thermal problems are being used
to justify not deleting p4-clockmod from the kernel, and
this laptop overheats and shuts down if that is not used.
Comment 7 Len Brown 2009-04-02 03:45:37 UTC
Maciej, can you share the product "type" from the serial number sticker?
eg. the format will be like this example: 1844-5GU
Comment 8 uzytkownik2@gmail.com 2009-04-02 12:22:51 UTC
(In reply to comment #7)
> Maciej, can you share the product "type" from the serial number sticker?
> eg. the format will be like this example: 1844-5GU

1843-6NG
Comment 9 ykzhao 2009-04-03 07:05:51 UTC
Hi, Maciej
    Will you please attach the output of acpidump?
    It will be great if you can attach the following outputs.
    > cat /proc/acpi/thermal_zone/*/*
    thanks.
Comment 10 uzytkownik2@gmail.com 2009-04-03 18:12:28 UTC
Created attachment 20796 [details]
acpidump

(In reply to comment #9)
> Hi, Maciej
>     Will you please attach the output of acpidump?
>     It will be great if you can attach the following outputs.
>     > cat /proc/acpi/thermal_zone/*/*
>     thanks.

# cat /proc/acpi/thermal_zone/THM0/cooling_mode 
<setting not supported>
# cat /proc/acpi/thermal_zone/THM0/polling_frequency 
<polling disabled>
# cat /proc/acpi/thermal_zone/THM0/state 
state:                   ok
# cat /proc/acpi/thermal_zone/THM0/temperature 
temperature:             84 C
# cat /proc/acpi/thermal_zone/THM0/trip_points 
critical (S5):           99 C
passive:                 95 C: tc1=5 tc2=4 tsp=600 devices= CPU
Comment 11 Shaohua 2009-04-08 01:39:04 UTC
did you see any abnormal of the system, like high interrupt rate?

you can run 'vmstat 1' for some time and give us the output. And also the output of /proc/interrupts
Comment 12 uzytkownik2@gmail.com 2009-04-09 13:05:50 UTC
After changing kernel to tuxonice-sources from Gentoo (the same .config as with zen and vanilla) the problem is not so dramatic and it reaches ~90 C only under heavy load being 71 C in idle with auto-level. After night (I tend to left the compilation for night but it finish it in the middle. Also it is my alarm clock) I can touch the computer keyboard.

Tha below output is for processor under heavy load(i.e. evaluatin in haskell (9!)! which caused the temperature to reach 89 C).

(In reply to comment #11)
> did you see any abnormal of the system, like high interrupt rate?
> 

I don't know what is normal interrupt rate (I have not wrote down the output of powertop when it was normal).

> you can run 'vmstat 1' for some time and give us the output. And also the
> output of /proc/interrupts

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 3  0  81416 105940 311744 637636    0    0    74   115   81   19 25  9 65  1
 1  0  81416 105824 311744 637636    0    0     0     0 1117  474 96  4  0  0
 1  0  81416 105816 311744 637636    0    0     0     0 1092  484 97  3  0  0
 1  0  81416 105824 311744 637636    0    0     0     0 1114  589 94  6  0  0
 1  0  81416 105816 311744 637636    0    0     0    48 1099  461 98  2  0  0
 1  0  81416 105824 311744 637636    0    0     0     0 1095  424 97  3  0  0
 1  0  81416 105816 311752 637636    0    0     0    40 1103  664 95  5  0  0
 1  0  81416 105824 311752 637636    0    0     0     0 1090  376 97  3  0  0
 1  0  81416 105816 311752 637636    0    0     0     0 1090  500 97  3  0  0
 1  0  81416 105824 311752 637636    0    0     0    12 1110  518 96  4  0  0
 1  0  81416 105816 311752 637636    0    0     0     0 1098  475 97  3  0  0
 1  0  81416 105824 311752 637636    0    0     0     0 1101  493 97  3  0  0
 1  0  81416 105816 311752 637636    0    0     0     0 1096  573 95  5  0  0
 1  0  81416 105824 311752 637636    0    0     0     0 1102  397 96  4  0  0
 1  0  81416 105816 311752 637636    0    0     0     0 1132  538 96  4  0  0
 1  0  81416 105824 311752 637636    0    0     0     0 1097  495 97  3  0  0
 1  0  81416 105816 311752 637636    0    0     0     0 1104  541 97  3  0  0
 1  0  81416 105824 311752 637636    0    0     0     0 1098  393 97  3  0  0
 4  0  81416 105816 311752 637636    0    0     0     0 1099  595 96  4  0  0
 1  0  81416 105824 311752 637636    0    0     0    28 1103  411 97  3  0  0
 1  0  81416 105816 311752 637636    0    0     0     0 1082  469 98  2  0  0



           CPU0       
  0:   43640032   IO-APIC-edge      timer
  1:     174849   IO-APIC-edge      i8042
  7:          0   IO-APIC-edge      parport0
  8:          1   IO-APIC-edge      rtc0
  9:          6   IO-APIC-fasteoi   ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3
 12:    5170807   IO-APIC-edge      i8042
 14:    2436175   IO-APIC-edge      pata_atiixp
 15:    1334243   IO-APIC-edge      pata_atiixp
 16:        976   IO-APIC-fasteoi   eth0
 17:      55825   IO-APIC-fasteoi   ATI IXP, radeon@pci:0000:01:05.0
 20:          1   IO-APIC-fasteoi   yenta
 21:     583338   IO-APIC-fasteoi   acpi
 22:   18392268   IO-APIC-fasteoi   ath
NMI:          0   Non-maskable interrupts
LOC:  126936036   Local timer interrupts
TRM:          0   Thermal event interrupts
SPU:          0   Spurious interrupts
ERR:          0
MIS:          0
Comment 13 uzytkownik2@gmail.com 2009-04-10 15:57:02 UTC
It seems that the Windows XP is not much cooler now the suspend kernel...
Comment 14 Henrique de Moraes Holschuh 2009-04-11 06:02:30 UTC
Maciej, can you tell us of a _mainline_ (aka Linus) kernel that behaved better on your thinkpad than 2.6.29.1 does?  And give us an idea (e.g. through the temperatures measured by thinkpad-acpi on both kernels) of the difference of the thermal performance between the two kernels?

That would really help check if the newer kernels are indeed doing a poorer job of idling your R51e...
Comment 15 uzytkownik2@gmail.com 2009-04-15 10:07:08 UTC
(In reply to comment #14)
> Maciej, can you tell us of a _mainline_ (aka Linus) kernel that behaved
> better
> on your thinkpad than 2.6.29.1 does?

By mainline you mean version from git? 

> And give us an idea (e.g. through the
> temperatures measured by thinkpad-acpi on both kernels) of the difference of
> the thermal performance between the two kernels?
> 

I will. But it seems that the behaviour is somehow random (sometimes it take several minutes to heat, sometimes much longer).

> That would really help check if the newer kernels are indeed doing a poorer
> job
> of idling your R51e...

Well. I started tend to believe that it can be the hardware problem described as the Windows XP tends to behave no better. I cannot find the operation of patching described in hardware guide. I will search the friendly web but any links would speed the checking of the hardware hypothesis.
Comment 16 Henrique de Moraes Holschuh 2009-04-15 11:25:05 UTC
Well, if XP behaves no better, we probably didn't grow a new bug, but still please do test older kernel.org releases (that's what I called 'mainline') to give us a sure datapoint.

As for the hardware guide, it tells you how to *disassemble* the thinkpad (but you better be well used to doing such things or you might break something, laptops are NOT easy to disassemble, and are even worse to assemble back).  The heat transfer interface repair procedure of a thinkpad is not going to be explained anywhere.

Really, if you are not used to do this kind of stuff, it is better to find a skilled technician (and _DO_ get one that is used to repair laptops!) to do it for you.  You'll also need some very high quality thermal grease, the sort an overclocker would use, but you must get one that does not react to other metals.

If you still want to try it yourself, look at various overclocking sites and learn how to do a perfect heatsink and chip surface cleaning job, how to apply thermal grease perfectly, and which thermal grease you should use.  Search also for thinkpad repair sites to see if any have a few pages about heatsink and fan repair.

Remember you're dealing with a notebook, so it is useless to get thermal greases that require the small ovens people use on desktops, you need to get one that will cure well and work well at ~55C, not one that likes to stay above 70C :-)
Comment 17 uzytkownik2@gmail.com 2009-04-15 11:46:29 UTC
(In reply to comment #16)
> Well, if XP behaves no better, we probably didn't grow a new bug, but still
> please do test older kernel.org releases (that's what I called 'mainline') to
> give us a sure datapoint.
> 
> As for the hardware guide, it tells you how to *disassemble* the thinkpad
> (but
> you better be well used to doing such things or you might break something,
> laptops are NOT easy to disassemble, and are even worse to assemble back). 

Ok. I disassembled it a few times already (to extend memory etc.) and I haven't even gain some no-fitting-anywhere screws ;) 

> (...)

Correct me if I'm wrong. As the predicted worktime for this specific laptop is probably around 3-6 months that's a bit too much to invest.  
I'll check other mainline kernels. If they will not help I think that the hardware is to be blame and the bug should be marked as INVALID (in such case sorry for wasting your time - I was beliving it is some regression which may affect other R51e users).
Comment 18 uzytkownik2@gmail.com 2009-04-22 21:44:33 UTC
Well. The instruction where to put thermal grease actually was in the manual in section covering replacing CPU. Eventually (I had some problems with removing fan as it used some strange screws and I had to borrow screwdriver) I applied new termal grease.

Just after application nothing much changed - the computer heated to 89C instantly (with cpufreq applied). Currently under heavy load (rebuilding system with new gcc 4.4.0 - yes no risk no fun ;) - however report is from kernel build with 4.3.2) the temperature is about 79-82C with cpufreq applied (around 1.1/1.5 GHz). Currently I increase the bounderies - it seems that the temperature now is 79-84C (still system jumps into 1.1 GHz uder heavy load). 

Aa far as I understend it is normal that new grease needs some time to reach it final level. All in all I think it was a hardware bug given then after appling hardware fix computer runs 5x faster and 10C cooler after around 24h. I guess that this bug can be marked as invalid (if not I will repopen it).
Comment 19 Len Brown 2009-05-07 21:17:09 UTC
> All in all I think it was a hardware bug given then after applying
> hardware fix computer runs 5x faster and 10C cooler after around 24h.
> I guess that this bug can be marked as invalid

I'm glad your computer is feeling better --
and thanks for verifying that Windows behaves the same way as Linux.

For the record...

Maciej's ThinkPad R51e 1843-6NG

http://www-307.ibm.com/pc/support/site.wss/quickPath.do?quickPathEntry=18436NG

Celeron M 370(1.5GHz), 512MB RAM, 40GB 5400rpm HD, 15in 1024x768 LCD, ATI Radeon 200M, CDRW/DVDRW, 802.11bg wireless, Modem, 10/100 Ethernet, 6c Li-Ion batt, WinXP Home

I picked up one on ebay: ThinkPad R51e 1844-5GU

http://www-307.ibm.com/pc/support/site.wss/quickPath.do?quickPathEntry=18445GU

PM 740(1.7GHz), 512MB RAM, 40GB 4200rpm HD, 15in 1024x768 LCD, ATI Radeon 200M, CDRW/DVD, 802.11bg wireless, Modem, 10/100 Ethernet, 6c Li-Ion batt, WinXP Pro

While the two will share the same chipset, BIOS etc, they are different processors.  The Pentium M supports acpi-cpufreq with P-states down to 800MHz, while I believe that the Celeron M does not (and thus the original desire to be able to run p4clocmod).

In any case, I've not had any cooling problems on the R51e -- further suggesting that Linux does not have a general problem that applies to all R51e.

Note You need to log in before you can comment on or make changes to this bug.