Thinkpad R51e is overheating very much even when idle except when run on 128 MHz (please see also bug #12826). AFAIR it is not preset on Windows and some earielr version of Linux (both I'll check out today). The only way of preventing this to happen is to set by thinkpad_acpi fan controll to full-speed (i.e. removing the safety settings).
It would be nice to know what "overheating very much" means (i.e. the output of the thermal sensors), after all, that's highly subjective. For example, I certainly like to keep my T43 a lot cooler than the firmware would make it if left to its own devices, so I run it at fan level 7 any time I am going to go hard on the CPU or GPU for more than 10 minutes. Anyway, in my experience, any IBM thinkpad that overheats (as in exceeds the hardware safety margins) when at full number-crunching mode typically is either suffering a hardware problem, or operating at a _very_ hot place (above 35°C). Note that the hardware can, and will, operate at temperatures that are quite uncomfortable for your lap. If it is a hardware problem, the two most common ones are: the fan is not being able to blow enough air (due to dust), or the thermal interface between the thermal sink assembly and the CPU and/or the GPU has cracked. Both are easy to fix.
I noticed now that the report implies we did better in earlier versions, so regardless of whether that thinkpad needs some hardware loving care, we will need to find out why we are not idling it right, if the submitter can confirm that going back to an earlier Linux kernel makes his machine behave better. I doubt I can help with fixing any such issues, btw. It is not related to thinkpad-acpi.
(In reply to comment #1) > It would be nice to know what "overheating very much" means (i.e. the output > of > the thermal sensors), after all, that's highly subjective. According to ACPI reading - around 89 C. About 70-80 C in idle. With fan on full-speed (i.e. with manual controll) it tends to be 70 C. I2C readings (shouldn't it be blacklisted on thinkpad - it is 'pulled' by something in kernel so I have it even though I am awere of problems. Systems boots as far) however as much lower - 44 C in the idle with full-speed (a bit more with automatic mode). Anyway measuring 'by hand' the computer (the bottom surface and near the 'fan') indicates it is hotter then it used to be a few months ago and it is much cooler then in automatic mode. > For example, I > certainly like to keep my T43 a lot cooler than the firmware would make it if > left to its own devices, so I run it at fan level 7 any time I am going to go > hard on the CPU or GPU for more than 10 minutes. > As far as I observed reported speed have nothing in common with fan speed (measuring both noise and presure of wind). All measurement are made from the default level unless stated otherwise. > Anyway, in my experience, any IBM thinkpad that overheats (as in exceeds the > hardware safety margins) when at full number-crunching mode typically is > either > suffering a hardware problem, or operating at a _very_ hot place (above > 35°C). > It's beginning of spring here ;) The temperature is normal and it stands near the middle of room with partially opened window. > Note that the hardware can, and will, operate at temperatures that are quite > uncomfortable for your lap. > It used to not. Anyway I tend to not use laptop as a... laptop. When I'm using it as a mobile PC I tend to find a table (especially since they are near power source) unless it is for so short it didnot have a time to heat. > If it is a hardware problem, the two most common ones are: the fan is not > being > able to blow enough air (due to dust), My room tends to be a dutie place (good for alergic and asthmatic ;) ) - however it changes depending on the changes of buildings neighbourhood. The computer was cleaned before reporting and it shown little or no effect. The fan could move without problem. > or the thermal interface between the > thermal sink assembly and the CPU and/or the GPU has cracked. Both are easy > to > fix. No visible hardware problems (i.e. after taking off the case) was seen. (In reply to comment #2) > I noticed now that the report implies we did better in earlier versions, so > regardless of whether that thinkpad needs some hardware loving care, we will > need to find out why we are not idling it right, if the submitter can confirm > that going back to an earlier Linux kernel makes his machine behave better. > > I doubt I can help with fixing any such issues, btw. It is not related to > thinkpad-acpi. I had no time to check on Windows and earlier version today (homework). Unless something extraordinary will happen tomorrow I will check.
(In reply to comment #3) > (In reply to comment #1) > > It would be nice to know what "overheating very much" means (i.e. the > output of > > the thermal sensors), after all, that's highly subjective. > > According to ACPI reading - around 89 C. About 70-80 C in idle. With fan on > full-speed (i.e. with manual controll) it tends to be 70 C. > I forgot - only when run on lower speed. On highier it run untill it turned off.
You won't _see_ a crack in the thermal interface by inspecting the thing with normal means. If the thermal glue broke, there is an air gap (or more of one) between the CPU and the heatsink, you can't see that with the heatsink in place, and once you remove it, you need to replace the thermal pad/compound anyway... Well, your R51e is overheating by any sane definition of the term, and you're hereby advised that the machine definitely must have some grown some sort of defect in its cooling system, and that you should fix that ASAP as it will damage things further as time goes by. I don't follow what you say about the fan speed readings. They're usually correct, if the fan tachometer sensor is operating normally (and the fan has the kind of tachometer expected by the EC -- shouldn't be a problem unless you replaced the fan). If that thing is broken, the EC will drive the fan incorrectly, and that could explain bad thermal behavior. Well, please report back when you find out if an earlier version of Linux gives you better thermal behavior. But please make sure you're using the same cpuidle governor as you used to, etc. As a datapoint, latest 2.6.28.y seems to be working just fine on my T43. But I am not sure the T43 uses the same cpuidle implementation as the R51e.
just to clarify the purpose of this bug report... per bug #12826, the R51e's thermal problems are being used to justify not deleting p4-clockmod from the kernel, and this laptop overheats and shuts down if that is not used.
Maciej, can you share the product "type" from the serial number sticker? eg. the format will be like this example: 1844-5GU
(In reply to comment #7) > Maciej, can you share the product "type" from the serial number sticker? > eg. the format will be like this example: 1844-5GU 1843-6NG
Hi, Maciej Will you please attach the output of acpidump? It will be great if you can attach the following outputs. > cat /proc/acpi/thermal_zone/*/* thanks.
Created attachment 20796 [details] acpidump (In reply to comment #9) > Hi, Maciej > Will you please attach the output of acpidump? > It will be great if you can attach the following outputs. > > cat /proc/acpi/thermal_zone/*/* > thanks. # cat /proc/acpi/thermal_zone/THM0/cooling_mode <setting not supported> # cat /proc/acpi/thermal_zone/THM0/polling_frequency <polling disabled> # cat /proc/acpi/thermal_zone/THM0/state state: ok # cat /proc/acpi/thermal_zone/THM0/temperature temperature: 84 C # cat /proc/acpi/thermal_zone/THM0/trip_points critical (S5): 99 C passive: 95 C: tc1=5 tc2=4 tsp=600 devices= CPU
did you see any abnormal of the system, like high interrupt rate? you can run 'vmstat 1' for some time and give us the output. And also the output of /proc/interrupts
After changing kernel to tuxonice-sources from Gentoo (the same .config as with zen and vanilla) the problem is not so dramatic and it reaches ~90 C only under heavy load being 71 C in idle with auto-level. After night (I tend to left the compilation for night but it finish it in the middle. Also it is my alarm clock) I can touch the computer keyboard. Tha below output is for processor under heavy load(i.e. evaluatin in haskell (9!)! which caused the temperature to reach 89 C). (In reply to comment #11) > did you see any abnormal of the system, like high interrupt rate? > I don't know what is normal interrupt rate (I have not wrote down the output of powertop when it was normal). > you can run 'vmstat 1' for some time and give us the output. And also the > output of /proc/interrupts procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 0 81416 105940 311744 637636 0 0 74 115 81 19 25 9 65 1 1 0 81416 105824 311744 637636 0 0 0 0 1117 474 96 4 0 0 1 0 81416 105816 311744 637636 0 0 0 0 1092 484 97 3 0 0 1 0 81416 105824 311744 637636 0 0 0 0 1114 589 94 6 0 0 1 0 81416 105816 311744 637636 0 0 0 48 1099 461 98 2 0 0 1 0 81416 105824 311744 637636 0 0 0 0 1095 424 97 3 0 0 1 0 81416 105816 311752 637636 0 0 0 40 1103 664 95 5 0 0 1 0 81416 105824 311752 637636 0 0 0 0 1090 376 97 3 0 0 1 0 81416 105816 311752 637636 0 0 0 0 1090 500 97 3 0 0 1 0 81416 105824 311752 637636 0 0 0 12 1110 518 96 4 0 0 1 0 81416 105816 311752 637636 0 0 0 0 1098 475 97 3 0 0 1 0 81416 105824 311752 637636 0 0 0 0 1101 493 97 3 0 0 1 0 81416 105816 311752 637636 0 0 0 0 1096 573 95 5 0 0 1 0 81416 105824 311752 637636 0 0 0 0 1102 397 96 4 0 0 1 0 81416 105816 311752 637636 0 0 0 0 1132 538 96 4 0 0 1 0 81416 105824 311752 637636 0 0 0 0 1097 495 97 3 0 0 1 0 81416 105816 311752 637636 0 0 0 0 1104 541 97 3 0 0 1 0 81416 105824 311752 637636 0 0 0 0 1098 393 97 3 0 0 4 0 81416 105816 311752 637636 0 0 0 0 1099 595 96 4 0 0 1 0 81416 105824 311752 637636 0 0 0 28 1103 411 97 3 0 0 1 0 81416 105816 311752 637636 0 0 0 0 1082 469 98 2 0 0 CPU0 0: 43640032 IO-APIC-edge timer 1: 174849 IO-APIC-edge i8042 7: 0 IO-APIC-edge parport0 8: 1 IO-APIC-edge rtc0 9: 6 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3 12: 5170807 IO-APIC-edge i8042 14: 2436175 IO-APIC-edge pata_atiixp 15: 1334243 IO-APIC-edge pata_atiixp 16: 976 IO-APIC-fasteoi eth0 17: 55825 IO-APIC-fasteoi ATI IXP, radeon@pci:0000:01:05.0 20: 1 IO-APIC-fasteoi yenta 21: 583338 IO-APIC-fasteoi acpi 22: 18392268 IO-APIC-fasteoi ath NMI: 0 Non-maskable interrupts LOC: 126936036 Local timer interrupts TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0
It seems that the Windows XP is not much cooler now the suspend kernel...
Maciej, can you tell us of a _mainline_ (aka Linus) kernel that behaved better on your thinkpad than 2.6.29.1 does? And give us an idea (e.g. through the temperatures measured by thinkpad-acpi on both kernels) of the difference of the thermal performance between the two kernels? That would really help check if the newer kernels are indeed doing a poorer job of idling your R51e...
(In reply to comment #14) > Maciej, can you tell us of a _mainline_ (aka Linus) kernel that behaved > better > on your thinkpad than 2.6.29.1 does? By mainline you mean version from git? > And give us an idea (e.g. through the > temperatures measured by thinkpad-acpi on both kernels) of the difference of > the thermal performance between the two kernels? > I will. But it seems that the behaviour is somehow random (sometimes it take several minutes to heat, sometimes much longer). > That would really help check if the newer kernels are indeed doing a poorer > job > of idling your R51e... Well. I started tend to believe that it can be the hardware problem described as the Windows XP tends to behave no better. I cannot find the operation of patching described in hardware guide. I will search the friendly web but any links would speed the checking of the hardware hypothesis.
Well, if XP behaves no better, we probably didn't grow a new bug, but still please do test older kernel.org releases (that's what I called 'mainline') to give us a sure datapoint. As for the hardware guide, it tells you how to *disassemble* the thinkpad (but you better be well used to doing such things or you might break something, laptops are NOT easy to disassemble, and are even worse to assemble back). The heat transfer interface repair procedure of a thinkpad is not going to be explained anywhere. Really, if you are not used to do this kind of stuff, it is better to find a skilled technician (and _DO_ get one that is used to repair laptops!) to do it for you. You'll also need some very high quality thermal grease, the sort an overclocker would use, but you must get one that does not react to other metals. If you still want to try it yourself, look at various overclocking sites and learn how to do a perfect heatsink and chip surface cleaning job, how to apply thermal grease perfectly, and which thermal grease you should use. Search also for thinkpad repair sites to see if any have a few pages about heatsink and fan repair. Remember you're dealing with a notebook, so it is useless to get thermal greases that require the small ovens people use on desktops, you need to get one that will cure well and work well at ~55C, not one that likes to stay above 70C :-)
(In reply to comment #16) > Well, if XP behaves no better, we probably didn't grow a new bug, but still > please do test older kernel.org releases (that's what I called 'mainline') to > give us a sure datapoint. > > As for the hardware guide, it tells you how to *disassemble* the thinkpad > (but > you better be well used to doing such things or you might break something, > laptops are NOT easy to disassemble, and are even worse to assemble back). Ok. I disassembled it a few times already (to extend memory etc.) and I haven't even gain some no-fitting-anywhere screws ;) > (...) Correct me if I'm wrong. As the predicted worktime for this specific laptop is probably around 3-6 months that's a bit too much to invest. I'll check other mainline kernels. If they will not help I think that the hardware is to be blame and the bug should be marked as INVALID (in such case sorry for wasting your time - I was beliving it is some regression which may affect other R51e users).
Well. The instruction where to put thermal grease actually was in the manual in section covering replacing CPU. Eventually (I had some problems with removing fan as it used some strange screws and I had to borrow screwdriver) I applied new termal grease. Just after application nothing much changed - the computer heated to 89C instantly (with cpufreq applied). Currently under heavy load (rebuilding system with new gcc 4.4.0 - yes no risk no fun ;) - however report is from kernel build with 4.3.2) the temperature is about 79-82C with cpufreq applied (around 1.1/1.5 GHz). Currently I increase the bounderies - it seems that the temperature now is 79-84C (still system jumps into 1.1 GHz uder heavy load). Aa far as I understend it is normal that new grease needs some time to reach it final level. All in all I think it was a hardware bug given then after appling hardware fix computer runs 5x faster and 10C cooler after around 24h. I guess that this bug can be marked as invalid (if not I will repopen it).
> All in all I think it was a hardware bug given then after applying > hardware fix computer runs 5x faster and 10C cooler after around 24h. > I guess that this bug can be marked as invalid I'm glad your computer is feeling better -- and thanks for verifying that Windows behaves the same way as Linux. For the record... Maciej's ThinkPad R51e 1843-6NG http://www-307.ibm.com/pc/support/site.wss/quickPath.do?quickPathEntry=18436NG Celeron M 370(1.5GHz), 512MB RAM, 40GB 5400rpm HD, 15in 1024x768 LCD, ATI Radeon 200M, CDRW/DVDRW, 802.11bg wireless, Modem, 10/100 Ethernet, 6c Li-Ion batt, WinXP Home I picked up one on ebay: ThinkPad R51e 1844-5GU http://www-307.ibm.com/pc/support/site.wss/quickPath.do?quickPathEntry=18445GU PM 740(1.7GHz), 512MB RAM, 40GB 4200rpm HD, 15in 1024x768 LCD, ATI Radeon 200M, CDRW/DVD, 802.11bg wireless, Modem, 10/100 Ethernet, 6c Li-Ion batt, WinXP Pro While the two will share the same chipset, BIOS etc, they are different processors. The Pentium M supports acpi-cpufreq with P-states down to 800MHz, while I believe that the Celeron M does not (and thus the original desire to be able to run p4clocmod). In any case, I've not had any cooling problems on the R51e -- further suggesting that Linux does not have a general problem that applies to all R51e.