Bug 12231
Summary: | fan not controled on a Dell Latitude E5400 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Sébastien Hinderer (Sebastien.Hinderer) |
Component: | i386 | Assignee: | ykzhao (yakui.zhao) |
Status: | CLOSED INVALID | ||
Severity: | high | CC: | alan, exalowprofile, hidave.darkstar, jdelvare, lfx, linux-bugs, Matt_Domsch, rezwanul_kabir, samuel.thibault, Sebastien.Hinderer, trenn, ubuntu |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | any | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 56331 | ||
Attachments: |
The output of acpidump.
try the debug patch in which rsdt is used instead of XSDT try the debug patch in which rsdt is used instead of XSDT Kernel configuration Output of dmesg Output of lspci -vxxx The complete output of detect-sensors The output of acpidump. |
Description
Sébastien Hinderer
2008-12-15 11:24:50 UTC
Created attachment 19311 [details]
The output of acpidump.
there is not ACPI fan device from the acpidump you attached. so the fan is controlled by either BIOS or some platform specific drivers. maybe you can try the hwmon drivers. cc Jean. Hi, Sebastien Will you please attach the output of dmesg, lspci -vxxx? From the acpidump it seems that there exists another issue that the GPE block length of 32/64X mismatches. >32/64X bit length mismatch in Gpe0Block: 128/64 Thanks. Created attachment 19320 [details]
try the debug patch in which rsdt is used instead of XSDT
Will you please try the debug patch on the latest kernel and see whether the Fan issue still exists?
In the debug patch the RSDT table is used instead of XSDT.
Thanks.
Created attachment 19321 [details]
try the debug patch in which rsdt is used instead of XSDT
Sorry that the incorrect patch is attached.
Thanks.
Created attachment 19397 [details]
Kernel configuration
Created attachment 19398 [details]
Output of dmesg
Created attachment 19399 [details]
Output of lspci -vxxx
The patch did not fix the fan issue. However, it may be worth noticing that the problem appears only when the laptop is plugged to electric power. When the laptop works on its batter only, the fan seems to be handled properly. please check your BIOS and see if there is a option like "fan always on when AC"? I can't check my BIOS right now because I am blind and have no sighted assistance available, but I think such an option is not set, because when Windows XP is started, the fan is not constantly working even when the AC adapter is plugged. Sébastien, it might help to know which hardware monitoring chip is present in your system, if any. Do you have lm-sensors installed? If you don't, please install it, then attach the full output of sensors-detect. Created attachment 19598 [details]
The complete output of detect-sensors
detect-sensors has been ran with 2.6.28-rc9 kernel to which the patch previously included in this bug report had been applied. In addition to this information, the i8k module could find a few information: cat /proc/i8k 1.0 A06 BWB814J 31 -22 0 -22 0 -1 -22 According to drivers/char/i8k.c:i8k_proc_show, thi information is to be interpreted as follows: 1) "1.0": format of output 2) "A06": BIOS version 3) "BWB814J" BIOS machine id 4) "31": CPU temperature 5) "-22": Left fan status 6) "0": Right fan status 7) "-22": Left fan speed 8) "0": Right fan speed 9) "-1": AC power 10) "-22": Fn Key status The reported CPU temperature (4) is coherent with the one reported by acpi. However, 5, 6, 7 and 8 look odd to me: as if the information should be reported differently, e.g. fan statuscould be -22 for the two fans, and their speed could be 0. The code looks correct, though, so perhaps it's the hardware which makes the information available in a diffferent way ? Ccing the author of the i8k module. Regarding coment #9. Not sure fan is handled correctly when the laptop is on battery, so this remark should perhaps be taken into account with care. Regarding the output of sensors-detect: there is an unknown SMSC Super-I/O in the machine. It may or may not include fan speed control outputs, but anyway at this point it doesn't really matter, because we (obviously) do not have any hwmon driver for this chip. SMSC makes a lot of custom chips for PC system vendors, and in general we do not have access to specifications for these chips. You may want to load the coretemp driver to get accurate CPU temperatures (possibly better than what ACPI reports) but that's about all lm-sensors can do for you. As far as your fan issues are concerned, the solution has to come from either ACPI or laptop-specific code such as the i8k driver. You might also try to ask Dell themselves for help, they are generally Linux-friendly. I have some weird information too: 1.0 A08 BRN4Z3J 54 -22 0 27660 0 -1 -22 -22 may be -EINVAL perhaps? :) I tried the i8kutils package, it works just fine: i8kctl fan - 1 started the right fan (I do not physically have a left fan) i8kctl fan - 2 made it turn faster i8kctl fan - 0 made it stop. Which BIOS revision do you use, Samuel? On my machine (e5400, BIOS A07), i8kctl changes get overriden in less than a second, i.e. the fan stops/slows down and powers up nearly immediately after. Additionally, i8kutils is not available for x86_64 so even if it works it's not a viable solution. As can be seen above, by bios is A08. Why do you say i8kutils is not available for x86_64? My box _is_ x86_64. Oops, my mistakes. Sorry, should have seen that you already posted your BIOS version. Where did you get version A08? The newest version I can see at http://tinyurl.com/e5400bios is A07. And you're right, i8kutils works fine on x86_64. For whatever reason, the Debian package (http://packages.debian.org/sid/i8kutils) lists only x86. Ah, wait, my laptop is a D430, I was just pointing out that these -22 strange values are completely normal, it doesn't affect whether fan control works or not. I have the same problem on ubuntu 8.10 the laptop is the same latitude e8400 with intel 8400... Could I send any information ? I have an A07 bios. Dell's support couldn't provide any useful information. As far as I could understand, the person I talked to said somethink like "If it works with Windows and not with Linux, it's a Linux problem". This message was not very useful... Has anybody an idea about what could be done, now ? Have you try also the new bios version A08 ? Release Date: 1/8/2009 Version: A08 Yes I read only all the comments... It's kind of funny because hibernate and suspend works, all device works good except the fan... I don't known what to do... but with i8kctl fan - 0 made it stop the fan stops ? I have same issue (dell e5400) cat /proc/i8k 1.0 A06 FC7MJ2X 62 -22 1 -22 93450 -1 -22 'i8kctl fan l 0' make the left fan stop, but them it start immediately. Does someone notice a difference in fan behaviour when the laptop works on batteries rather than on AC adapter ? It seems to me that the fan is used in a much more reasonable way when i use the batteries... Does this inspire you guys an idea ? I found this bug is also present on Dell E6400 (Intel Core 2 Duo T9600 + nvidia Quadro NVS 160M) with bios ver. A12. and Linux 2.6.30 I can notice the same fan's behaviour as described above. Please write if you need more info. Created attachment 22763 [details]
The output of acpidump.
This is acpidump from Latitude E6400 bios A12. Fan is still working almost all the time, I can't see difference whether it's on AC or battery.
Has any progress been made in this front? It's 2011 now, kernel 2.6.39 and this issue is still around (Bios A16). In response to #32. Not awareof any progress. Bug still present. Would be interested in any suggestion of an action that could help this to be solved. This has been suggested, but I haven't been able to test it (I don't have that much knowledge). Maybe someone with better skills could try it: https://bbs.archlinux.org/viewtopic.php?pid=780692#p780692 BTW: doing the (Shift+Fn) + 1,5,3,2,4 (one at a time, in this order) and disabling BIOS thermal control will fix the issue. This has to be set every boot though, and sometimes the BIOS screen that shows up after pressing the keys will freeze the computer. In response to #34: The mentionned suggestion has been obsoletedby commit bc1f419c76a2d6450413ce4349f4e4a07be011d5 In response to #35: which BIOS version is it ? Did you try doing the change in the BIOS before the system starts ? Doesit work and are you able to save the changer in a persistent way ? #36: BIOS is the latest (A16). The procedure is done after you have completely booted your system (I can do it right now, for instance). Right now my fan is completely silent (mode 0) using that "trick" + dellfand. These changes reset when you halt the system and you need to do it every boot, and I don't know of any way to save them. BTW: my system is currently 54ºC hot. I am not able to reproduce the shift+Fn trick, it does not enter the BIOS here. Mariachi: my question was: if you go into the BIOS during the machine starts up, _before_ the bootloader appears, are you able to turn off fan control, too ? If yes: in which menu is it ? incidentally: are you Latitude E5400 users able to use the built-in wireless adapter ? Even on WPA-protected networks ? No, I cannot change the fan control from the BIOS setup on system start. try reading the section that begins with "New stuff" http://en.gentoo-wiki.com/wiki/Dell_Latitude_E6x00#CPU_overheating_throttling My wireless with the broadcom card never worked quite right, so I bought an intel 5300 and now I'm happy camper. Hasn't anybody a (good) contact at Dell that could pass on this bug to someone inside Dell ? This bug has been around since day 1, I don't believe Dell cares about it :( But the fact they didn't react does not prove they don't care, perhaps they just don't know, or perhaps the problem was not reported to the right person so far. They are supposed to be linux friendly. They could at least add the ability to turn off BIOS fan control permanently (persistent after reboot) in one of their BIOS updates... https://lists.us.dell.com/mailman/listinfo/linux-desktops is probably a better starting point I would guess that the SMM (system management mode) code on the notebook is continually overriding the kernel attempts to adjust it. It may be worth disabling hwmon if that occurs and seeing what happens as one reason we've seen BIOS fan control fail is if Linux hwmon reconfigures the temperature sensors. In response to #43: are you suggesting to post the issue on that list ? If this is your suggestion, I am willing to subscribe to the list and post the issue there. In resopnse to #44: I have no hwmon package installed, no mention of hwmon in dmesg and no hwmon appears in lsmod, so I have no idea how I should disable this feature which I suspect is not enabled at all on my system. Adding the Dell engineering team. Thank you Matt! Maybe now something can be done about this ;) By the way. In my logs I have messages like these: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1) CPU0: Core temperature/speed normal [Hardware Error]: Machine check events logged CPU1: Core temperature above threshold, cpu clock throttled (total events = 26) CPU1: Core temperature/speed normal [Hardware Error]: Machine check events logged So I installed mcelog and here is what it logs: Hardware event. This is not a software error. MCE 0 CPU 0 THERMAL EVENT TSC 1404bf3f020 TIME 1309528922 Fri Jul 1 16:02:02 2011 Processor 0 heated above trip temperature. Throttling enabled. Please check your system cooling. Performance will be impacted STATUS 88020003 MCGSTATUS 0 MCGCAP 806 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 15 Sébastien, does the coretemp driver report excessive temperatures too? Jean: the coretemp driver was initially not loaded. So I loaded it a few minutes ago. I got two messages at load time: [11320.610670] coretemp coretemp.0: TjMax is assumed as 100 C! [11320.610730] coretemp coretemp.1: TjMax is assumed as 100 C! Since then nothing else has appeared in dmesg so I assume nothing else has been logged, except if it was logged in another place but then please let me know where to look. The purpose of the coretemp driver isn't to log anything in the kernel, but to expose temperature values to user-space through sysfs. Run "sensors" (from the lm-sensors package) to get the temperature values. For reference, see comments #13 and #16. Output of sensors: acpitz-virtual-0 Adapter: Virtual device temp1: +59.5°C (crit = +102.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +59.0°C (high = +100.0°C, crit = +100.0°C) coretemp-isa-0001 Adapter: ISA adapter Core 1: +60.0°C (high = +100.0°C, crit = +100.0°C) This temperature matches the one reported by acpi -t The real issue is that the operating system seems to have no control over the fan. The idea to use the coretemp driver is to check the reported temperature when the kernel logs a thermal throttling event. It will tell us whether the CPU is really heating or not. That being said, the coretemp driver doesn't yet report the thermal thresholds, so please report the output of: # modprobe msr # rdmsr -x -p 1 0x19b But I completely agree that the root problem appears to be fan not kicking in when they it should. ~# rdmsr -x -p 1 0x19b 3 Hmm, looks like the threshold isn't set, which would mean that the alarm triggers at TjMax i.e. 100°C on your CPU. I'm curious if you'll see any temperature even close to this being reported by the coretemp driver. Maybe I don't really understand the thermal threshold mechanism after all, sorry for the noise. As I said I have just loaded coretemp when you asked for so it isnot loaded usually, but I have never seen such a temperature for my CPU. Still, there are these messages I showed in one of my previous comments and which appear from time to time in my logs: CPU1: Core temperature above threshold, cpu clock throttled (total events = 782) CPU1: Core temperature/speed normal (and the samekind of messages for CPU0) when this happens the temperature is clearly not close to 100 degrees C, so I am suspecting that there are two thresholds, but I don't really know the things I am talking about here. I'm wondering about CONFIG_INTEL_IDLE. It is disabled in my kernel. Would it help to enable it ? And also, the help text in the kernel configuration for this item says: Enable intel_idle, a cpuidle driver that includes knowledge of native Intel hardware idle features. The acpi_idle driver can be configured at the same time, in order to handle processors intel_idle does not support. This sounds interesting to me, but I couldn't find the mentioned acpi_idle driver. Could anybody please say whether these two are relevant and where to find the acpi_idle driver, please ? Thanks. In addition, here are the output of dmesg | grep idle [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled. [ 0.009477] using mwait in idle threads. [ 0.805768] cpuidle: using governor ladder [ 0.805770] cpuidle: using governor menu [ 9.106195] ACPI: acpi_idle registered with cpuidle [ 9.106915] Marking TSC unstable due to TSC halts in idle I'm just wondering whether there wouldn't be something to tweak here, in what the kernel does when the CPU is idle. Thanks for any feedback. Google says: Dell Latitude E5400 Intel Core2 Duo P8400 2.26GHz This CPU should not get controlled by intel_idle.c, it doesn't hurt to enable it, but acpi_idle (processor.ko) would still stay active -> no change. Which C-states are supported on your machine depends on BIOS (with acpi_idle driver, intel_idle does not depend on BIOS). There is a new userspace tool: tools/power/cpupower in the kernel since 3.1-rc1 that tells you which C-states are supported: cpupower idle-info powertop is also a convenient tool to find out about unnecessary power consumption. What helped on my wife's HP which also had always running fans and I searched quite some time on software side: Cleaning the fan. Often a vacuum cleaner is enough. On this machine it was not and it took me quite some time to get the dozens of different screws removed to be able to access the fan and remove the dust. In response to #60: Fan, CPU and heatsink have been replaced three days ago and this did not improve the situation. Motherboard about to be replaced, too. The thing is that, once the fan starts to be noisy, it seems it never goes back to its non noisy staate although the temperature goes down. So for instance one can still hear the fan being rather loud even after more than one hour idle time on a system running in text mode, with only a few services started and whereas the temperature reported by acpi -t is of 31.5 degrees Celcius. So, the motherboard of the laptop with the noisy fan was finally replaced and that didn't seem to solve the problem either. The engineer then had the idea to use another thermal grease. So he took out the heatsink, removed the thermal grease on it (provided by the manufacturer, and who was completely dry just three days after he has put it on the heatsink), and put another thermal grease on the heatsink, from MIcrosi, he said. And that improved the situation a lot. The noise is now completely reasonable and acceptable. So, for those users who have this problem, cleaning the fan and heatsink and changing the thermal grease may be a good thing to try first. On this device the heatsink it not difficult to access: 5 screws to descrew and one pannel to remove. Thanks very much to all those who have helped solving this problem ! I've been away for the past year, so only now did I remember about this... I'm going to change the grease on my laptop too. It has the exact same problem: once the fan kicks in, even if the temperature goes down it never stops -- only way to fix this is to suspend or reboot. But now I see this is marked as invalid so I guess no one will work on this anymore? It has been way too long anyway, I'm not expecting this to be fixed... If your laptop shows it with a modern kernel then probably best to open a new bug specific to that laptop. It could easily be a software problem in your case. In response to #63. I'm fully convinced that the problem has absolutely nothing to do with anything software related. The change of grease has been done in August 2011, so almost one year ago now and the fan is still okay. One way to see that Linux is not causing the problem is to boot Windows. One then notices that the problem persists under that OS. |