Created attachment 22610 [details] ACPI DSDT table System: HP 2510p Core Duo notebook, x86_64 kernel; Debian stable ("Lenny") Yesterday my notebook (HP 2510p Core Duo) shut down hard due to overheating while compiling a kernel and running vlc. Both processor cores continued to run at full speed until the system shut down. IIUC the cores should get slowed down (T-state followed by P-state) before that happens. I was using the ondemand governor for frequency scaling. One thing I noticed is that, although the cores are registered as cooling devices, they are not bound to any thermal zones: /sys/class/thermal$ grep . cooling_device*/type cooling_device0/type:Fan cooling_device1/type:Fan cooling_device2/type:Fan cooling_device3/type:Fan cooling_device4/type:Fan cooling_device5/type:Fan cooling_device6/type:Fan cooling_device7/type:Processor cooling_device8/type:Processor cooling_device9/type:LCD /sys/class/thermal$ ls -1d thermal_zone*/cdev[0-9] thermal_zone0/cdev0 thermal_zone0/cdev1 thermal_zone0/cdev2 thermal_zone0/cdev3 thermal_zone1/cdev0 thermal_zone1/cdev1 thermal_zone1/cdev2 thermal_zone1/cdev3 thermal_zone1/cdev4 thermal_zone1/cdev5 thermal_zone1/cdev6 thermal_zone3/cdev0 thermal_zone3/cdev1 thermal_zone4/cdev0 As you can see, cdev7 and cdev8 are not listed. But according to Documentation/thermal/sysfs-api.txt, they should be listed "if the processor is listed in _PSL method", and AFAICT in my dsdt that is the case for zones 0, 3, 4 and 6 (through method C3B4). AFAICT all required modules are loaded (cpufreq_ondemand is compiled in): cpufreq_conservative 7904 0 cpufreq_userspace 3604 0 cpufreq_stats 4808 0 cpufreq_powersave 1776 0 coretemp 6720 0 acpi_cpufreq 8096 0 processor 39960 3 acpi_cpufreq thermal 16160 0 thermal_sys 16768 4 video,processor,thermal,fan
Created attachment 22611 [details] dmesg after booting the system
Created attachment 22612 [details] Kernel configuration
please attach the full acpidump output. please attach the output of "grep . /proc/acpi/thermal_zone/*/*"
Created attachment 22676 [details] debug patch please apply this debug patch and attach the dmesg output after boot.
Created attachment 22678 [details] dmesg with debugging from patch in comment#4 /proc/acpi/thermal_zone$ grep . */* TZ0/cooling_mode:<setting not supported> TZ0/polling_frequency:<polling disabled> TZ0/state:state: ok TZ0/temperature:temperature: 60 C TZ0/trip_points:critical (S5): 256 C TZ0/trip_points:passive: 99 C: tc1=1 tc2=2 tsp=300 devices=CPU0 CPU1 TZ0/trip_points:active[0]: 88 C: devices=C3C8 TZ0/trip_points:active[1]: 82 C: devices=C3C9 TZ0/trip_points:active[2]: 68 C: devices=C3CA TZ0/trip_points:active[3]: 50 C: devices=C3CB TZ0/trip_points:active[4]: 40 C: devices=C3CC TZ1/cooling_mode:<setting not supported> TZ1/polling_frequency:<polling disabled> TZ1/state:state: ok TZ1/temperature:temperature: 60 C TZ1/trip_points:critical (S5): 110 C TZ3/cooling_mode:<setting not supported> TZ3/polling_frequency:<polling disabled> TZ3/state:state: ok TZ3/temperature:temperature: 51 C TZ3/trip_points:critical (S5): 105 C TZ3/trip_points:passive: 95 C: tc1=1 tc2=2 tsp=300 devices=CPU0 CPU1 TZ4/cooling_mode:<setting not supported> TZ4/polling_frequency:<polling disabled> TZ4/state:state: ok TZ4/temperature:temperature: 34 C TZ4/trip_points:critical (S5): 110 C TZ4/trip_points:passive: 60 C: tc1=1 tc2=2 tsp=300 devices=CPU0 CPU1 TZ5/cooling_mode:<setting not supported> TZ5/polling_frequency:<polling disabled> TZ5/state:state: ok TZ5/temperature:temperature: 50 C TZ5/trip_points:critical (S5): 110 C TZ6/cooling_mode:<setting not supported> TZ6/polling_frequency:<polling disabled> TZ6/state:state: ok TZ6/temperature:temperature: 25 C TZ6/trip_points:critical (S5): 70 C TZ6/trip_points:passive: 60 C: tc1=1 tc2=2 tsp=300 devices=CPU0 CPU1 TZ6/trip_points:active[0]: 54 C: devices=C3B1 TZ6/trip_points:active[1]: 48 C: devices=C3B2 Hmmm. So /proc/acpi/thermal_zone does list the CPUs while /sys/class/thermal does not. And the trip points for the CPUs look to be so high that some critical temperature is reached and the system shut down before they get activated? Or is there still something wrong?
I've just done a little test by running empty loops on both cores while watching contents of /proc/acpi/thermal_zone. It looks as if the critical points of zones 1 and 5 (which don't have CPU trips) may get reached before the higher trip points in other zones are reached.
Created attachment 22682 [details] debug patch v2 weird, the dmesg shows that cooling_device7/8 are bind to the thermal zone successfully... please apply this debug patch and attach the dmesg output. please attach the output of "grep . /sys/class/thermal/*/*" at the same time
Created attachment 22687 [details] dmesg from 2nd patch + /sys/class/thermal contents Thanks Rui. Here's the new info (only included relevant parts of dmesg). I wonder if the problem could be related to the fact that the fan cooling devices are discovered before the thermal zones, while the cpu cooling devices are discovered after?
hah, I see the problem. the processors are bind to the thermal zones successfully. thermal_zone1/cdev0 thermal_zone1/cdev1 thermal_zone1/cdev2 thermal_zone1/cdev3 thermal_zone1/cdev4 thermal_zone1/cdev5 thermal_zone1/cdev6 are not equal cooling_device0 cooling_device1 ... cooling_device6 they just stand for the 1st/2nd/.../6th cooling devices in the current thermal zone. you can run cat thermal_zone1/cdev*/type to get the real cooling device type. Now the problem is why processor is not throttled before critical shutdown. I agree with your guess in comment #6. the critical trip point of one thermal zone is reached while the passive trip point of the other thermal zones is not reached. hmm, I'll generate a debug patch to get the full thermal zone status when critical shutdown, probably posted here later today. :)
Created attachment 22696 [details] debug patch: show all the thermal zones temperature and cooling device state when overheating please apply this patch and put the system into a overheating state, and then attach the /var/log/messages file after the critical shutdown.
> [cdevX is not equal to cooling_deviceX] > they just stand for the 1st/2nd/.../6th cooling devices in the current > thermal zone. Aargh, you are completely correct. It can also be seen by listing the symlinks: /sys/class/thermal/thermal_zone1$ ls -l cdev[0-9] lrwxrwxrwx 1 root root 0 2009-08-13 09:32 cdev0 -> ../cooling_device6 lrwxrwxrwx 1 root root 0 2009-08-13 09:32 cdev1 -> ../cooling_device5 lrwxrwxrwx 1 root root 0 2009-08-13 09:32 cdev2 -> ../cooling_device4 lrwxrwxrwx 1 root root 0 2009-08-13 09:32 cdev3 -> ../cooling_device3 lrwxrwxrwx 1 root root 0 2009-08-13 09:32 cdev4 -> ../cooling_device2 lrwxrwxrwx 1 root root 0 2009-08-13 09:32 cdev5 -> ../cooling_device7 lrwxrwxrwx 1 root root 0 2009-08-13 09:32 cdev6 -> ../cooling_device8 That is *extremely* confusing. And the similarity in names (cdev is an obvious abbreviation of cooling_device) does not help at all. I had even done an ls -l a few times, but still missed that they did not match. (It's also one of the reasons I still don't like sysfs: things are too hard to find and even when you find them it often remains confusing and obfuscated.) Still, sorry for missing that. I still somehow feel I should have seen it myself.
> please apply this patch and put the system into a overheating state, > and then attach the /var/log/messages file after the critical shutdown. There are two problems with that. 1) When the system shut down we had very high temperatures for NL (close to 30C), but currently it's only ~18C so it is doubtful I can get the system to overheat the same way. I could probably simulate hot temps by wrapping the notebook in a towel or something, but the circumstances would still be different. 2) Any messages just before a critical shutdown do *not* show up in the logs. The existing KERN_EMERG message in thermal_zone_device_update() Critical temperature reached (%ld C) is nowhere to be found in the logs for my previous critical shutdown. Still, it probably makes sense to add the zone in that message. I assume that is because the shutdown happens too fast for the syslog daemon to process the message and write it to the log files. (There is a call to emergency_sync() in orderly_poweroff(), so I assume the problem must be earlier, probably in the syslog daemon.) I could possibly work around that by using netconsole to capture console messages on another system. That may just be fast enough to see them. But before I try that I have two questions. Is thermal throttling of processors effective at all? From the code it looks as if the processor would only be throttled *one step at a time* when a trip point is reached. For my system there is only one CPU trip point per thermal zone, and that at fairly high temperatures. So if that would only result in a change from T0 (100%) to T1 (88%), the change may well be insufficient to prevent the overheating. Is there any way to add or modify trip point without changing the BIOS? For example, it looks as if for my system it would make sense to either have CPU throttling occur at lower temperatures in zone 1 than the current 99C, or to add CPU trip point(s) in zones 3 and 5. It would be nice if that could be done through a kernel interface.
(In reply to comment #12) > > But before I try that I have two questions. > > Is thermal throttling of processors effective at all? > From the code it looks as if the processor would only be throttled *one > step at a time* when a trip point is reached. the passive trip point means that thermal driver should poke the passive cooling devices (usually processors). It's not mapped to one the the processor's throttling state. i.e. after passive trip point is reached, processors may enter deeper throttling state if the temperature is still increasing. > For my system there is only > one CPU trip point per thermal zone, that's okay. it's true on all platforms > and that at fairly high temperatures. Hmm, for a passive trip point, a temperature 10C lower than the critical trip point is also normal. > Is there any way to add or modify trip point without changing the BIOS? yes, there is. If the thermal driver is built as a module, load it with module paramter thermal.psv=60 If the thermal driver is built in, boot the laptop with boot option thermeal.psv=60 but note that it changes the passive trip point in all the thermal zones. Anyway, you can give it a try. you can even set it to a lower value, say 40C to see if the processor can be throttled correctly by the thermal driver.
> i.e. after passive trip point is reached, processors may enter deeper > throttling state if the temperature is still increasing. OK. > > Is there any way to add or modify trip point without changing the > > BIOS? > > Yes, there is: thermal.psv=<temp> > But note that it changes the passive trip point in all the thermal > zones. Thanks. Pity that it can't be set per trip point. Hmm. Could that possibly be implemented using something like 'psv=0:85,1:80,3:90' (leaving trip points for zones that are not mentioned at their defaults)? I've tested by setting psv=65 and then doing a kernel build. Results were interesting. In general it works nicely, but some comments below. FYI, my possible cpu frequencies are (using ondemand governor): 1333000, 1067000, 1200000, 933000, 800000 The thermal limiting tripped in thermal_zone 1, which is probably the sensor for the CPU itself. The first thing that happened as soon as the limit was reached was that the cpufreq scaling_max_freq was set to 800000 (lowest value). I guess that is correct? After that it took exactly 60 (!) seconds before the CPU throttling state was changed from T0 to T1. 30 seconds later T1 to T2 and again 30 seconds later T2 to T3. These times are 100% reproducible. At that point zone 1 got below the limit. Immediately throttling went from T3 to T1 and a bit later to T0. Later again scaling_max_freq was raised to 1066400 (see below) and eventually 1333000. In total it took exactly 2 minutes to reach T3 and it also took around 2 minutes to get back to full speed again (I did not time that as exactly). Issues - After 'modprobe -r thermal; modprobe thermal' /proc/acpi/thermal_zone/ is not restored: the files that were there before the module is removed get deleted, but no new ones are created after reloading the module. /sys/class/thermal/ does get created correctly again. There are no kernel errors during module unload/reload. - The first period of 60 seconds before CPU throttling goes from T0 to T1 seems rather long to me. - The scaling_max_freq value of 1066400 looks like some kind of rounding error. As it is smaller than the closest possible value of 1067000, the max is effectively set to only 1200000. Conclusion is that my original issue was almost certainly due overheating to critical value of a zone that does *not* have a passive trip point. I think I will set 'thermal.psv=85' for now in the hope that that will avoid an emergency shutdown in the future. I'll try to take a look at the above issues myself, but any suggestions or debugging patches from you will be most appreciated. Thanks a lot for all the help and info so far!
> - After 'modprobe -r thermal; modprobe thermal' /proc/acpi/thermal_zone/ > is not restored The same happens for the fan module. I've done some basic debugging and AFAICT acpi_thermal_add_fs() runs correctly. So it looks as if files are created, but just not visible to users. One thing I noticed is that after module removal there still exists an empty directory /proc/acpi/thermal_zone. Could it be that that old dir "masks" the newly created one? The same happens with .30 and .28 (I tried battery in .28), so it's not some recent change. I can take the issue to lkml myself if you like.
> - After 'modprobe -r thermal; modprobe thermal' /proc/acpi/thermal_zone/ > is not restored This looks to be a KDE issue: # lsof | grep /proc/acpi/ | awk '{print $1" "$5" "$9}' acpid REG /proc/acpi/event ksysguard DIR /proc/acpi/battery ksysguard DIR /proc/acpi/fan ksysguard DIR /proc/acpi/thermal_zone No idea why it keeps the dirs locked when it's not using any files in them. So, let's concentrate on the other two more interesting issues I mentioned in comment #14 :-)
(In reply to comment #14) > > Yes, there is: thermal.psv=<temp> > > But note that it changes the passive trip point in all the thermal > > zones. > > Thanks. Pity that it can't be set per trip point. Hmm. Could that possibly > be implemented using something like 'psv=0:85,1:80,3:90' (leaving trip > points for zones that are not mentioned at their defaults)? > no, at least we won't do this in ACPI thermal driver. Because overriding the passive trip point is not a good idea from the beginning. it's BIOS that decides when to notify ACPI thermal driver to check the temperature. For example, BIOS will send a notification if the default passive trip point is triggered. But if we override the trip point, we probably can't get the notification when the new trip point is hit. thermal.psv says that it can override the passive trip point while this is misleading in some cases. > I've tested by setting psv=65 and then doing a kernel build. Results were > interesting. In general it works nicely, but some comments below. > > FYI, my possible cpu frequencies are (using ondemand governor): > 1333000, 1067000, 1200000, 933000, 800000 > > The thermal limiting tripped in thermal_zone 1, which is probably the > sensor for the CPU itself. > > The first thing that happened as soon as the limit was reached was that > the cpufreq scaling_max_freq was set to 800000 (lowest value). I guess > that is correct? yes. > After that it took exactly 60 (!) seconds before the CPU throttling state > was changed from T0 to T1. 30 seconds later T1 to T2 and again 30 seconds > later T2 to T3. These times are 100% reproducible. this depends on when BIOS generates ACPI thermal notifications. ACPI thermal driver checks the temperature and decides whether to change the cpu P/T state only if it receives such an notification. > At that point zone 1 got below the limit. Immediately throttling went from > T3 to T1 and a bit later to T0. Later again scaling_max_freq was raised > to 1066400 (see below) and eventually 1333000. > > In total it took exactly 2 minutes to reach T3 and it also took around 2 > minutes to get back to full speed again (I did not time that as exactly). > > Issues > - After 'modprobe -r thermal; modprobe thermal' /proc/acpi/thermal_zone/ > is not restored: the files that were there before the module is removed > get deleted, but no new ones are created after reloading the module. that's weird. the procfs files are surely created again when the driver is loaded. will you please make a double check? > /sys/class/thermal/ does get created correctly again. There are no > kernel errors during module unload/reload. > - The first period of 60 seconds before CPU throttling goes from T0 to T1 > seems rather long to me. that's because system is not overheating. :) > - The scaling_max_freq value of 1066400 looks like some kind of rounding > error. No, 1066400 = 1333000 * 80% the cooling states of a processor are: 0. T0, P0 (full frequency) 1. T0, P0 *80% 2. T0, P0 *60% 3. T0, P0 *40% 4. T1, P0 *40% 5. T2, P0 *40% ... > Conclusion is that my original issue was almost certainly due overheating > to critical value of a zone that does *not* have a passive trip point. yes, I agree.
(In reply to comment #15) > > This looks to be a KDE issue: > > # lsof | grep /proc/acpi/ | awk '{print $1" "$5" "$9}' > acpid REG /proc/acpi/event > ksysguard DIR /proc/acpi/battery > ksysguard DIR /proc/acpi/fan > ksysguard DIR /proc/acpi/thermal_zone > > No idea why it keeps the dirs locked when it's not using any files in them. > > So, let's concentrate on the other two more interesting issues I mentioned in > comment #14 :-) good to know. IMO, all the problems here is verified, and there is nothing we need to do in Linux/ACPI, right? Close this bug report as it's not a kernel bug. Please re-open it if you still have any questions. :)
On Tuesday 18 August 2009, you wrote: > > Thanks. Pity that it can't be set per trip point. Hmm. Could that > > possibly be implemented using something like 'psv=0:85,1:80,3:90' > > (leaving trip points for zones that are not mentioned at their > > defaults)? > > no, at least we won't do this in ACPI thermal driver. > Because overriding the passive trip point is not a good idea from the > beginning. > it's BIOS that decides when to notify ACPI thermal driver to check the > temperature. For example, BIOS will send a notification if the default > passive trip point is triggered. > But if we override the trip point, we probably can't get the > notification when the new trip point is hit. OK. So setting psv=85 is not a solution and I need a modified BIOS. [Some time passes while /me tries to hack his DSDT...] TZ1/temperature:temperature: 62 C TZ1/trip_points:critical (S5): 110 C +TZ1/trip_points:passive: 95 C: tc1=1 tc2=2 tsp=300 devices=CPU0 CPU1 Cool, so now I have a passive trip point on TZ1 too :-) Pity that loading a custom DSDT taints the kernel. > > FYI, my possible cpu frequencies are (using ondemand governor): > > 1333000, 1067000, 1200000, 933000, 800000 [...] > > - The scaling_max_freq value of 1066400 looks like some kind of > > rounding error. > > No, 1066400 = 1333000 * 80% Ugh. Still very, very strange that it comes out so close to a valid cpufreq value, but end up being just below it. Seems illogical. The cpufreq values look almost too nice. As if they've taken 80% and then looked for a nice looking number close to that value. Thanks again for your excellent help on this issue!
(In reply to comment #19) > On Tuesday 18 August 2009, you wrote: > > > Thanks. Pity that it can't be set per trip point. Hmm. Could that > > > possibly be implemented using something like 'psv=0:85,1:80,3:90' > > > (leaving trip points for zones that are not mentioned at their > > > defaults)? > > > > no, at least we won't do this in ACPI thermal driver. > > Because overriding the passive trip point is not a good idea from the > > beginning. > > it's BIOS that decides when to notify ACPI thermal driver to check the > > temperature. For example, BIOS will send a notification if the default > > passive trip point is triggered. > > But if we override the trip point, we probably can't get the > > notification when the new trip point is hit. > > OK. So setting psv=85 is not a solution and I need a modified BIOS. > > [Some time passes while /me tries to hack his DSDT...] > > TZ1/temperature:temperature: 62 C > TZ1/trip_points:critical (S5): 110 C > +TZ1/trip_points:passive: 95 C: tc1=1 tc2=2 tsp=300 devices=CPU0 CPU1 > > Cool, so now I have a passive trip point on TZ1 too :-) > Pity that loading a custom DSDT taints the kernel. > hah, I think you can try this in the thermal sysfs I/F: 1. enter the thermal zones that without passive trip point. 2. echo a proper value aaa to the "passive" file the thermal sysfs driver will bind the processor cooling device to this thermal zone, together with a fake passive trip point aaa. :)
On Friday 21 August 2009, you wrote: > hah, I think you can try this in the thermal sysfs I/F: > 1. enter the thermal zones that without passive trip point. > 2. echo a proper value aaa to the "passive" file > the thermal sysfs driver will bind the processor cooling device to this > thermal zone, together with a fake passive trip point aaa. :) Yes, Matthew Garrett had the same suggestion for me on IRC. Is that option missing in Documentation/thermal/sysfs-api.txt? And even then that doc is not really suitable for end users. I tried it for TZ5, but the system immediately became dreadfully slow (big latencies). I'll open a new BR for that issue later today :-) Also, it's a bit strange that you have to set the temperature in sysfs, but AFAICT the polling_frequency can only be set in procfs (and I think the same goes for cooling_mode)?
> I tried it for TZ5, but the system immediately became dreadfully slow > (big latencies). Ah, I've found the problem. I did 'echo -n 90 >passive', but the temp has to be in millidegrees, so 'echo -n 90000 >passive'. So the latency was probably due to the system being throttled all the way down to nothing due to overheating :-) With the limit set to 90000 it's much happier.
> Ah, I've found the problem. I did 'echo -n 90 >passive', but the temp > has to be in millidegrees, so 'echo -n 90000 >passive'. Would something like the patch below make sense to prevent my mistake? If you like it I'll test it and submit it to the lists. --- a/drivers/thermal/thermal_sys.c +++ b/drivers/thermal/thermal_sys.c @@ -225,6 +225,12 @@ passive_store(struct device *dev, if (!sscanf(buf, "%d\n", &state)) return -EINVAL; + /* sanity check: values below 40000 millidegrees don't make sense + * and can cause the system to go into a thermal heart attack + */ + if (state && state < 40000) + return -EINVAL; + if (state && !tz->forced_passive) { mutex_lock(&thermal_list_lock); list_for_each_entry(cdev, &thermal_cdev_list, node) {
Frans, what's the status of these patches?
> what's the status of these patches? As far as I'm concerned they are ready for integration. All patches except for 1/6 (trivial documentation improvement) and 4/6 have been acked. The only patch that's still somewhat open is 4/6, but I think my version 2 is a good compromise for the concerns raised by Matthew. I sent a summary mail with a request to include them for .32 to Len, with CCs to the acpi list you and Matthew last week: http://www.spinics.net/lists/linux-acpi/msg24533.html. I have not had any response to that.
Ping Len... I think we can ship this patch set in 2.6.32.
This bug report documents a property of the HP BIOS -- that on a hot day, it could hit a critical shutdown in a thermal zone that didn't have a passive trip point. As Linux was faithfully implementing the BIOS design, changing this bug report to the BIOS category, and closed as DOCUMENTED. That said, the patches on the list associated with working around this issue are useful and are now applied to the acpi tree.