Bug 14529 - fan control not working - Toshiba M900
Summary: fan control not working - Toshiba M900
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Fan (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-02 03:14 UTC by brice rebsamen
Modified: 2010-09-27 00:26 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.30
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output (52.57 KB, text/plain)
2009-11-03 09:31 UTC, brice rebsamen
Details
dmesg output for kernel 2.6.30 with ACPI_DEBUG turned on (55.54 KB, application/octet-stream)
2009-11-09 03:34 UTC, brice rebsamen
Details
dmesg output for kernel 2.6.31 with ACPI_DEBUG turned on (51.61 KB, text/plain)
2009-11-09 03:35 UTC, brice rebsamen
Details
output of acpidump (300.93 KB, application/octet-stream)
2009-11-09 03:36 UTC, brice rebsamen
Details

Description brice rebsamen 2009-11-02 03:14:02 UTC
I recently got a Toshiba Portégé M900. Running Debian Sid on it. Normal operating temperature is around 75C (degrees Celsius). When performing CPU intensive operations (Matlab computations for instance), it goes up to 98C and oscillates between 98C and 95C.

$ cat /proc/acpi/thermal_zone/TZ0/trip_points
critical (S5):           108 C
passive:                 101 C: tc1=30 tc2=30 tsp=50 devices=P001 P002
active[0]:               98 C: devices=FAN0
active[1]:               82 C: devices=FAN1
active[2]:               72 C: devices=FAN2
active[3]:               62 C: devices=FAN3
active[4]:               57 C: devices=FAN4
active[5]:               45 C: devices=FAN5

I cannot believe these are normal operating temperature. The fan that goes on at 98 degrees, is the one that goes on at startup and under windows (normal operating temperature around 45C).

My idea is that fan ids got messed up somehow: FAN0 should be active at 45C, then FAN1, etc.

98C cannot be healthy for the CPU... and normal operating temp of 75 is too hot: it becomes very uncomfortable for the hands after prolonged utilization.
Comment 1 brice rebsamen 2009-11-03 00:37:29 UTC
Btw I forgot to mention that dmesg shows the following messages:

[   59.112238] ACPI: Device [FAN0] failed to transition to D3                   
[   59.120221] ACPI: Device [FAN1] failed to transition to D3                   
[   59.128237] ACPI: Device [FAN2] failed to transition to D3                   
[   59.132296] ACPI: Device [FAN3] failed to transition to D3                   
[   59.136265] ACPI: Device [FAN4] failed to transition to D3                   
[   60.620166] ACPI: Device [FAN0] failed to transition to D3                   
[   60.630819] ACPI: Device [FAN1] failed to transition to D3                   
[   60.637018] ACPI: Device [FAN2] failed to transition to D3                   
[   60.644149] ACPI: Device [FAN3] failed to transition to D3                   
[   60.652161] ACPI: Device [FAN4] failed to transition to D3                   
[   60.660148] ACPI: Device [FAN5] failed to transition to D3                   
[  722.448492] ACPI: Device [FAN0] failed to transition to D3                   
[  722.456463] ACPI: Device [FAN1] failed to transition to D3                   
[  722.460519] ACPI: Device [FAN2] failed to transition to D3                   
[  722.468490] ACPI: Device [FAN3] failed to transition to D3                   
[  723.956481] ACPI: Device [FAN0] failed to transition to D3                   
[  723.960546] ACPI: Device [FAN1] failed to transition to D3                   
[  723.964591] ACPI: Device [FAN2] failed to transition to D3                   
[  723.968517] ACPI: Device [FAN3] failed to transition to D3                   
[  723.976475] ACPI: Device [FAN4] failed to transition to D3                   
[  724.000537] ACPI: Device [FAN0] failed to transition to D3                   
[  724.004475] ACPI: Device [FAN1] failed to transition to D3                   
[  724.008509] ACPI: Device [FAN2] failed to transition to D3                   
[  724.016541] ACPI: Device [FAN3] failed to transition to D3                   
[  724.021032] ACPI: Device [FAN4] failed to transition to D3                   
[  725.448146] ACPI: Device [FAN0] failed to transition to D3                   
[  725.452146] ACPI: Device [FAN1] failed to transition to D3                   
[  725.456145] ACPI: Device [FAN2] failed to transition to D3                   
[  725.464144] ACPI: Device [FAN3] failed to transition to D3                   
[  725.468145] ACPI: Device [FAN4] failed to transition to D3                   
[  732.428472] ACPI: Device [FAN0] failed to transition to D3                   
[  732.440467] ACPI: Device [FAN1] failed to transition to D3                   
[  732.444568] ACPI: Device [FAN2] failed to transition to D3                   
[  732.448514] ACPI: Device [FAN3] failed to transition to D3                   
[  733.936260] ACPI: Device [FAN0] failed to transition to D3                   
[  733.944252] ACPI: Device [FAN1] failed to transition to D3                   
[  733.948271] ACPI: Device [FAN2] failed to transition to D3                   
[  733.952268] ACPI: Device [FAN3] failed to transition to D3                   
[  733.960292] ACPI: Device [FAN4] failed to transition to D3                   
[  733.976292] ACPI: Device [FAN0] failed to transition to D3
[  733.984246] ACPI: Device [FAN1] failed to transition to D3
[  733.988268] ACPI: Device [FAN2] failed to transition to D3
[  733.992268] ACPI: Device [FAN3] failed to transition to D3
[  734.000530] ACPI: Device [FAN4] failed to transition to D3
[  735.440468] ACPI: Device [FAN0] failed to transition to D3
[  735.446110] ACPI: Device [FAN1] failed to transition to D3
[  735.451070] ACPI: Device [FAN2] failed to transition to D3
[  735.456472] ACPI: Device [FAN3] failed to transition to D3
[  735.464487] ACPI: Device [FAN4] failed to transition to D3
[  736.424473] ACPI: Device [FAN0] failed to transition to D3
[  736.436470] ACPI: Device [FAN1] failed to transition to D3
[  736.440518] ACPI: Device [FAN2] failed to transition to D3
[  736.444514] ACPI: Device [FAN3] failed to transition to D3
[  736.452468] ACPI: Device [FAN4] failed to transition to D3
[  737.940475] ACPI: Device [FAN0] failed to transition to D3
[  737.944554] ACPI: Device [FAN1] failed to transition to D3
[  737.948548] ACPI: Device [FAN2] failed to transition to D3
[  737.952515] ACPI: Device [FAN3] failed to transition to D3
[  737.960470] ACPI: Device [FAN4] failed to transition to D3
[  746.812153] ACPI: Device [FAN0] failed to transition to D3
[  746.816187] ACPI: Device [FAN1] failed to transition to D3
[  746.828150] ACPI: Device [FAN2] failed to transition to D3
[  746.836148] ACPI: Device [FAN3] failed to transition to D3
[  748.328147] ACPI: Device [FAN0] failed to transition to D3
[  748.332189] ACPI: Device [FAN1] failed to transition to D3
[  748.336161] ACPI: Device [FAN2] failed to transition to D3
[  748.341992] ACPI: Device [FAN3] failed to transition to D3
[  748.352149] ACPI: Device [FAN4] failed to transition to D3
[  974.596254] ACPI: Device [FAN0] failed to transition to D3
[  974.608241] ACPI: Device [FAN1] failed to transition to D3
[  974.612275] ACPI: Device [FAN2] failed to transition to D3
[  976.120469] ACPI: Device [FAN0] failed to transition to D3
[  976.124686] ACPI: Device [FAN1] failed to transition to D3
[  976.128607] ACPI: Device [FAN2] failed to transition to D3
[  976.136469] ACPI: Device [FAN3] failed to transition to D3

As you can see this goes on and on for ever...
Comment 2 ykzhao 2009-11-03 01:34:21 UTC
Will you please attach the output of acpidump on your box?

Will you please add the boot option of "acpi.debug_layer=0x04200000 acpi.debug_level=0x04" and attach the output of dmesg?
Thanks.
Comment 3 Len Brown 2009-11-03 02:44:43 UTC
also, does this work properly with any other version of Linux,
newer or older than 2.6.30?
Comment 4 brice rebsamen 2009-11-03 09:30:57 UTC
I tested with Lenny, kernel 2.6.30-2 and 2.6.26-2. With 2.6.26-2, main fan starts around 60 degrees but never stops. dmesg shows the following messages:

[  352.326859] ACPI: Transitioning device [FAN2] to D3
[  352.326871] ACPI: Unable to turn cooling device [f7440338] 'off'
[  352.338885] ACPI: Transitioning device [FAN3] to D3
[  352.338895] ACPI: Unable to turn cooling device [f74402d4] 'off'

repeated for ever.





I could not set the boot options you gave me. Could you please tell me how to do it? I added them in my grub menu file

kopt=root=/dev/sda5 ro acpi.debug_layer=0x04200000 acpi.debug_level=0x04
 
and then called update-grub, which results in the following kernel line:

kernel /boot/vmlinuz-2.6.30-bpo.2-686 root=/dev/sda5 ro acpi.debug_layer=0x04200000 acpi.debug_level=0x04 quiet

However, at boot time it tells me that options are ignored because unknown (or something like that). Nevertheless I am posting the output of dmesg (attached).
Comment 5 brice rebsamen 2009-11-03 09:31:31 UTC
Created attachment 23631 [details]
dmesg output
Comment 6 ykzhao 2009-11-06 05:52:25 UTC
Will you please attach the output of acpidump?

It seems that the CONFIG_ACPI_DEBUG is not enabled in kernel configuration. Will you please try to enable it and attach the output of dmesg?

thanks.
Comment 7 brice rebsamen 2009-11-07 11:29:00 UTC
I'll do it. I'm quite willing to help. But please guide me a bit. Does it mean I have to recompile the kernel? I've done it before, so it should not be a problem. Should I get a vanilla kernel? or the one I am using now? (debian's source)
Thanks
Comment 8 brice rebsamen 2009-11-09 03:34:06 UTC
Created attachment 23699 [details]
dmesg output for kernel 2.6.30 with ACPI_DEBUG turned on
Comment 9 brice rebsamen 2009-11-09 03:35:48 UTC
Created attachment 23700 [details]
dmesg output for kernel 2.6.31 with ACPI_DEBUG turned on

I am running kernel 2.6.30, but I also tried with 2.6.31. So here is the output. Same behavior as with 2.6.30 ...
Comment 10 brice rebsamen 2009-11-09 03:36:33 UTC
Created attachment 23701 [details]
output of acpidump
Comment 11 brice rebsamen 2009-11-09 03:39:42 UTC
So I download source package for debian kernel 2.6.30 and 2.6.31 and compiled them with option ACPI_DEBUG on. I tested them with the kernel boot option you gave me.

I ran some matlab code to load the CPU until it reached the 98C trip point. At that point fan turns on and temperature goes down to 95C, then fans turns off and temperature climbs up again to 98C, etc...

I attached the 2 dmesg output, and acpidump output.
Comment 12 Luis Maia 2009-11-13 12:22:56 UTC
I'm also running Debian with the kernel you mention and on my laptop (LG lw20 express) i notice exactly the same behaviour, except the raise of temperature because for now i'm not using anything heavy enough and processor is always running @800mhz.

I think the reason your fan turns on @98 is because motherboard is overriding the OS.

I still have a 2.6.26 to try, and as soon as i get home+time i can check if behaviour is the same ( i don't think so ).

Btw, Can this be software suspend related? Because i do suspend to ram quite often!

and after a suspend :

Nov 13 10:45:06 localhost acpid: client connected from 2875[0:0]
Nov 13 10:45:06 localhost acpid: 1 client rule loaded
Nov 13 12:36:26 localhost acpid: client 2875[0:0] has disconnected
Nov 13 12:36:26 localhost kernel: [10330.516759] ACPI handle has no context!
Nov 13 12:36:26 localhost kernel: [10330.532237] ACPI handle has no context!
Nov 13 12:36:26 localhost kernel: [10330.532256] ACPI handle has no context!
Nov 13 12:36:26 localhost kernel: [10330.548205] ACPI handle has no context!
Nov 13 12:36:26 localhost kernel: [10330.548223] ACPI handle has no context!
Nov 13 12:36:26 localhost kernel: [10330.564210] ACPI handle has no context!
Nov 13 12:36:26 localhost kernel: [10330.660168] HDA Intel 0000:00:1b.0: power state changed
 by ACPI to D3
Nov 13 12:36:26 localhost kernel: [10330.661926] ACPI: Device [FAN0] failed to transition to D0
Comment 13 brice rebsamen 2009-11-17 11:48:06 UTC
I discovered a new piece of information today. After closing the lid, the laptop goes to sleep mode. After waking up the session, the fan (i.e. the main fan) starts spinning and stays on forever. As a result the cruising temperature is around 45C and when CPU is loaded it goes no higher than 65C. Apparently, fan's speed is constant (and low): it does not spin faster when temperature increases to 65C...
Let me know if I can report more info on that (and how to do it).
Cheers
Comment 14 brice rebsamen 2009-11-19 01:37:01 UTC
Btw, by go to sleep, I meant suspend to ram.
Comment 15 James 2009-11-21 10:31:28 UTC
I'm affected, but not identically.

I have a Toshiba laptop: Satellite L300, model PSLB8A-0FM004.

I was running Ubuntu (2.6.28-15-generic) with kernel option acpi_osi="Linux".  This option would allow the fan to spin down when not busy, without the option the fan would spin up and never spin down.
  i.e. acpi_osi="Linux" would allow "normal" fan behaviour.

I migrated to Debian (squeeze/sid (2.6.30-2-686)).  The fan spins up and does not spin down.  I thought I would add acpi_osi="Linux" to see if that would spin the fan down as it did with the previous (older) Ubuntu Linux kernel.  The fan did not spin down.

With Debian (squeeze/sid (2.6.30-2-686)) & acpi_osi="Linux" I get:

  ACPI: Device [FAN0] failed to transition to D3
    *(1+n)

Please let me know what I can do to help.
Comment 16 brice rebsamen 2009-11-26 03:21:05 UTC
Could it be confirmed in the first place that this is an issue that should be reported here (kernel)? Or shall I report it to the ACPI people? Or perhaps ACPI people are already notified...

Since I discovered that suspending to RAM then waking up triggers the fan I no longer risk burning my CPU (and my hands :). However I'd like to see this issue resolved one day!
Regards
Comment 17 Len Brown 2009-11-26 05:45:04 UTC

[    1.294098] ACPI: Fan [FAN0] (on)
[    1.294397] fan PNP0C0B:01: registered as cooling_device1
[    1.294467] ACPI: Fan [FAN1] (on)
[    1.295468] fan PNP0C0B:02: registered as cooling_device2
[    1.295538] ACPI: Fan [FAN2] (on)
[    1.295833] fan PNP0C0B:03: registered as cooling_device3
[    1.295903] ACPI: Fan [FAN3] (on)
[    1.296242] fan PNP0C0B:04: registered as cooling_device4
[    1.296312] ACPI: Fan [FAN4] (on)
[    1.296613] fan PNP0C0B:05: registered as cooling_device5
[    1.296685] ACPI: Fan [FAN5] (on)

Were the fans really running at max speed during boot?

[    1.368816]  thermal-0263 [00] thermal_get_temperatur: Temperature is 3612 dK

88.05 C is pretty high for a temperature at boot.

[    1.369758]  thermal-0390 [00] thermal_trips_update  : Found critical threshold [3812]
[    1.369934]  thermal-0415 [00] thermal_trips_update  : No hot threshold
[    1.524813]  thermal-0263 [00] thermal_get_temperatur: Temperature is 2732 dK
[    1.560662] thermal LNXTHERM:01: registered as thermal_zone0
[    1.560736] ACPI: Thermal Zone [TZ0] (0 C)

And I don't believe 0 C either...
Comment 18 brice rebsamen 2009-11-26 05:56:15 UTC
What happened at that time is that I rebooted right after compiling the kernel, which explains the temperature of 86C. In that case, what happens usually is that the fan blows really hard during BIOS and GRUB phases, then stops when the kernel gets loaded.

Why does is show 5 fans? Does my laptop really have 5 fans? or are those virtual fans, i.e. a way of controlling the speed?

And the weirdest of all, is that the fan control works perfectly after suspending to RAM. Does it mean that temperature monitoring and fan control works fine, and what's broken is initialization of some parameters / modules...?
Comment 19 Zhang Rui 2009-12-04 05:44:02 UTC
please attach the output of "grep . /sys/class/thermal/*/*".
please "cd /proc/acpi/fan", and turn on the fan manually by setting the "state" file of each FAN device to 0 .
does the temperature goes down at this time?
Comment 20 brice rebsamen 2009-12-04 14:14:17 UTC
Below is the output of grep . /sys/class/thermal/*/*

All fans are already on:
$ cat FAN*/state
status:                  on
status:                  on
status:                  on
status:                  on
status:                  on
status:                  on

Nevertheless I tried the following command as root: 'echo 0 > FAN0/state' for all 6 fans and nothing happened...

Let me know what else I can do and thank you very much for working on this.



$ grep . /sys/class/thermal/*/*
/sys/class/thermal/cooling_device0/cur_state:1  
/sys/class/thermal/cooling_device0/max_state:1  
/sys/class/thermal/cooling_device0/type:Fan     
/sys/class/thermal/cooling_device1/cur_state:1  
/sys/class/thermal/cooling_device1/max_state:1  
/sys/class/thermal/cooling_device1/type:Fan     
/sys/class/thermal/cooling_device2/cur_state:1  
/sys/class/thermal/cooling_device2/max_state:1  
/sys/class/thermal/cooling_device2/type:Fan     
/sys/class/thermal/cooling_device3/cur_state:1  
/sys/class/thermal/cooling_device3/max_state:1  
/sys/class/thermal/cooling_device3/type:Fan     
/sys/class/thermal/cooling_device4/cur_state:1  
/sys/class/thermal/cooling_device4/max_state:1  
/sys/class/thermal/cooling_device4/type:Fan     
/sys/class/thermal/cooling_device5/cur_state:1  
/sys/class/thermal/cooling_device5/max_state:1  
/sys/class/thermal/cooling_device5/type:Fan     
/sys/class/thermal/cooling_device6/cur_state:0  
/sys/class/thermal/cooling_device6/max_state:3  
/sys/class/thermal/cooling_device6/type:Processor
/sys/class/thermal/cooling_device7/cur_state:0   
/sys/class/thermal/cooling_device7/max_state:3   
/sys/class/thermal/cooling_device7/type:Processor
/sys/class/thermal/cooling_device8/cur_state:3   
/sys/class/thermal/cooling_device8/max_state:7   
/sys/class/thermal/cooling_device8/type:LCD      
/sys/class/thermal/thermal_zone0/cdev0_trip_point:7
/sys/class/thermal/thermal_zone0/cdev1_trip_point:6
/sys/class/thermal/thermal_zone0/cdev2_trip_point:5
/sys/class/thermal/thermal_zone0/cdev3_trip_point:4
/sys/class/thermal/thermal_zone0/cdev4_trip_point:3
/sys/class/thermal/thermal_zone0/cdev5_trip_point:2
/sys/class/thermal/thermal_zone0/cdev6_trip_point:1
/sys/class/thermal/thermal_zone0/cdev7_trip_point:1
/sys/class/thermal/thermal_zone0/mode:enabled      
/sys/class/thermal/thermal_zone0/temp:78000        
/sys/class/thermal/thermal_zone0/trip_point_0_temp:108000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:101000  
/sys/class/thermal/thermal_zone0/trip_point_1_type:passive 
/sys/class/thermal/thermal_zone0/trip_point_2_temp:98000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:82000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:72000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:62000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/trip_point_6_temp:57000
/sys/class/thermal/thermal_zone0/trip_point_6_type:active
/sys/class/thermal/thermal_zone0/trip_point_7_temp:45000
/sys/class/thermal/thermal_zone0/trip_point_7_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
Comment 21 Zhang Rui 2009-12-07 02:33:31 UTC
what if you boot with "acpi_enforce_resources=lax"?
does the laptop still running with a high temperature?
Comment 22 brice rebsamen 2009-12-07 03:03:24 UTC
Yes. It still goes to 90C plus!
Comment 23 James 2009-12-08 23:59:24 UTC
I apologize for hijacking this bug report, I don't think my issue was
related.  I did think it was when I originally added my post)

I think I found what the cause was for my problem posted at "Comment #15"
http://bugzilla.kernel.org/show_bug.cgi?id=14529#c15
I am putting it here as well in case someone else is looking for an
answer to the issue I described problem.

The kernel option acpi_osi="Linux" used to be the "fix" with my Ubuntu install.

I migrated to Debian with a squeeze/sid installation and the "fix"
stopped working.  I was stumped.

GRUB.

With GRUB 2 you edit '/etc/default/grub' then run 'update-grub'.

This is the line that _is_ working for me:
GRUB_CMDLINE_LINUX="acpi_osi=\"Linux\""

Note the escape characters.  Previously I wasn't escaping the
quotations therefore the quotes in the kernel option were being
stripped when they were then written to /boot/grub/grub.cfg .

James
Comment 24 Zhang Rui 2009-12-09 01:08:59 UTC
James,
please open another bug report and give a detailed description about your issue.
please attach the dmesg output and acpidump output of your laptop as well.
Comment 25 Zhang Rui 2010-01-13 06:49:10 UTC
brice,

Is this a regression? I mean does the problem always exist in all the kernel you've tried?

please attach the output of "grep . /sys/class/thermal/cooling_device*/*" when the temperature is higher than 45C.
Comment 26 brice rebsamen 2010-01-14 02:31:52 UTC
I've only tried kernel 2.6.30, 2.6.31 and 2.6.32. They all have the problem.

Also as I posted above, I tested with Lenny, kernel 2.6.30-2 and 2.6.26-2. With 2.6.26-2, main fan starts around 60 degrees but never stops. dmesg shows the following messages:

[  352.326859] ACPI: Transitioning device [FAN2] to D3
[  352.326871] ACPI: Unable to turn cooling device [f7440338] 'off'
[  352.338885] ACPI: Transitioning device [FAN3] to D3
[  352.338895] ACPI: Unable to turn cooling device [f74402d4] 'off'

repeated for ever.


Now the temperature is 56C.

$ grep . /sys/class/thermal/cooling_device*/*
/sys/class/thermal/cooling_device0/cur_state:1
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device1/cur_state:1
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:1
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:1
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:1
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:1
/sys/class/thermal/cooling_device5/max_state:1
/sys/class/thermal/cooling_device5/type:Fan
/sys/class/thermal/cooling_device6/cur_state:0
/sys/class/thermal/cooling_device6/max_state:3
/sys/class/thermal/cooling_device6/type:Processor
/sys/class/thermal/cooling_device7/cur_state:0
/sys/class/thermal/cooling_device7/max_state:3
/sys/class/thermal/cooling_device7/type:Processor
/sys/class/thermal/cooling_device8/cur_state:0
/sys/class/thermal/cooling_device8/max_state:7
/sys/class/thermal/cooling_device8/type:LCD
Comment 27 Zhang Rui 2010-01-21 07:57:55 UTC
what if you boot with boot option "acpi.power_nocheck"?
Comment 28 brice rebsamen 2010-01-22 03:32:00 UTC
booting with boot option "acpi.power_nocheck" solves the problem. The fan turns faster when the temperature increases and slows down when the temperature drops, thus effectively maintaining the temperature around 45C when the CPU is moderately loaded, and around 60C when the CPU load gets high.

Thank you very much for that.
Comment 29 Zhang Rui 2010-02-21 06:48:39 UTC
        Name (FSTA, 0x00)

        PowerResource (PF0, 0x00, 0x0000)
        {
            Method (_STA, 0, NotSerialized)
            {
                If (LOr (FSTA, 0x01))
                {
                    Return (One)
                }
                Else
                {
                    Return (Zero)
                }
            }

            Method (_ON, 0, NotSerialized)
            {
                ...
                Or (FSTA, 0x01, FSTA)
                ...
            }

            Method (_OFF, 0, NotSerialized)
            {
                ...
                And (FSTA, Not (0x01), FSTA)
                ...
            }
        }

this is a BIOS problem that the fan device reported uninitiated state.
we need to use acpi.power_nocheck to control ACPI fan devices.
Comment 30 Klaus Doblmann 2010-04-23 13:26:32 UTC
I've just tried to help a chap who has the same problem on a M900 running Ubuntu 10.04RC. However adding acpi.power_nocheck did _NOT_ help in this case.
I know the guy is running the Ubuntu kernel so something could be wrong there but compiling and running a mainline kernel is a bit beyond his abilities I fear.

Could someone please take a look at this thread here and comment on it if they've got an idea: http://ubuntuforums.org/showthread.php?t=1456703

relevant conversation starts on page 4: http://ubuntuforums.org/showthread.php?t=1456703&page=4
Comment 31 brice rebsamen 2010-05-28 23:54:42 UTC
with a newer kernel (2.6.32-9 or 2.6.32-trunk-686) the problem reappears, and this time, booting with the acpi.power_nocheck option does not help.

- if I boot with a cold laptop, the fan will never start blowing and temperature will rise.
- if I suspend to RAM and resume, the fan start blowing and temperature control is fine.
- if I boot with a hot laptop, the fan start blowing and temperature control is fine.

Reverting to using kernel 2.6.31 for now.
Comment 32 Zhang Rui 2010-06-09 05:52:42 UTC
as this is a regression, can you run git bisect to find out which commit introduces this problem please?
Comment 33 Klaus Doblmann 2010-06-09 08:48:10 UTC
Updating my report above:
I told the guy to update the BIOS very early on and I thought he had done so. As he continued to write about the problem I found this bugreport here and posted post #30
As it turned out the chap didn't follow my instructions which we found out later when someone else posted that updating the BIOS solved the problem for him. So we finally got the guy with the alleged regression to update his BIOS and all was fine using the acpi.power_nocheck option.

@Brice: Could you make sure you're running the latest BIOS? This seems to have fixed the problem for at least two guys I know.
Comment 34 brice rebsamen 2010-06-15 08:47:42 UTC
@Klaus: I updated to the latest BIOS provided by TOSHIBA. That did not solve the problem.

@Zhang: I will try to find the bug with git-bisect, but I have never done it before, so it will take some time; besides the problem only appears when booting with a cold CPU, so definitely I will need time.
Comment 35 brice rebsamen 2010-06-16 03:42:46 UTC
I discovered a strange behavior with the 2.6.32 line of kernels. When booting with a cold CPU, when loading the CPU, the temperature rises up to 90+ degrees celsius (fan not working). After stopping the process that loads the CPU, it starts to cool down slowly. When the temperature falls below 70 something, the fan starts working, quickly bringing the temperature to a normal operating range and maintaining it there ever after (normal behavior).

I will let you know the result of the git bisect search asap.
Comment 36 brice rebsamen 2010-06-17 03:31:26 UTC
I started to test a few kernels from the git repository. My fan problem appears with a custom built 2.6.31 kernel, but not with Debian's 2.6.31 (linux-image-2.6.31-1-686). So I am not sure how to proceed. Could you please advise?
Comment 37 Zhang Rui 2010-06-22 08:35:51 UTC
(In reply to comment #35)
> I discovered a strange behavior with the 2.6.32 line of kernels. When booting
> with a cold CPU, when loading the CPU, the temperature rises up to 90+
> degrees
> celsius (fan not working).

what do you mean by loading the CPU?

> After stopping the process that loads the CPU, it
> starts to cool down slowly.

which process to stop?

(In reply to comment #36)
> I started to test a few kernels from the git repository. My fan problem
> appears
> with a custom built 2.6.31 kernel, but not with Debian's 2.6.31
> (linux-image-2.6.31-1-686).

I have no idea so far. :(
Comment 38 brice rebsamen 2010-06-23 09:50:05 UTC
By loading the CPU I mean running a program that is computationally intensive (a python program that does large random matrix inversions).

Here is how I reproduce the bug:
- make sure the CPU is cool (the computer has been turned off for at least 30 minutes)
- boot with a problematic kernel (see above) with acpi.power_nocheck option turned on (see above)
- run the python script to stress the CPU.
- temperature reach 90C, fan does not start...
- kill the python script (Ctrl-C)

Option one:
- temperature goes down slowly. At about 70C the fan suddenly starts, thus quickly cooling down the CPU to 50C. from now on, the fan works properly.

Option two:
- suspend to RAM and resume
- the fan starts, thus quickly cooling down the CPU to 50C. from now on, the fan works properly


Weird, isn't it?
Comment 39 Zhang Rui 2010-06-30 05:25:11 UTC
(In reply to comment #38)

> 
> Here is how I reproduce the bug:
> - make sure the CPU is cool (the computer has been turned off for at least 30
> minutes)
> - boot with a problematic kernel (see above) with acpi.power_nocheck option
> turned on (see above)

please kill acpid at this time.
and run "cat /proc/acpi/event"

> - run the python script to stress the CPU.
> - temperature reach 90C, fan does not start...

please attach the output of "grep . /sys/class/thermal/*/*" at this time.
please attach the output of /proc/acpi/event file.

> - kill the python script (Ctrl-C)
> 
> Option one:
> - temperature goes down slowly. At about 70C the fan suddenly starts, thus
> quickly cooling down the CPU to 50C. from now on, the fan works properly.
> 
please re-do the same test in this case, and check if there is an ACPI event (a new line in /proc/acpi/event) before the fan starts to spin.
Comment 40 Zhang Rui 2010-09-03 03:22:34 UTC
ping...
Comment 41 Zhang Rui 2010-09-27 00:26:06 UTC
close this bug as there is no response for more than one month.

Note You need to log in before you can comment on or make changes to this bug.