Bug 12516 - Fan speed not adjusted to current thermal zone after boot
Summary: Fan speed not adjusted to current thermal zone after boot
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Fan (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-20 13:34 UTC by Frans Pop
Modified: 2009-02-26 23:59 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.29-rc2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Output of acpidump (377.01 KB, text/plain)
2009-01-21 03:46 UTC, Frans Pop
Details
Output of dmesg (32.95 KB, text/plain)
2009-01-21 03:52 UTC, Frans Pop
Details
dmesg and /proc/acpi info with acpi debugging (88.12 KB, text/plain)
2009-01-22 12:07 UTC, Frans Pop
Details

Description Frans Pop 2009-01-20 13:34:03 UTC
Latest working kernel version: N/A
Earliest failing kernel version: 2.6.28.1 (earlier versions not tested)
Distribution: Debian
Hardware Environment: HP 2510p Core2 Duo notebook
Software Environment: Debian Lenny

Problem Description:
When I boot the notebook, the BIOS sets the fan speed to some intermediate speed when the grub menu is displayed.

After Linux has completed to boot, I would expect the fan speed to be adjusted to the proper speed for the current thermal zone, but that does not happen. Instead it remains at the "default" speed until one of the trip points is hit (e.g. due to running a CPU intensive task). From that point onwards the fan speed is correctly adjusted dynamically.

Selected values from /proc/acpi:
$ grep . thermal_zone/*/* | head -n11; echo; grep . fan/*/*
thermal_zone/TZ0/cooling_mode:<setting not supported>
thermal_zone/TZ0/polling_frequency:<polling disabled>
thermal_zone/TZ0/state:state:                   active[4]
thermal_zone/TZ0/temperature:temperature:             51 C
thermal_zone/TZ0/trip_points:critical (S5):           256 C
thermal_zone/TZ0/trip_points:passive:                 99 C: tc1=1 tc2=2 tsp=300 devices=CPU0 CPU1
thermal_zone/TZ0/trip_points:active[0]:               88 C: devices=C3C8
thermal_zone/TZ0/trip_points:active[1]:               82 C: devices=C3C9
thermal_zone/TZ0/trip_points:active[2]:               68 C: devices=C3CA
thermal_zone/TZ0/trip_points:active[3]:               60 C: devices=C3CB
thermal_zone/TZ0/trip_points:active[4]:               40 C: devices=C3CC

fan/C3B1/state:status:                  off
fan/C3B2/state:status:                  off
fan/C3C8/state:status:                  off
fan/C3C9/state:status:                  off
fan/C3CA/state:status:                  off
fan/C3CB/state:status:                  off
fan/C3CC/state:status:                  on

This shows the fan should be running at its lowest speed (state: active[4], fan on for C3CC). The actual fan speed however is (based on noise level) somewhere between the C3CB and C3CA level.
Comment 1 ykzhao 2009-01-20 17:19:56 UTC
Will you please attach the output of acpidump, dmesg?
     Thanks.
Comment 2 ykzhao 2009-01-20 17:20:35 UTC
How about the boot option of "acpi.power_nocheck=1"?
Thanks.
Comment 3 Frans Pop 2009-01-21 03:46:25 UTC
Created attachment 19918 [details]
Output of acpidump
Comment 4 Frans Pop 2009-01-21 03:52:15 UTC
Created attachment 19919 [details]
Output of dmesg

The following line is new since 2.6.29, but I would guess is unrelated:
FADT: X_PM1a_EVT_BLK.bit_width (16) does not match PM1_EVT_LEN (4)

There was at least one other report of the same message (on a totally different system): http://lkml.org/lkml/2009/1/10/15
Comment 5 Frans Pop 2009-01-21 09:17:16 UTC
> How about the boot option of "acpi.power_nocheck=1"?

If I add that option the fan behaves as I would expect and follows the thermal zone from very early in the boot (even before the root file system is mounted).
What does it do? Effectively boot the same way as if the laptop does not have mains power connected (i.e. as if running on battery)?

I can see some justification in that, but it still seems inconsistent to keep the fan running indefinitely and ignore the current thermal zone when mains power is connected, especially as eventually you will end up in the situation where the thermal zone determines fan speed anyway.

Isn't it a bit too much like random behavior to have to wait for hitting a trip point for the first time?
Comment 6 ykzhao 2009-01-21 19:34:11 UTC
(In reply to comment #4)
> Created an attachment (id=19919) [details]
> Output of dmesg
> 
> The following line is new since 2.6.29, but I would guess is unrelated:
> FADT: X_PM1a_EVT_BLK.bit_width (16) does not match PM1_EVT_LEN (4)
> 
We see this issue on many laptops. Will you please try the boot option of "acpi=rsdt"?
> There was at least one other report of the same message (on a totally
> different
> system): http://lkml.org/lkml/2009/1/10/15
> 
Comment 7 ykzhao 2009-01-21 21:50:50 UTC
Hi, Frans 
> What does it do? Effectively boot the same way as if the laptop does not have
> mains power connected (i.e. as if running on battery)?
The option of "acpi.power_nocheck=1" is only to solve the issue that the ACPI object can't return the real state of Fan/Power_resource. For example: The power resource is already turned off. But the On status is always returned. So adding this otpion ignores the check of incorrect status.
Maybe on your box the incorrect fan status is returned in the boot phase. 
Please add it and see whether the FAN device speed is correct.
    Will you please enable CONFIG_ACPI_DEBUG in kernel configuration and add the boot option of "acpi.debug_layer=0x04A10000 acpi.debug_level=0x17"?
    After the system is booted, please attach the output of dmesg.
Thanks.
Comment 8 Frans Pop 2009-01-22 07:24:47 UTC
>> The following line is new since 2.6.29, but I would guess is unrelated:
>> FADT: X_PM1a_EVT_BLK.bit_width (16) does not match PM1_EVT_LEN (4)
> 
> We see this issue on many laptops. Will you please try the boot option
> of "acpi=rsdt"?

That gives me the following diff in dmesg, but as you can see the message is still there:
 ACPI: RSDP 000F7960, 0024 (r2 HP    )
-ACPI: XSDT 7E7C81C8, 007C (r1 HPQOEM SLIC-MPC        1 HP          1)
-ACPI: FACP 7E7C8084, 00F4 (r4 HP     30C9            3 HP          1)
+ACPI: RSDT 7E7C8178, 0050 (r1 HP     30C9     18060820 HP          1)
+ACPI: FACP 7E7C8000, 0084 (r2 HP     30C9            2 HP          1)
 FADT: X_PM1a_EVT_BLK.bit_width (16) does not match PM1_EVT_LEN (4)
 ACPI: DSDT 7E7C8538, 13484 (r1 HP       nc2500    10000 MSFT  3000001)
 ACPI: FACS 7E7E7D80, 0040
[... (~100 lines lower down) ...]
-ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
Comment 9 Frans Pop 2009-01-22 10:43:41 UTC
>> How about the boot option of "acpi.power_nocheck=1"?
> If I add that option the fan behaves as I would expect and follows the
> thermal zone from very early in the boot (even before the root file
> system is mounted).

I've tried this boot option two more times now (once with debugging and once without) and the result is now different: the fan stays on the same way as without the option. No idea why it did seem to work the first time. Maybe a trip point was hit by accident during the boot?

> Maybe on your box the incorrect fan status is returned in the boot phase. 
> Please add it and see whether the FAN device speed is correct.

I cannot see the actual fan _speed_ anywhere, only a fan "power level" in
/proc/acpi/fan/*/state: if more of these are "on", the fan is running faster. What I do observe after boot is that the actual fan speed is higher than the state reported in /proc/acpi/fan/*/state.

I will add the debug logs you requested after this comment.
Comment 10 Frans Pop 2009-01-22 12:07:31 UTC
Created attachment 19941 [details]
dmesg and /proc/acpi info with acpi debugging

This is with the debugging you requested, but without acpi.power_nocheck.

Included at the bottom of the log also the fan and thermal status from /proc/acpi just after booting (with the fan running at the "BIOS" speed).
After that I've included the debug messages I get when I start two empty loops to force the temperature up, which hits a trip point and makes the fan speed match the thermal zone (despite the temp going up the fan actually slows down), and a bit later I stop the empty loops and the system goes down to the lowest thermal zone (which is equal to the initial thermal zone!) and the fan goes into its lowest speed.
And last again the status from /proc/acpi.

Looking at the log I see:
- line 319-360: all fan levels initially off
- line 640-731: binding drivers; all levels still off
- starting at line 750: detection of thermal zones
- line 840-849: fan gets set to lowest level (C3CC), but fan speed does not change
- line 1299-1326: fan levels consistent with previous setting, actual fan speed still at BIOS level
- line 1338-1386: fan and thermal zone status consistent with above
- line 1389-1415: hit trip point; actual fan speed changes to match thermal zone

So AFAICT the kernel never gets to "see" the BIOS fan level/speed.

What I find surprising is that the messages in lines 840-849 are basically identical to those in lines 1397-1406. So why does the hardware respond by changing the fan speed in the second case but not in the first one?
Comment 11 ykzhao 2009-01-22 18:51:44 UTC
HI, Frans
    Thanks for the test. From the dmesg log it seems that ACPI fan device is correctly controlled according to the change of temperature. 
    For example: In the boot phase the temperature of TZ0 is 53, which is above the threshold of AC4. In such case the FAN(C3CC) device is turned on.
    Then when the temperature rises, the trip points is also re-checked. When the temperature is 60, which is above the threshold of AC3. In such case the FAN(C3CB) is turned on. When the temperature is decreased, the C3CB FAN device is turned off again. 
    >the messages in lines 840-849 are basically identical to those in lines 1397-1406. But they have different behaviour.
    Maybe the major difference is that the trip-points is re-checked.
    
    How about the boot option of acpi_osi="!Windows 2006" acpi_osi="!Windows 2001 SP2"?
    Thanks.
Comment 12 ykzhao 2009-02-26 23:59:51 UTC
Hi, Frans
    Any test result if the boot option of acpi_osi="!Windows 2006" acpi_osi="!Windows 2001 SP2"?
    
    In fact from the test it seems that the ACPI fan is turned on/off correctly according to the temperature of thermal zone. But sometimes the real fan speed is not what we expected. Maybe this is related with BIOS.
   At the same time there is no response for more than one month, the bug will be rejected.
   thanks.

Note You need to log in before you can comment on or make changes to this bug.