Bug 13718 - HP Pavilion dv6 weird fan operation
Summary: HP Pavilion dv6 weird fan operation
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Fan (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_power-fan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-06 08:31 UTC by russellr
Modified: 2009-08-12 06:22 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump from dv6-1160tx (230.19 KB, application/octet-stream)
2009-07-06 12:26 UTC, russellr
Details
info from /sys/class (802 bytes, text/plain)
2009-07-08 12:18 UTC, russellr
Details

Description russellr 2009-07-06 08:31:22 UTC
Hi,

Machine is a dv6-1160tx.

I've attached the acpidump.out for this machine.

The fan goes from silent to very loud for no apparent reason and spends quite a lot of time at very loud when the temperature on both cores is < 50 degrees C.

If I force the temperature up using a CPU hogger, the fan goes on and stays on until I lower the temperature.  That's good!

What's not good is that this damned laptop spends way too much time sounding like a jet engine.

Kernel is 2.6.29.5-191 (Fedora 11).

I've tried adding a custom DSDT but it won't "take".  I'll report this as another bug.

regards,
RR
Comment 1 russellr 2009-07-06 08:49:54 UTC
DSDT bug: http://bugzilla.kernel.org/show_bug.cgi?id=13719
Comment 2 ykzhao 2009-07-06 09:38:31 UTC
Will you please attach the output of acpidump?
   Thanks.
Comment 3 russellr 2009-07-06 12:26:31 UTC
Created attachment 22232 [details]
acpidump from dv6-1160tx

Attached as requested
Comment 4 ykzhao 2009-07-07 00:56:54 UTC
Hi, Russellr
    From the acpidump it seems that there is no ACPI FAN device. In such case the FAN is not controlled by OS. Maybe it is controlled by BIOS. And we can do nothing to fix it.
 
    Thanks.
Comment 5 ykzhao 2009-07-07 00:58:11 UTC
As there is no ACPI fan device, it can't be controlled by OS. So this is not a ACPI bug. Instead it is a BIOS bug.
    This bug will be rejected.

Thanks.
Comment 6 russellr 2009-07-07 08:56:46 UTC
Hi,

Thanks for looking.

I've been scouring the DSDT and adding debug to try to figure this out.

I was inclined to agree with you, but now I've done some scientific tests on this, and it *IS* controlled by Windows (and possibly Linux), somehow.

First, I lowered the ambient temperature of the room (it's Winter here) to under 20 degrees C.

TEST 1 (Windows)
~~~~~~

Next, I booted Windows Vista Home and once the CPU settled down to idle (about 2%) I began a stop watch.

I used a utility to view the CPU temperature, which was around 47 degrees C.

Over 7 minutes the fan stayed completely off.  (For the first 2m45s, the disk was still active - but the fan remained off.)

TEST 2 (Windows)
~~~~~~

I started a CPU benchmark on Windows.  The fan went to level 2, then down to level 1 in a few seconds, then up to level 2 after 30 seconds, then down to level 1 after 40 seconds, and finally down to off when the benchmark stopped.

The CPU temp reached 53 degrees during the middle part of the test.

TEST 3 (Ubuntu 9.04 - 2.6.28 kernel)
~~~~~~

Installed Ubuntu onto the hard disk and booted.

When idle, I started a stop watch.

Over a period of 7 minutes, the pattern was clear: about 40 seconds with the fan off, then 11 seconds with the fan on level 2.

This pattern repeated consistently over the whole test.

I couldn't measure the CPU temp with Ubuntu (couldn't easily find a monitor for it).  CPU temp was probably around 45 degrees C.

I didn't bother with the CPU loading test.

TEST 4 (Fedora 11 - 2.6.29 kernel)
~~~~~~

Booted F11.

When idle, I started a stop watch, and monitored the CPU temp.  It started at 41 degrees and stayed there or a bit lower.

Similar pattern: fan off for a period and then on level 2 for 8 seconds.

In this case the "off" period started at 2 minutes and then gradually decreased, as follows:
2m00s, 1m13s, 1m03s, 0m58s, 0m59s, 0m56s, 0m52s 

obviously, a monotonically decreasing time within the bounds of my measurement accuracy.

The fan "on" time, however, was consistently 8 seconds.

TEST 5 (Fedora 11)
~~~~~~

I used a CPU hog program.

Fan went up to level 1, then level 2 during the 4 minute test.

On cooling, the fan went down to level 1, then a very quiet level before dropping to off.

CPU got to around 60 degrees C during the test.

In this case, once the test was completed and the CPU had cooled (to around 45 degrees), there was a clear pattern: about 43 seconds off and 9 seconds on level 2.

This pattern repeated indefinitely (at least 30 minutes).

The times varied a second or two, but the pattern was clearly repeating.

CONCLUSIONS
~~~~~~~~~~~

It seems clear that Windows can tell this computer to switch the fan off and it stays off.  So, Windows can override the BIOS (or whatever it is).

It also seems clear that something - the BIOS? - can control the fan independently of the OS.  Whether it does this when the OS is running or not, is yet to be proven.

It is also clear that when required the fan will go on and can be at several speed levels.

The above tests showed 4 levels (off, very slightly on, on, and very on).

I believe I've heard other (louder) fan levels when the systems gets much hotter.

Given the above information, I feel sure you must agree that something in the ACPI/DSDT can control the fan.

Somehow Windows is making this computer usable, and Linux cannot make it usable.

What do I/you/we do now to resolve this problem?

I've been reading the ACPI spec and I have in fact improved the DSDT with a custom DSDT.

The default DSDT doesn't show a TZ0 sensor.

When I fixed some compile errors in the DSDT (I think it was when I altered the CRT method to return a value in all cases) this sensor appeared in the Gnome Sensors Applet.

Interestingly, this sensor reports a much higher (about 15-20 degrees) temperature than the 2 libsensors (one for each CPU core) temperatures I monitored in the above tests (which mirrored the temperatures I found using the Windows utility).

The DSDT has two data elements called CFAN and SFAN.  SFAN is written to in a method, but CFAN isn't.

My debug indicates that this SFAN method is never called.

I've added debug to print the values of these two items and they are always zero.

I have no idea whether these items actually have anything to do with the fan.

I haven't had the courage to try writing to them yet.

What do you think?

Please help!

Please tell me how I can help.

I don't want to get into a fight with HP saying "This computer doesn't work with Linux!" because I know they are going to say "We only support Windows".

regards,
RR
Comment 7 russellr 2009-07-07 09:12:10 UTC
Hi again,

Just did another test.

I booted Fedora 11 with my custom DSDT.

The new sensor that appears in the Gnome Sensors Applet is called acpi -> TZ01.

It consistently shows about 14 degrees above the libsensors sensors.

I wonder what it is measuring?

My fan pattern on idle is about 36 seconds off, 11 seconds on level 2.

The 11 seconds is consistent, and the 36 seconds varies a little.

Just more evidence that different OSes caused the fan to operate differently.

Now we have 3 different OSes and one OS with two different DSDTs, and we have 4 different behaviours.

The actual CPU temp doesn't seem to be a factor when it is in the low-to-mid 40's.

regards,
RR
Comment 8 Zhang Rui 2009-07-08 06:15:32 UTC
please check your BIOS and see if there are any options related with the FAN.
what do you get under /sys/class/hwmon/?
does the behaviour change if you disable the ACPI thermal driver and hwmon driver?
Comment 9 russellr 2009-07-08 12:16:56 UTC
Hi,

Thanks for your reply.

In the BIOS the only option is "FAN ON ALWAYS" (or something like that), which I've switched OFF.

I've attached listings of /sys/class/hwmon and /sys/class/thermal.

The 4 cooling devices are 2 x CPU and 2 x LCD (not sure why there would be 2 x LCD unless my external monitor is counted too). I read the "type" file to determine this.

I removed modules: acpi_freq, hwmon, coretemp and it made no difference to the fan behaviour.

My guess is that the fan is entirely hardware controlled, but there's some secret override that an OS can perform.

I've tried zeroing the CFAN and SFAN values in the DSDT every time the temperature is read, and it makes no difference.

I think I read somewhere that Windows puts the DSDT in the registry.  Do you think there's anything I might find in there (I have to re-load Windows to find out)?

Any other ideas or suggestions?

Do you need any more info?

Thanks for your help.

regards,
RR
Comment 10 russellr 2009-07-08 12:18:13 UTC
Created attachment 22256 [details]
info from /sys/class
Comment 11 russellr 2009-07-11 18:07:10 UTC
Hi again,

More information...

If the laptop is running on battery, the fan doesn't do the on-off cycling under Linux as described in the tests.

What should I be looking for in the DSDT?

regards,
RR
Comment 12 russellr 2009-07-11 18:48:26 UTC
I was wrong.

After about 10-15 minutes on battery, the "jet engine" started again.

The pattern started as 1 minute off, 10 second on (level 2).

After a few minutes it changed to 50 seconds off, 9 seconds on (approx).

Then the CPU temp went up (due to some background activity) and the fan went on to level 1 for about 30 seconds.

At the end of that there was a 60 second off period, then 10 seconds on, then 50 seconds off, 10 seconds on, 44 seconds off, 17 seconds on.

I then needed to plug the power in and a different, but similar pattern, started - to the one I recorded originally with the power-on with custom DSDT.

Strangely enough, it managed a period of about 3 minutes of silence for no apparent reason, before the regular jet engine started again.

Someone must know the answer to how this is controlled.

Who do I ask?

regards,
RR
Comment 13 russellr 2009-07-13 01:46:06 UTC
Hi,

More information...

I reinstalled and booted Windows and confirm that the fan behaves sensibly while using that OS.

I extracted the DSDT from the Windows Registry, and decompiled it to dsl (what MS calls asl, apparently).

I then compared it to the DSDT from a Linux boot (Ubuntu 9.04).

I had to deal with 0x0660 vs 0x660 and 1 vs 0x1 and _pr_ vs pr_, and varying spaces, and lines etc. etc.

Also, logic like "lgreaterequal()" vs "lnot(llessthan(...))"

Bottom line: there's not much difference.

There does seem to be some constants that are different.  For example 32 vs offset(0x8).  I don't know what that means.

There are buffers which seem to be initialized with hex vs strings.

It's quite possible the secret is in there somewhere.

Can you tell me where to look to narrow down the search?

What else could be controlling the fan?

Does anyone here know anything that can help?

regards,
RR
Comment 14 russellr 2009-07-13 04:57:45 UTC
Hi,

OK, HP recommended I upgrade the BIOS from F.21 to F.22.

They believe it fixes fan problems.

I tested with Windows first - no change - Windows still silent.

I then booted Fedora 11....

Was very happy for the first 5 minutes - silence!

But, then the jet engine started again; about 40 seconds of silence, then about 10 seconds of jet engine.

After about 4 minutes of this I got several continuous minutes of silence.

Then it started its 40/10 cycle again.

Someone PLEASE HELP!

regards,
RR
Comment 15 Zhang Rui 2009-08-12 06:06:41 UTC
please attach the output of "grep . /sys/class/hwmon/hwmon*/*".
I agree with your point in comment #9

 (In reply to comment #9)
> 
> My guess is that the fan is entirely hardware controlled, but there's some
> secret override that an OS can perform.
> 

But unfortunately the secret is beyond Linux/ACPI scope.
I'd like to debug further but I'm afraid there is little we can help here.
Comment 16 russellr 2009-08-12 06:22:32 UTC
OK, thanks anyway.

I've returned the computer to HP as I've managed to argue that the product is "not a general purpose computer".

They've refunded my money and I've ordered a Dell Latitude.

Note You need to log in before you can comment on or make changes to this bug.