Bug 31942 - full speed fan starting with 2.6.38 - HP Compaq 6715s
Summary: full speed fan starting with 2.6.38 - HP Compaq 6715s
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Fan (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_power-fan
Depends on:
Blocks: 27352
  Show dependency tree
Reported: 2011-03-27 01:01 UTC by katabami
Modified: 2011-04-12 21:16 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.38
Tree: Mainline
Regression: Yes

Output from 'ls -R /sys/class/thermal' (142 bytes, text/plain)
2011-03-27 01:01 UTC, katabami
cat /proc/cpuinfo (623 bytes, text/plain)
2011-03-27 01:03 UTC, katabami
cat /proc/ioports (1.49 KB, text/plain)
2011-03-27 01:04 UTC, katabami
cat /proc/iomem (1.89 KB, text/plain)
2011-03-27 01:06 UTC, katabami
'grep . /sys/class/thermal/thermal_zone*/*' from 2.6.37 (good) (1.88 KB, text/plain)
2011-03-30 06:45 UTC, katabami
'grep . /sys/class/thermal/thermal_zone*/*' from 2.6.38 (bad) (1.88 KB, text/plain)
2011-03-30 06:46 UTC, katabami
acpidump (333.74 KB, text/plain)
2011-03-30 06:46 UTC, katabami

Description katabami 2011-03-27 01:01:27 UTC
Created attachment 52102 [details]
Output from 'ls -R /sys/class/thermal' 

In kernel 2.6.38, the cpu fan runs at its full speed. I have HP Compaq 6715s with the cpu Mobile Sempron 3400+. 2.6.37 was ok (but it had another problem.)

Attached is the output from ls -R /sys/class/thermal. The result is the same
as previous kernel versions.

Changes I've noticed are:
* thermal/{cooling_device,thermal_zone}?/power has new entries autosuspend_delay_ms, wakeup_active{,_count}, wakeup_{hit_count,last_time_ms,max_time_ms,total_time_ms}
* In 'flag' field of cat /proc/cpuinfo, 'nopl' is added.

Rafael Wysocki said to come here. Thank you very much in advance.
Comment 1 katabami 2011-03-27 01:03:05 UTC
Created attachment 52112 [details]
cat /proc/cpuinfo

cat /proc/cpuinfo
Comment 2 katabami 2011-03-27 01:04:06 UTC
Created attachment 52122 [details]
cat /proc/ioports

cat /proc/ioports
Comment 3 katabami 2011-03-27 01:06:06 UTC
Created attachment 52132 [details]
cat /proc/iomem

cat /proc/iomem

It doesn't seem that lspci -vvv is related to cpu fan, so I don't paste it. All modules I use are pata_acpi and ether card (tg3).
Comment 4 Len Brown 2011-03-29 01:47:20 UTC
is it possible to bisect which change between 2.6.37
and 2.6.38 caused the failure?

please show the output from
grep . /sys/class/thermal/thermal_zone*/*

with both 2.6.37 and 2.6.38

please attach the output from acpidump
Comment 5 katabami 2011-03-30 06:43:20 UTC
Bisection will take a while, but I can.

Thanks Len Brown for your work in kernel acpi.
Comment 6 katabami 2011-03-30 06:45:02 UTC
Created attachment 52582 [details]
'grep . /sys/class/thermal/thermal_zone*/*' from 2.6.37 (good)
Comment 7 katabami 2011-03-30 06:46:24 UTC
Created attachment 52592 [details]
'grep . /sys/class/thermal/thermal_zone*/*' from 2.6.38 (bad)

As you guessed, thresholds seem too low.
Comment 8 katabami 2011-03-30 06:46:55 UTC
Created attachment 52602 [details]
Comment 9 katabami 2011-04-02 05:53:53 UTC
The bad commit was:
commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12
Author: Andreas Herrmann <andreas.herrmann3@amd.com>
Date:   Thu Feb 24 15:53:46 2011 +0100

    x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems
    On some SB800 systems polarity for IOAPIC pin2 is wrongly
    specified as low active by BIOS. This caused system hangs after
    resume from S3 when HPET was used in one-shot mode on such
    systems because a timer interrupt was missed (HPET signal is
    high active).
    For more details see:
    Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
    Tested-by: Andre Przywara <andre.przywara@amd.com>
    Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
    Cc: Borislav Petkov <borislav.petkov@amd.com>
    Cc: stable@kernel.org # 37.x, 32.x
    LKML-Reference: <20110224145346.GD3658@alberich.amd.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
My pc has SB600, which has its own early-quirk.c entry.

I noticed one more thing. When you try to reboot the bad kernel,the screen turns black and the optical drive makes noise "Bz-zz" as usual, but it doesn't proceed any more. (At least 20 secs.) Correct reboot would show the HP logo at the next step.

I reverted this patch on top of 2.6.38, it seems to run like 2.6.36, i.e. almost ok, for hours.

Let me ask: there's another acpi bug between 2.6.34 and 36 for my PC. I understand that acpi drivers underwent serious rewritement after 2.6.36. Will a bisect reveal the problem? (See the next message for the bug detail.) My impression is "no". I suspect that there's a persisten bug in my hardware or in the kernel, and it interplays with the kernel acpi code changes.

Thanks a lot for taking care of this issue.
Comment 10 katabami 2011-04-02 05:54:59 UTC
(Another bug details; not related to the bug of this page.) In short, after resume from s2disk, sometimes s2ram fails.

I can't omit the history. Up to 2.6.34 + distro (gentoo) patch, after s2disk, sometimes, not always, the fan doesn't work even after the cpu temperature exceeds 48C according to /usr/bin/sensors [1]. It can be cured with s2ram. sensors says "ALARM" in that case. (This is almost ok. I run a script to automatically trigger s2ram.)

[1] Sensor chip is:
 Adapter: SMBus PIIX4 adapter at 8200
I ignore acpitz-virtual-0.

It happens less in winter, so I guess if it takes longer to reach 48C,
it has more chance "to be cured."

With 2.6.36 + distro patch, when the fan doesn't work after s2disk, s2ram fails. The screen turns black, the cursor continues blinking at the top left, and it hangs there. sensors doesn't say "ALARM". The trace technique using /sys/power/pm_trace doesn't leave any footprint.

With 2.6.38 without the bad commit - I've only tried once so far - after s2disk, the fan worked over 48C, but the sensors say "ALARM", and s2ram failed in the same manner.

Thanks for reading.
Comment 11 Johannes Niediek 2011-04-04 19:28:17 UTC
I just want to report that I have the same problem on the same hardware. According to the ouput of "sensors", the thermal zone temperature is around 55 degrees all the time, which is much higher than with other kernels I tried.

Don't hesitate to ask for more info, I am not sure what to include here.

Thanks a lot
Comment 12 Rafael J. Wysocki 2011-04-12 21:16:27 UTC
There is the following commit in the mainline that should fix the issue:

commit 1d3e09a304e6c4e004ca06356578b171e8735d3c
Author: Andreas Herrmann <andreas.herrmann3@amd.com>
Date:   Tue Mar 15 15:31:37 2011 +0100

    x86, quirk: Fix SB600 revision check

which also is present in as

so this one should be fixed.

Please open a separate bug entry for the other bug.

Note You need to log in before you can comment on or make changes to this bug.