Bug 18552

Summary: 2.6.31 -> 2.6.32 regression - fans running at full-speed after resume from suspend - Intel DG35EC motherboard
Product: Drivers Reporter: Jon Dowland (jon+bugzilla.kernel.org)
Component: Hardware MonitoringAssignee: Jean Delvare (jdelvare)
Status: REJECTED INVALID    
Severity: normal CC: jdelvare, lenb, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35.4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 56331    
Attachments: output of acpidump
/sys/devices/platform/w83627ehf.1760 output
/sys/devices/platform/w83627* output, before suspend
/sys/devices/platform/w83627*

Description Jon Dowland 2010-09-15 08:51:48 UTC
Hello,

With (at least) 2.6.35.4, my CPU fan runs at full speed after resuming from suspend and can not be convinced to climb back down.

I first noticed this occur with the Debian patched kernel 2.6.32 (debian version -21).  The problem did not occur with 2.6.31 (debian version -2). I then reproduced it with a pristine 2.6.35.4.

My machine is a desktop PC with an Intel DG35EC motherboard, quad core 2 duo CPU, sblive PCI sound card, ATI graphics card of some sort (quite old).

I initially reported this upstream at http://bugs.debian.org/596741

Next time I sit down to investigate this I will try bisecting between 2.6.31 and 2.6.32, but any other suggestions for diagnosis gladly welcome.
Comment 1 Jon Dowland 2010-09-17 21:35:26 UTC
I've spent the last few days trying to bisect from .35 to 31.  Here's the log so far:


git bisect start
# bad: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect bad 9fe6206f400646a2322096b56c59891d530e8d51
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# bad: [0b94190e1e60f96962b82d35729d7d44cf298ef8] viafb: fix LCD hardware cursor regression
git bisect bad 0b94190e1e60f96962b82d35729d7d44cf298ef8
# bad: [0969afcc449d5d655784c04e938cf4cfc6e89c0e] Merge branch 'twl4030-mfd' into for-2.6.33
git bisect bad 0969afcc449d5d655784c04e938cf4cfc6e89c0e
# skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype.
git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec
# skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips
git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac
# bad: [678ad5d8aaf8925cb8465f84e1e47d9b1284666a] /proc/kcore: fix stat.st_size
git bisect bad 678ad5d8aaf8925cb8465f84e1e47d9b1284666a
# skip: [a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4] Merge branch 'docs-next' of git://git.lwn.net/linux-2.6
git bisect skip a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4
# skip: [cf33ce15463b784a1d648905fc067fa4d6b17466] net: fix hydra printk format warning
git bisect skip cf33ce15463b784a1d648905fc067fa4d6b17466
# skip: [1ed0ce000a6c20c36ec649e32fc24393ef418ed8] KVM: Use pointer to vcpu instead of vcpu_id in timer code.
git bisect skip 1ed0ce000a6c20c36ec649e32fc24393ef418ed8


The skips have been for commits where the machine would not resume from suspend at all.

If anyone can suggest a narrow range of commits, or some path specs that might help, I'd be very grateful.  This is soul-crushingly tedious :>
Comment 2 Len Brown 2010-09-21 01:56:24 UTC
did you say that 2.6.32 failed?
if yes, don't you want to mark that bad instead of going all the way
up to 2.6.35?

If you want a wild guess, I'd look at changes to drivers/acpi/ec.c
Comment 3 Jon Dowland 2010-09-23 22:09:01 UTC
Hello, thank you for the suggestion.  Yes, it was daft to start the bisection such a long way after an identified good commit.

I re-started from 2.6.32. I've hit a minor stumbling block, a run of commits which won't resume from suspend at all.  I've been skipping those but it's slow progress.

After a few days of testing, I decided to save that progress and restart with just commits that touched that wild-guess path of yours.  I also took the opportunity to trim my config down a lot (was using what was essentially an allmodconfig, optimising for "manually configuring time" at the expense of "huge number of compiles to try" time). I've finished! The log:


git bisect start '--' 'drivers/acpi/ec.c'
# bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1
git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# skip: [3b87bb640e77023c97cf209e3dd85887a1113ad0] Merge branch 'bjorn-start-stop-2.6.32' into release
git bisect skip 3b87bb640e77023c97cf209e3dd85887a1113ad0
# skip: [2a84cb9852f52c0cd1c48bca41a8792d44ad06cc] ACPI: EC: Merge IRQ and POLL modes
git bisect skip 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc
# skip: [cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4] ACPI: EC: remove .stop() method
git bisect skip cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4
# skip: [d02be04707b8ff5375a76c027327e8708877da39] ACPI: EC: remove .start() method
git bisect skip d02be04707b8ff5375a76c027327e8708877da39
# good: [f25752e67d9d9ee7562ae9944314dd8c057d3fa2] ACPI: EC: Drop orphan comment
git bisect good f25752e67d9d9ee7562ae9944314dd8c057d3fa2
# good: [762caf0baafc657c410b9c04f4a95d4e3aa4dda1] Merge branch 'ec' into release
git bisect good 762caf0baafc657c410b9c04f4a95d4e3aa4dda1
# good: [eb27cae8adaa658a0bf31631baa1ce29d8183759] ACPI: linux/acpi.h should not include linux/dmi.h

Recording the last commit (which was good) as good:

$ git bisect good
Bisecting: -1 revisions left to test after this (roughly 0 steps)
[d26f0528d588e596955bf296a609afe52eafc099] Merge branch 'misc-2.6.32' into release

I'm just trying this zeroth step. I'm not sure where to go from there but I will probably try feeding the log into the wider-scoped bisection.
Comment 4 Jon Dowland 2010-09-27 21:48:47 UTC
Ok I finished a fresh bisect (feeding in the skip/good/bad from the path-restricted one). Unfortunately I have got a nonsensical result: 


17d857be649a21ca90008c6dc425d849fa83db5c is the first bad commit                
commit 17d857be649a21ca90008c6dc425d849fa83db5c                                 
Author: Linus Torvalds <torvalds@linux-foundation.org>                          
Date:   Sun Sep 27 14:57:48 2009 -0700                                          
                                                                                
    Linux 2.6.32-rc1                                                            
                                                                                
:100644 100644 f908accd332b877338fdf92380bf52e3734f8cec                         
00444a8e304f04b67c9de2f29ea543912fd67f5d M      Makefile                        


I'm going to have to go back and carefully check all the steps.  One possible problem (apart from me falling asleep at the keyboard and mistakenly classifying a step, perhaps) is that there are two categories of commit that I had to skip: ones that would not resume from suspend at all, and ones which would not compile.  It occurs to me that the latter were mostly in a module (edac I think) which is probably not relevant, and by tweaking my config I could probably eliminate those.

The bisect log in full:


git bisect start
# bad: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect bad 9fe6206f400646a2322096b56c59891d530e8d51
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# bad: [0b94190e1e60f96962b82d35729d7d44cf298ef8] viafb: fix LCD hardware cursor regression
git bisect bad 0b94190e1e60f96962b82d35729d7d44cf298ef8
# bad: [0969afcc449d5d655784c04e938cf4cfc6e89c0e] Merge branch 'twl4030-mfd' into for-2.6.33
git bisect bad 0969afcc449d5d655784c04e938cf4cfc6e89c0e
# skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype.
git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec
# skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips
git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac
# bad: [678ad5d8aaf8925cb8465f84e1e47d9b1284666a] /proc/kcore: fix stat.st_size
git bisect bad 678ad5d8aaf8925cb8465f84e1e47d9b1284666a
# skip: [a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4] Merge branch 'docs-next' of git://git.lwn.net/linux-2.6
git bisect skip a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4
# skip: [cf33ce15463b784a1d648905fc067fa4d6b17466] net: fix hydra printk format warning
git bisect skip cf33ce15463b784a1d648905fc067fa4d6b17466
# skip: [1ed0ce000a6c20c36ec649e32fc24393ef418ed8] KVM: Use pointer to vcpu instead of vcpu_id in timer code.
git bisect skip 1ed0ce000a6c20c36ec649e32fc24393ef418ed8
# skip: [b938fb6f491113880ebaabfa06c6446723c702fd] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect skip b938fb6f491113880ebaabfa06c6446723c702fd
# skip: [b1ab7e4b2a88d3ac13771463be8f302ce1616cfc] VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.
git bisect skip b1ab7e4b2a88d3ac13771463be8f302ce1616cfc
# bad: [e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a] regulator: Add WM831x DC-DC buck convertor support
git bisect bad e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a
# skip: [d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
git bisect skip d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b
# skip: [d22b8ed9a3b0a157b732580258ec16b729265953] Staging: vme: add Tundra TSI148 VME-PCI Bridge driver
git bisect skip d22b8ed9a3b0a157b732580258ec16b729265953
# skip: [e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206] PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
git bisect skip e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206
# skip: [e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a] regulator: Add WM831x DC-DC buck convertor support
git bisect skip e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a
# bad: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
git bisect bad 22763c5cf3690a681551162c15d34d935308c8d7
# bad: [0b94190e1e60f96962b82d35729d7d44cf298ef8] viafb: fix LCD hardware cursor regression
git bisect bad 0b94190e1e60f96962b82d35729d7d44cf298ef8
# bad: [0969afcc449d5d655784c04e938cf4cfc6e89c0e] Merge branch 'twl4030-mfd' into for-2.6.33
git bisect bad 0969afcc449d5d655784c04e938cf4cfc6e89c0e
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype.
git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec
# skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips
git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac
# skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype.
git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec
# skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips
git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac
# skip: [a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4] Merge branch 'docs-next' of git://git.lwn.net/linux-2.6
git bisect skip a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4
# skip: [cf33ce15463b784a1d648905fc067fa4d6b17466] net: fix hydra printk format warning
git bisect skip cf33ce15463b784a1d648905fc067fa4d6b17466
# skip: [1ed0ce000a6c20c36ec649e32fc24393ef418ed8] KVM: Use pointer to vcpu instead of vcpu_id in timer code.
git bisect skip 1ed0ce000a6c20c36ec649e32fc24393ef418ed8
# skip: [b938fb6f491113880ebaabfa06c6446723c702fd] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
git bisect skip b938fb6f491113880ebaabfa06c6446723c702fd
# skip: [b1ab7e4b2a88d3ac13771463be8f302ce1616cfc] VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.
git bisect skip b1ab7e4b2a88d3ac13771463be8f302ce1616cfc
# skip: [d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
git bisect skip d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b
# skip: [d22b8ed9a3b0a157b732580258ec16b729265953] Staging: vme: add Tundra TSI148 VME-PCI Bridge driver
git bisect skip d22b8ed9a3b0a157b732580258ec16b729265953
# skip: [e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206] PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
git bisect skip e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206
# skip: [e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a] regulator: Add WM831x DC-DC buck convertor support
git bisect skip e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a
# bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1
git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c
# skip: [5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3] Staging: vt6656: Integrate vt6656 into build system.
git bisect skip 5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3
# skip: [b81ad777b9ee66a69dd270a451c214b7e443a0c1] Staging: rtl8192su: remove CONFIG_RTL8192_PM ifdefs
git bisect skip b81ad777b9ee66a69dd270a451c214b7e443a0c1
# skip: [cf7474a6f4eda22603591b7d6253dffc224e4784] bnx2: Refine coalescing parameters.
git bisect skip cf7474a6f4eda22603591b7d6253dffc224e4784
# skip: [0a85b6f0ab0d2edb0d41b32697111ce0e4f43496] Staging: Comedi: Lindent changes to comdi driver in staging tree
git bisect skip 0a85b6f0ab0d2edb0d41b32697111ce0e4f43496
# skip: [c7b50db21fe8c295092518e224d60b95e69da3b0] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write()
git bisect skip c7b50db21fe8c295092518e224d60b95e69da3b0
# skip: [945b4ac44e5700acd3d974c176c8ace34b4d2e8e] x86/amd-iommu: Dump illegal command on ILLEGAL_COMMAND_ERROR
git bisect skip 945b4ac44e5700acd3d974c176c8ace34b4d2e8e
# bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1
git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c
# good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657
# skip: [3b87bb640e77023c97cf209e3dd85887a1113ad0] Merge branch 'bjorn-start-stop-2.6.32' into release
git bisect skip 3b87bb640e77023c97cf209e3dd85887a1113ad0
# skip: [2a84cb9852f52c0cd1c48bca41a8792d44ad06cc] ACPI: EC: Merge IRQ and POLL modes
git bisect skip 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc
# skip: [cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4] ACPI: EC: remove .stop() method
git bisect skip cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4
# skip: [d02be04707b8ff5375a76c027327e8708877da39] ACPI: EC: remove .start() method
git bisect skip d02be04707b8ff5375a76c027327e8708877da39
# good: [f25752e67d9d9ee7562ae9944314dd8c057d3fa2] ACPI: EC: Drop orphan comment
git bisect good f25752e67d9d9ee7562ae9944314dd8c057d3fa2
# good: [762caf0baafc657c410b9c04f4a95d4e3aa4dda1] Merge branch 'ec' into release
git bisect good 762caf0baafc657c410b9c04f4a95d4e3aa4dda1
# good: [eb27cae8adaa658a0bf31631baa1ce29d8183759] ACPI: linux/acpi.h should not include linux/dmi.h
git bisect good eb27cae8adaa658a0bf31631baa1ce29d8183759
# good: [d26f0528d588e596955bf296a609afe52eafc099] Merge branch 'misc-2.6.32' into release
git bisect good d26f0528d588e596955bf296a609afe52eafc099
# good: [be90a49ca22a95f184d9f32d35b5247b44032849] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
git bisect good be90a49ca22a95f184d9f32d35b5247b44032849
# good: [a487b6705a811087c182c8cab7e3b5845dfa6ccb] Merge branch 'for-linus' of git://neil.brown.name/md
git bisect good a487b6705a811087c182c8cab7e3b5845dfa6ccb
# good: [b9b9df62e7fd6b5f099c24bc867100ab86e1da5a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
git bisect good b9b9df62e7fd6b5f099c24bc867100ab86e1da5a
# good: [66b7ed40aaf153d634aabff409a0dda675f37f45] ACPI: remove redundant "handle" and "parent" arguments
git bisect good 66b7ed40aaf153d634aabff409a0dda675f37f45
# good: [76e0134f4154aeadac833c2daea32102c64c0bb0] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
git bisect good 76e0134f4154aeadac833c2daea32102c64c0bb0
# good: [3b383767c41be070cae24875789d97b42a3e71a8] Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good 3b383767c41be070cae24875789d97b42a3e71a8
# good: [cce1d9f23213f3a8a43b6038df84a665aa8d8612] Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
git bisect good cce1d9f23213f3a8a43b6038df84a665aa8d8612
# good: [e56d953d190061938b31cabbe01b7f3d76c60bd0] ACPI: IA64=y ACPI=n build fix
git bisect good e56d953d190061938b31cabbe01b7f3d76c60bd0
# good: [6f5071020d5ec89b5d095aa488db604adb921aec] Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good 6f5071020d5ec89b5d095aa488db604adb921aec
# good: [569ec4cc779c8aae03a4659939d08822c9e4a242] ACPI: kill "unused variable ‘i’" warning
git bisect good 569ec4cc779c8aae03a4659939d08822c9e4a242
# good: [b3b75cef705708402b5d381a30fa17f89e0549b4] alpha: Fix duplicate <asm/thread_info.h> include
git bisect good b3b75cef705708402b5d381a30fa17f89e0549b4
Comment 5 Zhang Rui 2010-09-30 02:05:59 UTC
please attach the output of acpidump.
Comment 6 Jon Dowland 2010-09-30 20:37:06 UTC
Created attachment 32112 [details]
output of acpidump

output of acpidump. This was generated after booting 2.6.36-rc5, no suspending has been attempted yet.
Comment 7 Len Brown 2010-10-19 02:14:27 UTC
re: bisect failure

Note that you can limit the scope of the bisect,
say to drivers/acpi/ and that might help.
Comment 8 Zhang Rui 2010-11-08 02:40:38 UTC
Jon,

there is no ACPI fan device on this laptop, which means that the fan is either controlled by native driver or BIOS.

(In reply to comment #0)
> 
> With (at least) 2.6.35.4, my CPU fan runs at full speed after resuming from
> suspend and can not be convinced to climb back down.
> 
what did you do to spin the fan?
Comment 9 Jon Dowland 2010-11-09 09:34:25 UTC
Hi Zhang,

I am not using a laptop, as per #1, 

My machine is a desktop PC with an Intel DG35EC motherboard, quad core 2 duo
CPU, sblive PCI sound card, ATI graphics card of some sort (quite old).

> what did you do to spin the fan?

I did nothing more than wake the machine from sleep. The fans are on full blast as soon as the machine powers up from sleep.
Comment 10 Jon Dowland 2010-12-14 22:52:19 UTC
I have finally revisited this.

The bisect results are patently absurd, since they point at 2.6.32-rc1 (and rc2) as the bad commit, but all that modifies is the version definitions in the Makefile.

I rebuilt and tested the top-most good and bottom-most bad commits from the bisect, which are 2.6.32-rc1 and its immediate child.

Considering that perhaps the problem was more erratic than I thought, I tried 5 successive suspend/resume cycles.  I also tried 2.6.37-rc5+ or HEAD (probably HEAD).

All attempts were successful: the fans wound down as they should.  All tests were performed from single user mode using "pm-suspend". I would have tried more than 5 attempts for each if I had any negatives.

However, moving to multi-user mode from 2.6.32-rc2 in order to write this message, an attempt to resume from suspend initiated from the GNOME menu has the problem occurring.

Anyway, this would seem to absolve the kernel at least.  Thanks to everyone for your help.
Comment 11 Zhang Rui 2010-12-15 00:45:14 UTC
please attach the output of "sensors"
Comment 12 Jon Dowland 2010-12-16 18:49:27 UTC
$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +58.0°C  (high = +84.0°C, crit = +100.0°C)  

coretemp-isa-0001
Adapter: ISA adapter
Core 2:      +61.0°C  (high = +84.0°C, crit = +100.0°C)  

coretemp-isa-0002
Adapter: ISA adapter
Core 1:      +60.0°C  (high = +84.0°C, crit = +100.0°C)  

coretemp-isa-0003
Adapter: ISA adapter
Core 3:      +65.0°C  (high = +84.0°C, crit = +100.0°C)  

w83627dhg-isa-06e0
Adapter: ISA adapter
Vcore:       +0.92 V  (min =  +0.00 V, max =  +1.74 V)   
in1:         +1.55 V  (min =  +1.22 V, max =  +0.95 V)   ALARM
AVCC:        +3.04 V  (min =  +1.41 V, max =  +2.75 V)   ALARM
VCC:         +3.04 V  (min =  +1.54 V, max =  +1.28 V)   ALARM
in4:         +1.01 V  (min =  +0.31 V, max =  +1.45 V)   
in5:         +1.38 V  (min =  +0.10 V, max =  +0.54 V)   ALARM
in6:         +0.34 V  (min =  +1.43 V, max =  +0.76 V)   ALARM
3VSB:        +3.04 V  (min =  +1.31 V, max =  +0.14 V)   ALARM
Vbat:        +3.04 V  (min =  +2.70 V, max =  +0.06 V)   ALARM
fan1:        811 RPM  (min = 3515 RPM, div = 16)  ALARM
fan2:       1360 RPM  (min =  615 RPM, div = 16)
fan3:          0 RPM  (min =  254 RPM, div = 64)  ALARM
fan4:          0 RPM  (min = 7031 RPM, div = 16)  ALARM
temp1:       +85.0°C  (high = +92.0°C, hyst = +88.0°C)  sensor = thermistor
temp2:       +74.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = diode
temp3:      -128.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = diode
cpu0_vid:   +0.000 V


In other news, I have just had a successful resume with 2.6.36-rc5 and radeon.modeset=0.  Just one, I need to try another four.
Comment 13 Jean Delvare 2010-12-17 09:36:16 UTC
Jon, do you have any software-driven fan speed control daemon running by any chance? For example lm-sensors's fancontrol script?

When the fans go to full speed, does it reflect in the RPM values shown by "sensors"?

Please attach the output of:
$ (cd /sys/devices/platform/w83627* && grep . *)
both when the fans are running at normal speed and when the fans are running at full speed.
Comment 14 Jon Dowland 2010-12-18 11:14:19 UTC
I don't have any software-driven fan speed control scripts/daemons running.

I've managed to resume and have the fans at their proper speed five times now with 2.6.36, radeon.modeset=0, and removing that kopt has the fans at full speed again.  Glad we're starting to get somewhere...

I'm just diffing sensors output before/after suspend/resume, I'll report back.

I'll attach this now
(cd /sys/devices/platform/w83627* && grep . *) > fans_full_speed.txt
Comment 15 Jon Dowland 2010-12-18 11:16:02 UTC
Created attachment 40662 [details]
/sys/devices/platform/w83627ehf.1760 output

output of (cd /sys/devices/platform/w83627* && grep . *) after resume with 2.6.36, radeon KMS, and fans running full speed.
Comment 16 Jon Dowland 2010-12-18 16:32:05 UTC
Created attachment 40722 [details]
/sys/devices/platform/w83627* output, before suspend

fans low
Comment 17 Jon Dowland 2010-12-18 16:32:44 UTC
Created attachment 40732 [details]
/sys/devices/platform/w83627*

fans high, post resume
Comment 18 Jon Dowland 2010-12-18 16:37:35 UTC
Inline, diff of "sensors" output, before and after suspend (fans low before, fans high after).

If anything, the reported RPM values are lower after resume.

--- before      2010-12-18 11:10:18.000000000 +0000
+++ after       2010-12-18 16:34:57.000000000 +0000
@@ -1,42 +1,42 @@
 lm63-i2c-1-4c
 Adapter: Radeon i2c bit bus 0x90
-temp1:       +56.0°C  (high = +70.0°C)                  
-temp2:       +62.9°C  (low  =  +0.0°C, high = +70.0°C)  
-                      (crit = +100.0°C, hyst = +95.0°C)  
+temp1:       +44.0°C  (high = +70.0°C)                  
+temp2:       +40.8°C  (low  =  +0.0°C, high = +70.0°C)  
+                      (crit = +85.0°C, hyst = +75.0°C)  
 
 w83627dhg-isa-06e0
 Adapter: ISA adapter
-Vcore:       +0.96 V  (min =  +0.00 V, max =  +1.74 V)   
-in1:         +1.54 V  (min =  +1.22 V, max =  +0.95 V)   ALARM
-AVCC:        +3.04 V  (min =  +1.41 V, max =  +2.75 V)   ALARM
-VCC:         +3.02 V  (min =  +1.54 V, max =  +1.30 V)   ALARM
-in4:         +0.99 V  (min =  +0.31 V, max =  +1.45 V)   
-in5:         +1.37 V  (min =  +0.36 V, max =  +0.54 V)   ALARM
-in6:         +0.36 V  (min =  +1.43 V, max =  +0.76 V)   ALARM
+Vcore:       +0.92 V  (min =  +0.00 V, max =  +1.74 V)   
+in1:         +1.55 V  (min =  +1.22 V, max =  +0.95 V)   ALARM
+AVCC:        +3.04 V  (min =  +1.44 V, max =  +2.77 V)   ALARM
+VCC:         +3.04 V  (min =  +1.54 V, max =  +3.33 V)   
+in4:         +1.00 V  (min =  +0.31 V, max =  +1.51 V)   
+in5:         +1.38 V  (min =  +0.10 V, max =  +0.67 V)   ALARM
+in6:         +0.35 V  (min =  +1.46 V, max =  +1.78 V)   ALARM
 3VSB:        +3.04 V  (min =  +1.33 V, max =  +0.14 V)   ALARM
-Vbat:        +3.02 V  (min =  +2.70 V, max =  +0.06 V)   ALARM
-fan1:       1339 RPM  (min = 3515 RPM, div = 16)  ALARM
-fan2:       2057 RPM  (min =  615 RPM, div = 16)
+Vbat:        +3.60 V  (min =  +2.70 V, max =  +0.06 V)   ALARM
+fan1:       1068 RPM  (min =  390 RPM, div = 16)
+fan2:       1534 RPM  (min =  615 RPM, div = 16)
 fan3:          0 RPM  (min =  257 RPM, div = 128)  ALARM
-fan4:          0 RPM  (min = 42187 RPM, div = 32)  ALARM
-temp1:       +82.0°C  (high = +94.0°C, hyst = +120.0°C)  sensor = thermistor
-temp2:       +84.0°C  (high = +80.0°C, hyst = +75.0°C)  ALARM  sensor = diode
-temp3:       -94.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = diode
+fan4:          0 RPM  (min = 10546 RPM, div = 128)  ALARM
+temp1:       +83.0°C  (high = +94.0°C, hyst = +88.0°C)  sensor = thermistor
+temp2:       +76.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = diode
+temp3:       -76.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = diode
 cpu0_vid:   +0.000 V
 
 coretemp-isa-0000
 Adapter: ISA adapter
-Core 0:      +61.0°C  (high = +84.0°C, crit = +100.0°C)  
+Core 0:      +60.0°C  (high = +84.0°C, crit = +100.0°C)  
 
 coretemp-isa-0001
 Adapter: ISA adapter
-Core 2:      +70.0°C  (high = +84.0°C, crit = +100.0°C)  
+Core 2:      +66.0°C  (high = +84.0°C, crit = +100.0°C)  
 
 coretemp-isa-0002
 Adapter: ISA adapter
-Core 1:      +66.0°C  (high = +84.0°C, crit = +100.0°C)  
+Core 1:      +62.0°C  (high = +84.0°C, crit = +100.0°C)  
 
 coretemp-isa-0003
 Adapter: ISA adapter
-Core 3:      +75.0°C  (high = +84.0°C, crit = +100.0°C)  
+Core 3:      +68.0°C  (high = +84.0°C, crit = +100.0°C)
Comment 19 Jean Delvare 2010-12-19 13:16:18 UTC
We can see that many values change after resume. In the case of input values, it is expected that they are slightly different, either because the machine had the time to cool down (for temperatures) or simply because there's always some variations in monitored values (voltages.) I am a little curious about Vbat though, as +3.60V seems impossible. Does Vbat stick to this value forever after resuming, or does it get back to a more reasonable reading after some time?

For fans, the situation is different: the reported speeds are significantly lower after resume. This could be explained by the fact that the W83627DHG is programmed to adjust the fan speeds depending on temperature.

The changing limits are a bug, apparently the limit registers aren't preserved during suspend, so the driver should save them and restore them at resume time. I can write a patch doing that if you are interested in testing it.

That being said, this will not solve your problem. The readings from "sensors" are pretty clear that the noisy fan is neither the system fan nor the CPU fan. And the fact that booting with radeon.modeset=0 solves the problem clearly points to the graphics adapter's fan. Oddly enough, the LM63 monitoring chip on that adapter does only report temperature and not the fan speed (the fan speed monitoring pin must have been configured to its alternate usage which is alert output). It is possible that the LM63 controls the speed of the fan.

Please report the output of:
$ (cd /sys/bus/i2c/devices/1-004c/ && grep . *)
before and after a "failed" suspend/resume.
Comment 20 Jean Delvare 2010-12-19 13:22:21 UTC
If you have the possibility to build a kernel with CONFIG_HWMON_DEBUG_CHIP=y, it would also be interesting to see what the lm63 driver says when loaded.
Comment 21 Jon Dowland 2010-12-22 19:02:20 UTC
> Please report the output of:
> $ (cd /sys/bus/i2c/devices/1-004c/ && grep . *)
> before and after a "failed" suspend/resume.

Before: 


$ (cd /sys/bus/i2c/devices/1-004c/ && grep . *)
alarms:0
modalias:i2c:lm63
name:lm63
pwm1:12
pwm1_enable:2
temp1_input:56000
temp1_max:70000
temp1_max_alarm:0
temp2_crit:100000
temp2_crit_alarm:0
temp2_crit_hyst:95000
temp2_fault:0
temp2_input:63000
temp2_max:70000
temp2_max_alarm:0
temp2_min:0
temp2_min_alarm:0
uevent:DRIVER=lm63
uevent:MODALIAS=i2c:lm63

After:


$ (cd /sys/bus/i2c/devices/1-004c/ && grep . *)
alarms:0
modalias:i2c:lm63
name:lm63
pwm1:0
pwm1_enable:2
temp1_input:55000
temp1_max:70000
temp1_max_alarm:0
temp2_crit:85000
temp2_crit_alarm:0
temp2_crit_hyst:75000
temp2_fault:0
temp2_input:51875
temp2_max:70000
temp2_max_alarm:0
temp2_min:0
temp2_min_alarm:0
uevent:DRIVER=lm63
uevent:MODALIAS=i2c:lm63

Kernel:

2.6.36-rc5  (one of the latest ones I had pre-built that works enough to get online)
Comment 22 Jon Dowland 2010-12-22 19:06:19 UTC
> If you have the possibility to build a kernel with CONFIG_HWMON_DEBUG_CHIP=y,
> it would also be interesting to see what the lm63 driver says when loaded.

I will try that, thanks.

> That being said, this will not solve your problem. The readings from
> "sensors"
> are pretty clear that the noisy fan is neither the system fan nor the CPU
> fan.
> And the fact that booting with radeon.modeset=0 solves the problem clearly
> points to the graphics adapter's fan

The radeon card is a relatively recent addition to the machine (within the
last 18 months). I've had the machine about three years.  I'm pretty sure I
remember the loud fan (that remains on post suspend) from before I put the card
in, but I will try taking the card out and see what happens, to be sure.
Comment 23 Jon Dowland 2010-12-23 23:33:31 UTC
Kernel: 2.6.37-rc7+ (e819eb8687767cefca7b6abf5ac6d5efcf581eeb)
CONFIG_HWMON_DEBUG_CHIP=y

no output on stderr or kernel ring buffer for "modprobe lm63"

output of "grep . *" in /sys/bus/i2c/drivers/lm63/1-004c, before and after a suspend/resume cycle.

before:

alarms:0
modalias:i2c:lm63
name:lm63
pwm1:12
pwm1_enable:2
temp1_input:56000
temp1_max:70000
temp1_max_alarm:0
temp2_crit:100000
temp2_crit_alarm:0
temp2_crit_hyst:95000
temp2_fault:0
temp2_input:61875
temp2_max:70000
temp2_max_alarm:0
temp2_min:0
temp2_min_alarm:0
uevent:DRIVER=lm63
uevent:MODALIAS=i2c:lm63

after:

alarms:0
modalias:i2c:lm63
name:lm63
pwm1:0
pwm1_enable:2
temp1_input:54000
temp1_max:70000
temp1_max_alarm:0
temp2_crit:85000
temp2_crit_alarm:0
temp2_crit_hyst:75000
temp2_fault:0
temp2_input:57125
temp2_max:70000
temp2_max_alarm:0
temp2_min:0
temp2_min_alarm:0
uevent:DRIVER=lm63
uevent:MODALIAS=i2c:lm63

The fan (whichever it is) is on full-speed after suspend.
Comment 24 Jon Dowland 2010-12-23 23:35:55 UTC
sensors output, post resume with 2.6.37-rc7+ (e819eb8687767cefca7b6abf5ac6d5efcf581eeb) and CONFIG_HWMON_DEBUG_CHIP=y, after "modprobe w83627ehf" (did I mention I had to manually probe that?)


$ sensors
lm63-i2c-1-4c
Adapter: Radeon i2c bit bus 0x90
temp1:       +50.0°C  (high = +70.0°C)                  
temp2:       +45.8°C  (low  =  +0.0°C, high = +70.0°C)  
                      (crit = +85.0°C, hyst = +75.0°C)  

w83627dhg-isa-06e0
Adapter: ISA adapter
Vcore:       +0.89 V  (min =  +0.00 V, max =  +1.74 V)   
in1:         +1.55 V  (min =  +1.22 V, max =  +1.98 V)   
AVCC:        +3.04 V  (min =  +1.44 V, max =  +2.77 V)   ALARM
VCC:         +3.04 V  (min =  +1.54 V, max =  +1.28 V)   ALARM
in4:         +1.00 V  (min =  +0.31 V, max =  +1.53 V)   
in5:         +1.38 V  (min =  +0.10 V, max =  +0.69 V)   ALARM
in6:         +0.35 V  (min =  +1.46 V, max =  +1.78 V)   ALARM
3VSB:        +3.04 V  (min =  +1.33 V, max =  +0.14 V)   ALARM
Vbat:        +3.02 V  (min =  +2.70 V, max =  +0.06 V)   ALARM
fan1:        869 RPM  (min =  390 RPM, div = 16)
fan2:       1360 RPM  (min =  615 RPM, div = 16)
fan3:          0 RPM  (min =  254 RPM, div = 64)  ALARM
fan4:          0 RPM  (min = 6490 RPM, div = 16)  ALARM
temp1:       +83.0°C  (high = +94.0°C, hyst = +88.0°C)  sensor = thermistor
temp2:       +72.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = diode
temp3:       -74.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = diode
cpu0_vid:   +0.000 V


I'll give it a little while and then check to see if VBat changes, as requested.
Comment 25 Jon Dowland 2010-12-24 00:20:55 UTC
VBat has held at +3.02 V.
Comment 26 Jon Dowland 2010-12-24 00:31:14 UTC
Well, my memory clearly has failed me -- I've taken the radeon out temporarily and immediately on boot I notice that the noisy fan is gone - so it *is* the radeon fan.

noise levels are pretty much the same before and after resume (same kernel as last few comments, e819eb8687767cefca7b6abf5ac6d5efcf581eeb)

I've taken the liberty of changing the bug state to "NEEDINFO" rather than "INVALID", I hope that's ok...
Comment 27 Jon Dowland 2010-12-27 18:12:56 UTC
Specifics about the radeon card:

(via Xorg.0.log)


(--) RADEON(0): Chipset: "ATI Radeon X850 XT (R480) (PCIE)" (ChipID = 0x5d52)
...
(II) AIGLX: Loaded and initialized /usr/lib/dri/r300_dri.so

Xorg.0.log output after resume (with fan on full):


(II) AIGLX: Suspending AIGLX clients for VT switch
(II) Open ACPI successful (/var/run/acpid.socket)
(II) AIGLX: Resuming AIGLX clients after VT switch
(II) RADEON(0): EDID vendor "IVM", prod id 22027
(II) RADEON(0): Using hsync ranges from config file
(II) RADEON(0): Using vrefresh ranges from config file
(II) RADEON(0): Printing DDC gathered Modelines:
(II) RADEON(0): Modeline "1920x1080"x0.0  148.50  1920 2008 2052 2200  1080 1084 1089 1125 +hsync +vsync (67.5 kHz)
(II) RADEON(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync (37.9 kHz)
(II) RADEON(0): Modeline "800x600"x0.0   36.00  800 824 896 1024  600 601 603 625 +hsync +vsync (35.2 kHz)
(II) RADEON(0): Modeline "640x480"x0.0   31.50  640 656 720 840  480 481 484 500 -hsync -vsync (37.5 kHz)
(II) RADEON(0): Modeline "640x480"x0.0   31.50  640 664 704 832  480 489 492 520 -hsync -vsync (37.9 kHz)
(II) RADEON(0): Modeline "640x480"x0.0   30.24  640 704 768 864  480 483 486 525 -hsync -vsync (35.0 kHz)
(II) RADEON(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz)
(II) RADEON(0): Modeline "720x400"x0.0   28.32  720 738 846 900  400 412 414 449 -hsync +vsync (31.5 kHz)
(II) RADEON(0): Modeline "1280x1024"x0.0  135.00  1280 1296 1440 1688  1024 1025 1028 1066 +hsync +vsync (80.0 kHz)
(II) RADEON(0): Modeline "1024x768"x0.0   78.75  1024 1040 1136 1312  768 769 772 800 +hsync +vsync (60.0 kHz)
(II) RADEON(0): Modeline "1024x768"x0.0   75.00  1024 1048 1184 1328  768 771 777 806 -hsync -vsync (56.5 kHz)
(II) RADEON(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz)
(II) RADEON(0): Modeline "832x624"x0.0   57.28  832 864 928 1152  624 625 628 667 -hsync -vsync (49.7 kHz)
(II) RADEON(0): Modeline "800x600"x0.0   49.50  800 816 896 1056  600 601 604 625 +hsync +vsync (46.9 kHz)
(II) RADEON(0): Modeline "800x600"x0.0   50.00  800 856 976 1040  600 637 643 666 +hsync +vsync (48.1 kHz)
(II) RADEON(0): Modeline "1152x864"x0.0  108.00  1152 1216 1344 1600  864 865 868 900 +hsync +vsync (67.5 kHz)
(II) RADEON(0): Modeline "1280x960"x0.0  108.00  1280 1376 1488 1800  960 961 964 1000 +hsync +vsync (60.0 kHz)
(II) RADEON(0): Modeline "1280x1024"x0.0  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsync +vsync (64.0 kHz)
(II) RADEON(0): Modeline "1440x900"x0.0   88.75  1440 1488 1520 1600  900 903 909 926 +hsync -vsync (55.5 kHz)
(II) RADEON(0): Modeline "1440x900"x0.0  136.75  1440 1536 1688 1936  900 903 909 942 -hsync +vsync (70.6 kHz)
(II) RADEON(0): Modeline "1680x1050"x0.0  119.00  1680 1728 1760 1840  1050 1053 1059 1080 +hsync -vsync (64.7 kHz)


dmesg output post suspend, grepping for radeon


[ 3118.283789] Back to C!
snip (to establish resume tick)
[ 3118.615004] radeon 0000:01:00.0: restoring config space at offset 0xf (was 0x1ff, writing 0x10b)
[ 3118.615010] radeon 0000:01:00.0: restoring config space at offset 0xc (was 0x0, writing 0xfffe0000)
[ 3118.615016] radeon 0000:01:00.0: restoring config space at offset 0x8 (was 0x1, writing 0x3001)
[ 3118.615020] radeon 0000:01:00.0: restoring config space at offset 0x6 (was 0x4, writing 0x90100004)
[ 3118.615025] radeon 0000:01:00.0: restoring config space at offset 0x4 (was 0xc, writing 0x8000000c)
[ 3118.615029] radeon 0000:01:00.0: restoring config space at offset 0x3 (was 0x800000, writing 0x800010)
[ 3118.615033] radeon 0000:01:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
[ 3118.615917] radeon 0000:01:00.0: setting latency timer to 64
[ 3118.615926] radeon 0000:01:00.0: f6c15800 unpin not necessary
[ 3118.643391] [drm] radeon: 4 quad pipes, 1 z pipes initialized.
[ 3118.643395] radeon 0000:01:00.0: WB enabled
[ 3118.643415] [drm] radeon: ring at 0x0000000060001000
Comment 28 Jean Delvare 2011-01-05 17:13:40 UTC
Jon, sorry for the long silence, I was on vacation.

(In reply to comment #23)
> Kernel: 2.6.37-rc7+ (e819eb8687767cefca7b6abf5ac6d5efcf581eeb)
> CONFIG_HWMON_DEBUG_CHIP=y
> 
> no output on stderr or kernel ring buffer for "modprobe lm63"

Most probably because the lm63 module was already loaded. Try "dmesg | grep lm63" after boot, or "rmmod lm63 && modprobe lm63" and look again.

> output of "grep . *" in /sys/bus/i2c/drivers/lm63/1-004c, before and after a
> suspend/resume cycle.
> 
> before:
> 
> pwm1:12
> temp2_crit:100000
> temp2_crit_hyst:95000
> 
> after:
> 
> pwm1:0
> temp2_crit:85000
> temp2_crit_hyst:75000

As you can see, the critical limit for the remote sensor of the LM63 on your graphics adapter changed. 85/75°C are the hardware defaults. The lm63 driver doesn't currently preserve limits over suspend/resume. That being said, I'm unsure if this explains your actual problem, as the measured temperature is still below the new limit, so that wouldn't cause the fan to kick in. But it might be a similar problem, for example with the automatic fan speed lookup table. The lm63 driver doesn't support this feature yet.

(In reply to comment #26)
> Well, my memory clearly has failed me -- I've taken the radeon out
> temporarily
> and immediately on boot I notice that the noisy fan is gone - so it *is* the
> radeon fan.

I knew it :p

> I've taken the liberty of changing the bug state to "NEEDINFO" rather than
> "INVALID", I hope that's ok...

Not really. Your problem is entirely different from the original bug. This isn't a regression, and you don't have an Intel DG35EC motherboard. So, now that we have clarified what your actual problem is, it would be much better if you would create a _new_, clean bug in the right section, so that the right people (radeon KMS driver maintainers) start looking into it. Feel free to include me in the Cc list, in case I can help with the lm63 driver.
Comment 29 Jon Dowland 2011-02-18 22:36:18 UTC
> Not really. Your problem is entirely different from the original bug.
> This isn't a regression, and you don't have an Intel DG35EC motherboard.

I do have that motherboard, but yes, it proved to be irrelevant, I see.

> So, now that we have clarified what your actual problem is, it would be
> much better if you would create a _new_, clean bug in the right section,
> so that the right people (radeon KMS driver maintainers) start looking
> into it.

Thank you for the suggestion.  I didn't realise how you folks handle these
things. In my community (Debian), we'd treat the bug as the constant, and
alter what it was filed against as we clarified which component was responsible.
It's simply a different way of doing things, I see.

Thank you for all your help.