Hello, With (at least) 2.6.35.4, my CPU fan runs at full speed after resuming from suspend and can not be convinced to climb back down. I first noticed this occur with the Debian patched kernel 2.6.32 (debian version -21). The problem did not occur with 2.6.31 (debian version -2). I then reproduced it with a pristine 2.6.35.4. My machine is a desktop PC with an Intel DG35EC motherboard, quad core 2 duo CPU, sblive PCI sound card, ATI graphics card of some sort (quite old). I initially reported this upstream at http://bugs.debian.org/596741 Next time I sit down to investigate this I will try bisecting between 2.6.31 and 2.6.32, but any other suggestions for diagnosis gladly welcome.
I've spent the last few days trying to bisect from .35 to 31. Here's the log so far: git bisect start # bad: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35 git bisect bad 9fe6206f400646a2322096b56c59891d530e8d51 # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # bad: [0b94190e1e60f96962b82d35729d7d44cf298ef8] viafb: fix LCD hardware cursor regression git bisect bad 0b94190e1e60f96962b82d35729d7d44cf298ef8 # bad: [0969afcc449d5d655784c04e938cf4cfc6e89c0e] Merge branch 'twl4030-mfd' into for-2.6.33 git bisect bad 0969afcc449d5d655784c04e938cf4cfc6e89c0e # skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype. git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec # skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac # bad: [678ad5d8aaf8925cb8465f84e1e47d9b1284666a] /proc/kcore: fix stat.st_size git bisect bad 678ad5d8aaf8925cb8465f84e1e47d9b1284666a # skip: [a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4] Merge branch 'docs-next' of git://git.lwn.net/linux-2.6 git bisect skip a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4 # skip: [cf33ce15463b784a1d648905fc067fa4d6b17466] net: fix hydra printk format warning git bisect skip cf33ce15463b784a1d648905fc067fa4d6b17466 # skip: [1ed0ce000a6c20c36ec649e32fc24393ef418ed8] KVM: Use pointer to vcpu instead of vcpu_id in timer code. git bisect skip 1ed0ce000a6c20c36ec649e32fc24393ef418ed8 The skips have been for commits where the machine would not resume from suspend at all. If anyone can suggest a narrow range of commits, or some path specs that might help, I'd be very grateful. This is soul-crushingly tedious :>
did you say that 2.6.32 failed? if yes, don't you want to mark that bad instead of going all the way up to 2.6.35? If you want a wild guess, I'd look at changes to drivers/acpi/ec.c
Hello, thank you for the suggestion. Yes, it was daft to start the bisection such a long way after an identified good commit. I re-started from 2.6.32. I've hit a minor stumbling block, a run of commits which won't resume from suspend at all. I've been skipping those but it's slow progress. After a few days of testing, I decided to save that progress and restart with just commits that touched that wild-guess path of yours. I also took the opportunity to trim my config down a lot (was using what was essentially an allmodconfig, optimising for "manually configuring time" at the expense of "huge number of compiles to try" time). I've finished! The log: git bisect start '--' 'drivers/acpi/ec.c' # bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1 git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # skip: [3b87bb640e77023c97cf209e3dd85887a1113ad0] Merge branch 'bjorn-start-stop-2.6.32' into release git bisect skip 3b87bb640e77023c97cf209e3dd85887a1113ad0 # skip: [2a84cb9852f52c0cd1c48bca41a8792d44ad06cc] ACPI: EC: Merge IRQ and POLL modes git bisect skip 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc # skip: [cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4] ACPI: EC: remove .stop() method git bisect skip cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4 # skip: [d02be04707b8ff5375a76c027327e8708877da39] ACPI: EC: remove .start() method git bisect skip d02be04707b8ff5375a76c027327e8708877da39 # good: [f25752e67d9d9ee7562ae9944314dd8c057d3fa2] ACPI: EC: Drop orphan comment git bisect good f25752e67d9d9ee7562ae9944314dd8c057d3fa2 # good: [762caf0baafc657c410b9c04f4a95d4e3aa4dda1] Merge branch 'ec' into release git bisect good 762caf0baafc657c410b9c04f4a95d4e3aa4dda1 # good: [eb27cae8adaa658a0bf31631baa1ce29d8183759] ACPI: linux/acpi.h should not include linux/dmi.h Recording the last commit (which was good) as good: $ git bisect good Bisecting: -1 revisions left to test after this (roughly 0 steps) [d26f0528d588e596955bf296a609afe52eafc099] Merge branch 'misc-2.6.32' into release I'm just trying this zeroth step. I'm not sure where to go from there but I will probably try feeding the log into the wider-scoped bisection.
Ok I finished a fresh bisect (feeding in the skip/good/bad from the path-restricted one). Unfortunately I have got a nonsensical result: 17d857be649a21ca90008c6dc425d849fa83db5c is the first bad commit commit 17d857be649a21ca90008c6dc425d849fa83db5c Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun Sep 27 14:57:48 2009 -0700 Linux 2.6.32-rc1 :100644 100644 f908accd332b877338fdf92380bf52e3734f8cec 00444a8e304f04b67c9de2f29ea543912fd67f5d M Makefile I'm going to have to go back and carefully check all the steps. One possible problem (apart from me falling asleep at the keyboard and mistakenly classifying a step, perhaps) is that there are two categories of commit that I had to skip: ones that would not resume from suspend at all, and ones which would not compile. It occurs to me that the latter were mostly in a module (edac I think) which is probably not relevant, and by tweaking my config I could probably eliminate those. The bisect log in full: git bisect start # bad: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35 git bisect bad 9fe6206f400646a2322096b56c59891d530e8d51 # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # bad: [0b94190e1e60f96962b82d35729d7d44cf298ef8] viafb: fix LCD hardware cursor regression git bisect bad 0b94190e1e60f96962b82d35729d7d44cf298ef8 # bad: [0969afcc449d5d655784c04e938cf4cfc6e89c0e] Merge branch 'twl4030-mfd' into for-2.6.33 git bisect bad 0969afcc449d5d655784c04e938cf4cfc6e89c0e # skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype. git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec # skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac # bad: [678ad5d8aaf8925cb8465f84e1e47d9b1284666a] /proc/kcore: fix stat.st_size git bisect bad 678ad5d8aaf8925cb8465f84e1e47d9b1284666a # skip: [a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4] Merge branch 'docs-next' of git://git.lwn.net/linux-2.6 git bisect skip a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4 # skip: [cf33ce15463b784a1d648905fc067fa4d6b17466] net: fix hydra printk format warning git bisect skip cf33ce15463b784a1d648905fc067fa4d6b17466 # skip: [1ed0ce000a6c20c36ec649e32fc24393ef418ed8] KVM: Use pointer to vcpu instead of vcpu_id in timer code. git bisect skip 1ed0ce000a6c20c36ec649e32fc24393ef418ed8 # skip: [b938fb6f491113880ebaabfa06c6446723c702fd] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 git bisect skip b938fb6f491113880ebaabfa06c6446723c702fd # skip: [b1ab7e4b2a88d3ac13771463be8f302ce1616cfc] VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. git bisect skip b1ab7e4b2a88d3ac13771463be8f302ce1616cfc # bad: [e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a] regulator: Add WM831x DC-DC buck convertor support git bisect bad e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a # skip: [d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 git bisect skip d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b # skip: [d22b8ed9a3b0a157b732580258ec16b729265953] Staging: vme: add Tundra TSI148 VME-PCI Bridge driver git bisect skip d22b8ed9a3b0a157b732580258ec16b729265953 # skip: [e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206] PM: Fix typo in label name s/Platofrm_finish/Platform_finish/ git bisect skip e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206 # skip: [e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a] regulator: Add WM831x DC-DC buck convertor support git bisect skip e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a # bad: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32 git bisect bad 22763c5cf3690a681551162c15d34d935308c8d7 # bad: [0b94190e1e60f96962b82d35729d7d44cf298ef8] viafb: fix LCD hardware cursor regression git bisect bad 0b94190e1e60f96962b82d35729d7d44cf298ef8 # bad: [0969afcc449d5d655784c04e938cf4cfc6e89c0e] Merge branch 'twl4030-mfd' into for-2.6.33 git bisect bad 0969afcc449d5d655784c04e938cf4cfc6e89c0e # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype. git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec # skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac # skip: [7e17615c45980fc34d3f7d04bc7063cfc32180ec] MIPS: Get rid of duplicate cpu_idle() prototype. git bisect skip 7e17615c45980fc34d3f7d04bc7063cfc32180ec # skip: [0ae6654da437db4ae6333d232e718b570c7a3eac] sata_promise: disable hotplug on 1st gen chips git bisect skip 0ae6654da437db4ae6333d232e718b570c7a3eac # skip: [a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4] Merge branch 'docs-next' of git://git.lwn.net/linux-2.6 git bisect skip a9bbd210a44102cc50b30a5f3d111dbf5f2f9cd4 # skip: [cf33ce15463b784a1d648905fc067fa4d6b17466] net: fix hydra printk format warning git bisect skip cf33ce15463b784a1d648905fc067fa4d6b17466 # skip: [1ed0ce000a6c20c36ec649e32fc24393ef418ed8] KVM: Use pointer to vcpu instead of vcpu_id in timer code. git bisect skip 1ed0ce000a6c20c36ec649e32fc24393ef418ed8 # skip: [b938fb6f491113880ebaabfa06c6446723c702fd] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 git bisect skip b938fb6f491113880ebaabfa06c6446723c702fd # skip: [b1ab7e4b2a88d3ac13771463be8f302ce1616cfc] VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. git bisect skip b1ab7e4b2a88d3ac13771463be8f302ce1616cfc # skip: [d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 git bisect skip d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b # skip: [d22b8ed9a3b0a157b732580258ec16b729265953] Staging: vme: add Tundra TSI148 VME-PCI Bridge driver git bisect skip d22b8ed9a3b0a157b732580258ec16b729265953 # skip: [e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206] PM: Fix typo in label name s/Platofrm_finish/Platform_finish/ git bisect skip e681c9dd62fe8fcc5bba28a3ca3f7dc8be940206 # skip: [e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a] regulator: Add WM831x DC-DC buck convertor support git bisect skip e4ee831f949a7c7746a56bcf1e7ca057d6f69e2a # bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1 git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c # skip: [5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3] Staging: vt6656: Integrate vt6656 into build system. git bisect skip 5d1fe0c98f2aef99fb57aaf6dd25e793c186cea3 # skip: [b81ad777b9ee66a69dd270a451c214b7e443a0c1] Staging: rtl8192su: remove CONFIG_RTL8192_PM ifdefs git bisect skip b81ad777b9ee66a69dd270a451c214b7e443a0c1 # skip: [cf7474a6f4eda22603591b7d6253dffc224e4784] bnx2: Refine coalescing parameters. git bisect skip cf7474a6f4eda22603591b7d6253dffc224e4784 # skip: [0a85b6f0ab0d2edb0d41b32697111ce0e4f43496] Staging: Comedi: Lindent changes to comdi driver in staging tree git bisect skip 0a85b6f0ab0d2edb0d41b32697111ce0e4f43496 # skip: [c7b50db21fe8c295092518e224d60b95e69da3b0] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() git bisect skip c7b50db21fe8c295092518e224d60b95e69da3b0 # skip: [945b4ac44e5700acd3d974c176c8ace34b4d2e8e] x86/amd-iommu: Dump illegal command on ILLEGAL_COMMAND_ERROR git bisect skip 945b4ac44e5700acd3d974c176c8ace34b4d2e8e # bad: [17d857be649a21ca90008c6dc425d849fa83db5c] Linux 2.6.32-rc1 git bisect bad 17d857be649a21ca90008c6dc425d849fa83db5c # good: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 git bisect good 74fca6a42863ffacaf7ba6f1936a9f228950f657 # skip: [3b87bb640e77023c97cf209e3dd85887a1113ad0] Merge branch 'bjorn-start-stop-2.6.32' into release git bisect skip 3b87bb640e77023c97cf209e3dd85887a1113ad0 # skip: [2a84cb9852f52c0cd1c48bca41a8792d44ad06cc] ACPI: EC: Merge IRQ and POLL modes git bisect skip 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc # skip: [cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4] ACPI: EC: remove .stop() method git bisect skip cf745ec7a1222a661b2c5f0e8c2c4be81300d2a4 # skip: [d02be04707b8ff5375a76c027327e8708877da39] ACPI: EC: remove .start() method git bisect skip d02be04707b8ff5375a76c027327e8708877da39 # good: [f25752e67d9d9ee7562ae9944314dd8c057d3fa2] ACPI: EC: Drop orphan comment git bisect good f25752e67d9d9ee7562ae9944314dd8c057d3fa2 # good: [762caf0baafc657c410b9c04f4a95d4e3aa4dda1] Merge branch 'ec' into release git bisect good 762caf0baafc657c410b9c04f4a95d4e3aa4dda1 # good: [eb27cae8adaa658a0bf31631baa1ce29d8183759] ACPI: linux/acpi.h should not include linux/dmi.h git bisect good eb27cae8adaa658a0bf31631baa1ce29d8183759 # good: [d26f0528d588e596955bf296a609afe52eafc099] Merge branch 'misc-2.6.32' into release git bisect good d26f0528d588e596955bf296a609afe52eafc099 # good: [be90a49ca22a95f184d9f32d35b5247b44032849] Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 git bisect good be90a49ca22a95f184d9f32d35b5247b44032849 # good: [a487b6705a811087c182c8cab7e3b5845dfa6ccb] Merge branch 'for-linus' of git://neil.brown.name/md git bisect good a487b6705a811087c182c8cab7e3b5845dfa6ccb # good: [b9b9df62e7fd6b5f099c24bc867100ab86e1da5a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 git bisect good b9b9df62e7fd6b5f099c24bc867100ab86e1da5a # good: [66b7ed40aaf153d634aabff409a0dda675f37f45] ACPI: remove redundant "handle" and "parent" arguments git bisect good 66b7ed40aaf153d634aabff409a0dda675f37f45 # good: [76e0134f4154aeadac833c2daea32102c64c0bb0] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 git bisect good 76e0134f4154aeadac833c2daea32102c64c0bb0 # good: [3b383767c41be070cae24875789d97b42a3e71a8] Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip git bisect good 3b383767c41be070cae24875789d97b42a3e71a8 # good: [cce1d9f23213f3a8a43b6038df84a665aa8d8612] Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds git bisect good cce1d9f23213f3a8a43b6038df84a665aa8d8612 # good: [e56d953d190061938b31cabbe01b7f3d76c60bd0] ACPI: IA64=y ACPI=n build fix git bisect good e56d953d190061938b31cabbe01b7f3d76c60bd0 # good: [6f5071020d5ec89b5d095aa488db604adb921aec] Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip git bisect good 6f5071020d5ec89b5d095aa488db604adb921aec # good: [569ec4cc779c8aae03a4659939d08822c9e4a242] ACPI: kill "unused variable ‘i’" warning git bisect good 569ec4cc779c8aae03a4659939d08822c9e4a242 # good: [b3b75cef705708402b5d381a30fa17f89e0549b4] alpha: Fix duplicate <asm/thread_info.h> include git bisect good b3b75cef705708402b5d381a30fa17f89e0549b4
please attach the output of acpidump.
Created attachment 32112 [details] output of acpidump output of acpidump. This was generated after booting 2.6.36-rc5, no suspending has been attempted yet.
re: bisect failure Note that you can limit the scope of the bisect, say to drivers/acpi/ and that might help.
Jon, there is no ACPI fan device on this laptop, which means that the fan is either controlled by native driver or BIOS. (In reply to comment #0) > > With (at least) 2.6.35.4, my CPU fan runs at full speed after resuming from > suspend and can not be convinced to climb back down. > what did you do to spin the fan?
Hi Zhang, I am not using a laptop, as per #1, My machine is a desktop PC with an Intel DG35EC motherboard, quad core 2 duo CPU, sblive PCI sound card, ATI graphics card of some sort (quite old). > what did you do to spin the fan? I did nothing more than wake the machine from sleep. The fans are on full blast as soon as the machine powers up from sleep.
I have finally revisited this. The bisect results are patently absurd, since they point at 2.6.32-rc1 (and rc2) as the bad commit, but all that modifies is the version definitions in the Makefile. I rebuilt and tested the top-most good and bottom-most bad commits from the bisect, which are 2.6.32-rc1 and its immediate child. Considering that perhaps the problem was more erratic than I thought, I tried 5 successive suspend/resume cycles. I also tried 2.6.37-rc5+ or HEAD (probably HEAD). All attempts were successful: the fans wound down as they should. All tests were performed from single user mode using "pm-suspend". I would have tried more than 5 attempts for each if I had any negatives. However, moving to multi-user mode from 2.6.32-rc2 in order to write this message, an attempt to resume from suspend initiated from the GNOME menu has the problem occurring. Anyway, this would seem to absolve the kernel at least. Thanks to everyone for your help.
please attach the output of "sensors"
$ sensors coretemp-isa-0000 Adapter: ISA adapter Core 0: +58.0°C (high = +84.0°C, crit = +100.0°C) coretemp-isa-0001 Adapter: ISA adapter Core 2: +61.0°C (high = +84.0°C, crit = +100.0°C) coretemp-isa-0002 Adapter: ISA adapter Core 1: +60.0°C (high = +84.0°C, crit = +100.0°C) coretemp-isa-0003 Adapter: ISA adapter Core 3: +65.0°C (high = +84.0°C, crit = +100.0°C) w83627dhg-isa-06e0 Adapter: ISA adapter Vcore: +0.92 V (min = +0.00 V, max = +1.74 V) in1: +1.55 V (min = +1.22 V, max = +0.95 V) ALARM AVCC: +3.04 V (min = +1.41 V, max = +2.75 V) ALARM VCC: +3.04 V (min = +1.54 V, max = +1.28 V) ALARM in4: +1.01 V (min = +0.31 V, max = +1.45 V) in5: +1.38 V (min = +0.10 V, max = +0.54 V) ALARM in6: +0.34 V (min = +1.43 V, max = +0.76 V) ALARM 3VSB: +3.04 V (min = +1.31 V, max = +0.14 V) ALARM Vbat: +3.04 V (min = +2.70 V, max = +0.06 V) ALARM fan1: 811 RPM (min = 3515 RPM, div = 16) ALARM fan2: 1360 RPM (min = 615 RPM, div = 16) fan3: 0 RPM (min = 254 RPM, div = 64) ALARM fan4: 0 RPM (min = 7031 RPM, div = 16) ALARM temp1: +85.0°C (high = +92.0°C, hyst = +88.0°C) sensor = thermistor temp2: +74.0°C (high = +80.0°C, hyst = +75.0°C) sensor = diode temp3: -128.0°C (high = +80.0°C, hyst = +75.0°C) sensor = diode cpu0_vid: +0.000 V In other news, I have just had a successful resume with 2.6.36-rc5 and radeon.modeset=0. Just one, I need to try another four.
Jon, do you have any software-driven fan speed control daemon running by any chance? For example lm-sensors's fancontrol script? When the fans go to full speed, does it reflect in the RPM values shown by "sensors"? Please attach the output of: $ (cd /sys/devices/platform/w83627* && grep . *) both when the fans are running at normal speed and when the fans are running at full speed.
I don't have any software-driven fan speed control scripts/daemons running. I've managed to resume and have the fans at their proper speed five times now with 2.6.36, radeon.modeset=0, and removing that kopt has the fans at full speed again. Glad we're starting to get somewhere... I'm just diffing sensors output before/after suspend/resume, I'll report back. I'll attach this now (cd /sys/devices/platform/w83627* && grep . *) > fans_full_speed.txt
Created attachment 40662 [details] /sys/devices/platform/w83627ehf.1760 output output of (cd /sys/devices/platform/w83627* && grep . *) after resume with 2.6.36, radeon KMS, and fans running full speed.
Created attachment 40722 [details] /sys/devices/platform/w83627* output, before suspend fans low
Created attachment 40732 [details] /sys/devices/platform/w83627* fans high, post resume
Inline, diff of "sensors" output, before and after suspend (fans low before, fans high after). If anything, the reported RPM values are lower after resume. --- before 2010-12-18 11:10:18.000000000 +0000 +++ after 2010-12-18 16:34:57.000000000 +0000 @@ -1,42 +1,42 @@ lm63-i2c-1-4c Adapter: Radeon i2c bit bus 0x90 -temp1: +56.0°C (high = +70.0°C) -temp2: +62.9°C (low = +0.0°C, high = +70.0°C) - (crit = +100.0°C, hyst = +95.0°C) +temp1: +44.0°C (high = +70.0°C) +temp2: +40.8°C (low = +0.0°C, high = +70.0°C) + (crit = +85.0°C, hyst = +75.0°C) w83627dhg-isa-06e0 Adapter: ISA adapter -Vcore: +0.96 V (min = +0.00 V, max = +1.74 V) -in1: +1.54 V (min = +1.22 V, max = +0.95 V) ALARM -AVCC: +3.04 V (min = +1.41 V, max = +2.75 V) ALARM -VCC: +3.02 V (min = +1.54 V, max = +1.30 V) ALARM -in4: +0.99 V (min = +0.31 V, max = +1.45 V) -in5: +1.37 V (min = +0.36 V, max = +0.54 V) ALARM -in6: +0.36 V (min = +1.43 V, max = +0.76 V) ALARM +Vcore: +0.92 V (min = +0.00 V, max = +1.74 V) +in1: +1.55 V (min = +1.22 V, max = +0.95 V) ALARM +AVCC: +3.04 V (min = +1.44 V, max = +2.77 V) ALARM +VCC: +3.04 V (min = +1.54 V, max = +3.33 V) +in4: +1.00 V (min = +0.31 V, max = +1.51 V) +in5: +1.38 V (min = +0.10 V, max = +0.67 V) ALARM +in6: +0.35 V (min = +1.46 V, max = +1.78 V) ALARM 3VSB: +3.04 V (min = +1.33 V, max = +0.14 V) ALARM -Vbat: +3.02 V (min = +2.70 V, max = +0.06 V) ALARM -fan1: 1339 RPM (min = 3515 RPM, div = 16) ALARM -fan2: 2057 RPM (min = 615 RPM, div = 16) +Vbat: +3.60 V (min = +2.70 V, max = +0.06 V) ALARM +fan1: 1068 RPM (min = 390 RPM, div = 16) +fan2: 1534 RPM (min = 615 RPM, div = 16) fan3: 0 RPM (min = 257 RPM, div = 128) ALARM -fan4: 0 RPM (min = 42187 RPM, div = 32) ALARM -temp1: +82.0°C (high = +94.0°C, hyst = +120.0°C) sensor = thermistor -temp2: +84.0°C (high = +80.0°C, hyst = +75.0°C) ALARM sensor = diode -temp3: -94.0°C (high = +80.0°C, hyst = +75.0°C) sensor = diode +fan4: 0 RPM (min = 10546 RPM, div = 128) ALARM +temp1: +83.0°C (high = +94.0°C, hyst = +88.0°C) sensor = thermistor +temp2: +76.0°C (high = +80.0°C, hyst = +75.0°C) sensor = diode +temp3: -76.5°C (high = +80.0°C, hyst = +75.0°C) sensor = diode cpu0_vid: +0.000 V coretemp-isa-0000 Adapter: ISA adapter -Core 0: +61.0°C (high = +84.0°C, crit = +100.0°C) +Core 0: +60.0°C (high = +84.0°C, crit = +100.0°C) coretemp-isa-0001 Adapter: ISA adapter -Core 2: +70.0°C (high = +84.0°C, crit = +100.0°C) +Core 2: +66.0°C (high = +84.0°C, crit = +100.0°C) coretemp-isa-0002 Adapter: ISA adapter -Core 1: +66.0°C (high = +84.0°C, crit = +100.0°C) +Core 1: +62.0°C (high = +84.0°C, crit = +100.0°C) coretemp-isa-0003 Adapter: ISA adapter -Core 3: +75.0°C (high = +84.0°C, crit = +100.0°C) +Core 3: +68.0°C (high = +84.0°C, crit = +100.0°C)
We can see that many values change after resume. In the case of input values, it is expected that they are slightly different, either because the machine had the time to cool down (for temperatures) or simply because there's always some variations in monitored values (voltages.) I am a little curious about Vbat though, as +3.60V seems impossible. Does Vbat stick to this value forever after resuming, or does it get back to a more reasonable reading after some time? For fans, the situation is different: the reported speeds are significantly lower after resume. This could be explained by the fact that the W83627DHG is programmed to adjust the fan speeds depending on temperature. The changing limits are a bug, apparently the limit registers aren't preserved during suspend, so the driver should save them and restore them at resume time. I can write a patch doing that if you are interested in testing it. That being said, this will not solve your problem. The readings from "sensors" are pretty clear that the noisy fan is neither the system fan nor the CPU fan. And the fact that booting with radeon.modeset=0 solves the problem clearly points to the graphics adapter's fan. Oddly enough, the LM63 monitoring chip on that adapter does only report temperature and not the fan speed (the fan speed monitoring pin must have been configured to its alternate usage which is alert output). It is possible that the LM63 controls the speed of the fan. Please report the output of: $ (cd /sys/bus/i2c/devices/1-004c/ && grep . *) before and after a "failed" suspend/resume.
If you have the possibility to build a kernel with CONFIG_HWMON_DEBUG_CHIP=y, it would also be interesting to see what the lm63 driver says when loaded.
> Please report the output of: > $ (cd /sys/bus/i2c/devices/1-004c/ && grep . *) > before and after a "failed" suspend/resume. Before: $ (cd /sys/bus/i2c/devices/1-004c/ && grep . *) alarms:0 modalias:i2c:lm63 name:lm63 pwm1:12 pwm1_enable:2 temp1_input:56000 temp1_max:70000 temp1_max_alarm:0 temp2_crit:100000 temp2_crit_alarm:0 temp2_crit_hyst:95000 temp2_fault:0 temp2_input:63000 temp2_max:70000 temp2_max_alarm:0 temp2_min:0 temp2_min_alarm:0 uevent:DRIVER=lm63 uevent:MODALIAS=i2c:lm63 After: $ (cd /sys/bus/i2c/devices/1-004c/ && grep . *) alarms:0 modalias:i2c:lm63 name:lm63 pwm1:0 pwm1_enable:2 temp1_input:55000 temp1_max:70000 temp1_max_alarm:0 temp2_crit:85000 temp2_crit_alarm:0 temp2_crit_hyst:75000 temp2_fault:0 temp2_input:51875 temp2_max:70000 temp2_max_alarm:0 temp2_min:0 temp2_min_alarm:0 uevent:DRIVER=lm63 uevent:MODALIAS=i2c:lm63 Kernel: 2.6.36-rc5 (one of the latest ones I had pre-built that works enough to get online)
> If you have the possibility to build a kernel with CONFIG_HWMON_DEBUG_CHIP=y, > it would also be interesting to see what the lm63 driver says when loaded. I will try that, thanks. > That being said, this will not solve your problem. The readings from > "sensors" > are pretty clear that the noisy fan is neither the system fan nor the CPU > fan. > And the fact that booting with radeon.modeset=0 solves the problem clearly > points to the graphics adapter's fan The radeon card is a relatively recent addition to the machine (within the last 18 months). I've had the machine about three years. I'm pretty sure I remember the loud fan (that remains on post suspend) from before I put the card in, but I will try taking the card out and see what happens, to be sure.
Kernel: 2.6.37-rc7+ (e819eb8687767cefca7b6abf5ac6d5efcf581eeb) CONFIG_HWMON_DEBUG_CHIP=y no output on stderr or kernel ring buffer for "modprobe lm63" output of "grep . *" in /sys/bus/i2c/drivers/lm63/1-004c, before and after a suspend/resume cycle. before: alarms:0 modalias:i2c:lm63 name:lm63 pwm1:12 pwm1_enable:2 temp1_input:56000 temp1_max:70000 temp1_max_alarm:0 temp2_crit:100000 temp2_crit_alarm:0 temp2_crit_hyst:95000 temp2_fault:0 temp2_input:61875 temp2_max:70000 temp2_max_alarm:0 temp2_min:0 temp2_min_alarm:0 uevent:DRIVER=lm63 uevent:MODALIAS=i2c:lm63 after: alarms:0 modalias:i2c:lm63 name:lm63 pwm1:0 pwm1_enable:2 temp1_input:54000 temp1_max:70000 temp1_max_alarm:0 temp2_crit:85000 temp2_crit_alarm:0 temp2_crit_hyst:75000 temp2_fault:0 temp2_input:57125 temp2_max:70000 temp2_max_alarm:0 temp2_min:0 temp2_min_alarm:0 uevent:DRIVER=lm63 uevent:MODALIAS=i2c:lm63 The fan (whichever it is) is on full-speed after suspend.
sensors output, post resume with 2.6.37-rc7+ (e819eb8687767cefca7b6abf5ac6d5efcf581eeb) and CONFIG_HWMON_DEBUG_CHIP=y, after "modprobe w83627ehf" (did I mention I had to manually probe that?) $ sensors lm63-i2c-1-4c Adapter: Radeon i2c bit bus 0x90 temp1: +50.0°C (high = +70.0°C) temp2: +45.8°C (low = +0.0°C, high = +70.0°C) (crit = +85.0°C, hyst = +75.0°C) w83627dhg-isa-06e0 Adapter: ISA adapter Vcore: +0.89 V (min = +0.00 V, max = +1.74 V) in1: +1.55 V (min = +1.22 V, max = +1.98 V) AVCC: +3.04 V (min = +1.44 V, max = +2.77 V) ALARM VCC: +3.04 V (min = +1.54 V, max = +1.28 V) ALARM in4: +1.00 V (min = +0.31 V, max = +1.53 V) in5: +1.38 V (min = +0.10 V, max = +0.69 V) ALARM in6: +0.35 V (min = +1.46 V, max = +1.78 V) ALARM 3VSB: +3.04 V (min = +1.33 V, max = +0.14 V) ALARM Vbat: +3.02 V (min = +2.70 V, max = +0.06 V) ALARM fan1: 869 RPM (min = 390 RPM, div = 16) fan2: 1360 RPM (min = 615 RPM, div = 16) fan3: 0 RPM (min = 254 RPM, div = 64) ALARM fan4: 0 RPM (min = 6490 RPM, div = 16) ALARM temp1: +83.0°C (high = +94.0°C, hyst = +88.0°C) sensor = thermistor temp2: +72.0°C (high = +80.0°C, hyst = +75.0°C) sensor = diode temp3: -74.0°C (high = +80.0°C, hyst = +75.0°C) sensor = diode cpu0_vid: +0.000 V I'll give it a little while and then check to see if VBat changes, as requested.
VBat has held at +3.02 V.
Well, my memory clearly has failed me -- I've taken the radeon out temporarily and immediately on boot I notice that the noisy fan is gone - so it *is* the radeon fan. noise levels are pretty much the same before and after resume (same kernel as last few comments, e819eb8687767cefca7b6abf5ac6d5efcf581eeb) I've taken the liberty of changing the bug state to "NEEDINFO" rather than "INVALID", I hope that's ok...
Specifics about the radeon card: (via Xorg.0.log) (--) RADEON(0): Chipset: "ATI Radeon X850 XT (R480) (PCIE)" (ChipID = 0x5d52) ... (II) AIGLX: Loaded and initialized /usr/lib/dri/r300_dri.so Xorg.0.log output after resume (with fan on full): (II) AIGLX: Suspending AIGLX clients for VT switch (II) Open ACPI successful (/var/run/acpid.socket) (II) AIGLX: Resuming AIGLX clients after VT switch (II) RADEON(0): EDID vendor "IVM", prod id 22027 (II) RADEON(0): Using hsync ranges from config file (II) RADEON(0): Using vrefresh ranges from config file (II) RADEON(0): Printing DDC gathered Modelines: (II) RADEON(0): Modeline "1920x1080"x0.0 148.50 1920 2008 2052 2200 1080 1084 1089 1125 +hsync +vsync (67.5 kHz) (II) RADEON(0): Modeline "800x600"x0.0 40.00 800 840 968 1056 600 601 605 628 +hsync +vsync (37.9 kHz) (II) RADEON(0): Modeline "800x600"x0.0 36.00 800 824 896 1024 600 601 603 625 +hsync +vsync (35.2 kHz) (II) RADEON(0): Modeline "640x480"x0.0 31.50 640 656 720 840 480 481 484 500 -hsync -vsync (37.5 kHz) (II) RADEON(0): Modeline "640x480"x0.0 31.50 640 664 704 832 480 489 492 520 -hsync -vsync (37.9 kHz) (II) RADEON(0): Modeline "640x480"x0.0 30.24 640 704 768 864 480 483 486 525 -hsync -vsync (35.0 kHz) (II) RADEON(0): Modeline "640x480"x0.0 25.18 640 656 752 800 480 490 492 525 -hsync -vsync (31.5 kHz) (II) RADEON(0): Modeline "720x400"x0.0 28.32 720 738 846 900 400 412 414 449 -hsync +vsync (31.5 kHz) (II) RADEON(0): Modeline "1280x1024"x0.0 135.00 1280 1296 1440 1688 1024 1025 1028 1066 +hsync +vsync (80.0 kHz) (II) RADEON(0): Modeline "1024x768"x0.0 78.75 1024 1040 1136 1312 768 769 772 800 +hsync +vsync (60.0 kHz) (II) RADEON(0): Modeline "1024x768"x0.0 75.00 1024 1048 1184 1328 768 771 777 806 -hsync -vsync (56.5 kHz) (II) RADEON(0): Modeline "1024x768"x0.0 65.00 1024 1048 1184 1344 768 771 777 806 -hsync -vsync (48.4 kHz) (II) RADEON(0): Modeline "832x624"x0.0 57.28 832 864 928 1152 624 625 628 667 -hsync -vsync (49.7 kHz) (II) RADEON(0): Modeline "800x600"x0.0 49.50 800 816 896 1056 600 601 604 625 +hsync +vsync (46.9 kHz) (II) RADEON(0): Modeline "800x600"x0.0 50.00 800 856 976 1040 600 637 643 666 +hsync +vsync (48.1 kHz) (II) RADEON(0): Modeline "1152x864"x0.0 108.00 1152 1216 1344 1600 864 865 868 900 +hsync +vsync (67.5 kHz) (II) RADEON(0): Modeline "1280x960"x0.0 108.00 1280 1376 1488 1800 960 961 964 1000 +hsync +vsync (60.0 kHz) (II) RADEON(0): Modeline "1280x1024"x0.0 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync (64.0 kHz) (II) RADEON(0): Modeline "1440x900"x0.0 88.75 1440 1488 1520 1600 900 903 909 926 +hsync -vsync (55.5 kHz) (II) RADEON(0): Modeline "1440x900"x0.0 136.75 1440 1536 1688 1936 900 903 909 942 -hsync +vsync (70.6 kHz) (II) RADEON(0): Modeline "1680x1050"x0.0 119.00 1680 1728 1760 1840 1050 1053 1059 1080 +hsync -vsync (64.7 kHz) dmesg output post suspend, grepping for radeon [ 3118.283789] Back to C! snip (to establish resume tick) [ 3118.615004] radeon 0000:01:00.0: restoring config space at offset 0xf (was 0x1ff, writing 0x10b) [ 3118.615010] radeon 0000:01:00.0: restoring config space at offset 0xc (was 0x0, writing 0xfffe0000) [ 3118.615016] radeon 0000:01:00.0: restoring config space at offset 0x8 (was 0x1, writing 0x3001) [ 3118.615020] radeon 0000:01:00.0: restoring config space at offset 0x6 (was 0x4, writing 0x90100004) [ 3118.615025] radeon 0000:01:00.0: restoring config space at offset 0x4 (was 0xc, writing 0x8000000c) [ 3118.615029] radeon 0000:01:00.0: restoring config space at offset 0x3 (was 0x800000, writing 0x800010) [ 3118.615033] radeon 0000:01:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407) [ 3118.615917] radeon 0000:01:00.0: setting latency timer to 64 [ 3118.615926] radeon 0000:01:00.0: f6c15800 unpin not necessary [ 3118.643391] [drm] radeon: 4 quad pipes, 1 z pipes initialized. [ 3118.643395] radeon 0000:01:00.0: WB enabled [ 3118.643415] [drm] radeon: ring at 0x0000000060001000
Jon, sorry for the long silence, I was on vacation. (In reply to comment #23) > Kernel: 2.6.37-rc7+ (e819eb8687767cefca7b6abf5ac6d5efcf581eeb) > CONFIG_HWMON_DEBUG_CHIP=y > > no output on stderr or kernel ring buffer for "modprobe lm63" Most probably because the lm63 module was already loaded. Try "dmesg | grep lm63" after boot, or "rmmod lm63 && modprobe lm63" and look again. > output of "grep . *" in /sys/bus/i2c/drivers/lm63/1-004c, before and after a > suspend/resume cycle. > > before: > > pwm1:12 > temp2_crit:100000 > temp2_crit_hyst:95000 > > after: > > pwm1:0 > temp2_crit:85000 > temp2_crit_hyst:75000 As you can see, the critical limit for the remote sensor of the LM63 on your graphics adapter changed. 85/75°C are the hardware defaults. The lm63 driver doesn't currently preserve limits over suspend/resume. That being said, I'm unsure if this explains your actual problem, as the measured temperature is still below the new limit, so that wouldn't cause the fan to kick in. But it might be a similar problem, for example with the automatic fan speed lookup table. The lm63 driver doesn't support this feature yet. (In reply to comment #26) > Well, my memory clearly has failed me -- I've taken the radeon out > temporarily > and immediately on boot I notice that the noisy fan is gone - so it *is* the > radeon fan. I knew it :p > I've taken the liberty of changing the bug state to "NEEDINFO" rather than > "INVALID", I hope that's ok... Not really. Your problem is entirely different from the original bug. This isn't a regression, and you don't have an Intel DG35EC motherboard. So, now that we have clarified what your actual problem is, it would be much better if you would create a _new_, clean bug in the right section, so that the right people (radeon KMS driver maintainers) start looking into it. Feel free to include me in the Cc list, in case I can help with the lm63 driver.
> Not really. Your problem is entirely different from the original bug. > This isn't a regression, and you don't have an Intel DG35EC motherboard. I do have that motherboard, but yes, it proved to be irrelevant, I see. > So, now that we have clarified what your actual problem is, it would be > much better if you would create a _new_, clean bug in the right section, > so that the right people (radeon KMS driver maintainers) start looking > into it. Thank you for the suggestion. I didn't realise how you folks handle these things. In my community (Debian), we'd treat the bug as the constant, and alter what it was filed against as we clarified which component was responsible. It's simply a different way of doing things, I see. Thank you for all your help.