The newer kernels read the temperature of my Radeon card, and report that it is running at around 80 C on an idle desktop. For comparison, my CPU is only 34 C.
To make sure something in the desktop wasn't causing it, I booted the kernel with init=/bin/bash and the temperature still rose to 83 C. My guess is this is a defect in the firmware, or the driver interface to it, and it is running in an infinite loop instead of going idle.
What kernel versions were OK? 2.6.37?
It isn't a regression. Older kernels did not report the temperature at all. I noticed an odd pattern today as well. After a cold boot, the temperature runs up to 80+, but after suspending and resuming, it remains around 66.
Is this the same as bug #29572?
No.
Rafael, why is this marked as a regression? The reporter explicitly stated it was not.
Presumably by mistake. Sorry.
My system is also impacted, noticed since upgrading to Ubuntu Natty. Kernel version 2.6.38-8.41-generic. Under no load my video card is reporting 82 degrees Celsius. The graphics card is a ATI Technologies Inc Barts PRO [ATI Radeon HD 6800 Series]. I'm using the radeon kernel module with the radeondrmfb frame buffer device. I think it's related to putting the console in framebuffer mode, the card is quiet in text mode.
# echo low > /sys/class/drm/card0/device/power_profile "resolves" the issue. Default power management settings for KMS put the card in high performance mode on AC power. # echo dynpm > /sys/class/drm/card0/device/power_method Dynamic frequency scaling may work for you, though the screen flashes when power levels change. Still seems to run too hot. I'm sticking to low for most purposes. This page was very helpful: https://wiki.archlinux.org/index.php?title=ATI&oldid=135045
This seems to be regression in my case. Mobile FireGL V5250, temperature reading from thinkpad-acpi: 2.6.37.2, KMS, profile=default/high - temperature=67 2.6.37.2, KMS, profile=mid - temperature=64 2.6.38.2, KMS, profile=default/high - temperature=71 2.6.38.2, KMS, profile=mid - temperature=64 Some older kernels and windows with default clocks for GPU - temperature=67
High temperature of mobile radeon is back to normal with pcie_aspm=force.
Can you attach dmesg output from your system with the 3.0-rc4 kernel, please?
Created attachment 64452 [details] dmesg output for 3.0-rc5 kernel
Commit "PCI: Rework ASPM disable code" added in 3.0.20 and 3.2.25 has worsened the situation. I can't enable ASPM on ThinkPad T60 now even with "pcie_aspm=force" kernel parameter. So radeon is always hot now. 3.2.4 kernel: # dmesg | grep ASPM [ 0.000000] PCIe ASPM is forcibly enabled [ 0.161612] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [ 3.612673] e1000e 0000:02:00.0: Disabling ASPM L0s L1 # lspci -vv -s 01:00.0 | grep ASPM LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ # cat /proc/acpi/ibm/thermal temperatures: 49 41 37 68 36 -128 33 -128 42 54 55 -128 -128 -128 -128 -128 3.2.5 kernel: # dmesg | grep ASPM [ 0.000000] PCIe ASPM is forcibly enabled [ 0.161614] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [ 3.523647] e1000e 0000:02:00.0: Disabling ASPM L0s L1 # lspci -vv -s 01:00.0 | grep ASPM LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ # cat /proc/acpi/ibm/thermal temperatures: 51 41 37 72 36 -128 33 -128 43 55 59 -128 -128 -128 -128 -128 Already tested kernels 3.2.7 and 3.3-rc4 - same problem.
Tested 3.3.0 kernel today and nothing changes. So I look deeper into ASPM registers: ---- Windows XP and Linux kernel prior 2.6.38: root complex 00:01.0 0xB0 == 0x03 (L1 and L0s) video card 01:00.0 0x68 == 0x43 (L1 and L0s) ---- Linux 3.2.4: 00:01.0 0xB0 == 00 (L0 only) 01:00.0 0x68 == 40 (L0 only) with pcie_aspm=force: 00:01.0 0xB0.b=43 (L1 and L0s) 01:00.0 0x68.b=43 (L1 and L0s) ---- Linux 3.2.5 and 3.3.0: 00:01.0 0xB0.b=40 (L0 only) 01:00.0 0x68.b=40 (L0 only) with pcie_aspm=force: 00:01.0 0xB0.b=40 (L0 only) 01:00.0 0x68.b=40 (L0 only) ---- Also I have working ASPM for network devices (ethernet and wireless) with Windows XP and kernels prior 2.6.38. But after 2.6.38 ASPM doesn't turn on even with force key for network devices. And after another rework of ASPM code in 3.0.20, 3.2.5 and 3.3 kernels ASPM doesn't turn on for video card despite force key. I can enable ASPM on my devices with setpci: setpci -s 00:01.0 0xB0.b=0x3:3 setpci -s 01:00.0 0x68.b=0x3:3 It works without problems, like it works prior 2.6.38 kernel. But, in my opinion, ASPM handling code in Linux definately needs another rework.
Your network driver is explicitly turning off ASPM, so that's completely unrelated to the core ASPM handling code. pcie_aspm=force will only enable ASPM handling, it won't change the policy. If your BIOS didn't enable L1 and you want L1 enabled, you have to set the policy to powersave.
To Matthew Garrett: I agree about network cards, but current situation with video card worries me much. Prior 3.0.20, 3.2.5 and 3.3 kernels, users of ThinkPads T60 can simply add key "pcie_aspm=force" to kernel and get ASPM working for their radeon card. But after your last patch to ASPM code we can't get ASPM working simply by keys or sysfs. "pcie_aspm=force" does nothing, but the ability to change policy. But changing policy also does nothing! I tried to change to powersave and then watch into registers - I got the same 0x40 values. So now direct change registers with setpci is our only choice to get ASPM working. And so it should not be.