I have a small fanless Atom330+ION box with Asrock ION330 motherboard. It is installed in a passively cooled case with hard TDP limits. The box has been well tested and runs perfectly with kernels up to 3.4. The CPU temperature stays at about 40-45 degrees Celsius and up to 58-60 at peak load. After upgrading the kernel to 3.6 the CPU temperature of both atom cores rises to 60 within 1 minute after boot and exceeds 80 under load. The heat spreads to other hardware components and my HDD SMART now has permanent overheat warnings (which makes the warranty void). The sympthoms were well described in this article at Phoronix: http://www.phoronix.com/scan.php?page=article&item=linux_power_20&num=2 I can confirm that it is true. The article says that kernel 3.7 (which I haven't yet tested) still has this regression. It may be similar or related to bug 48721, but in my case the hardware is different. The video is a built-in Nvidia ION, driven by nvidia proprietary driver. The nvidia driver 310.19 (latest) and all other software packages (XBMC) remained the same before and after the bad kernel upgrade.
Probably not the same as bug 48721 b/c that seems to be related to use of Intel grahics driver, and here you've got Nvidia graphics. Please run powertop on both the working and failing configurations and report what C-states are being used, and also what P-states are being used. note that powertop can be used with the --html option to create a file you can attach to this bug report.
Finally I have found some time to test it again. I have made 3 powertop reports and corresponding sensors output 1) - kernel 3.4.8-pae-slava1 2) - kernel 3.4.8-pae-alt1 (stock distro kernel with pae) 3) - kernel 3.5.7-pae-alt1 (stock distro kernel with pae)
Created attachment 91871 [details] Kernel 3.5.7 which overheats the system All kernels following 3,5,x demonstrate identical overheating problem.
Created attachment 91881 [details] kernel3.5.7 temperature sensors readings Here the system is perfectly idle. The data was taken immediately after a cold boot. If I run any applications the temperatures jump higher.
Created attachment 91891 [details] kernel 3.4.8 which is nice and cool
Created attachment 91901 [details] kernel3.4.8 temperature sensors readings Taken at idle after a cold boot. All conditions identical to the kernel3.5.7 sensors reading except the kernel versions.
Created attachment 91911 [details] custom kernel 3.4.8 which runs best This kernel is a tweaked version of the other 3.4.8 kernel. It shows better or similar thermal performance but seems a bit more responsive.
Created attachment 91921 [details] custom 3,4,8 kernel config patch to show the difference from the stock This patch shows all difference between the two 3.4.8 kernels.
I also tried later kernels both optimized for Atom and lower latency and distro stock. They all overheat and the temperatures are not visibly different from the 3,5,7. Kernels optimized for the Atom CPU behave similar to non-optimized generic Pentium4 kernels.
please attach the dmesg output for both 3.4.8 and 3.5.7.
Created attachment 98631 [details] dmesg output with kernel 3.4.8
please attach the acpidump output of this box.
There is no such command. Web search gave me only dead links man pages and debs, but I run an rpm based distribution. Where can I download this tool?
Created attachment 98911 [details] acpidump source please build it and run acpidump > acpidump.out with root privilege.
Created attachment 99051 [details] Result of running acpidump
NO ACPI Fan/Thermal control on this platform. Len, can you please continue to look at this problem please?
> nvidia: module license 'NVIDIA' taints kernel. please contact nvidia for support, or re-open when you can reproduce this w/o their proprietary software.
1) Th purpose of this machine is to play sound and video. it is physically impossible to install a different video card. There is no other a/v output. It is impossible to run the box without nvidia driver. HOWEVER, 2) The bug is reproducible by booting different versions of the linux kernel while running THE SAME nvidia blob, i.e. the system can run fine with the nvidia driver it uses. I did not change, reinstall, update or did anything with the nvidia driver before and after the kernel swap which triggered the problem. Only the kernel<->blob interface got rebuilt, but it was the same source rpm package for both good and bad kernels. From this I conclude that it is the kernel to blame and not nvidia. I need good justification to ask nvidia support and I need some evidence that their driver has something to do with the problem. Could you give me any such evidence / tips how to get the truly relevant technical info to make such request?
Nvidia have their source code and can read ours, the reverse is not true. Only they can help you.
While this statement is true, it is irrelevant to this technical issue. On one hand I see the fact that The system runs perfectly well and cool with an old kernel and goes mad with a newer kernel while the nvidia code remains the same, identical, constant, unchanged, frozen..... It is a (possibly superficial) evidence that nvidia is not involved in the problem, unless there are some changes in the kernel that break mutual compatibility. On the other hand I have nothing to support the claim that nvidia driver is defective in this area. Nvidia support will simply send me back to you and I am unable to make them do anything better. We probably need to file such request together with relevant technical reference. BTW. ION1 is a popular, but discontinued product since ION2. Nvidia tends to drop support for such. Still ION seems to be the only way to build a home theater grade _silent_ HD Audio/Video player box _with_no_moving_parts_. It is a whole class of devices where linux OS (used to?) have an edge over other OSes.