Bug 8866
Summary: | k8temp sensor displays wrong temperature | ||
---|---|---|---|
Product: | Drivers | Reporter: | Andrey Panov (panov) |
Component: | Hardware Monitoring | Assignee: | Rudolf Marek (r.marek) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, drescherjm, herrmann.der.user, jdelvare, just.for.lkml, w.kless |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22.1 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Andrey Panov
2007-08-08 16:35:44 UTC
I have the same problem with a BE-2400. Even under load the display form k8temp stays at ~24..26C and I'm using a passive cooler that on touch feels definitely warmer that 37C. I looked for documentation of the PCI function 0xE4 from Device 0x1103 and found the PDFs at amd.com #32559, don't know the URL, I downloaded it nearly a year ago. It shows on page 175..177 under section 4.6.23 that bits 24 to 28 are used for a "TjOffset" to calculate the real temperature as Tcontrol = CurTmp - TjOffset*2 - 49 with CurTmp being the bits that are currently used by k8temp. As all bits 24..28 are zero for my CPU this did not help. #31116 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf Says about 0xE4 on page 242: 14:8 DiodeOffset. Read-only. Reset: value varies by product. This field is used to specify the correction value applied to thermal diode measurements. See also section 2.10.2 [Thermal Diode] on page 110. It is encoded as follows: 00h is undefined. 01h to 3Fh: correction = +11C - DiodeOffset, or {01h to 3Fh} = {+10C to -52C}. 40h to 7Fh: undefined. The DiodeOffset is also metioned in #32559, but that PDF is a little unclear about this and in that version bit 14 is not included in it. If I change this line in k8temp from: #define TEMP_FROM_REG(val) (((((val) >> 16) & 0xff) - 49) * 1000) to: #define TEMP_FROM_REG(val) (((((val) >> 16) & 0xff) + 11 - (((val)>>8) & 0x3f) ) * 1000) I get much better values: temp1_input:56000 temp1_input_raw:4e613a temp2_input:47000 temp2_input_raw:45617a temp3_input:53000 temp3_input_raw:4b613e temp4_input:46000 temp4_input_raw:44a17e (values direct from sysfs. The _raw version is direct value as read with pci_read_config_dword) The above change is obvious wrong for DiodeOffset==0 and might be wrong for the barcelona / phenom cores if bit 14 is set, but at least for a significant number of current K8 cores such an adjustment for the DiodeOffest!=0 case would make much sense. There is even an extra fat note the k8temp is partly broken on http://www.lm-sensors.org/wiki/Devices , so I suspect a large number of people fall about that. my /proc/cpuinfo: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) X2 Dual Core Processor BE-2400 stepping : 2 cpu MHz : 2300.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch bogomips : 4727.07 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc 100mhzsteps processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) X2 Dual Core Processor BE-2400 stepping : 2 cpu MHz : 2300.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch bogomips : 4727.07 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc 100mhzsteps Just ask, if more information is needed. Hi, Well it seems that putting diode offset may get values back to reasonable values, question is if this correction applies for the digital thermal sensor. I never succeeded to get any reasonable answer from AMD. Another problem is that internal connection to sensor is broken in microprocessor and this is true for all revF CPUs so far (and even barcelonas). Please check the the Erratum 141. In the meanwhile I will try to contact AMD again and get some feedback. Rudolf Hi again, I tried to get more information out of AMD, and there is no reliable way how to fix it with the diode offset, at least I will change the driver to print some warning. Rudolf Code fix? Where? I can't see any code change upstream. Sorry was under the impression the driver warning had been implemented I am not aware of any such patch. If it exists, it's not in mainline. Please reopen this bug. I'm working on that. Expect this soon. Rudolf Any progress on this. I saw this today on a 2.6.26 kernel and an Asus M2N with an X2 cpu 5600+ 65W version. # sensors k8temp-pci-00c3 Adapter: PCI adapter core0 temp: +6.0 C temp2: +0.0 C core1 temp: +7.0 C temp4: -9.0 C it8716-isa-0290 Adapter: ISA adapter VCore: +0.99 V (min = +0.00 V, max = +4.08 V) VDDR: +3.17 V (min = +0.00 V, max = +4.08 V) +5V: +4.70 V (min = +0.00 V, max = +6.85 V) +12V: +11.71 V (min = +0.00 V, max = +16.32 V) 5VSB: +4.70 V (min = +0.00 V, max = +6.85 V) VBat: +2.98 V CPU Fan: 2566 RPM (min = 0 RPM) Chassis Fan 1: 0 RPM (min = 0 RPM) Power Supply Fan: 0 RPM (min = 0 RPM) CPU Temp: +25.0 C (low = -1.0 C, high = +127.0 C) sensor = thermal diode MB Temp: +38.0 C (low = -1.0 C, high = +127.0 C) sensor = transistor MB Temp: -6.0 C (low = -1.0 C, high = +127.0 C) sensor = transistor cpu0_vid: +1.100 V The k8temp patches that went in 2.6.29 should make the temperature readings look slightly better. Nevertheless the thermal sensors are unreliable on these CPUs, this is a hardware problem and no software workaround is possible. Looking at k8temp.c from 2.6.29-rc4 I see a fixed temp_offset of 21000, aka 21 °C. My system is model 107 == 0x6b and would trigger this fix: vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) X2 Dual Core Processor BE-2400 stepping : 2 With my modified formula from comment #1 I see what I think are correct readings: f71882fg-isa-0600 Adapter: ISA adapter CPU: +34 C (high = +85 C, hyst = +81 C) k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +42 C Core0 Temp: +33 C Core1 Temp: +38 C Core1 Temp: +29 C The second temperature from Core0 seems to match the CPU-temp from the external sensors, albeit the change is somewhat slower. But the raw output from the pci register was 37217a at that point where the reading was at 33°C. That means my CPU/Board would need an Adjustment of 27000, not 21000. I would still strongly suggest to read the adjustment from the pci register instead of hardcoding it. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf On page 242 is the formula I'm using, including that the offset should be read from the same pci register as the raw temperature values. (I would assume this offset is board specific and my MSI K9AG has a different offset then the board of Andreas Herrmann, the author of the patch that added the offset) You did notice that Andreas Herrmann works for AMD, didn't you? I bet he knows what he's doing. Anyway, the bottom line is that these CPU sensors are unreliable. So you can apply pretty much any offset you like to have the readings you want to have, the numbers don't really matter. FYI, DiodeOffset is used "to correct the measurement made by an external temperature sensor." There is an on-die thermal diode which can be connected to an external sensor via 2 chip pins. To get a correct relation between this external temperature sensor value and TcontrolMax you need to add DiodeOffset. Some notes regarding comment #10: (1) I don't think that the values are correct. You have CPU: +34 C (high = +85 C, hyst = +81 C) and Core0 Temp: +42 C Using the fix that is in mainline kernel you would have seen Core0 Temp: +36 C which is much closer to the other CPU temperature value reported by an external sensor (I guess this sensor is connected to the on-die diode, but I am not sure.) (2) The referenced document is the BKDG for family 10h CPUs (e.g. Phenom). AFAIK your CPU is family 0xf (K8). (3) Your DiodeOffset is -22 degC. I guess TcontrolMax for your part is 70 degC. (see http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/33954.pdf) This means, when your mainboard has an external sensor connected to the on-die diode and it reports 92degC then you have reached TcontrolMax. (TcontrolMax "represents the maximum allowed TCONTROL value for the processor to be within its functional temperature specification.") |