Bug 76321

Summary: Incorrect hwmon temperature when radeon card is turned off
Product: Drivers Reporter: Pali Rohár (pali)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: high CC: alexdeucher, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.15-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: possible fix
possible fix v2
patch for get/set dpm state
patch v2 for get/set dpm state

Description Pali Rohár 2014-05-16 07:09:46 UTC
When radeon card is turned off (automatically via runpm) sensors program show totally incorrect temperature:

radeon-pci-0100
Adapter: PCI adapter
temp1:       +511.0°C  (crit = +120.0°C, hyst = +90.0°C)

When some opengl program is running with DRI_PRIME=1 then sensors reports:

radeon-pci-0100
Adapter: PCI adapter
temp1:        +46.0°C  (crit = +120.0°C, hyst = +90.0°C)

So kernel radeon driver should not report temperature (e.g. returns -EINVAL in hwmon) when radeon card is turned off (via vgaswitcheroo or runpm).
Comment 1 Alex Deucher 2014-05-16 15:36:10 UTC
Created attachment 136331 [details]
possible fix

Does this patch help?
Comment 2 Pali Rohár 2014-05-16 17:42:14 UTC
Here is output from sensors with your patch (when card is turned off):

radeon-pci-0100
Adapter: PCI adapter
ERROR: Can't get value of subfeature temp1_crit: Can't read
ERROR: Can't get value of subfeature temp1_crit_hyst: Can't read
temp1:            N/A  (crit =  +0.0°C, hyst =  +0.0°C)

Values crit & hyst worked fine (as it is not runtime value, but static).

And also your patch disabling showing/changing dpm/pm profile. I do not know if there is some problem to use that sysfs files when card is turned off, but if not I think there is no reason to disable it.

So in my opinion only this part of patch is needed:

@@ -538,8 +570,13 @@ static ssize_t radeon_hwmon_show_temp(struct device *dev,
 				      char *buf)
 {
 	struct radeon_device *rdev = dev_get_drvdata(dev);
+	struct drm_device *ddev = rdev->ddev;
 	int temp;
 
+	if  ((rdev->flags & RADEON_IS_PX) &&
+	     (ddev->switch_power_state != DRM_SWITCH_POWER_ON))
+		return -EINVAL;
+
 	if (rdev->asic->pm.get_temperature)
 		temp = radeon_get_temperature(rdev);
 	else

Or is really needed other hunks too?
Comment 3 Alex Deucher 2014-05-16 22:32:31 UTC
Created attachment 136431 [details]
possible fix v2

You can't change any hardware state when the card is powered down.  This patch is a little less strict however.
Comment 4 Pali Rohár 2014-05-17 18:57:31 UTC
With v2 patch now sensors does not report any error (when card is turned off):

$ sensors
radeon-pci-0100
Adapter: PCI adapter
temp1:            N/A  (crit = +120.0°C, hyst = +90.0°C)

This looks ok.

And for power_dpm_state & power_dpm_force_performance_level: I understand that it cannot be changed when card is turned off (I see that it also disappear from PCI bus space), but I would like to see option to set "initial" dpm state/level which will be used when card is turned om again. This can be usefull for scripts which setting powersave options depending on hostplugging AC adapter. What do you think about using same sysfs entries for it?
Comment 5 Alex Deucher 2014-05-19 14:31:36 UTC
(In reply to Pali Rohár from comment #4)
> 
> And for power_dpm_state & power_dpm_force_performance_level: I understand
> that it cannot be changed when card is turned off (I see that it also
> disappear from PCI bus space), but I would like to see option to set
> "initial" dpm state/level which will be used when card is turned om again.
> This can be usefull for scripts which setting powersave options depending on
> hostplugging AC adapter. What do you think about using same sysfs entries
> for it?

That's fine if someone wants to implement it.  It's not a high priority for me at the moment.
Comment 6 Pali Rohár 2014-05-20 17:06:35 UTC
Ok, I will look at it and will try to implemenent it. So can you commit radeon_hwmon_show_temp() part of patch?
Comment 7 Alex Deucher 2014-05-20 18:41:36 UTC
Yes, I already sent attachment 136431 [details] upstream.
Comment 8 Pali Rohár 2014-08-09 20:33:19 UTC
Created attachment 146011 [details]
patch for get/set dpm state

(In reply to Alex Deucher from comment #5)
> (In reply to Pali Rohár from comment #4)
> > 
> > And for power_dpm_state & power_dpm_force_performance_level: I understand
> > that it cannot be changed when card is turned off (I see that it also
> > disappear from PCI bus space), but I would like to see option to set
> > "initial" dpm state/level which will be used when card is turned om again.
> > This can be usefull for scripts which setting powersave options depending
> on
> > hostplugging AC adapter. What do you think about using same sysfs entries
> > for it?
> 
> That's fine if someone wants to implement it.  It's not a high priority for
> me at the moment.

It is implemented in attached patch. I tested it and it working fine on my system. My scripts automacitally change dpm state to battery when power adapter is unplugged. And when radeon card is powered on it initialize in battery state (even if last state was performance before turning card off).
Comment 9 Alex Deucher 2014-08-11 16:48:44 UTC
(In reply to Pali Rohár from comment #8)
> Created attachment 146011 [details]
> patch for get/set dpm state
> 
> 
> It is implemented in attached patch. I tested it and it working fine on my
> system. My scripts automacitally change dpm state to battery when power
> adapter is unplugged. And when radeon card is powered on it initialize in
> battery state (even if last state was performance before turning card off).

Patch looks fine wth respect to the power state change.  You can drop the changes to radeon_[get|set]_dpm_forced_performance_level() however.  That won't work without changes to the current force_performance_level interfaces.  If you can clean up the patch as requested and generate a proper git patch, I'll apply it.
Comment 10 Pali Rohár 2014-08-11 17:05:37 UTC
Created attachment 146241 [details]
patch v2 for get/set dpm state

New patch v2 without changes to radeon_[get|set]_dpm_forced_performance_level().
Comment 11 Alex Deucher 2014-08-11 19:21:21 UTC
Applied to my tree.  thanks.
Comment 12 Pali Rohár 2014-08-12 10:20:22 UTC
Alex, can you send that patch to stable trees like that in attachment 136431 [details]? Because before these patches sysfs reported battery/performance dpm state when card was turned off and some script used that information...
Comment 13 Alex Deucher 2014-08-12 13:14:28 UTC
Yes, I've cc'ed stable.