Bug 82781

Summary: Option to disable mclk reclocking with AMD R9 280X (TAHITI) to avoid screen flickering on VGA/CRT
Product: Drivers Reporter: Christian Birchinger (joker)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.16.1 Tree: Mainline
Regression: No
Attachments: add some debugging output
dmesg output
Xorg log file
EDID info
possible fix

Description Christian Birchinger 2014-08-19 15:11:12 UTC
I would like an option to forcefully disable mclk reclocking. Either as a module parameter
or of course prefered at runtime as additional sysfs option in /sys/class/drm/card[n]/device
probably with "power_<something>". The reason for this follows.

I switched from a AMD 5570 to a R9 280X card. Both have working and stable DPM.

With the new AMD R9 280X i noticed ocasiuonal screen flickering and narrowed it down
to the mclk reclocking between "15000" level 0 and "160000" level 1-3.

By connecting my TV over HDMI i noticed this would stop as multi head is known to
disable mclk reclocking (I've checked si_dpm.c)

I now hardcoded "disable_mclk_switching = true;" there and have a perfectly working
flicker free screen.

The sclk still changes fine and while probably not as power saving as full reclocking
of course a lot better than totaly disabling DPM or forcing it to "low" and "high" based
on the tasks i'm doing.

I've also noticed this effect on Windows using proprietary drivers even tough a lot less
than on Linux. Therefor i expect this to be a hardware related issue probably only visible
on VGA analog out to CRT screens, so i fear it's not fixable in an automatic way.

An option to manualy disable this feature in sysfs would help a lot. At runtime mclk reclocking
could be reenabled while DPMS is on (aka. the screen is turned off). If this could be detected
automaticaly it would be even better.
Comment 1 Alex Deucher 2014-08-19 15:23:26 UTC
Created attachment 147291 [details]
add some debugging output

I'd rather try and fix this properly.  Is it only a problem with specific monitors?  If so which ones?  Please attach your xorg log and dmesg output.  Please apply the attached patch and attach your dmesg output with it applied when the problematic monitor is attached and you see the flickering.
Comment 2 Christian Birchinger 2014-08-19 15:56:20 UTC
Created attachment 147301 [details]
dmesg output

Ok i've attached the dmesg output.

Back with the 5570 i've corrected my sync timings after your suggestions.
I can also attach my EDID info in case the values that were fine with the
5570 somehow are not acceptable anymore. They should be pretty vanilla
vesa modes now (Windows Generic Monitor is the same here).
Comment 3 Christian Birchinger 2014-08-19 15:57:36 UTC
Created attachment 147311 [details]
Xorg log file
Comment 4 Christian Birchinger 2014-08-19 16:00:44 UTC
Created attachment 147321 [details]
EDID info

EDID info for the monitor

The Monitor is an EIZO F930 connected over BNC (so no real EDID from the device itself)

The values should be pretty standard VESA, and have been running stable with various cards for the past years (probably 10 or more)
Comment 5 Alex Deucher 2014-08-19 16:16:30 UTC
Created attachment 147331 [details]
possible fix

As you can see from your log, the vblank time for the mode you've selected is right at the mclk switching limit:
[    5.943137] vblank_time 450, switch_limit 450
so I guess in your case, there are times when it doesn't quite complete in time.  A minor adjustment to the comparison should fix it.
Comment 6 Alex Deucher 2014-08-19 16:27:32 UTC
Since you seem to be using a custom edid, you can also adjust the modelines to increase (to give the GPU more time to switch the mclk) or decrease (to disable mclk switching) the vblank_time. I suspect the vblank period is not reliable on you monitor and since it's currently set to exactly the switching limit  the margin for error is very small.  The switch may not have finished completely during the vblank period if the vblank period happens to be shorter in some cases.  That's why you see the flickering.  Tweaking your modeline to increase the vblank period should avoid that.
Comment 7 Christian Birchinger 2014-08-19 16:38:34 UTC
Yes, that patch would indeed fix it in my case.

I was pretty sure i took standard VESA modes and the fact that Windows Generic Monitor (no overrides or "hacks" there) values also using the same timings tells
me that they re rather "normal" modes.

i'm going to take your fix and live with it because i prefer when all my screen
modes remain exactly the same between OS switches.

But for curiosity reasons. Do you have a suggestion for mode sources with higher
vblank pauses which are still considered "standard" and not totaly uniq and custom? Or might it be possible my 85 Hz value simply has this 450 as standard.

Anyway, thanks for your help.
Comment 8 Christian Birchinger 2014-08-19 17:08:10 UTC
I've applied your patch, booted and still see the mclk changing.

I know i'm doing something wrong but i cannot see what.

I've checke the source .. and "vblank_time <= switch_limit" is there.
Also 450 and 450 gets printed so the test must match. "make modules" tells me everything is built and the timestamp on radeon.ko is recent so it's installed.
Comment 9 Alex Deucher 2014-08-19 17:11:01 UTC
(In reply to Christian Birchinger from comment #7)
> Yes, that patch would indeed fix it in my case.
> 
> I was pretty sure i took standard VESA modes and the fact that Windows
> Generic Monitor (no overrides or "hacks" there) values also using the same
> timings tells
> me that they re rather "normal" modes.
> 
> i'm going to take your fix and live with it because i prefer when all my
> screen
> modes remain exactly the same between OS switches.
> 
> But for curiosity reasons. Do you have a suggestion for mode sources with
> higher
> vblank pauses which are still considered "standard" and not totaly uniq and
> custom? Or might it be possible my 85 Hz value simply has this 450 as
> standard.
>

There's nothing wrong with the modes or non-standard about a mode with a 450 us vblank time.  The issue is it takes 450 us for the mclk to change.  It just so happens that when the vblank period is the same as the switch limit so there is very little margin for error.

As for tweaking your modeline, you can try adjusting the htotal or vblank_end parameters.  CRT multi-sync monitors are pretty flexible when it comes to timing.

You can use gtf of cvt to generate vesa compatible modelines.  e.g.,
$ gtf 1600 1200 85

  # 1600x1200 @ 85.00 Hz (GTF) hsync: 107.10 kHz; pclk: 234.76 MHz
  Modeline "1600x1200_85.00"  234.76  1600 1720 1896 2192  1200 1201 1204 1260  -HSync +Vsync

$ cvt 1600 1200 85
# 1600x1200 84.95 Hz (CVT 1.92M3) hsync: 107.21 kHz; pclk: 235.00 MHz
Modeline "1600x1200_85.00"  235.00  1600 1728 1896 2192  1200 1203 1207 1262 -hsync +vsync

your mode:
[    30.543] (II) RADEON(0): Modeline "1600x1200"x85.0  229.50  1600 1664 1856 2160  1200 1201 1204 1250 +hsync +vsync (106.2 kHz eP)

gtf was the vesa formula used for CRTs and cvt is the newer formula designed for DFPs.  I'm not sure how your modes were generated, but they don't seem to follow either gtf or cvt.  Both the gtf (540 us vblank period) and cvt (558 us vblank period) modes should work fine as there is pleny of margin.
Comment 10 Christian Birchinger 2014-08-19 17:58:04 UTC
I don't know why but the only way i can stop it from chaning mclk is doing this:

si_dpm.c

/*
  if ((rdev->pm.dpm.new_active_crtc_count > 1) ||
      ni_dpm_vblank_too_short(rdev))
*/
    disable_mclk_switching = true;

I know ni_dpm_vblank_too_short must report true because "450 <= 450".
I'm totaly clueless why only hardcoding "true" keeps mclk at max 160000
also any printk i add after this wont get printed out.
Comment 11 Christian Birchinger 2014-08-19 18:46:43 UTC
Simply doing this also works:

ni_dpm.c:

bool ni_dpm_vblank_too_short(struct radeon_device *rdev)
{
  struct rv7xx_power_info *pi = rv770_get_pi(rdev);
  u32 vblank_time = r600_dpm_get_vblank_time(rdev);
  /* we never hit the non-gddr5 limit so disable it */
  u32 switch_limit = pi->mem_gddr5 ? 450 : 0;

  return true;

Adding a hardcoded true also disables mclk changes. which only leads
to one conclusion:

  if (vblank_time <= switch_limit)
    return true;
  else
    return false;

Somehow the values are not always 450 but i cannot see such a case
in the printk debug.
Comment 12 Alex Deucher 2014-08-19 19:12:23 UTC
I'm not sure what's going on.  Maybe a gcc bug?  Can you try a modeline with a longer vblank period?  E.g.,

xrandr --newmode 1600x1200_gtf  234.76  1600 1720 1896 2192  1200 1201 1204 1260  -HSync +Vsync
xrandr --addmode DVI-0 1600x1200_gtf
xrandr --output DVI-0 --mode 1600x1200_gtf

or
xrandr --newmode 1600x1200_cvt 235.00  1600 1728 1896 2192  1200 1203 1207 1262 -hsync +vsync
xrandr --addmode DVI-0 1600x1200_cvt
xrandr --output DVI-0 --mode 1600x1200_cvt
Comment 13 Christian Birchinger 2014-08-19 19:39:48 UTC
The GTF one is unusable. I see a totaly desyncd image and lines are only flickering around.

The CVT one works, altough with huge black borders. However, it still flickers every 20-60 seconds. So longer vblank periods don't seem to fix my issue.