Bug 75241

Summary: radeon_compute_pll_avivo broken in 3.15-rc3
Product: Drivers Reporter: Clemens Ladisch (clemens)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: alexdeucher, benh, bugzilla, deathsimple, szg00000, tasev.stefanoska
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.15-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Possible fix.
dmesg working 3.15-rc5
dmesg broken 3.15-rc5
dmesg after boot working with max divider 32
dmesg after suspend resume broken
Possible fix v2.
Possible fix v3.

Description Clemens Ladisch 2014-05-01 13:51:26 UTC
After upgrading from rc2 to rc3, my RS880 no longer outputs a signal
that my monitor is able to show.

Bisected to this:

commit c2fb3094669a3205f16a32f4119d0afe40b1a1fd
Author: Christian König <christian.koenig@amd.com>
Date:   Sun Apr 20 13:24:32 2014 +0200

    drm/radeon: improve PLL limit handling in post div calculation
    
    This improves the PLL parameters when we work at
    the limits of the allowed ranges.


Debug output with black screen:

kernel: [drm:drm_crtc_helper_set_config] attempting to set mode from userspace
kernel: [drm:drm_mode_debug_printmodeline] Modeline 29:"1600x1200" 60 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x48 0x5
kernel: [drm:radeon_encoder_set_active_device] setting active device to 00000200 from 00000200 00000200 for encoder 2
kernel: [drm:drm_crtc_helper_set_mode] [CRTC:14]
kernel: [drm:radeon_atom_encoder_dpms] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
kernel: [drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 2036.3 ref: 30, post 6

With this commit reverted (and working screen):

kernel: [drm:radeon_compute_pll_avivo] 162000 - 16106, pll dividers - fb: 1023.5 ref: 13, post 7
Comment 1 Christian König 2014-05-01 14:48:41 UTC
Thanks for the info, could you provide the debug output of 3.14 as well? I especially need the line with radeon_compute_pll_avivo.

The problem isn't really triggered by the patch you bisected, but more an issue of the new PLL code.

Thanks in advance,
Christian.
Comment 2 Christian König 2014-05-01 17:02:16 UTC
Created attachment 134611 [details]
Possible fix.

Please try the attached patch, it might fix the issue.
Comment 3 Clemens Ladisch 2014-05-01 19:24:13 UTC
3.14:
16205, pll dividers - fb: 135.8 ref: 2, post 6

With the patch:
162000 - 161990, pll dividers - fb: 271.5 ref: 4, post 6

And the patch indeed fixes this.
Comment 4 Christian König 2014-05-02 12:37:39 UTC
(In reply to Clemens Ladisch from comment #3)
> 3.14:
> 16205, pll dividers - fb: 135.8 ref: 2, post 6
> 
> With the patch:
> 162000 - 161990, pll dividers - fb: 271.5 ref: 4, post 6
> 
> And the patch indeed fixes this.

Thanks allot for the info. Going to push the patch with the next bugfix release.

Could you try higher values for the limit as well and try to figure out what's the maximum your monitor still can handle?

It might also make sense to temporary comment out that the following line and see what you get for the parameters and if those still work fine:

avivo_reduce_ratio(&fb_div, &ref_div, fb_div_min, ref_div_min);

And by the way: What monitor is this?
Comment 5 Clemens Ladisch 2014-05-02 21:08:22 UTC
It's an Eizo S2100, but this should not matter because the clocks seen by the
monitor are always about the same (162MHz/75kHz/60Hz).  If some were out of
range, the monitor would show an error message, but with the PLL problem, the
monitor does not appear to detect even an out-of-range signal.   I'd guess the
PLL itself cannot handle the parameters.

The largest working ref_div_max limit is 131.

with 131:  162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6

with 132:  162000 - 162000, pll dividers - fb: 1493.3 ref: 22, post 6

avivo_reduce_ratio does not change these values.
Comment 6 Christian König 2014-05-03 11:25:04 UTC
(In reply to Clemens Ladisch from comment #5)
> It's an Eizo S2100, but this should not matter because the clocks seen by the
> monitor are always about the same (162MHz/75kHz/60Hz).  If some were out of
> range, the monitor would show an error message, but with the PLL problem, the
> monitor does not appear to detect even an out-of-range signal.   I'd guess
> the
> PLL itself cannot handle the parameters.

The PLL should be able to handle this quite fine. It's just that when you increase the reference and post divider you can better match the wanted frequency for the cost of increased jitter and general signal stability.

I have one monitor here that practically works with everything I give to it, another one can't handle it when the frequency doesn't precisely match and a third one doesn't like it when we have a high jitter in the signal.

The trick is to find the right sweet spot where you can make everbody happy.

> The largest working ref_div_max limit is 131.

Thanks allot, going to use 128 then (just because it's a nice round number) until somebody else starts to complain that his monitor doesn't likes the signal.

Christian.
Comment 7 Tasev Nikola 2014-05-15 15:14:31 UTC
Hi

I'm still having randomly the frequency out off range problem
with kernel 3.15-rc5.
2 times today when booting , once yesterday after suspend resume.
My screen is a Belinea 2080S2  1600x1200 .
I first reported this as bug 75471 where are dmesg.

Nikola
Comment 8 Alex Deucher 2014-05-15 15:26:53 UTC
Does this patch help:
http://lists.freedesktop.org/archives/dri-devel/2014-May/059469.html
Comment 9 Christian König 2014-05-15 15:33:23 UTC
(In reply to Alex Deucher from comment #8)
> Does this patch help:
> http://lists.freedesktop.org/archives/dri-devel/2014-May/059469.html

Unlikely. Tasev has an RS780, on those the feedback divider is usually in the ~1000 area. This patch only moves the feedback divider limit for .5 from 14 down to 13.

Does it help if you suspend/resume again after this issue? Might be that we are seeing a crash somewhere else?
Comment 10 Tasev Nikola 2014-05-15 16:58:50 UTC
I'm compiling a patched kernel now.
I will test it but to be shure i will need 4-5 day's probably,
because i use the 3.15-rc5 from sunday and the problem appear only now.
Comment 11 Tasev Nikola 2014-05-15 17:57:42 UTC
Hi 

I just notice that the 3.15-rc5 will boot successfully only once in 5-6 attempt.
When i try it the first time in sunday i was just lucky he boot at first time.
I suspend resume this computer rater then shutdown, and i did not shutdown the computer until yesterday after the failure when suspend resume.
Now, with or without the patch, it will boot only once in 5-6 attempts without 
the out off range frequency problem.
Attached are dmesg when working and not working without patch.
Comment 12 Tasev Nikola 2014-05-15 17:58:53 UTC
Created attachment 136241 [details]
dmesg working 3.15-rc5
Comment 13 Tasev Nikola 2014-05-15 17:59:42 UTC
Created attachment 136251 [details]
dmesg broken 3.15-rc5
Comment 14 Tasev Nikola 2014-05-16 19:51:51 UTC
Hi 

I try today with a Medion 1280x1024 monitor and everything work without
problem.
It seem's that only the combinaison RS880 + Belinea 2080S2 have problem
with the new PLL code.
I tried different value from 128 to 90 for the ref_div_max but none work
with my Belinea 1600x1200 screen.
Comment 15 Christian König 2014-05-20 13:20:22 UTC
(In reply to Tasev Nikola from comment #14)
> I tried different value from 128 to 90 for the ref_div_max but none work
> with my Belinea 1600x1200 screen.

Try going down to at least 32, this would match the behaviour on 3.14.

The problem is that in both the working and broken case the calculated parameters are the same.

Broken: [   23.511041] [drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6

Working: [   23.560826] [drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6

So I'm not really sure what else could go wrong here.
Comment 16 Tasev Nikola 2014-05-20 15:25:29 UTC
Hi 

I tried with 64, 48 and 32 for the ref_div_max .
The only one working at boot is 32 , but after the first suspend resume the 
off range frequency problem appear again. I try a second suspend resume with 
the same result. I try also with the patch from comment 8 with the same 
result, boot succesfull and fail after resume.
And you're right, the calculated parameters are the same in both the working and broken case again.
The dmesg after boot and after suspend resume are attached.
Comment 17 Tasev Nikola 2014-05-20 15:26:47 UTC
Created attachment 136831 [details]
dmesg after boot working with max divider 32
Comment 18 Tasev Nikola 2014-05-20 15:27:52 UTC
Created attachment 136841 [details]
dmesg after suspend resume broken
Comment 19 Christian König 2014-05-20 15:51:00 UTC
From the logs you are always getting the same set of paramaters, even when you change the maximum used in the fix:

[drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1425.4 ref: 21, post 6

With a maximum of 32 and a post divider of 6 the ref divider shouldn't be more than 5, but it still stays at 21.

Thise means there is something wrong with the way you install the kernel module (or the modification you make). Please double check that you got the right kernel module loaded.
Comment 20 Tasev Nikola 2014-05-20 18:56:13 UTC
You're right again.

It seems that just build the module doesn't work for me. I build a new kernel from sources with the ref_div_max 124 and it seems to work for now. 
[drm:radeon_compute_pll_avivo] 162000 - 161990, pll dividers - fb: 1346.2 ref: 17, post 7
I rebooted 3 times and it always boot fine. I would test it for some days and report if everything work fine. 
Sorry for my previous post
Comment 21 Tasev Nikola 2014-05-21 18:07:38 UTC
Hi

With the ref_div_max 124 everything works fine.
Should i try another value just let me now.
Comment 22 Christian König 2014-05-21 18:23:10 UTC
(In reply to Tasev Nikola from comment #21)
> Hi
> 
> With the ref_div_max 124 everything works fine.
> Should i try another value just let me now.

I'm going to submit a patch with value 114, just to have some more room for errors.

I know that values below 100 causes problems for another user, so when 114 works for you we probably found the sweet spot.
Comment 23 Tasev Nikola 2014-05-23 17:16:07 UTC
With ref_div_max 114 everything works fine for me.
Comment 24 Tasev Nikola 2014-06-01 08:37:43 UTC
The new  ref_div_max = max(min(100 / post_div, ref_div_max), 1u);  
works fine with my Belinea 1600x1200 screen.
Comment 25 Dan Merillat 2014-07-02 07:32:41 UTC
Unfortunately, I had to set this down to 32 to work on my system.

Radeon HD 3200 (onboard, RS780)
Monitor Viewsonic G225f
Kernel 3.16-rc3

Nonworking:
[drm:radeon_compute_pll_avivo] 229500 - 229500, pll dividers - fb: 1602.7 ref: 25, post 4

Working:
[drm:radeon_compute_pll_avivo] 229500 - 229500, pll dividers - fb: 240.4 ref: 3, post 5

CRTs are getting increasingly rare - perhaps a tunable for this so us fogies with 100 pound monitors can set it where it works on our system?  For me, it's a trivial patch to carry forward but setting something like drm.ref_div_tweak=32 in my grub config would be easier.

I haven't been able to use a kernel since commit 3216701   drm/radeon: rework finding display PLL numbers v2.
Comment 26 Christian König 2014-07-04 13:43:22 UTC
Created attachment 142051 [details]
Possible fix v2.

Does this patch fixes the issue for you?
Comment 27 Dan Merillat 2014-07-06 18:37:37 UTC
No, I reverted to a clean 3.16-rc3 (changed the 32 back to 100) and applied the patch:

[drm:radeon_compute_pll_avivo] 229500 - 229500, pll dividers - fb: 1602.7 ref: 20, post 5

fb: is the same, ref and post are different.  Same results as without the patch - the monitor wakes up out of sleep, but doesn't display anything.

I can't get the OSD to display, so I don't know what it thinks the sync rates are.
Comment 28 Christian König 2014-07-07 12:41:29 UTC
Created attachment 142281 [details]
Possible fix v3.

How about this one? Does it fixes the issue as well?
Comment 29 Dan Merillat 2014-08-06 06:00:55 UTC
(In reply to Christian König from comment #28)
> Created attachment 142281 [details]
> Possible fix v3.
> 
> How about this one? Does it fixes the issue as well?

Sorry for the long delay in getting back to you.

3.16 stock does not work on my monitor, this patch (alone) fixes it.

I don't have a scope at my house, but at the office when this happens all signal lines on the VGA are idle.
Comment 30 Benjamin Herrenschmidt 2016-07-02 10:21:36 UTC
Your latest change broke it for me, sorry for the delay in noticing, that combination of machine & monitor was stuck in the dark ages for a while...

The combo is Radeon R9 290 (from Sapphire) and good old Apple Cinema Display 23" (1920x1200x60 fixed resolution display) on DVI.

I get a black screen with radeon. It works with Alex's amdgpu. The one liner
that fixes it is in the PLL calculation:

-ref_div_max = max(min(100 / post_div, ref_div_max), 1u);
+ref_div_max = max(min(128 / post_div, ref_div_max), 1u);

I noticed other differences though, the max fb div is 2047 with radeon and 4095 with amdgpu but the above is the key.

This is a trace of amdgpu calculation (which works) after I sprinkled printk's around:

[    3.471131] fb_div_min/max=4/4095 pll_flags=400
[    3.471132] by 10 ! fb_div_min/max=40/40950
[    3.471133] ref_div_min=2 (from 0/2)
[    3.471133] ref_div_max=1023 (from 0/1023)
[    3.471134] vco_min/max=600000/1200000
[    3.471134] post_div_min/max=4/7
[    3.471135] initial nom=153970, den=2700
[    3.471136] reduced nom=15397, den=270
[    3.471136] - trying post_div 4, ref_div_max=32
[    3.471137]   tentative ref_div=32m, fb_div=7299
[    3.471137]   adjusted ref_div=32m, fb_div=7299
[    3.471138] diff=7, diff_best=-1
[    3.471138] - trying post_div 5, ref_div_max=25
[    3.471139]   tentative ref_div=25m, fb_div=7128
[    3.471139]   adjusted ref_div=25m, fb_div=7128
[    3.471139] diff=6, diff_best=7
[    3.471140] - trying post_div 6, ref_div_max=21
[    3.471140]   tentative ref_div=21m, fb_div=7185
[    3.471141]   adjusted ref_div=21m, fb_div=7185
[    3.471141] diff=6, diff_best=6
[    3.471141] - trying post_div 7, ref_div_max=18
[    3.471142]   tentative ref_div=18m, fb_div=7185
[    3.471142]   adjusted ref_div=18m, fb_div=7185
[    3.471150] diff=6, diff_best=6
[    3.471150] post_div_best=7
[    3.471151] - trying post_div 7, ref_div_max=18
[    3.471151]   tentative ref_div=18m, fb_div=7185
[    3.471152]   adjusted ref_div=18m, fb_div=7185
[    3.471153] [drm:amdgpu_pll_compute] 153970 - 153960, pll dividers - fb: 239.5 ref: 6, post 7

Now this is with radeon *NOTE: I have bumped the max fb div to the same as AMD GPU when taking that trace but that had no effect:

[    4.718126] fb_div_min/max=4/4095 pll_flags=410
[    4.718126] by 10 ! fb_div_min/max=40/40950
[    4.718127] ref_div_min=2 (from 0/2)
[    4.718128] ref_div_max=1023 (from 0/1023)
[    4.718128] vco_min/max=600000/1200000
[    4.718129] post_div_min/max=4/7
[    4.718129] initial nom=153970, den=2700
[    4.718130] reduced nom=15397, den=270
[    4.718130] - trying post_div 4, ref_div_max=25
[    4.718131]   tentative ref_div=25m, fb_div=5703
[    4.718131]   adjusted ref_div=25m, fb_div=5703
[    4.718132] diff=11, diff_best=-1
[    4.718133] - trying post_div 5, ref_div_max=20
[    4.718133]   tentative ref_div=20m, fb_div=5703
[    4.718133]   adjusted ref_div=20m, fb_div=5703
[    4.718134] diff=11, diff_best=11
[    4.718134] - trying post_div 6, ref_div_max=16
[    4.718135]   tentative ref_div=16m, fb_div=5474
[    4.718135]   adjusted ref_div=16m, fb_div=5474
[    4.718136] diff=14, diff_best=11
[    4.718136] - trying post_div 7, ref_div_max=14
[    4.718136]   tentative ref_div=14m, fb_div=5589
[    4.718137]   adjusted ref_div=14m, fb_div=5589
[    4.718137] diff=12, diff_best=11
[    4.718138] post_div_best=5
[    4.718138] - trying post_div 5, ref_div_max=20
[    4.718139]   tentative ref_div=20m, fb_div=5703
[    4.718139]   adjusted ref_div=20m, fb_div=5703
[    4.718141] [drm:radeon_compute_pll_avivo] 153970 - 153980, pll dividers - fb: 570.3 ref: 20, post 5

The modeline is:

Modeline 55:"1920x1200" 60 153970 1920 1968 2000 2080 1200 1203 1209 1235 0x48 0x9

And is consistent between the 2 drivers.
Comment 31 Benjamin Herrenschmidt 2016-07-02 10:58:10 UTC
Note: It's an LCD :-) It's one of those fixed-mode panels Apple has always been fond of, one of the very first 1920x1200 out there.
Comment 32 Benjamin Herrenschmidt 2016-07-02 10:59:57 UTC
Note 2: Catalyst and the Windows driver both work fine. Any way to know what formula these 2 use (I assume it's the same code) ?
Comment 33 Christian König 2016-07-02 16:16:36 UTC
(In reply to Benjamin Herrenschmidt from comment #30)
> The combo is Radeon R9 290 (from Sapphire) and good old Apple Cinema Display
> 23" (1920x1200x60 fixed resolution display) on DVI.

Well this bug report is about nearly ten year old hardware and was fixed almost two years ago (we just forgot to close it).

So please open a separate bug report preferable in the FDO bugzilla.
Comment 34 Benjamin Herrenschmidt 2016-07-02 21:47:45 UTC
Well, the Apple Cinema Display is nearly 10 years old too :-) But at least it's an LCD... I will open a new bug on FDO.
Comment 35 Benjamin Herrenschmidt 2016-07-05 09:04:16 UTC
https://bugs.freedesktop.org/show_bug.cgi?id=96789