Bug 76761

Summary: radeon DPM breaks suspend to disk and resume from RAM in 3.14
Product: Drivers Reporter: Vitaliy Filippov (vitalif)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: alexdeucher, bjoernv, hafflys, swoorupj, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.14.4 (debian 3.14.4-1 amd64) Subsystem:
Regression: Yes Bisected commit-id:
Attachments: possible fix
better fix

Description Vitaliy Filippov 2014-05-22 19:35:12 UTC
Hi!

The suspend/resume has broken again on my laptop with the 3.14 kernel. But it works without any problem when I boot without radeon module, so I assume it's the case.

The problem detail is:
- machine suspends to RAM without problem, but does not resume (hangs with black screen during resume)
- machine does not suspend to disk (just hangs with black screen after "snapshotting system", magic sysrq doesn't work so I assume that's an "oops"), neither via s2disk or echo disk > /sys/power/state

Without the radeon module, everything works OK; the card is RV730.

Are there any other details I should provide?
Comment 1 Alex Deucher 2014-05-22 19:51:53 UTC
Can you bisect?
Comment 2 Vitaliy Filippov 2014-05-22 19:56:44 UTC
Yes, I can try to bisect...

Just found Bug 63391 (Radeon RS880 doesn't resume from suspend with radeon dpm enabled), and also tried to disable DPM - suspend works fine without it.

But DPM is an important feature for me, without it the laptop easily gets VERY HOT! So I can't just disable DPM as it was done in Bug 63391...
Comment 3 Vitaliy Filippov 2014-05-23 23:24:02 UTC
OK I did bisect.

The result is:

6c7bccea390853bdec5b76fe31fc50f3b36f75d5 is the first bad commit
commit 6c7bccea390853bdec5b76fe31fc50f3b36f75d5
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Dec 18 14:07:14 2013 -0500

    drm/radeon/pm: move pm handling into the asic specific code
    
    We need more control over the ordering of dpm init with
    respect to the rest of the asic.  Specifically, the SMC
    has to be initialized before the rlc and cg/pg.  The pm
    code currently initializes late in the driver, but we need
    it to happen much earlier so move pm handling into the asic
    specific callbacks.
    
    This makes dpm more reliable and makes clockgating work
    properly on CIK parts and should help on SI parts as well.
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Comment 4 Vitaliy Filippov 2014-05-24 16:43:55 UTC
Do you need any other details about this bug? :)
Comment 5 Vitaliy Filippov 2014-05-27 21:23:41 UTC
Alex, do you have any suggestions?..
Comment 6 Alex Deucher 2014-05-28 00:11:48 UTC
It was a long weekend in the US.  I haven't had a chance to look into it yet.
Comment 7 Alex Deucher 2014-05-30 14:31:21 UTC
Created attachment 137731 [details]
possible fix

Does this patch help?
Comment 8 Alex Deucher 2014-05-30 14:49:59 UTC
Created attachment 137741 [details]
better fix

Please try this patch instead.
Comment 9 Vitaliy Filippov 2014-05-30 22:34:41 UTC
Yes, everything works with it, thanks!

So, in which kernel version are you planning to include this fix?
Comment 10 Alex Deucher 2014-05-30 22:42:48 UTC
3.15 is we can make it in in time, otherwise 3.16.  I've cc'ed stable so it will show up in older kernels as well too once it's applied upstream.
Comment 11 Vitaliy Filippov 2014-06-02 21:33:42 UTC
Thanks, I'll wait for the inclusion :-)
Comment 12 swoorupj 2014-06-09 07:19:00 UTC
Hi Alex,

I have reported the same issue in bugs.freedesktop. I tried out the 3.15 linux kernel in archlinux just today. And this issue is still prevalent to me. 

Resuming from suspend still results a blank screen using the radeon module in the kernel.

dmesg logs: http://ix.io/cSi

I have a AMD A6-4400M APU that integrates  Radeon HD 7520G.

Kind Regards,
Swoorup Joshi
Comment 13 Alex Deucher 2014-06-09 13:46:10 UTC
(In reply to swoorupj from comment #12)
> Hi Alex,
> 
> I have reported the same issue in bugs.freedesktop. I tried out the 3.15
> linux kernel in archlinux just today. And this issue is still prevalent to
> me. 

Is it the same commit (6c7bccea390853bdec5b76fe31fc50f3b36f75d5) that caused the regression for you as well?  If not, you are experienceing a different issue.
Comment 14 swoorupj 2014-06-10 03:25:34 UTC
(In reply to Alex Deucher from comment #13)
> (In reply to swoorupj from comment #12)
> > Hi Alex,
> > 
> > I have reported the same issue in bugs.freedesktop. I tried out the 3.15
> > linux kernel in archlinux just today. And this issue is still prevalent to
> > me. 
> 
> Is it the same commit (6c7bccea390853bdec5b76fe31fc50f3b36f75d5) that caused
> the regression for you as well?  If not, you are experienceing a different
> issue.

No, this issue has been occurring since kernel 3.8.
Comment 15 swoorupj 2014-06-10 03:36:41 UTC
Is there a way or patch, I could get all the state of the card and test how they differ before and after suspending?
Comment 16 Vitaliy Filippov 2014-06-10 07:22:18 UTC
Since you know everything was ok in 3.8, maybe you'll just also try to bisect? :)
Comment 17 swoorupj 2014-06-10 08:29:08 UTC
Sorry I have mislead you. I meant I started using linux since 3.8. And I have had this problem since. I don't recall it working with KMS.
Comment 18 Alex Deucher 2014-06-10 14:00:12 UTC
swoorupj you have some other issue then.  It's not related to this bug.
Comment 19 StephenH 2014-08-29 18:28:44 UTC
Fedora kernel 3.15.x versions on an Acer Aspire One netbook with AMD C50 processor and Radeon driver has this problem.
Kernel 3.14.x and earlier kernel versions did not have this problem.

Whatever changed between 3.14.x and 3.15.x has caused this problem. I have a bug report at https://bugzilla.redhat.com/show_bug.cgi?id=1121838 but I think the problem will need to be addressed here, upstream from Fedora.

Kernels 3.14 and under = work fine.
Kernels 3.15.x != work

I asked in the buzilla report cited above what changed that could cause this and how do I fix it? With Fedora working to rebase Fedora 20 to 3.16.x, if the problem has not been fixed, then there will be problems. This will be even more so when Fedora 21 is released and Fedora 19 becomes an E.O.L. version. I am currently running Fedora 20, but I have to use a Fedora 19 kernel (3.14.17-100.fc19.x86_64) in order to have suspend/resume work properly.

Please don't ask me to bisect unless you give me explicit directions on how to do so. I have no idea what bisecting does or how to do it. I'm not a developer.