Bug 74751

Summary: resume from suspend broken with 3.15-rc1 and rc2 kernels
Product: Drivers Reporter: Tasev Nikola (tasev.stefanoska)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: daniel, montonen.niko, wshuman3
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.15-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: bisect.log
dmesg
lspci grep VGA
dmidecode
dmesg working kernel after suspend/resume
dmesg broken kernel before suspend/resume
dmesg from working kernel
dmesg from broken kernel
commit 25f397a429dfa43f22c278d0119a60a343aa568f from gitk
dmesg with devices for sys-power-pm-test
resume fbcon later

Description Tasev Nikola 2014-04-24 16:32:38 UTC
Created attachment 133611 [details]
bisect.log

From the 3.15-rc1 kernel, the resume from suspend is broken on my hp dm1 laptop with AMD E-350 radeon HD 6310 graphic.

After bisecting, i found that commit: 

 commit 25f397a429dfa43f22c278d0119a60a343aa568f
    Author: Daniel Vetter <daniel.vetter@ffwll.ch>
    Date:   Fri Jul 19 18:57:11 2013 +0200

was the first bad commit.

After revert the commit, and recompiling the 3.15-rc2 kernel, the suspend resume work again with a small problem.

After the resume, the laptop suspend immediately again. After pressing any key, it resume again normally. 

If y suspend with the parameter --quirk-dpms-on then the laptop suspend resume without any problem.

Until the 3.14.1 kernel, everything works fine. 

I dont know if i reported the bug at the right place, I don't know if it is acpi, drm, or radeon related but at least it seems dpms related.

If you need any other info please let me know. 

Attached are the bisect.log, dmseg, lspci and dmidecode. 

Best regards.
Comment 1 Tasev Nikola 2014-04-24 16:33:47 UTC
Created attachment 133621 [details]
dmesg
Comment 2 Tasev Nikola 2014-04-24 16:34:24 UTC
Created attachment 133631 [details]
lspci grep VGA
Comment 3 Tasev Nikola 2014-04-24 16:34:44 UTC
Created attachment 133641 [details]
dmidecode
Comment 4 Daniel Vetter 2014-04-25 13:56:08 UTC
Can you please boot with drm.debug=0xe on broken kernels and on working kernels, do a suspend/resume and then attach dmesg for each? Please make sure early boot messages are not cut off, increasing dmesg logsize with log_buf_len if needed.
Comment 5 Daniel Vetter 2014-04-25 13:58:54 UTC
Hm, 25f397a429dfa43f22c was already merged into 3.11, but you say that 3.14 works fine. Is this really the right first bad commit git bisect found? I'm confused ...
Comment 6 Tasev Nikola 2014-04-25 18:01:44 UTC
Hi Daniel,

First, sorry if i confuse you , i'm just an averrage user.

But yes this is really the first bad commit git bisect found.
I saw that it is just one line simple patch but it probably masked another problem elsewhere because the break in the patch is just before the dpms screen code. And yes changing the break 3 lines above like before the patch and recompiling the kernel fixes the problem for now. 

I dont know from where could come the problem. And yes, all the kernels before the 3.15-rc1 work without a problem. 

With the broken 3.15-rc2 kernel (non patched), my computer frooze immediately after resume. I only have a black screen, i can't log into console and i must shutdown the computer pressing the power button 10 sec. I attached the dmesg before suspend/resume. 

For the working 3.15-rc2 with the patch, I attached the dmesg with drm.debug=0xe after suspend/resume.
Comment 7 Tasev Nikola 2014-04-25 18:03:19 UTC
Created attachment 133791 [details]
dmesg working kernel after suspend/resume
Comment 8 Tasev Nikola 2014-04-25 18:04:07 UTC
Created attachment 133801 [details]
dmesg broken kernel before suspend/resume
Comment 9 Tasev Nikola 2014-04-28 14:20:26 UTC
Hi,

I just tried now 3.15-rc3, the bug is still there.
Comment 10 Niko Montonen 2014-04-29 07:07:32 UTC
I have a Lenovo ThinkPad Edge E325 with the upgraded version of Tasev's APU,
the E450, with the HD6320 Wrestler graphics chipset, and I'm suffering from an
issue that would appear to be the same as this.

I'm currently running 3.14.1 on the machine, and it works just fine, but
3.15-rc1 and 3.15-rc3 both fail to resume from suspend (I haven't tested rc2).
The machine hangs completely (network interfaces do not resume etc.), so I'm
unable to get dmesg after suspend.

I feel it's worth noting that the machine also completely ignores the lid
closing with 3.15-rc3, and I believe there are lots of ACPI changes in 3.15,
which would explain a lot.

Are there notes somewhere on how to get some useful debug info for situations
like this?
Comment 11 Niko Montonen 2014-04-29 07:08:02 UTC
Created attachment 134161 [details]
dmesg from working kernel
Comment 12 Niko Montonen 2014-04-29 07:08:25 UTC
Created attachment 134171 [details]
dmesg from broken kernel
Comment 13 Daniel Vetter 2014-04-29 08:41:08 UTC
Now I'm even more confused, since 25f397a429dfa is a lot more than a one-line patch. And it _really_ is included in 3.14 already, so git bisect can't possible list that one as the offending commit for a post-3.14 regression, the tool doesn't work like that.

Can you please double-check the sha1 and perhaps cite the full commit message + patch to make sure we're talking about the same?
Comment 14 Tasev Nikola 2014-04-29 12:30:13 UTC
Hi,

Here is the patch from the commit that i found with git bisect.
It's a copy paste from gitk for the commit 25f397a429dfa43f22c278d0119a60a343aa568f



---------------------- drivers/gpu/drm/drm_crtc_helper.c ----------------------
index c0f2d62..8108db9 100644
@@ -695,12 +695,13 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set)
 				if (new_encoder == NULL)
 					/* don't break so fail path works correct */
 					fail = 1;
-				break;
 
 				if (connector->dpms != DRM_MODE_DPMS_ON) {
 					DRM_DEBUG_KMS("connector dpms not on, full mode switch\n");
 					mode_changed = true;
 				}
+
+				break;
 			}
 		}

I just changed the break in the code like before the patch and after recompiling
the 3.15-rc2 the suspend resume work again.
But like i said i'm just an average user so sorry if i did something wrong.
I attached a full copy paste from gitk for the commit 25f397a429dfa43f22c278d0119a60a343aa568f
Comment 15 Tasev Nikola 2014-04-29 12:31:22 UTC
Created attachment 134191 [details]
commit 25f397a429dfa43f22c278d0119a60a343aa568f from gitk
Comment 16 Tasev Nikola 2014-04-29 12:41:21 UTC
For bisecting i did this:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-git

git bisect start | tee -a /root/bisect.log
git bisect bad v3.15-rc1 | tee -a /root/bisect.log
git bisect good v3.14 | tee -a /root/bisect.log

Then i compile the kernel each time with 
CONCURRENCY_LEVEL=4 make-kpkg --initrd kernel_image  modules_image

After testing the kernel 

git bisect good or git bisect bad

then make clean and build again, testing an so on.
Comment 17 Tasev Nikola 2014-05-05 16:22:12 UTC
Hi, 

I just tested 3.15-rc4 today and the bug is still there.

Is it something that i could do/test to help debug this ?
Comment 18 Tasev Nikola 2014-05-12 14:25:12 UTC
Hi

Just tested now 3.15-rc5, still not working.
Comment 19 William Shuman 2014-05-12 14:26:11 UTC
try the patch mentioned in https://bugzilla.kernel.org/show_bug.cgi?id=75651
Comment 20 Tasev Nikola 2014-05-13 14:56:02 UTC
I tried the patch just in case but it didn't help.
Thank you anyway.
Comment 21 Daniel Vetter 2014-05-15 15:17:59 UTC
Ok, the offending commit is actually 177cf92de4aa97ec1435987e91696ed8b5023130, at least that one matches the diff of your change. It references the other commit, but that's not the commit itself.

For debugging it might be useful to do the suspend partially. You can disable certain parts of suspend (it will immediately resume if you do this)

# echo <mode> > /sys/power/pm_test

See

# cat /sys/power/pm_test

for a list of all possible values. If you manage to reproduce the bug with this please attach the drm.debug=0xe dmesg.
Comment 22 Tasev Nikola 2014-05-25 18:30:40 UTC
Hi Daniel,

I didn't notice your last reply until today when i was testing
the 3.15-rc6 kernel to report that is still not working.

cat /sys/power/pm_test give me :
[none] core processors platform devices freezer

So i echo the different mode to /sys/power/pm_test one by one 
and then do pm-suspend every time but i could not reproduce the bug,
the computer resume normaly after 3-4 seconds.

If you need the dmesg after every different suspend resume i can
send it to you, just tell me.

Just for info, i also test the rc6 kernel with reverting the patch
in comment 14 and it works ok after resume.
Comment 23 Tasev Nikola 2014-05-27 12:20:50 UTC
Hi 

I try again different value for /sys/power/pm_test with the
3.15-rc7 kernel and i notice that with devices selected for
/sys/power/pm_test it took long time to resume (12-15 seconds).

Looking into dmesg i can see GPU lockup and uvd errors (at 363.74 line).
I don't no if this is related to my resume problem.
Dmesg is attached.
Comment 24 Tasev Nikola 2014-05-27 12:22:39 UTC
Created attachment 137481 [details]
dmesg with devices for sys-power-pm-test
Comment 25 Daniel Vetter 2014-05-30 08:13:33 UTC
Created attachment 137721 [details]
resume fbcon later

Hm, somehow forgotten to attach this yesterday. Please test this patch instead of any reverts.
Comment 26 Tasev Nikola 2014-06-01 08:09:22 UTC
(In reply to Daniel Vetter from comment #25)
> Created attachment 137721 [details]
> resume fbcon later
> 
> Hm, somehow forgotten to attach this yesterday. Please test this patch
> instead of any reverts.

The patch work's fine.
Thank you
Comment 27 Daniel Vetter 2014-06-18 14:59:13 UTC
Patch is merged upstream, thanks for the report and testing.