Bug 88921

Summary: System reboots upon suspending when Radeon 5770 card is used
Product: Drivers Reporter: Eduard Bloch (blade)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: normal CC: aaron.lu, alexdeucher
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13.rc* Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci -v output
dmesg output
dmesg covering X server lockup

Description Eduard Bloch 2014-11-25 20:03:26 UTC
Created attachment 158831 [details]
lspci -v output

Hello,

I have a workstation with ASRock A785GXH/128M mainboad (latest BIOS) and Phenom II X4 955 Processor, running Debian Unstable (amd64). It has an onboard IGP and without an additional video card, this solution works just fine even with recent kernels, i.e. S3 suspending/resuming works without any glitches.

The trouble starts when I add my Radeon 5770 video card. Prior to kernel 3.13.rc-something it worked OK, i.e. S3 suspend has put the system down, power LED started blinking and after approximately 5 seconds the fans stopped (that extra delay was always slightly different to Windows' but I didn't care much).

After the said version, the system has a different behavior. Instead of turning off the fans, the whole system suddenly REBOOTS! In some of the latest versions (maybe 3.16, not sure) the ~5s delay disappeared so the system now reboots even sooner (*sic*).

The behavior is also slightly random: sometimes suspend did work once but the next time it rebooted, sometimes even twice, but that lucky cases were rare and not reproducible. The odds seem to vary with kernel versions but this is maybe just imagination or some side effect from other software components (Xorg has been upgraded in the meantime).

I tried to bisect the issue once before but the result was inconclusive, probably because of the mentioned Heisenbug nature. I could narrow it down to a potential range of commits but unfortunately one of them was a huge DRM code update which caused lots of build errors and it was hard to follow.

I am not sure whether the delay happens with onboard video too but I could test it if someone wants to know.
Comment 1 Aaron Lu 2014-12-22 07:42:02 UTC
Looks like DRM related, re-assign there.
Comment 2 Alex Deucher 2014-12-22 16:49:36 UTC
Can you bisect?  Please also attach your dmesg output.
Comment 3 Eduard Bloch 2014-12-23 15:25:42 UTC
Created attachment 161711 [details]
dmesg output

As mentioned, I tried to bisect a couple of months ago without much luck, either because of compiling problems or the Heisenbug behavior.

In the meantime, I removed the no_console_suspend argument from the kernel command line (which I originally added while doing the bisecting work) and now the suspend act seems to be much more stable; I think I haven't seen the failure again since I switched to 3.18rc7 a couple of weeks ago (and now at 3.18.1).

That might be pure coincidence, or maybe something related has changed (I also upgraded Debian in the meantime), I cannot tell the exact reason right now. I could try to add the no_console_suspend switch after Xmas to check whether the problem returns.
Comment 4 Eduard Bloch 2014-12-28 22:48:36 UTC
Created attachment 162051 [details]
dmesg covering X server lockup

This becomes more and more weird. Sometimes I can boot the system and I can suspend it five times per day with no issues. Sometimes it reboots as described in the original description.

And sometimes it seems to hang for about 10 seconds before the fans are turned off (this feels like with pre-3.13rcX kernels). When I resume it, the X picture is frozen. But I still can switch to the FB console and kill X from there. However, when X is restarted, it feels slightly sluggish, like having no accelleration in some applications. I haven't tested movie playback or other things in that mode yet, and Firefox freezes at 100% CPU time when I visit some sites like *sic* kernel bugzilla.

All that obvservations with the same kernel 3.18.1. I added the dmesg look again, it contains something about r600 exception which looked related.
Comment 5 Eduard Bloch 2015-09-10 17:31:39 UTC
In the meantime, I replaced the gfx card with an Nvidia GTX750 and it just works, with nouveau or nvidia. Nouveau crashes unless GLX is disabled, though, but there are no such glitches WRT power management anymore.

If some kernel hacker wants this Radeon card for investigation, please let me know. I am closing this bug now.