Bug 70411

Summary: 7970M PRIME: lockup with Distance game, recovery doesn't work
Product: Drivers Reporter: Christoph Haag (haagch.christoph)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13-3.14-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg of radeon getting stuck

Description Christoph Haag 2014-02-11 21:38:15 UTC
Created attachment 125651 [details]
dmesg of radeon getting stuck

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M]

Using PRIME to render on the gpu etc.

This has been going on for a while and I sadly don't really know since when, maybe later I'll test older kernels.

It has worked some time ago (again, no specifics, but I think it was 3.12 (without dpm, maybe it's relevant).

I report this because this is specifically happening with the Distance game in the Alpha version only available to kickstarter backers (I believe). http://survivethedistance.com/  (Surely there is a way to get this into the hands of driver developers if it is useful for finding out if more GPUs and configurations are affected and fixing bugs!)

Well, it's alpha so it's maybe buggy, but it shouldn't bring the radeon driver down to an unrecoverable status... (I believe restarting X doesn't help and rmmod radeon doesn't really work currently)
I don't really see lockups with other 3D programs so this is specifically for Distance.

It's reproducible every time so I can later get more detail. For now there's the dmesg: Starting the game and showing some intro works, but shortly before displaying the menu X freezes for a few seconds and after it is usable again, I see the stuff starting at 82688.801309 in dmesg. The game window is still there, but frozen and killable with -9.

Unfortunately the gpu has not recovered and e.g. trying to run DRI_PRIME=1 glxgears results in the message:

radeon: The kernel rejected CS, see dmesg for more information.
Comment 1 Alex Deucher 2014-02-12 03:49:03 UTC
This is most likely a 3D driver (mesa) issue rather than a kernel issue.  Did you update your 3D driver recently?  If not, can you bisect?
Comment 2 Michel Dänzer 2014-02-12 07:58:53 UTC
Does setting the environment variable R600_DEBUG=nohyperz for running the game work around the problem?
Comment 3 Christoph Haag 2014-02-12 08:20:05 UTC
Maybe it would have been good to mention, mesa is a relatively current git version a21552a.



(In reply to Michel Dänzer from comment #2)
> Does setting the environment variable R600_DEBUG=nohyperz for running the
> game work around the problem?

Yes, with R600_DEBUG=nohyperz the game works and there is no lockup.

I guess it's possible that it stopped working when HyperZ was enabled by default.

Still, this lockup is not nice since currently only a reboot makes the radeon gpu usable again.
Comment 4 Alex Deucher 2014-02-12 14:59:54 UTC
(In reply to Christoph Haag from comment #3)
> 
> Still, this lockup is not nice since currently only a reboot makes the
> radeon gpu usable again.

Welcome to GPUs.  Unfortunately, there's not really a whole lot we can do other than disabling hyperz or fixing the hyperz setup in mesa that's causing the problem.  Either way this isn't really a kernel bug.
Comment 5 Christoph Haag 2014-05-26 23:24:41 UTC
Well then, no reason to keep this open.