Bug 69901 - intel ivy bridge/radeonsi PRIME hang since 3.14
Summary: intel ivy bridge/radeonsi PRIME hang since 3.14
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-03 13:50 UTC by Christoph Haag
Modified: 2014-02-10 11:39 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.14-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
sysprof output: X hanging after rendering with PRIME (20.39 KB, application/octet-stream)
2014-02-03 13:51 UTC, Christoph Haag
Details
sysprof output: kwin hanging after rendering with PRIME (75.91 KB, application/octet-stream)
2014-02-03 13:52 UTC, Christoph Haag
Details
Patch that may fix the problem (986 bytes, patch)
2014-02-05 08:24 UTC, Thomas Hellstrom
Details | Diff

Description Christoph Haag 2014-02-03 13:50:26 UTC
I have these two gpus in my laptop:

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M]

With xrandr --setprovideroffloadsink radeon Intel and then DRI_PRIME=1 glxgears it works fine on 3.13.

On 3.14-rc1 it works for a short time, but then hangs occur.

When trying with kwin it is kwin who is hanging, when trying with compton it is actually X.
Hanging means they are unkillable, use 100% CPU (all red in htop detailed view) and the graphical output in X is completely blocked. It seems to be luck whether switching to a tty works.

I filed this with intel because sysprof showed the cpu usage to originate from libdrm_intel.so when kwin hang. In the other sysprof log I did not see anything from intel, so maybe it's not actually intel's problem.
Comment 1 Christoph Haag 2014-02-03 13:51:39 UTC
Created attachment 124311 [details]
sysprof output: X hanging after rendering with PRIME
Comment 2 Christoph Haag 2014-02-03 13:52:23 UTC
Created attachment 124321 [details]
sysprof output: kwin hanging after rendering with PRIME
Comment 3 Chris Wilson 2014-02-03 21:21:44 UTC
It looks to be memory corruption striking the shmemfs used to back swappable GEM objects (in both drivers). In both profiles, it is a deferred file cleanup hitting an infinite loop (my guess is that the cleanup itself is started by an OOPS and SIGKILL). So, it looks like the stuck CPU is another symptom. Please enable all the mm/vm and lockdep kernel debugging options and see if that generates clue.
Comment 4 Christoph Haag 2014-02-04 16:22:56 UTC
I first did a bisect and I think (!) this is the result:

58aa6622d32af7d2c08d45085f44c54554a16ed7 is the first bad commit
Comment 5 Thomas Hellstrom 2014-02-04 17:25:26 UTC
This is probably TTM clearing page::mapping and page::index members of the Intel pages. I don't have time to put together a patch tonight, but probably tomorrow.

/Thomas
Comment 6 Thomas Hellstrom 2014-02-05 08:24:00 UTC
Created attachment 124621 [details]
Patch that may fix the problem

Could you try the attached patch out to see if it fixes the problem?
Comment 7 Christoph Haag 2014-02-05 11:32:17 UTC
Yes it fixes it, no lock ups anymore.
Comment 8 Thomas Hellstrom 2014-02-05 12:00:30 UTC
Great. I'll include the patch in my next pull request.
Comment 9 Christoph Haag 2014-02-10 11:39:49 UTC
Thanks, fixed in rc2.

Note You need to log in before you can comment on or make changes to this bug.