Bug 18242 - kernel should forbid app from using GPU if it causes lockups
kernel should forbid app from using GPU if it causes lockups
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel)
All Linux
: P1 enhancement
Assigned To: drivers_video-dri
Depends on:
  Show dependency treegraph
Reported: 2010-09-11 07:28 UTC by Török Edwin
Modified: 2012-05-12 15:49 UTC (History)
1 user (show)

See Also:
Kernel Version:
Tree: Mainline
Regression: No


Description Török Edwin 2010-09-11 07:28:52 UTC
During the testing of r600g (still in development) it happens that some mesa demos/games lockup the GPU. 
Most of the time the kernel recovers from this nicely, however the app will lockup the GPU over and over again.

It doesn't even know the correct process, for example it was the gloss demo (using direct rendering) this time:
[ 3070.276433] GPU lockup (waiting for 0x0003708B last fence id 0x00037088)
[ 3070.276467] Pid: 4274, comm: Xorg Not tainted 2.6.36-rc3-phenom #96

It could have been an app using LIBGL_ALWAYS_INDIRECT=1 though, so just killing the app causing the lockup is not a good idea (could easily lead to getting X killed).

I think the kernel should forbid the app from sending any more GPU commands (perhaps by returning failure for every ioctl it does?) once it determines it locked up. So in this case it'd first forbid Xorg, then see there is another lockup, then forbid gloss.
It should probably print a message like:
Process '<processname>' (pid <pid>) caused a GPU lockup, forbidding GPU commands for 'N minutes'. To reenable do 'echo <pid> >/sys/kernel/..../gpu_reenable'.

Of course it'd be best if the kernel wouldn't accept the GPU commands leading to a GPU lockup, but that might not be possible to determine in general (whether certain GPU instructions will cause a lockup or not).
Comment 1 Alan 2012-05-12 15:48:53 UTC
Closing ..  doesn't tell anyone anything they don't already know

Note You need to log in before you can comment on or make changes to this bug.