Bug 116101

Summary: "RIP radeon_gem_va_ioctl+0x35/0x650", "Userspace still has active objects", and "trying to unbind memory from uninitialized GART !" when unbinding from radeon
Product: Drivers Reporter: Joe P. (MonopolyMan720)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: normal CC: deathsimple
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.5.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: Journalctl output

Description Joe P. 2016-04-09 20:39:18 UTC
Created attachment 212291 [details]
Journalctl output

I am attempting to unbind my R9 290 from radeon and rebind it to vfio. However, the entire system hangs when "echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind" is ran as root. 

Steps to Reproduce: 
Run the following script as root

#!/bin/bash
set -x
echo "1002 67b1" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
echo "1002 67b1" > /sys/bus/pci/drivers/vfio-pci/remove_id

echo "1002 aac8" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "0000:01:00.1" > /sys/bus/pci/devices/0000:01:00.1/driver/unbind
echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind
echo "1002 aac8" > /sys/bus/pci/drivers/vfio-pci/remove_id

set +x

Actual Results: 
System hangs on "echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind"

Journalctl shows "Apr 09 16:10:26 joey-arch-pc kernel: radeon 0000:01:00.0: Userspace still has active objects !" followed by numerous "Apr 09 16:10:26 joey-arch-pc kernel: trying to unbind memory from uninitialized GART !"

Expected Results: 

The GPU should be unbinding from radeon and rebinding to vfio-pci without hanging. 

Additional information:
I experienced the same issue on 4.1.20-1. 

I know of instances where the same script results in the expected outcome with a Cayman PRO graphics card. 

Attached is the output of journalctl for the entire boot. Go to 16:10:26 to see relevant call traces.
Comment 1 Christian König 2016-04-11 07:36:33 UTC
Well as the error message already suggests "Userspace still has active objects !" you have an application which is still using the hardware.

So unbinding and rebinding it to the isn't possible in this state.
Comment 2 Joe P. 2016-04-11 14:46:26 UTC
Right now the radeon card isn't being used for anything. I am using my Intel iGPU as my output but using PRIME offloading to play games via the 290 (which works fine). Sorry, I should have specified that in the original report.
Comment 3 Joe P. 2016-04-11 14:54:03 UTC
(In reply to Christian König from comment #1)
> Well as the error message already suggests "Userspace still has active
> objects !" you have an application which is still using the hardware.
> 
> So unbinding and rebinding it to the isn't possible in this state.

Just to show that it's working 

$ glxinfo | grep "OpenGL renderer"
OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Server 

$ DRI_PRIME=1 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: Gallium 0.4 on AMD HAWAII (DRM 2.43.0, LLVM 3.7.1)
Comment 4 Christian König 2016-04-11 18:43:21 UTC
It doesn't matter what OpenGL driver you currently use for the desktop. The hardware is in use by something so unbinding can't work correctly.

Try "sudo lsof /dev/dri/card1" (or whatever number your Radeon card is) to figure out which process it is using.

Might be that X has opened for prime support or something without your knowledge.

Another possibility is that a framebuffer device is still bound to it.
Comment 5 Joe P. 2016-04-12 23:26:41 UTC
(In reply to Christian König from comment #4)
> It doesn't matter what OpenGL driver you currently use for the desktop. The
> hardware is in use by something so unbinding can't work correctly.
> 
> Try "sudo lsof /dev/dri/card1" (or whatever number your Radeon card is) to
> figure out which process it is using.
> 
> Might be that X has opened for prime support or something without your
> knowledge.
> 
> Another possibility is that a framebuffer device is still bound to it.

Wow, can't believe it was that simple. I checked and X was using the device. I  set the ignore option of the 290 to true in the X configuration file and it worked fine. X wasn't using the device and I am still able to use PRIME offloading. I'm having some issues when using PRIME, but that's a separate issue and I can now unbind the device. Thanks. Marking the bug as invalid.
Comment 6 Michel Dänzer 2016-04-13 00:25:32 UTC
There's still a bug though — if unbinding can't work, it should be refused upfront, not attempted anyway, breaking things.