Bug 200869

Summary: UVD cause amdgpu crash on CIK: [amdgpu]] UVD not responding, trying to reset the VCPU
Product: Drivers Reporter: Janpieter Sollie (janpieter.sollie)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.17.17 Subsystem:
Regression: No Bisected commit-id:
Attachments: attachments from the firepro machine

Description Janpieter Sollie 2018-08-20 18:47:08 UTC
Created attachment 277985 [details]
attachments from the firepro machine

no VDPAU or VAAPI is possible with my firepro w5100 card, using a bonaire chip.
The module needs to be loaded 3 times before the card is initialized successfully.  3 times is not a random number: it's every time like that:
- First time, the UVD is failing,
- Second time, the SDMA is failing
- Third time, the GFX ring buffer test fails, but the GPU works
when using HW acceleration, the GPU crashes again, and the process turns to zombie
View kernel log for more details
Attached:
- dmesg log
- Xorg log
- ps aux
Comment 1 Janpieter Sollie 2018-08-21 06:18:33 UTC
Small update:

When booting the module with the following parameters:
amdgpu si_support=0 gpu_recovery=1 dpm=1 dc=0

I still need to load/unload the module 3 times, but it works for non-1080 videos.
For 1080 videos, the mplayer already draws a black screen (didn't do that before), but does not show anything in the black window.  
Killing the mplayer process with kill -9 also kills the process properly.
These steps do not result in any extra information in dmesg, though.
Comment 2 Janpieter Sollie 2018-08-21 18:34:34 UTC
2nd update:
after a double check of /sys/module/amdgpu, it turned out that the parameters for amdgpu were not applied at all when put in /etc/modprobe.d/options. I turned lilo into the following startup string:
amdgpu.dc=1 amdgpu.audio=1 amdgpu.si_support=0
which seems to work properly.  Sorry for the incorrect bug reporting.