Bug 200869 - UVD cause amdgpu crash on CIK: [amdgpu]] UVD not responding, trying to reset the VCPU
Summary: UVD cause amdgpu crash on CIK: [amdgpu]] UVD not responding, trying to reset ...
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-20 18:47 UTC by Janpieter Sollie
Modified: 2018-08-21 18:34 UTC (History)
0 users

See Also:
Kernel Version: 4.17.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments
attachments from the firepro machine (130.00 KB, application/x-tar)
2018-08-20 18:47 UTC, Janpieter Sollie
Details

Description Janpieter Sollie 2018-08-20 18:47:08 UTC
Created attachment 277985 [details]
attachments from the firepro machine

no VDPAU or VAAPI is possible with my firepro w5100 card, using a bonaire chip.
The module needs to be loaded 3 times before the card is initialized successfully.  3 times is not a random number: it's every time like that:
- First time, the UVD is failing,
- Second time, the SDMA is failing
- Third time, the GFX ring buffer test fails, but the GPU works
when using HW acceleration, the GPU crashes again, and the process turns to zombie
View kernel log for more details
Attached:
- dmesg log
- Xorg log
- ps aux
Comment 1 Janpieter Sollie 2018-08-21 06:18:33 UTC
Small update:

When booting the module with the following parameters:
amdgpu si_support=0 gpu_recovery=1 dpm=1 dc=0

I still need to load/unload the module 3 times, but it works for non-1080 videos.
For 1080 videos, the mplayer already draws a black screen (didn't do that before), but does not show anything in the black window.  
Killing the mplayer process with kill -9 also kills the process properly.
These steps do not result in any extra information in dmesg, though.
Comment 2 Janpieter Sollie 2018-08-21 18:34:34 UTC
2nd update:
after a double check of /sys/module/amdgpu, it turned out that the parameters for amdgpu were not applied at all when put in /etc/modprobe.d/options. I turned lilo into the following startup string:
amdgpu.dc=1 amdgpu.audio=1 amdgpu.si_support=0
which seems to work properly.  Sorry for the incorrect bug reporting.

Note You need to log in before you can comment on or make changes to this bug.