Bug 107381 - radeon VCE init error (-110) -- AMD/Intel Mars Hybrid Graphics
Summary: radeon VCE init error (-110) -- AMD/Intel Mars Hybrid Graphics
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-06 17:09 UTC by Andrew Schmadel
Modified: 2017-08-29 10:32 UTC (History)
23 users (show)

See Also:
Kernel Version: 4.3
Tree: Mainline
Regression: No


Attachments
dmesg output (9.67 KB, application/octet-stream)
2015-11-06 17:09 UTC, Andrew Schmadel
Details
dmesg output after gnomeshell locked up, crashed and restarted (13.59 KB, application/octet-stream)
2015-11-24 20:11 UTC, Jean-Pierre van Riel
Details
dmesg output after forcing webgl rendering with radeon and firefox (18.65 KB, application/octet-stream)
2015-11-24 20:12 UTC, Jean-Pierre van Riel
Details
dmesg under kernel 4.4 - archlinux x64 (64.30 KB, text/plain)
2016-03-03 15:07 UTC, Robin KERDILES
Details
Ubuntu 15.10 Kernel 4.5-rc6 (86.43 KB, text/plain)
2016-03-04 10:34 UTC, corentin.dehay
Details

Description Andrew Schmadel 2015-11-06 17:09:02 UTC
Created attachment 192261 [details]
dmesg output

Since upgrading to Ubuntu 15.10, I have encountered graphics performance issues, and have occasionally experienced lockups during boot.

I have encountered this issue on kernel 4.2.0 and 4.3.0, and it seems to have affected users on other distributions as well:

https://bugs.launchpad.net/fedora/+source/linux/+bug/1512848
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803087
https://bugzilla.redhat.com/show_bug.cgi?id=1262649

Notably, this issue seems to primarily impact users with The ATI "Mars" chipset, on machines that have an Intel/AMD hybrid graphics hardware configuration.

This shows up in dmesg (full log attached, because there's a fair amount of seemingly-useful context): 
[ 4.917369] radeon 0000:01:00.0: VCE init error (-110).


Some other context from my PC: 

$ xrandr --listproviders
Providers: number : 3
Provider 0: id: 0x6a cap: 0x9, Source Output, Sink Offload crtcs: 4 outputs: 5 associated providers: 2 name:Intel
Provider 1: id: 0x41 cap: 0x6, Sink Output, Source Offload crtcs: 2 outputs: 0 associated providers: 2 name:radeon
Provider 2: id: 0x41 cap: 0x6, Sink Output, Source Offload crtcs: 2 outputs: 0 associated providers: 2 name:radeon

$ lspci -k (trimmed to omit likely-irrelevant devices)
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
        Subsystem: Samsung Electronics Co Ltd Device c0e6
        Kernel driver in use: ivb_uncore
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
        Kernel driver in use: pcieport
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
        DeviceName: Onboard IGD
        Subsystem: Samsung Electronics Co Ltd Device c0e6
        Kernel driver in use: i915
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Mars [Radeon HD 8670A/8670M/8750M] (rev ff)
        Kernel driver in use: radeon
Comment 1 Jean-Pierre van Riel 2015-11-24 20:10:29 UTC
Similar frequent 'radeon 0000:01:00.0: VCE init error (-110)' issues for me. 

Gnome shell lag gets quite bad at times. At one point, it froze for a fairly long while, and restarted before I'd finished switching and logging into another console.

I noticed libgjs.so.0.0.0 segfaulted during the lockup.

I found the following to be an interesting way to test IGPU versus the DGPU, but it didn't trigger the bug
$ DRI_PRIME=0 vblank_mode=0 firefox http://oortonline.gl/#run
$ DRI_PRIME=1 vblank_mode=0 firefox http://oortonline.gl/#run

I was able to trigger more errors with radeon and saw this in dmesg

[68750.862570] [drm:r600_ib_test [radeon]] *ERROR* radeon: fence wait failed (-35).
[68750.862583] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on GFX ring (-35).
[68750.862588] [drm:radeon_resume_kms [radeon]] *ERROR* ib ring test failed (-35).

More info on my hardware

$  xrandr --listproviders
Providers: number : 3
Provider 0: id: 0x6e cap: 0x9, Source Output, Sink Offload crtcs: 4 outputs: 8 associated providers: 2 name:Intel
Provider 1: id: 0x42 cap: 0x6, Sink Output, Source Offload crtcs: 2 outputs: 1 associated providers: 2 name:radeon
Provider 2: id: 0x42 cap: 0x6, Sink Output, Source Offload crtcs: 2 outputs: 1 associated providers: 2 name:radeon

$ lspci -k
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
	Kernel driver in use: pcieport
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
	DeviceName:  Onboard IGD
	Subsystem: Dell Device 05be
	Kernel driver in use: i915
...
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mars XTX [Radeon HD 8790M] (rev ff)
	Kernel driver in use: radeon

$ sudo lshw -c display
  *-display UNCLAIMED     
       description: VGA compatible controller
       product: Mars XTX [Radeon HD 8790M]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list
       configuration: latency=0
       resources: memory:e0000000-efffffff memory:f7c00000-f7c3ffff ioport:e000(size=256) memory:f7c40000-f7c5ffff
  *-display
       description: VGA compatible controller
       product: 4th Gen Core Processor Integrated Graphics Controller
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 06
       width: 64 bits
       clock: 33MHz
       capabilities: msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:30 memory:f5800000-f5bfffff memory:d0000000-dfffffff ioport:f000(size=64)

To see switcheroo options
# cat /sys/kernel/debug/vgaswitcheroo/switch
0:DIS: :DynOff:0000:01:00.0
1:IGD:+:Pwr:0000:00:02.0

dmesg files attached
Comment 2 Jean-Pierre van Riel 2015-11-24 20:11:12 UTC
Created attachment 195321 [details]
dmesg output after gnomeshell locked up, crashed and restarted
Comment 3 Jean-Pierre van Riel 2015-11-24 20:12:16 UTC
Created attachment 195331 [details]
dmesg output after forcing webgl rendering with radeon and firefox
Comment 4 Robin KERDILES 2016-03-03 15:05:49 UTC
This issue still affects kernels 4.4, 4.5-rc6
Archlinux x64
Hardware : radeon HD 8750M
Comment 5 Robin KERDILES 2016-03-03 15:07:41 UTC
Created attachment 206701 [details]
dmesg under kernel 4.4 - archlinux x64
Comment 6 corentin.dehay 2016-03-04 10:34:24 UTC
Created attachment 206781 [details]
Ubuntu 15.10 Kernel 4.5-rc6

Dmesg with ubuntu 15.10 and kernel 4.5-rc6 if it can help to fix it (or not)
Comment 7 Stratos Zolotas 2016-03-06 10:52:26 UTC
I'm experiencing the same issue and I'm not on a hybrid configuration. I have a 2 VGA setup with radeon only hardware on openSUSE Tumbleweed

uname -a
Linux teras 4.4.3-1-default #1 SMP PREEMPT Fri Feb 26 09:54:10 UTC 2016 (171b8f1) x86_64 x86_64 x86_64 GNU/Linux

sudo lspci | grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R7 250/350]
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Redwood XT [Radeon HD 5670/5690/5730]

dmesg output:
[    2.750393] radeon 0000:01:00.0: VCE init error (-110).
Comment 8 chico76 2016-05-07 09:58:50 UTC
I have the same problem, tested with 4.4.8-1-lts, 4.5.1 and I have recently compiled the 4.5.3 kernel in Arch linux.


I have a hybrid set up the ARUBA card is working fine and the OLAND is the one getting the VCE init error (-110).


maj 07 10:01:43 noname kernel: radeon 0000:01:00.0: VCE init error (-110).
maj 07 10:01:44 noname kernel: [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
maj 07 10:01:44 noname kernel: [drm:si_resume [radeon]] *ERROR* si startup failed on resume
maj 07 10:01:44 noname acpid[554]: client connected from 666[0:1000]

if it helps, the error seems to be yielded from radeon/si.c after the comment:
/* allocate wb buffer */

But well, I am not qualified to look at this, haven't been programming for 10 years...
Comment 9 Stratos Zolotas 2016-05-07 10:54:31 UTC
After removing my Redwood AMD VGA and replaced with an extra Oland, now I have the error in double, with 4.5.2 kernel

sudo lspci | grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R7 250/350]
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland PRO [Radeon R7 240/340]

[    2.593576] [drm] Found VCE firmware/feedback version 50.0.1 / 17!
[    2.774303] radeon 0000:01:00.0: VCE init error (-110).
[    4.160741] [drm] Found VCE firmware/feedback version 50.0.1 / 17!
[    4.268492] radeon 0000:02:00.0: VCE init error (-110).
Comment 10 chico76 2016-05-11 19:47:14 UTC
Does anyone have any idea in wich kernel version the OLAND chip startet to fail?
Maybe i can try to bisect the kernel if I new a version when it was working..
Comment 11 bsarels 2016-06-16 07:35:16 UTC
Hello,

I'm affected too on a HP Elitebook 840 G1.

sudo lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mars [Radeon HD 8730M] (rev ff)

I'm willing to help if possible.
Comment 12 Gauthier P. 2016-10-02 19:05:22 UTC
Hello,

I can confirm the bug on a HP Elitebook 840 G1, running ArchLinux and the lasted available kernel (Linux arch 4.7.5-1-ARCH #1 SMP PREEMPT Sat Sep 24 13:04:22 CEST 2016 x86_64 GNU/Linux).

The problem also occurs on my own build of the last kernel (4.7.6).

I hope to see this problem solved.

Sincerely,
Comment 13 madbiologist 2016-11-07 11:38:25 UTC
VCE 1.0 support was added in kernel 4.2. If you are able to build your own kernel with this commit reverted it should fix the issue:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=a918efab631a5112d9d168700458317ad77f269c
Comment 14 Bhaskar 2016-12-01 12:29:52 UTC
This issue still does not seem to be fixed. My kernel version is 4.4.0-51-generic and the issue still occurs.

Here's the output of dmesg | egrep -i 'vce|error' :

[    2.057617] [drm] Found VCE firmware/feedback version 40.2.2 / 15!
[    2.271399] [drm] VCE initialized successfully.
[    4.627532] kfd kfd: error getting iommu info. is the iommu enabled?
[    4.627538] kfd kfd: Error initializing iommuv2 for device (1002:130a)
[    4.627729] kfd kfd: device (1002:130a) NOT added due to errors
[    4.752479] [drm] Found VCE firmware/feedback version 50.0.1 / 17!
[    6.235140] radeon 0000:01:00.0: VCE init error (-110).
[    7.301364] [drm:radeon_acpi_init [radeon]] *ERROR* Cannot find a backlight controller
[   20.220010] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
[   31.605179] radeon 0000:01:00.0: VCE init error (-110).
[   51.400194] radeon 0000:01:00.0: VCE init error (-110).
[   74.917029] radeon 0000:01:00.0: VCE init error (-110).



uname -a
Linux gublu 4.4.0-51-generic #72-Ubuntu SMP Thu Nov 24 18:29:54 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


sudo lspci | grep VGA
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R6 Graphics]
Comment 15 Stratos Zolotas 2016-12-01 12:36:14 UTC
I'm on 4.8.10 and still is not fixed. I have two AMD GPUS (both Oland) and the issue is appearing on both. So don't expect to see a fix on any released kernel. Haven't tested any 4.9 pre-release yet.

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R7 250/350]
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland PRO [Radeon R7 240/340]

dmesg | egrep -i 'vce|error'
[    3.297273] [drm] Found VCE firmware/feedback version 50.0.1 / 17!
[    3.456641] radeon 0000:01:00.0: failed VCE resume (-110).
[    4.878397] [drm] Found VCE firmware/feedback version 50.0.1 / 17!
[    4.985278] radeon 0000:02:00.0: failed VCE resume (-110).

uname -a
Linux teras 4.8.10-1-default #1 SMP PREEMPT Mon Nov 21 13:50:28 UTC 2016 (d1ec066) x86_64 x86_64 x86_64 GNU/Linux
Comment 16 WHAT 2016-12-30 20:01:29 UTC
openSUSE Leap 42.2 with kernel 4.4.36 is also affected, showing "radeon VCE init error (-110)" on startup, Radeon R7 M265 GPU not working (lspci -k output with (rev ff)); Unable to install proprietary fglrx driver because the installer can't find the AMD GPU.
Comment 17 Jean-Pierre van Riel 2017-01-08 12:58:42 UTC
Until VCE gets fixed for Mars/, there might be a way to disable VCE (I need to look into it more)

https://github.com/torvalds/linux/commit/fabb5935871db1f31fcd2684fd154e24de04d917#diff-9bc1b4aaf15dd521a1991717e4e2a2e0
Comment 18 Tabs 2017-01-24 08:41:30 UTC
It is possible to disable VCI in the latest drivers (commit was done on Mar 18, 2016).

For persistent changes edit file /etc/modprobe.d/radeon.conf to add the line:
options radeon vce=0

After the next reboot you can check the changes apply by using
systool -vm radeon

It stops the error messages for me, but I have the feeling that the UI is much slower (probably 2D accelaration has been disabled in the process).
Comment 19 Michel Dänzer 2017-01-24 08:56:42 UTC
(In reply to Tabs from comment #18)
> It stops the error messages for me, but I have the feeling that the UI is
> much slower (probably 2D accelaration has been disabled in the process).

You can check this in the Xorg log file.

Note You need to log in before you can comment on or make changes to this bug.