Bug 200517

Summary: Vega 8/Radeon 535 hybrid graphics - amdgpu crash on modesetting
Product: Drivers Reporter: bakarichard91
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher, peter
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.18.0rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
lspci -nn
dmesg full
possible fix

Description bakarichard91 2018-07-17 01:31:33 UTC
Created attachment 277363 [details]
dmesg

Notebook: Acer A315-41G-*

4.* kernels start only using these parameters:
amdgpu.runpm=0 radeon.modeset=0 or compiled without VGA switcheroo.

lspci and dmesg attached

Even with these parameters significant heating occurs however this can be fixed by running "sudo echo OFF > /sys/kernel/debug/vgaswitcheroo/switch" command.

It would be the best if both of them worked correctly like on win10.
Comment 1 bakarichard91 2018-07-17 01:34:04 UTC
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15d0
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 15d1
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3
00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3
00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15db
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15dc
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15e8
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15e9
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ea
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15eb
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ec
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ed
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ee
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ef
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] (rev d1)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader (rev 01)
02:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)
03:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 31)
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega [Radeon Vega 8 Mobile] (rev c4)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device 15de
04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 15df
04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15e0
04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15e1
04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Device 15e3
05:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61)
Comment 2 Alex Deucher 2018-07-17 15:04:33 UTC
Duplicate of:
https://bugs.freedesktop.org/show_bug.cgi?id=105760
Comment 3 Alex Deucher 2018-07-17 15:05:56 UTC
Please attach your full dmesg output from boot and `lspci -nn` output.
Comment 4 bakarichard91 2018-07-17 15:47:30 UTC
Hi,

Sorry for the duplication. It's worth to mention that APIC problems occured too which have been fixed using kernel parameters. (manual pci addressing)
Comment 5 bakarichard91 2018-07-17 15:48:16 UTC
Created attachment 277371 [details]
lspci -nn
Comment 6 bakarichard91 2018-07-17 15:48:52 UTC
Created attachment 277373 [details]
dmesg full
Comment 7 Alex Deucher 2018-07-17 15:55:39 UTC
Created attachment 277375 [details]
possible fix

Does this patch fix the issue?
Comment 8 bakarichard91 2018-07-17 19:35:35 UTC
Hi, I've patched it and it loads perfectly without any additional parameters. I'm checking the temperature now.

Will this ever be merged to the "master"?

Thanks for the help. The support of linux is better than than support of any commerical product.
Comment 9 Alex Deucher 2018-07-17 19:38:11 UTC
(In reply to bakarichard91 from comment #8)
> Hi, I've patched it and it loads perfectly without any additional
> parameters. I'm checking the temperature now.
> 
> Will this ever be merged to the "master"?

Assuming it fixes the issue, I'll go ahead and apply it to upstream and stable kernels.
Comment 10 Alex Deucher 2018-07-17 19:40:14 UTC
(In reply to Alex Deucher from comment #2)
> Duplicate of:
> https://bugs.freedesktop.org/show_bug.cgi?id=105760

This is apparently not a duplicate of that bug.  Similar symptoms, but different root cause.
Comment 11 bakarichard91 2018-07-17 20:58:09 UTC
I played Vietcong but the GPU hasn't switched, is that okay? Is there any way to monitor frequencies? However temperature seems to be OK now.
Comment 12 Alex Deucher 2018-07-18 14:17:25 UTC
If you are using X, you need to use xrandr to setup the dGPU for render offload.  E.g., see:
https://wiki.archlinux.org/index.php/PRIME
or similar pages.
Comment 13 bakarichard91 2018-07-18 15:00:27 UTC
That is clear, thanks.

Another topic, you seems to be competent to ask your opinion: what do you think why are my gpu and cpu core temps ~5-8C higher than on Windows 10. This is the best what open/free drivers can produce? Tlp is in use of course.
If I know well, my Acer notebook's power management is totally acpi controlled however the DSDT table was broken in Linux and I had to manually addressing two pci/apic controllers to even start the kernel.
Do you have any suggestion what I should do not to be my legs burnt.
Thanks.
Comment 14 Alex Deucher 2018-07-18 15:18:18 UTC
On the GPU side, we are working on enabling gfxoff and stutter mode which should save some additional power.  They are not enabled by default yet, but should be in the near future.  For the platform side (CPU, FCH, etc.), you may want to run something like powertop and see what can be done to save power on things like USB, SATA, etc.
Comment 15 bakarichard91 2018-07-28 17:48:20 UTC
Hi Alex, kernel 4.18rc6 works perfectly. This can be closed I think. Thanks again.
(Power problems were fixed with patching DSDT/SSDT tables)
Comment 16 Peter Wu 2018-08-29 21:12:18 UTC
Hi Richard,

In an attempt to understand the problem better, could you attach these two files:

sudo acpidump > acpidump.txt
sudo lspci -nnvvvxxxx > lspci-nnvvvxxxx.txt