Created attachment 283823 [details] dmesg After upgrading kernel from 5.1.14 to 5.2.1 I encountered many artifacts during desktop session. Also when going from suspend state, external monitor is green and kernel crashes. See dmesg
Created attachment 283825 [details] lspci
Can you bisect?
Well, that took me some time... Looks like this is the cause... 005440066f929ba0dca8f4e0aebfbf8daac592cc is the first bad commit commit 005440066f929ba0dca8f4e0aebfbf8daac592cc Author: Huang Rui <ray.huang@amd.com> Date: Wed Mar 13 20:21:00 2019 +0800 drm/amdgpu: enable gfxoff again on raven series (v2) This patch enables gfxoff and stutter mode again, since we take more testing on raven series. For raven2 and picasso, we can enable it directly. And for raven, we need check the RLC/SMC ucode version cannot be less than #531/0x1e45. v2: add smc version checking for raven. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Tested-by: Likun Gao <Likun.Gao@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 21 +++++++++++++++++++++ drivers/gpu/drm/amd/powerplay/hwmgr/smu10_hwmgr.c | 13 ++++--------- drivers/gpu/drm/amd/powerplay/smumgr/smu10_smumgr.c | 4 ++++ 5 files changed, 33 insertions(+), 11 deletions(-)
I'm seeing the same problems when running 5.2.x that were not present in 5.1. The commit above is the source of the visual artifacts, but I believe the lockup issue was introduced later. Is there any help I can provide in testing a fix? It looks like there might have been some previous effort here: https://www.spinics.net/lists/amd-gfx/msg32192.html I created https://bugzilla.kernel.org/show_bug.cgi?id=204611 that can be used to track the lockup issue.
This issue should be fixed with this patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=98f58ada2d37e68125c056f1fc005748251879c2
(In reply to Alex Deucher from comment #5) > This issue should be fixed with this patch: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=98f58ada2d37e68125c056f1fc005748251879c2 Is this patch going to 5.2?
yes.
I applied this to 5.2.10 and I'm still seeing artifacts.
(In reply to tones111 from comment #8) > I applied this to 5.2.10 and I'm still seeing artifacts. Sorry, I realized that statement doesn't give much context to work with. My system has an R5 2500U. lspci shows the following: 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4) (prog-if 00 [VGA controller]) Subsystem: Lenovo Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] Flags: bus master, fast devsel, latency 0, IRQ 51 Memory at b0000000 (64-bit, prefetchable) [size=256M] Memory at c0000000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at c0600000 (32-bit, non-prefetchable) [size=512K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/4 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable- Count=3 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [200] Resizable BAR <?> Capabilities: [270] Secondary PCI Express <?> Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [320] Latency Tolerance Reporting Kernel driver in use: amdgpu Kernel modules: amdgpu
Hello everyone, would the artefacts look like on this picture, or am I having a different issue? http://e-x-a.org/stuff/amdgpu-artefacts.jpg (Taken with a phone, as the artefacts are not screenshottable.) The squares appear around small stuff that changes (esp. terminal text) and disappear in around half a second. Notably, they are only seen in xfce (suspect compositor is needed); not in LightDM (which does not do composition) nor around any frequently refreshed/accelerated surface (glxgears and animations in forefox are clean.) Mine is: 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c3) (prog-if 00 [VGA controller]) Subsystem: Lenovo Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at b0000000 (64-bit, prefetchable) [size=256M] Memory at c0000000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at c0800000 (32-bit, non-prefetchable) [size=512K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/4 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable- Count=3 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [200] Resizable BAR <?> Capabilities: [270] Secondary PCI Express <?> Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [320] Latency Tolerance Reporting Kernel driver in use: amdgpu Kernel modules: amdgpu The problem happens on all 5.2 kernels I tried (from debian). "Debian stable" 4.19 and one 5.1 I tried are OK. If this is a different kind of artifacts, please let me know (I'd open a different kind of bug.) Thanks in advance! -mk
Some good news. After a bios update to Lenovo's E485/E585 1.54 I no longer need to provide additional boot arguments in order for the machine to come up and the visual artifacts have gone away. I would see issues with some fonts in Firefox that looked similar to your screenshot. The easiest way for me to reproduce the problem was to resize my terminal (Alacritty) or scroll around in gitk or gvim. After a few days running on the new bios I haven't seen the artifacts, so this bug looks to be resolved for me since kernel 5.2.11. Thanks!
That sounds great, thank you very much for the information and confirmation. I will try to update the BIOS and confirm ASAP.
After the BIOS upgrade the kernel parameters can be removed, but the kernel (5.2.16) now locks up when entering XFCE (it survives lightdm though). The error is almost same as as in the posted dmesg; I'll attach mine with backtraces in a few seconds. Highlights: This gets printed out before each warning: [ 66.159175] [drm] pstate TEST_DEBUG_DATA: 0x36F60000 R08 gets increased by some value between 49 and 56 after each next warning (the value is sometimes in R10) Userspace seems working otherwise (the logs are from syslog), just the display won't show anything. I will try a few other kernels available for debian and eventually bisect.
Created attachment 285069 [details] syslog from 5.2.16 with warnings
I updated kernel to 5.3 and problem disappeared. I did not update bios or anything like that. Perhaps the problem you guys are facing is different than originally reported.