A regression has been introduced in between 5.1.21 and 5.2.0. This regression is still present to this day (5.14.0). Since 5.2.0, if I boot my computer, within 5 minutes it will suddenly freeze with little blue dots all over the screen. There is nothing to do except reboot the computer. My hardware is: CPU: AMD Ryzen 3 2200G Motherboard: ASRock B450M-HDV Motherboard BIOS version: 1.10 (I'm not sure) GPU: None (I use the integrated GPU in the CPU) I managed to git bisect the faulty commit: 005440066f929ba0dca8f4e0aebfbf8daac592cc is the first bad commit commit 005440066f929ba0dca8f4e0aebfbf8daac592cc Author: Huang Rui <ray.huang@amd.com> Date: Wed Mar 13 20:21:00 2019 +0800 drm/amdgpu: enable gfxoff again on raven series (v2) This patch enables gfxoff and stutter mode again, since we take more testing on raven series. For raven2 and picasso, we can enable it directly. And for raven, we need check the RLC/SMC ucode version cannot be less than #531/0x1e45. v2: add smc version checking for raven. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Tested-by: Likun Gao <Likun.Gao@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> :040000 040000 76a7156f7ff7f32be629f1dffe761499360e49f7 f903deb8648b1a3dbe98fe15a78661bc6646cadd M drivers If you have questions or need more information, I am at your disposal.
СС'ing Huang Rui
Please attach your full dmesg output. Does updating to the newest firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu help?
Created attachment 298909 [details] dmesg output
I attached my dmesg output. Unfortunately, using the latest firmware did not improve the situation.
Please make sure your kernel has this patch: commit 7af2a5771e0918cdadb1614c1f81dd67a58e00aa Author: Alex Deucher <alexander.deucher@amd.com> Date: Wed Jan 15 12:26:51 2020 -0500 drm/amdgpu: attempt to enable gfxoff on more raven1 boards (v2) Switch to a blacklist so we can disable specific boards that are problematic. v2: make the blacklist non-raven specific. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
I cloned the stable linux repository at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git. I tried two kernel versions: - git checkout v5.14.7 - git checkout 7af2a5771e0918cdadb1614c1f81dd67a58e00aa Both exhibit the problem.
Any update?
(In reply to pierre.o.tardif from comment #7) > Any update? Commit 005440066f929ba0dca8f4e0aebfbf8daac592cc enabled the gfxoff feature which was apparently problematic on your board. Commit 7af2a5771e0918cdadb1614c1f81dd67a58e00aa disables gfxoff again on your board. So it seems to be some other issue I guess.
I investigated the problem some more. Commit 005440066f929ba0dca8f4e0aebfbf8daac592cc enabled gfxoff, but it /also/ enabled stutter mode. It changed this line: /* OverDrive(bit 14),gfxoff(bit 15),stutter mode(bit 17) disabled by default*/ uint amdgpu_pp_feature_mask = 0xfffd3fff; to this line: /* OverDrive(bit 14) disabled by default*/ uint amdgpu_pp_feature_mask = 0xffffbfff; Two bits were changed: the gfxoff bit, and the stutter mode bit. I did a `git checkout 005440066f929ba0dca8f4e0aebfbf8daac592cc`, I reset the stutter mode bit, meaning that I changed it to this line: uint amdgpu_pp_feature_mask = 0xfffdbfff; I recompiled the kernel and it works! So indeed the problem seems to not be gfxoff, but stutter mode. I don't know what gfxoff or stutter mode do, but maybe stutter mode should be disabled on my particular setup?
What should I do if I want to get this fixed? Should I try to write a patch? Will somebody review my patch if I manage to write one?
Created attachment 299287 [details] possible fix This patch should fix it.
I confirm that the patch does work!