Bug 214417 - Regression with AMD Ryzen 3 2200G IGPU. Everything freezes within 5 minutes of boot.
Summary: Regression with AMD Ryzen 3 2200G IGPU. Everything freezes within 5 minutes o...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(Other) (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: drivers_video-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-15 17:34 UTC by pierre.o.tardif
Modified: 2021-10-23 04:44 UTC (History)
2 users (show)

See Also:
Kernel Version: From 5.2.0 to 5.14.0
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg output (118.30 KB, text/plain)
2021-09-22 04:07 UTC, pierre.o.tardif
Details
possible fix (2.30 KB, patch)
2021-10-20 20:49 UTC, Alex Deucher
Details | Diff

Description pierre.o.tardif 2021-09-15 17:34:24 UTC
A regression has been introduced in between 5.1.21 and 5.2.0. This regression is still present to this day (5.14.0).

Since 5.2.0, if I boot my computer, within 5 minutes it will suddenly freeze with little blue dots all over the screen. There is nothing to do except reboot the computer.

My hardware is:

CPU: AMD Ryzen 3 2200G
Motherboard: ASRock B450M-HDV
Motherboard BIOS version: 1.10 (I'm not sure)
GPU: None (I use the integrated GPU in the CPU)

I managed to git bisect the faulty commit:


005440066f929ba0dca8f4e0aebfbf8daac592cc is the first bad commit
commit 005440066f929ba0dca8f4e0aebfbf8daac592cc
Author: Huang Rui <ray.huang@amd.com>
Date:   Wed Mar 13 20:21:00 2019 +0800

    drm/amdgpu: enable gfxoff again on raven series (v2)
    
    This patch enables gfxoff and stutter mode again, since we take more testing on
    raven series. For raven2 and picasso, we can enable it directly. And for raven,
    we need check the RLC/SMC ucode version cannot be less than #531/0x1e45.
    
    v2: add smc version checking for raven.
    
    Signed-off-by: Huang Rui <ray.huang@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
    Tested-by: Likun Gao <Likun.Gao@amd.com> (v2)
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 76a7156f7ff7f32be629f1dffe761499360e49f7 f903deb8648b1a3dbe98fe15a78661bc6646cadd M	drivers


If you have questions or need more information, I am at your disposal.
Comment 1 Artem S. Tashkinov 2021-09-19 12:02:58 UTC
СС'ing Huang Rui
Comment 2 Alex Deucher 2021-09-21 13:36:36 UTC
Please attach your full dmesg output.  Does updating to the newest firmware:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu
help?
Comment 3 pierre.o.tardif 2021-09-22 04:07:01 UTC
Created attachment 298909 [details]
dmesg output
Comment 4 pierre.o.tardif 2021-09-22 04:11:05 UTC
I attached my dmesg output.

Unfortunately, using the latest firmware did not improve the situation.
Comment 5 Alex Deucher 2021-09-22 13:25:43 UTC
Please make sure your kernel has this patch:

commit 7af2a5771e0918cdadb1614c1f81dd67a58e00aa
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Jan 15 12:26:51 2020 -0500

    drm/amdgpu: attempt to enable gfxoff on more raven1 boards (v2)
    
    Switch to a blacklist so we can disable specific boards
    that are problematic.
    
    v2: make the blacklist non-raven specific.
    
    Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Comment 6 pierre.o.tardif 2021-09-23 02:38:12 UTC
I cloned the stable linux repository at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git.

I tried two kernel versions:
- git checkout v5.14.7
- git checkout 7af2a5771e0918cdadb1614c1f81dd67a58e00aa

Both exhibit the problem.
Comment 7 pierre.o.tardif 2021-10-02 20:42:25 UTC
Any update?
Comment 8 Alex Deucher 2021-10-04 13:30:01 UTC
(In reply to pierre.o.tardif from comment #7)
> Any update?

Commit 005440066f929ba0dca8f4e0aebfbf8daac592cc enabled the gfxoff feature which was apparently problematic on your board.  Commit 7af2a5771e0918cdadb1614c1f81dd67a58e00aa disables gfxoff again on your board.  So it seems to be some other issue I guess.
Comment 9 pierre.o.tardif 2021-10-10 00:13:58 UTC
I investigated the problem some more. Commit 005440066f929ba0dca8f4e0aebfbf8daac592cc enabled gfxoff, but it /also/ enabled stutter mode. It changed this line:

/* OverDrive(bit 14),gfxoff(bit 15),stutter mode(bit 17) disabled by default*/
uint amdgpu_pp_feature_mask = 0xfffd3fff;

to this line:

/* OverDrive(bit 14) disabled by default*/
uint amdgpu_pp_feature_mask = 0xffffbfff;

Two bits were changed: the gfxoff bit, and the stutter mode bit.

I did a `git checkout 005440066f929ba0dca8f4e0aebfbf8daac592cc`, I reset the stutter mode bit, meaning that I changed it to this line:

uint amdgpu_pp_feature_mask = 0xfffdbfff;

I recompiled the kernel and it works! So indeed the problem seems to not be gfxoff, but stutter mode.

I don't know what gfxoff or stutter mode do, but maybe stutter mode should be disabled on my particular setup?
Comment 10 pierre.o.tardif 2021-10-19 01:32:29 UTC
What should I do if I want to get this fixed? Should I try to write a patch? Will somebody review my patch if I manage to write one?
Comment 11 Alex Deucher 2021-10-20 20:49:20 UTC
Created attachment 299287 [details]
possible fix

This patch should fix it.
Comment 12 pierre.o.tardif 2021-10-23 04:44:09 UTC
I confirm that the patch does work!

Note You need to log in before you can comment on or make changes to this bug.