Bug 216554

Summary: amdgpu is broken in linux-5.4.215 and newer
Product: Drivers Reporter: herrtimson
Component: Video(Other)Assignee: drivers_video-other
Status: NEW ---    
Severity: high CC: regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.215 Subsystem:
Regression: No Bisected commit-id:

Description herrtimson 2022-10-05 12:18:51 UTC
hi there, 

I happened to be confronted with a non working Xorg - as in, blackscreen of death when initializing during boot - when I downloaded the patch for linux kernel 5.4.215 the other, and of course applied and compiled and rebooted the new kernel.

reverting to boot into 5.4.214 is just fine. 

I'm using amdgpu, and since there are only a handfull of small patches concerning gpu/drm in the 5.4.215 patch, I decided to work out which one breaks the amdgpu module by first reverting them, and then to readd them one by one. 

hence my verdict is, that this commit has broken the amdgpu module: 

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c?id=v5.4.215&id2=v5.4.214

If I revert this single patch from the big 5.4.215 patch, I get a working amdgpu and thus a working Xorg, if I don't I get a blackscreen of death during boot. 


I'm a gentoo user, on amd64. 

I don't have anything to share with you for logs, as the blackscreen of death doesn't show me anything usefull. But reverting the two changed lines makes it all work again.
Comment 1 herrtimson 2022-10-05 12:23:53 UTC
there is one patch for the file ../drivers/gpu/drm/amd/amdgpu/amdgpu_display.c , which has landed in 6.0_rc two weeks ago, and which claims per its commit message to be a fix of some sort to the previous commit, which has been backported to 5.4.215: 

 drm/amdgpu: don't register a dirty callback for non-atomic

Some asics still support non-atomic code paths.

Fixes: 66f9962 ("drm/amdgpu: use dirty framebuffer helper")
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

this is the link: https://github.com/torvalds/linux/commit/abbc7a3dafb91b9d4ec56b70ec9a7520f8e13334


I don't know much about the kernel, let alone hacking around in it, but seems this may be the fix that is needed? Sadly I cannot test it, the third hunk doesn't backport cleanly to the 5.4 branch.
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-06 05:49:06 UTC
A patch to revert the two changes is already prepared, I pointed the developer to this report to ensure they're aware of it: 
https://lore.kernel.org/all/2da7598f-26ae-3da2-2534-d843aae7140c@leemhuis.info/
Comment 3 herrtimson 2022-10-06 05:55:31 UTC
thanks, seems it causes performance issues for linux-5.19 kernel and above? 

the patch quoted via > in the message you linked above - is it the supposed fix for the kernel?
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-06 06:07:23 UTC
(In reply to herrtimson from comment #3)
> thanks, seems it causes performance issues for linux-5.19 kernel and above? 

Apparently.

> the patch quoted via > in the message you linked above - is it the supposed
> fix for the kernel?

Not for 5.4.y. Nearly all things need to be fixed in mainline first and then are backported, otherwise it quickly gets messy.