Created attachment 279511 [details] dmesg How to reproduce: Boot and start any graphical application, for example gdm Expected: gdm will start Actual: screen goes black, with the cursor visible in top left corner, log is full of [Hardware Error] I have bisected the bug, it seems it is introduced in commit 284dec4317c8e76f45d3ce922f673c80331812f1. let me know if you think the hardware is actually broken, but i have had no issues on windows, and no hardware errors on 4.19 kernel.
Created attachment 279513 [details] /proc/cpuinfo
Created attachment 279515 [details] lspci -vvv
Created attachment 279517 [details] .config
If you've bisected then it's unlikely to be a hardware issue If you revert the commit does everything work again?
That's commit 284dec4317c8e76f45d3ce922f673c80331812f1 Author: Christian König <christian.koenig@amd.com> Date: Wed Aug 22 16:44:56 2018 +0200 drm/amdgpu: enable GTT PD/PT for raven v3
Currently GTT is only used for PD/PT as last resort when there is so few stolen memory assigned to the APU that it won't work at all otherwise. The faulting address looks suspicious like we miss to handle an error code correctly somewhere and instead use the value as DMA address. What is your BIOS setting for the stolen VRAM?
Everything works after a revert. There was a conflict, i am attaching a diff of the revert.
Created attachment 279521 [details] revert of 284dec
> What is your BIOS setting for the stolen VRAM? I can't find any such settings in bios. I really do not have any options regarding video.
(In reply to Christian König from comment #6) > > What is your BIOS setting for the stolen VRAM? [ 2.665594] [drm] amdgpu: 256M of VRAM memory ready
The bug is no more present on v4.20-rc5.
I was curious what could have fixed that. I tried to reproduce it on a totally different notebook with 2500U (EliteBook 745 G5) but I wasn't getting any MCE reported errors with the commit 284dec4317c8 ("drm/amdgpu: enable GTT PD/PT for raven v3"). Probably because of having more stole VRAM: [ 5.232179] [drm] amdgpu: 1024M of VRAM memory ready It's probably one of those: git log --oneline v4.20-rc2..v4.20-rc5 drivers/gpu/drm/amd/amdgpu/ ad97d9de4583 drm/amdgpu: Add delay after enable RLC ucode 1954db153d18 drm/amdgpu: Avoid endless loop in GPUVM fragment processing 9ce2b991f7ea drm/amdgpu: Cast to uint64_t before left shift a5d0f4565996 drm/amdgpu: Enable HDP memory light sleep 8d4d7c589947 drm/amdgpu: Add missing firmware entry for HAINAN 919a52fc4ca1 drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset 69756c6ff0de drm/amdgpu: Add amdgpu "max bpc" connector property (v2) c1a17777eb45 drm/amdgpu: fix huge page handling on Vega10 c837243ff401 drm/amdgpu: fix bug with IH ring setup 5581c670fb7e drm/amdgpu: set system aperture to cover whole FB region
i reopen because i have realized the bug is still present. i have accidentally tested with "iommu=soft" kernel parameter. When using this parameter the bug is not displayed, and the system is usable.
I'm experiencing this bug on a Lenovo E585. My boot logs also show only "256M of VRAM memory ready" when running 4.19.12 or commit 284dec4317c8. 4.19.12 boots fine, but the commit in question causes a lockup. Is there any data I can collect or patches to test to support resolving this? Thanks for any insight or direction.
Created attachment 280291 [details] v5.0-rc1 boot lockup Boot log running v5.0-rc1 attached
should be fixed with: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c1eba86339c8517814863bc7dd21e2661a84e77
I confirm the fix works. v5.0-rc3 also works. Let me know if you need any more testing.