Created attachment 285871 [details] Script from radeontop to read AMD gpu ids Running an utility named radeontop on an AMD APU causes a freeze while attempting to read amdgpu ids. Attached is the script. It will be nice to provide a better method to read AMD GPU cards.
It is happening for me too with a Vega integrated graphics. Totally freeze with no graphic load and the utility running
I think this might be a regression since radeontop worked fine with 4.19 on my Acer Nitro with Ryzen 5 2500U Raven + Polaris RX 560 The freezes are also not instant I get about a few seconds up to a few minutes before it hangs (might be dependent on load). Some more information might be here: https://github.com/clbr/radeontop/issues/87
Created attachment 285881 [details] possible fix Assuming radeontop uses the info ioctl to query the registers, this patch should fix it. If it mmaps the register BAR directly, there's nothing you can do. Accessing registers while the gfx block is off will lead to garbage data and possibly hang the chip.
Created attachment 285883 [details] possible fix updated patch to handle cached registers properly.
Hi. Please add patch for 4.19.x LTS kernels too. Thanks.
Created attachment 285923 [details] possible fix Better fix.
as users reported, this bug should only affects kernels 5.2+ by default, radeontop calls amdgpu_read_mm_registers, amdgpu_query_info and amdgpu_query_sensor_info, but it can be forced by the command line to read BAR from /dev/mem there is a kernel dump at https://github.com/clbr/radeontop/issues/87#issuecomment-529267244 thank you for the patch, but I cannot test it as my hardware is not affected (KAVERI)
Please read here... https://github.com/lestofante/ksysguard-gpu/issues/4 Same issue on 4.19.x LTS kernel.
thanks, I was not aware of it, may be different hardware from the ones on which kernel 4.19/5.1 works?
AMD Ryzen 5 2600G + AMD RX560 (multiseat system), system freezed after few days on kernel 4.19.83 in my case.
(In reply to Trek from comment #7) > by default, radeontop calls amdgpu_read_mm_registers, amdgpu_query_info and > amdgpu_query_sensor_info, but it can be forced by the command line to read > BAR from /dev/mem If you access the BAR directly you will likely have problems in certain power saving modes. Can someone test the patch?
I need approx 3-5 days for testing, because this bug is not persistent.
(In reply to Alex Deucher from comment #11) > (In reply to Trek from comment #7) > > by default, radeontop calls amdgpu_read_mm_registers, amdgpu_query_info and > > amdgpu_query_sensor_info, but it can be forced by the command line to read > > BAR from /dev/mem > > If you access the BAR directly you will likely have problems in certain > power saving modes. > > Can someone test the patch? Currently building on https://copr.fedorainfracloud.org/coprs/luya/kernel-amgpu-gfxoff/build/1095660/
(In reply to Alex Deucher from comment #11) > If you access the BAR directly you will likely have problems in certain > power saving modes. thank you, I'll add a warning message when accessing BAR directly
Created attachment 285947 [details] dmesg from amd raven ridege Ryzen 2500u dmesg showing latest kernel git snapshot
Created attachment 285949 [details] amdgpu firmware info Firmware information of amdgpu installed in the testing system
Created attachment 285951 [details] Screenshot of radeontop running with patched kernel Running radeontop with the patched test kernel, I can confirm the patch fixed the freezing issue which no longer occurs as the card is correctly picked up.
Reading another bug report on https://bugzilla.kernel.org/show_bug.cgi?id=204689 taken from amdgfx mailing list, could that issue related? Anyway, radeontop still runs with the patched kernel. No noticeable freeze and I tested with Blender rendering the old Ryzen CPU 3D model with GPU compute running on rocm-opencl (which needs optimization compared to amdgpu-pro-opencl). To Alex, will it possible to prepare the patch in the patchwork.kernel.org? Thanks.
(In reply to Luya Tshimbalanga from comment #18) > Reading another bug report on > https://bugzilla.kernel.org/show_bug.cgi?id=204689 taken from amdgfx mailing > list, could that issue related? Not likely.
I confirm the fix landed on kernel 5.4. Thanks Alex for a quick investigation. Closing this report.
(In reply to Luya Tshimbalanga from comment #20) > I confirm the fix landed on kernel 5.4. Thanks Alex for a quick > investigation. Closing this report. For me It Is happening again, i dont know since what kernel. Ivhace an Asus with ryzen 5 3550H
(In reply to albertogomezmarin from comment #21) > (In reply to Luya Tshimbalanga from comment #20) > > I confirm the fix landed on kernel 5.4. Thanks Alex for a quick > > investigation. Closing this report. > > For me It Is happening again, i dont know since what kernel. Ivhace an Asus > with ryzen 5 3550H Did the latest updated kernel resolve the issue?
I did not test it. I have not here the laptop to do it. I have now another laptop with ryyzen 7 3700U
It is still happening. For me it is an almost instant lock. REISUB does not work. CPU : Quad Core AMD Ryzen 3 2200G with Radeon Vega Graphics (-MCP-) Kernel : 5.6.17-pclos1 x86_64 Shell : bash 4.4.23