When load AMDGPU driver,it hangs everytime with IB test faild on sdma0.The AMDGPU drivers is not support on 32bit system? My linux is LFS-10.0,hardware is mainboard(Biostar B450GT with newest bios,update to 12/08/2020),cpu(Ryzen5 2400G),memery(32G DDR4 with duaul channel).I have test many version of kernels and amdgpu firmware,but hangs every time when modprobe amdgpu. I also have installed Ubuntu 20.4 TLS(with 5.8.0-25 kernel),AMDGPU worked very good on it.
Created attachment 294855 [details] Main Error
Created attachment 294857 [details] Kernel log without kfd fd kfd: added device 1002:15dd and amdgpu: Topology: Add APU node [0x0:0x0]
Created attachment 294859 [details] Kernel config file
Created attachment 294861 [details] GCC version
I have tested with many kernels and firmwares but failed ! To compare with Ubuntu 20.4 LTS kern log,my kern log lack of "kfd: added device" and "amdgpu 0000:06:00.0: amdgpu: Topology: Add APU node [0x0:0x0]". It seems sdma0 and vcn bug,with some memoery faults. This is mail error log. Jan 26 09:58:07 Pink kernel: [ 69.141903] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma0 (-110). Jan 26 09:58:07 Pink kernel: [ 69.145985] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:157 vmid:10 pasid:0, for process pid 0 thread pid 0) Jan 26 09:58:07 Pink kernel: [ 69.146002] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x000000f40021b000 from client 27 Jan 26 09:58:07 Pink kernel: [ 69.146012] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00A00B3A Jan 26 09:58:07 Pink kernel: [ 69.146020] amdgpu 0000:06:00.0: amdgpu: ^I Faulty UTCL2 client ID: CPC (0x5) Jan 26 09:58:07 Pink kernel: [ 69.146027] amdgpu 0000:06:00.0: amdgpu: ^I MORE_FAULTS: 0x0 Jan 26 09:58:07 Pink kernel: [ 69.146033] amdgpu 0000:06:00.0: amdgpu: ^I WALKER_ERROR: 0x5 Jan 26 09:58:07 Pink kernel: [ 69.146040] amdgpu 0000:06:00.0: amdgpu: ^I PERMISSION_FAULTS: 0x3 Jan 26 09:58:07 Pink kernel: [ 69.146046] amdgpu 0000:06:00.0: amdgpu: ^I MAPPING_ERROR: 0x1 Jan 26 09:58:07 Pink kernel: [ 69.146052] amdgpu 0000:06:00.0: amdgpu: ^I RW: 0x0 Jan 26 09:58:07 Pink kernel: [ 69.146067] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:157 vmid:10 pasid:0, for process pid 0 thread pid 0) Jan 26 09:58:07 Pink kernel: [ 69.146077] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x000000f40021b000 from client 27 Jan 26 09:58:07 Pink kernel: [ 69.146086] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00A00B3A Jan 26 09:58:07 Pink kernel: [ 69.146094] amdgpu 0000:06:00.0: amdgpu: ^I Faulty UTCL2 client ID: CPC (0x5) Jan 26 09:58:07 Pink kernel: [ 69.146100] amdgpu 0000:06:00.0: amdgpu: ^I MORE_FAULTS: 0x0 Jan 26 09:58:07 Pink kernel: [ 69.146106] amdgpu 0000:06:00.0: amdgpu: ^I WALKER_ERROR: 0x5 Jan 26 09:58:07 Pink kernel: [ 69.146112] amdgpu 0000:06:00.0: amdgpu: ^I PERMISSION_FAULTS: 0x3 Jan 26 09:58:07 Pink kernel: [ 69.146118] amdgpu 0000:06:00.0: amdgpu: ^I MAPPING_ERROR: 0x1 Jan 26 09:58:07 Pink kernel: [ 69.146124] amdgpu 0000:06:00.0: amdgpu: ^I RW: 0x0 Jan 26 09:58:07 Pink kernel: [ 69.146514] mce: [Hardware Error]: Machine check events logged Jan 26 09:58:07 Pink kernel: [ 69.146526] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 20: dc2030000001085b Jan 26 09:58:07 Pink kernel: [ 69.146533] mce: [Hardware Error]: TSC 52c0cc15a4 ADDR 7ffcffffff40 SYND 5b240204 IPID 2e00000000 Jan 26 09:58:07 Pink kernel: [ 69.146545] mce: [Hardware Error]: PROCESSOR 2:810f10 TIME 1611655087 SOCKET 0 APIC 0 microcode 8101016 Jan 26 09:58:08 Pink kernel: [ 70.150550] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:10 Pink kernel: [ 71.172270] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:11 Pink kernel: [ 72.193987] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:12 Pink kernel: [ 73.215700] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:13 Pink kernel: [ 74.237417] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:14 Pink kernel: [ 75.259129] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:15 Pink kernel: [ 76.280848] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:16 Pink kernel: [ 77.302559] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:17 Pink kernel: [ 78.324274] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:18 Pink kernel: [ 79.345988] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, trying to reset the VCPU!!! Jan 26 09:58:18 Pink kernel: [ 79.366046] [drm:vcn_v1_0_set_powergating_state [amdgpu]] *ERROR* VCN decode not responding, giving up!!! Jan 26 09:58:18 Pink kernel: [ 79.366067] [drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <vcn_v1_0> failed -1 Jan 26 09:58:18 Pink kernel: [ 79.366137] amdgpu 0000:06:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:16 vmid:0 pasid:0, for process pid 0 thread pid 0) Jan 26 09:58:18 Pink kernel: [ 79.366150] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 18 Jan 26 09:58:18 Pink kernel: [ 79.366159] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000420 Jan 26 09:58:18 Pink kernel: [ 79.366166] amdgpu 0000:06:00.0: amdgpu: ^I Faulty UTCL2 client ID: VCN (0x2) Jan 26 09:58:18 Pink kernel: [ 79.366172] amdgpu 0000:06:00.0: amdgpu: ^I MORE_FAULTS: 0x0 Jan 26 09:58:18 Pink kernel: [ 79.366177] amdgpu 0000:06:00.0: amdgpu: ^I WALKER_ERROR: 0x0 Jan 26 09:58:18 Pink kernel: [ 79.366182] amdgpu 0000:06:00.0: amdgpu: ^I PERMISSION_FAULTS: 0x2 Jan 26 09:58:18 Pink kernel: [ 79.366187] amdgpu 0000:06:00.0: amdgpu: ^I MAPPING_ERROR: 0x0 Jan 26 09:58:18 Pink kernel: [ 79.366192] amdgpu 0000:06:00.0: amdgpu: ^I RW: 0x0 Jan 26 09:58:19 Pink kernel: [ 80.405920] amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec (-110). Jan 26 09:58:20 Pink kernel: [ 81.429922] amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc0 (-110). Jan 26 09:58:21 Pink kernel: [ 82.453922] amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc1 (-110). Jan 26 09:59:18 Pink kernel: Kernel logging (proc) stopped. Jan 26 09:59:18 Pink kernel: Kernel log daemon terminating. Jan 26 10:00:07 Pink kernel: klogd 1.5.1, log source = /proc/kmsg started.
Does it work properly on 64bit?
Yes,maybe Ubuntu kernel applyed some patch? Otherwise AMDGPU driver only worked on X86_64 ? The radeon drivers worked well on 32bit kernel. I have Caicos and Oland chipset radeon graphic cards,all be drived perfect on LFS-10.0 i686 arch!
(In reply to bolando from comment #7) > Yes,maybe Ubuntu kernel applyed some patch? Otherwise AMDGPU driver only > worked on X86_64 ? The radeon drivers worked well on 32bit kernel. I have > Caicos and Oland chipset radeon graphic cards,all be drived perfect on > LFS-10.0 i686 arch! It should work in theory. That said, we don't do regular validation of 32 bit any more.
(In reply to Alex Deucher from comment #8) > (In reply to bolando from comment #7) > > Yes,maybe Ubuntu kernel applyed some patch? Otherwise AMDGPU driver only > > worked on X86_64 ? The radeon drivers worked well on 32bit kernel. I have > > Caicos and Oland chipset radeon graphic cards,all be drived perfect on > > LFS-10.0 i686 arch! > > It should work in theory. That said, we don't do regular validation of 32 > bit any more. Thanks for you relay,depend on general-purpose of drivers development,AMDGPU should work on 32bit arch.But I don't know what wrong with it.The AMDGPU driver for me lack of kfd ,APU node topology and amdgpudrmfb(fb0 interface),I want to know how to fix it. The firmware and kernel is nearly newest ,but 5.10.9 do more things on resetting the GPU, show more debug information than 5.8.0.
does setting CONFIG_HSA_AMD=n fix it?
No HSA_AMD option, it's only for 64bit kernel
I'd recommend running a 64-bit kernel, even if all user-space is 32-bit.
(In reply to Michel Dänzer from comment #12) > I'd recommend running a 64-bit kernel, even if all user-space is 32-bit. I just have finished LFS-10.0 in several weeks,it's a very hard work. If enable 64bit kernel support,I need to recompile everything on LFS10.0 in next weeks.Have any other solutions for 32bit arch ?
> If enable 64bit kernel support,I need to recompile everything on LFS10.0 in > next weeks. You shouldn't. 32-bit user-space works fine with a 64-bit kernel.
(In reply to Michel Dänzer from comment #14) > > If enable 64bit kernel support,I need to recompile everything on LFS10.0 in > > next weeks. > > You shouldn't. 32-bit user-space works fine with a 64-bit kernel. Thanks for reply.My LFS-10.0 is built for 32bit,I couldn't select 64bit kernel config when recompile the Linux kernel.Only 32bit kernel could build.I really want to know that the 32bit arch won't be supported by AMDGPU drivers from now on?
(In reply to bolando from comment #15) > (In reply to Michel Dänzer from comment #14) > > > If enable 64bit kernel support,I need to recompile everything on LFS10.0 > in > > > next weeks. > > > > You shouldn't. 32-bit user-space works fine with a 64-bit kernel. > > Thanks for reply.My LFS-10.0 is built for 32bit,I couldn't select 64bit > kernel config when recompile the Linux kernel.Only 32bit kernel could > build.I really want to know that the 32bit arch won't be supported by AMDGPU > drivers from now on? Anecdotally it works for some people. It may depend on the platform and device.
(In reply to Alex Deucher from comment #16) > (In reply to bolando from comment #15) > > (In reply to Michel Dänzer from comment #14) > > > > If enable 64bit kernel support,I need to recompile everything on > LFS10.0 > > in > > > > next weeks. > > > > > > You shouldn't. 32-bit user-space works fine with a 64-bit kernel. > > > > Thanks for reply.My LFS-10.0 is built for 32bit,I couldn't select 64bit > > kernel config when recompile the Linux kernel.Only 32bit kernel could > > build.I really want to know that the 32bit arch won't be supported by > AMDGPU > > drivers from now on? > > Anecdotally it works for some people. It may depend on the platform and > device. God from AMDGPU drivers development team?I have reviewed the 5.10.11 kernel changelog and found your name! Anecdotally worked on 32bit system ?It seems a few of people use the 32bit systems .The LFS book don't recommend build x86_64 system,so I built 32bit system. The newer kernel does work better on AMDGPU driver,maybe on one day,I can use Raven APU with new Linux kernel expectantly!Thanks a lot!
I have compiled x64 kernel for my LFS10.0.Booting with the X64 kernel,when load the amdgpu driver,screen frozen again. check the kern log ,everything seems OK but no amdgpudrmfb .I try to start X11,but failed with no fittable modes.
Please attach your xorg log and dmesg output. Note that if you want an fbdev interface for the console, you need to enable CONFIG_DRM_FBDEV_EMULATION=y in your config.
Everything is OK!I recompiled the 5.10.8 X64 kernel,AMDGPU is successful load !It seems the 32bit kernel is not supported now .I think the AMDGPU drivers need IOMMU and HSPA support,the 32bit kernel haven't supported them. Thanks for all people who replied and helped me ! Thanks a lot !
(In reply to Michel Dänzer from comment #14) > > If enable 64bit kernel support,I need to recompile everything on LFS10.0 in > > next weeks. > > You shouldn't. 32-bit user-space works fine with a 64-bit kernel. You advise is very effective! I use Ubuntu to compile the X64 kernel . Thanks !