This appears to be a regression in 5.19-rc3 (and rc2, didn't test before that). It works fine on 5.18.7. Both custom build. And also no issues on 5.18.0. Debian, amd64. 44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0) CPU: AMD Threadripper 2950X, stock Memory: 8x32GB ECC Motherboard: MSI MEG Creation X399 Booting looks fine, but when Xorg server starts, the screen looks corrupted, and it takes seconds until screen freezes and is not responding. Dmesg output: [ 140.683672] amdgpu 0000:44:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:1 pasid:32769, for process Xorg pid 2994 thread Xorg:cs0 pid 3237) [ 140.683678] amdgpu 0000:44:00.0: amdgpu: in page starting at address 0x0000800106ef5000 from client 0x1b (UTCL2) [ 140.683681] amdgpu 0000:44:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0014115B [ 140.683682] amdgpu 0000:44:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [ 140.683684] amdgpu 0000:44:00.0: amdgpu: MORE_FAULTS: 0x1 [ 140.683685] amdgpu 0000:44:00.0: amdgpu: WALKER_ERROR: 0x5 [ 140.683686] amdgpu 0000:44:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [ 140.683686] amdgpu 0000:44:00.0: amdgpu: MAPPING_ERROR: 0x1 [ 140.683687] amdgpu 0000:44:00.0: amdgpu: RW: 0x1 ... [ 151.015508] gmc_v10_0_process_interrupt: 699 callbacks suppressed ... Eventually resets, but still not usable: [ 161.261520] amdgpu 0000:44:00.0: amdgpu: IH ring buffer overflow (0x0008D620, 0x00002680, 0x0000D640) [ 161.270648] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=100, emitted seq=103 [ 161.270854] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 2994 thread Xorg:cs0 pid 3237 [ 161.271004] amdgpu 0000:44:00.0: amdgpu: GPU reset begin! [ 161.830407] amdgpu 0000:44:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) [ 161.830517] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed [ 162.084366] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx [ 162.101328] [drm] free PSP TMR buffer [ 162.149879] CPU: 15 PID: 188 Comm: kworker/u128:14 Tainted: G W E 5.19.0-rc3 #1 [ 162.149883] Hardware name: Micro-Star International Co., Ltd. MS-7B92/MEG X399 CREATION (MS-7B92), BIOS 1.30 03/25/2019 [ 162.149884] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] [ 162.149890] Call Trace: [ 162.149892] <TASK> [ 162.149893] dump_stack_lvl+0x34/0x45 [ 162.149898] amdgpu_do_asic_reset+0x1b/0x3db [amdgpu] [ 162.150047] amdgpu_device_gpu_recover_imp.cold+0x57e/0x910 [amdgpu] [ 162.150194] amdgpu_job_timedout+0x14b/0x180 [amdgpu] [ 162.150323] ? finish_task_switch.isra.0+0x7d/0x270 [ 162.150326] drm_sched_job_timedout+0x5b/0xf0 [gpu_sched] [ 162.150330] process_one_work+0x1ab/0x300 [ 162.150332] worker_thread+0x48/0x3c0 [ 162.150334] ? rescuer_thread+0x3c0/0x3c0 [ 162.150336] kthread+0xd1/0x100 [ 162.150338] ? kthread_complete_and_exit+0x20/0x20 [ 162.150339] ret_from_fork+0x1f/0x30 [ 162.150342] </TASK> [ 162.150351] amdgpu 0000:44:00.0: amdgpu: MODE1 reset [ 162.150354] amdgpu 0000:44:00.0: amdgpu: GPU mode1 reset [ 162.150417] amdgpu 0000:44:00.0: amdgpu: GPU smu mode1 reset [ 162.653371] amdgpu 0000:44:00.0: amdgpu: GPU reset succeeded, trying to resume [ 162.653516] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000). [ 162.653537] [drm] VRAM is lost due to GPU reset! [ 162.653541] [drm] PSP is resuming... [ 162.834166] [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR [ 162.948850] amdgpu 0000:44:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 162.948853] amdgpu 0000:44:00.0: amdgpu: SMU is resuming... [ 162.948884] amdgpu 0000:44:00.0: amdgpu: use vbios provided pptable [ 163.025704] amdgpu 0000:44:00.0: amdgpu: SMU is resumed successfully! [ 163.027473] [drm] DMUB hardware initialized: version=0x02020003 [ 163.280274] [drm] kiq ring mec 2 pipe 1 q 0 [ 163.284624] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 163.284906] [drm] JPEG decode initialized successfully. [ 163.284926] amdgpu 0000:44:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 163.284928] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 163.284930] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 163.284931] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 163.284932] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 163.284934] amdgpu 0000:44:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 163.284935] amdgpu 0000:44:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 163.284936] amdgpu 0000:44:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 163.284937] amdgpu 0000:44:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 163.284938] amdgpu 0000:44:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 [ 163.284940] amdgpu 0000:44:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 163.284941] amdgpu 0000:44:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0 [ 163.284942] amdgpu 0000:44:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0 [ 163.284943] amdgpu 0000:44:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0 [ 163.284944] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1 [ 163.284945] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1 [ 163.284947] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1 [ 163.284948] amdgpu 0000:44:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1 [ 163.284949] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1 [ 163.284950] amdgpu 0000:44:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1 [ 163.284951] amdgpu 0000:44:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1 [ 163.292565] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow start [ 163.292579] amdgpu 0000:44:00.0: amdgpu: recover vram bo from shadow done [ 163.292582] [drm] Skip scheduling IBs! [ 163.292583] [drm] Skip scheduling IBs! [ 163.292598] amdgpu 0000:44:00.0: amdgpu: GPU reset(3) succeeded! [ 163.292618] [drm] Skip scheduling IBs! [ 163.292626] [drm] Skip scheduling IBs! [ 163.292629] [drm] Skip scheduling IBs! [ 163.989966] usb usb8-port1: Cannot enable. Maybe the USB cable is bad? [ 166.265393] amdgpu_cs_ioctl: 3200 callbacks suppressed [ 166.265397] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 166.265812] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 166.282284] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 166.283327] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [ 171.486759] amdgpu_cs_ioctl: 65 callbacks suppressed
Created attachment 301271 [details] lspci -vvv output
Created attachment 301272 [details] dmesg-amdgpu-fail-5.19.0-rc3.txt
Created attachment 301273 [details] dmesg 5.19-rc2
Created attachment 301274 [details] dpkg-l.txt
Created attachment 301275 [details] config-5.19.0-rc3.txt
Created attachment 301276 [details] inxi-full-systeminfo.txt
The issue can happen even before logging from display manager to the desktop environment. I use lightdm.
# cat /sys/kernel/debug/dri/0/amdgpu_firmware_info VCE feature version: 0, firmware version: 0x00000000 UVD feature version: 0, firmware version: 0x00000000 MC feature version: 0, firmware version: 0x00000000 ME feature version: 38, firmware version: 0x0000003e PFP feature version: 38, firmware version: 0x00000056 CE feature version: 38, firmware version: 0x00000024 RLC feature version: 1, firmware version: 0x0000005b RLC SRLC feature version: 0, firmware version: 0x00000000 RLC SRLG feature version: 0, firmware version: 0x00000000 RLC SRLS feature version: 0, firmware version: 0x00000000 MEC feature version: 38, firmware version: 0x00000058 MEC2 feature version: 38, firmware version: 0x00000058 SOS feature version: 0, firmware version: 0x00210862 ASD feature version: 553648218, firmware version: 0x2100005a TA XGMI feature version: 0x00000000, firmware version: 0x2000000b TA RAS feature version: 0x00000000, firmware version: 0x1b00012a TA HDCP feature version: 0x00000000, firmware version: 0x1700001f TA DTM feature version: 0x00000000, firmware version: 0x12000009 TA RAP feature version: 0x00000000, firmware version: 0x0700000e TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000 SMC feature version: 0, program: 0, firmware version: 0x003a4700 (58.71.0) SDMA0 feature version: 52, firmware version: 0x0000004c SDMA1 feature version: 52, firmware version: 0x0000004c SDMA2 feature version: 52, firmware version: 0x0000004c SDMA3 feature version: 52, firmware version: 0x0000004c VCN feature version: 0, firmware version: 0x0210d02a DMCU feature version: 0, firmware version: 0x00000000 DMCUB feature version: 0, firmware version: 0x02020003 TOC feature version: 0, firmware version: 0x00000000 VBIOS version: 113-69XB6SSB1-D01
Bisected: 9cad937c0c58618fe5b0310fd539a854dc1ae95 is the first bad commit commit c9cad937c0c58618fe5b0310fd539a854dc1ae95 Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Date: Fri Apr 8 04:18:43 2022 +0530 drm/amdgpu: add drm buddy support to amdgpu
Duplicate of: https://bugzilla.kernel.org/show_bug.cgi?id=216120 https://gitlab.freedesktop.org/drm/amd/-/issues/2050