Bug 211157

Summary: amdgpu VM_L2_PROTECTION_FAULT_STATUS:0x00701431
Product: Drivers Reporter: Alexey Kuznetsov (axet)
Component: Video(Other)Assignee: drivers_video-other
Status: RESOLVED ANSWERED    
Severity: normal CC: antonio.chirizzi, bugzilla+kernel.org, bxkx, darinp, dev.claude.petrescu, dev, dmitry.myachin, fierevere, forum, kernel.org, kunal.bhat2001, matejm98mthw, michal.przybylowicz, mydellpc07, oneofone, rafael.ristovski, skobkin-ru, stefanspr94, vlastimil.holer, xfrtqtjbranckruuzs
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.10.0-1-amd64 Subsystem:
Regression: No Bisected commit-id:

Description Alexey Kuznetsov 2021-01-12 18:53:42 UTC
My laptop failed rarely during game play (twice a day). First is start to laggin baddly and in one second freeze completely. I still able to login remotely using ssh, but keyboard, caps lock or alt+f1 not working.

dmesg shows a related errors:

[ 7738.264852] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x00008001bfd14000 from client 27
[ 7738.264856] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00701431
[ 7738.264860] amdgpu 0000:03:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[ 7738.264864] amdgpu 0000:03:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[ 7738.264867] amdgpu 0000:03:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7738.264871] amdgpu 0000:03:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[ 7738.264874] amdgpu 0000:03:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7738.264878] amdgpu 0000:03:00.0: amdgpu: 	 RW: 0x0
[ 7738.265691] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32778, for process PathOfExile_x64 pid 14159 thread PathOfExile_x64 pid 14231)


Full log: https://linux-hardware.org/?probe=7f9336625d
Comment 1 Arunanshu Biswas 2021-05-15 15:56:25 UTC
Same here. This happens with 5.11 and 5.12 (AFAIK). Strange boxes appear on screen, freezes etc at very random intervals.
Comment 2 Sylvia 2021-06-06 18:59:04 UTC
Started to happen after 5.12.4 or so...

Bug also been "backported" to 5.10.x

Happens often, always with Firefox, when opening heavy graphics pages under CPU load (usually)

Sometimes can recover, sometimes freezing the system completely.


Hardware: Lenovo 330S 15ARR 
AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx AuthenticAMD GNU/Linux
AMD Radeon(TM) Vega 8 Graphics

Jun 06 21:40:02 [kernel] [11217.699553] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800119c80000 from client 27
Jun 06 21:40:02 [kernel] [11217.699555] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x006C0071
Jun 06 21:40:02 [kernel] [11217.699557] amdgpu 0000:04:00.0: amdgpu: _ Faulty UTCL2 client ID: CB (0x0)
Jun 06 21:40:02 [kernel] [11217.699559] amdgpu 0000:04:00.0: amdgpu: _ MORE_FAULTS: 0x1
Jun 06 21:40:02 [kernel] [11217.699560] amdgpu 0000:04:00.0: amdgpu: _ WALKER_ERROR: 0x0
Jun 06 21:40:02 [kernel] [11217.699562] amdgpu 0000:04:00.0: amdgpu: _ PERMISSION_FAULTS: 0x7
Jun 06 21:40:02 [kernel] [11217.699564] amdgpu 0000:04:00.0: amdgpu: _ MAPPING_ERROR: 0x0
Jun 06 21:40:02 [kernel] [11217.699566] amdgpu 0000:04:00.0: amdgpu: _ RW: 0x1
Jun 06 21:40:07 [kernel] [11222.696469] gmc_v9_0_process_interrupt: 5900 callbacks suppressed
Jun 06 21:40:07 [kernel] [11222.696476] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32777, for process GPU Process pid 3227 thread firefox:cs0 pid 3288)



Jun 06 21:40:07 [kernel] [11222.699857] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800119c80000 from client 27
Jun 06 21:40:07 [kernel] [11222.699858] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x006C0071
Jun 06 21:40:07 [kernel] [11222.699860] amdgpu 0000:04:00.0: amdgpu: _ Faulty UTCL2 client ID: CB (0x0)
Jun 06 21:40:07 [kernel] [11222.699862] amdgpu 0000:04:00.0: amdgpu: _ MORE_FAULTS: 0x1
Jun 06 21:40:07 [kernel] [11222.699863] amdgpu 0000:04:00.0: amdgpu: _ WALKER_ERROR: 0x0
Jun 06 21:40:07 [kernel] [11222.699865] amdgpu 0000:04:00.0: amdgpu: _ PERMISSION_FAULTS: 0x7
Jun 06 21:40:07 [kernel] [11222.699867] amdgpu 0000:04:00.0: amdgpu: _ MAPPING_ERROR: 0x0
Jun 06 21:40:07 [kernel] [11222.699868] amdgpu 0000:04:00.0: amdgpu: _ RW: 0x1
Jun 06 21:40:07 [kernel] [11222.700717] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32777, for process GPU Process pid 3227 thread firefox:cs0 pid 3288)
Comment 3 Arunanshu Biswas 2021-06-06 21:28:39 UTC
I think I found a solution. This applies to TLP users specifically, but non-TLP users can try that too.

In `/etc/tlp.conf` file, set this to:

RUNTIME_PM_DRIVER_BLACKLIST=""
# originally, it looks like this:
# RUNTIME_PM_DRIVER_BLACKLIST="amdgpu mei_me nouveau pcieport radeon"
Comment 4 Arunanshu Biswas 2021-06-06 21:29:59 UTC
(In reply to Arunanshu Biswas from comment #3)
> I think I found a solution. This applies to TLP users specifically, but
> non-TLP users can try that too.
> 
> In `/etc/tlp.conf` file, set this to:
> 
> RUNTIME_PM_DRIVER_BLACKLIST=""
> # originally, it looks like this:
> # RUNTIME_PM_DRIVER_BLACKLIST="amdgpu mei_me nouveau pcieport radeon"

Or you can simply blacklist it in modprobe.
Comment 5 Sylvia 2021-06-21 16:24:20 UTC
I have removed ivrs_ioapic[32]=00:14.0 froom boot params, and it seems to be that problem vanished (or just for now.. still keeping an eye for crashes), this param was required to boot my Lenovo laptop with earlier kernels.

00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
Comment 6 Dmitry Myachin 2021-06-23 14:53:56 UTC
I have same problem.

All working fine some time and then black screen. Music still playing but keyboard's multimedia key does not stop it anymore. Also Alt+F* not working.


Kernel: 5.12.12-zen1-1-zen
CPU: AMD Ryzen 5 3400G (8) @ 3.700GHz
GPU: AMD ATI 07:00.0 Picasso


Arch Linux. And it does not matter -zen or not.
Comment 7 Sylvia 2021-06-25 11:05:40 UTC
(In reply to Sylvia from comment #5)
> I have removed ivrs_ioapic[32]=00:14.0 froom boot params, and it seems to be
> that problem vanished (or just for now.. still keeping an eye for crashes),



Still..

[  430.943898] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32775, for process GPU Process pid 2960 thread firefox:cs0 pid 3006)
[  430.943905] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800112200000 from client 27
[  430.943908] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
[  430.943912] amdgpu 0000:04:00.0: amdgpu:      Faulty UTCL2 client ID: TCP (0x8)
[  430.943914] amdgpu 0000:04:00.0: amdgpu:      MORE_FAULTS: 0x1
[  430.943916] amdgpu 0000:04:00.0: amdgpu:      WALKER_ERROR: 0x0
[  430.943918] amdgpu 0000:04:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[  430.943921] amdgpu 0000:04:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  430.943923] amdgpu 0000:04:00.0: amdgpu:      RW: 0x1

[  435.948029] gmc_v9_0_process_interrupt: 47127 callbacks suppressed
[  440.952210] gmc_v9_0_process_interrupt: 47115 callbacks suppressed
[  445.955904] gmc_v9_0_process_interrupt: 615433 callbacks suppressed
[  455.963501] gmc_v9_0_process_interrupt: 623975 callbacks suppressed
[  460.967605] gmc_v9_0_process_interrupt: 622849 callbacks suppressed
[  465.971907] gmc_v9_0_process_interrupt: 624947 callbacks suppressed
[  470.976127] gmc_v9_0_process_interrupt: 625531 callbacks suppressed
[  480.104842] gmc_v9_0_process_interrupt: 107333 callbacks suppressed
[  485.107666] gmc_v9_0_process_interrupt: 624573 callbacks suppressed
[  490.111856] gmc_v9_0_process_interrupt: 624584 callbacks suppressed
490.266839] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  495.115724] gmc_v9_0_process_interrupt: 626231 callbacks suppressed
[  500.119513] gmc_v9_0_process_interrupt: 626410 callbacks suppressed

[  500.506030] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  505.123885] gmc_v9_0_process_interrupt: 124455 callbacks suppressed

[  510.128389] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32775, for process GPU Process pid 2960 thread firefox:cs0 pid 3006)
[  510.128391] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x000080011220f000 from client 27
[  510.128393] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
[  510.128395] amdgpu 0000:04:00.0: amdgpu:      Faulty UTCL2 client ID: TCP (0x8)
[  510.128396] amdgpu 0000:04:00.0: amdgpu:      MORE_FAULTS: 0x1
[  510.128397] amdgpu 0000:04:00.0: amdgpu:      WALKER_ERROR: 0x0
[  510.128399] amdgpu 0000:04:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[  510.128400] amdgpu 0000:04:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  510.128402] amdgpu 0000:04:00.0: amdgpu:      RW: 0x1
[  510.743952] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 8 Vlastimil Holer 2021-07-06 01:47:06 UTC
Still an issue with 5.13.0
(AMD Ryzen 5 PRO 3500U, Fedora 34, Mesa 21.1.4)

Jul 06 03:34:26 kernel: gmc_v9_0_process_interrupt: 116 callbacks suppressed
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a2000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a0000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a3000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a1000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a4000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a5000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a6000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a8000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a7000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110aa000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:36 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=413630, emitted seq=413632
Jul 06 03:34:36 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1574 thread Xorg:cs0 pid 1627
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fde0 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe00 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe20 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe40 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe60 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe80 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amd_iommu_report_page_fault: 21 callbacks suppressed
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041fea0 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041fec0 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041fee0 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff00 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff20 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff40 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff60 flags=0x0070]
Jul 06 03:34:36 kernel: [drm] free PSP TMR buffer
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
Jul 06 03:34:36 kernel: [drm] PCIE GART of 1024M enabled.
Jul 06 03:34:36 kernel: [drm] PTB located at 0x000000F401FA4000
Jul 06 03:34:36 kernel: [drm] PSP is resuming...
Jul 06 03:34:36 kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jul 06 03:34:37 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jul 06 03:34:37 kernel: [drm] VCN decode and encode initialized successfully(under SPG Mode).
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow start
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow done
Jul 06 03:34:37 kernel: [drm] Skip scheduling IBs!
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(2) succeeded!
Jul 06 03:34:47 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=413639, emitted seq=413641
Jul 06 03:34:47 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1574 thread Xorg:cs0 pid 1627
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jul 06 03:34:47 kernel: amd_iommu_report_page_fault: 36 callbacks suppressed
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1104075a0 flags=0x0070]
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:47 kernel: [drm] free PSP TMR buffer
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
Jul 06 03:34:47 kernel: [drm] PCIE GART of 1024M enabled.
Jul 06 03:34:47 kernel: [drm] PTB located at 0x000000F401FA4000
Jul 06 03:34:47 kernel: [drm] PSP is resuming...
Jul 06 03:34:47 kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jul 06 03:34:47 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jul 06 03:34:48 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
Jul 06 03:34:48 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
Jul 06 03:34:48 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(4) failed
Jul 06 03:34:48 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset end with ret = -110
Jul 06 03:34:58 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 06 03:35:08 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 9 Mthw 2021-07-27 13:29:55 UTC
Today on AMD 3550H, Arch Linux, linux 5.13.5, linux-firmware 20210315.3568f96. Froze for a few seconds, then recovered, had to restart Firefox though. Log:

júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e35000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401031
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e36000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e32000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e26000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e33000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e0b000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e03000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e0a000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e27000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e02000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Comment 10 Mthw 2021-07-27 13:33:47 UTC
Since this issue is present for multiple users can it be considered CONFIRMED? Also, does anyone know of a linux and linux-firmware version that is proven to be working correctly? I personally didn't have an issues yet on 5.10.17 + 20210315.3568f96.
Comment 11 Kunal Bhat 2021-07-28 08:49:23 UTC
(In reply to Mthw from comment #10)
> Since this issue is present for multiple users can it be considered
> CONFIRMED? Also, does anyone know of a linux and linux-firmware version that
> is proven to be working correctly? I personally didn't have an issues yet on
> 5.10.17 + 20210315.3568f96.

Seems to be working fine on anything under linux-firmware version 20210517.
Comment 12 Arunanshu Biswas 2021-08-13 06:05:58 UTC
Using amdgpu.noretry=0 is the best workaround right now. The previous answers (by me) failed at one point or another.
Comment 13 Sylvia 2021-08-27 09:54:32 UTC
updating bios to latest version helped for me

this works:
LENOVO 81FB/LNVNB161216, BIOS 7WCN38WW 11/04/2019


this one been faulty:
LENOVO 81FB/LNVNB161216, BIOS 7WCN36WW 05/10/2019

maybe not so compatible with recent AMD microcode updates?
Comment 14 DocMAX 2021-11-02 23:47:32 UTC
for me, this error begins with kernel 5.14. no issues on 5.13.13.
amd apu 4800u.
Comment 15 DocMAX 2021-11-02 23:52:33 UTC
here is a snippet:

Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000002000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000003000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000031
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000004000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000005000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Comment 16 datenwolf 2021-12-24 19:49:12 UTC
I have the same issue. System information (inxi)

```
    System:    Host: kraeh.datenwolf.net Kernel: 5.15.11_1 x86_64 bits: 64 Desktop: awesome 4.3
               Distro: void
    Machine:   Type: Laptop System: LENOVO product: 20UES00L00 v: ThinkPad T14 Gen 1 serial: PF2CQSMV
               Mobo: LENOVO model: 20UES00L00 serial: L1HF06R00VB UEFI: LENOVO v: R1BET61W(1.30 )
               date: 12/21/2020
    CPU:       Info: 8-Core model: AMD Ryzen 7 PRO 4750U with Radeon Graphics bits: 64 type: MT MCP cache:
               L2: 4 MiB
               Speed: 1396 MHz min/max: 1400/1700 MHz Core speeds (MHz): 1: 1396 2: 1396 3: 1397 4: 1397
               5: 1397 6: 1397 7: 1418 8: 1397 9: 1398 10: 1397 11: 1397 12: 1397 13: 1387 14: 1396
               15: 1415 16: 1397
    Graphics:  Device-1: AMD Renoir driver: amdgpu v: kernel
               Device-2: Chicony Integrated Camera type: USB driver: uvcvideo
               Display: server: X.Org 1.20.14 driver: loaded: amdgpu,ati unloaded: fbdev,modesetting,vesa
               resolution: 1920x1080~60Hz
               OpenGL: renderer: AMD RENOIR (DRM 3.42.0 5.15.11_1 LLVM 12.0.1) v: 4.6 Mesa 21.3.2
    Audio:     Device-1: AMD driver: snd_hda_intel
               Device-2: AMD Raven/Raven2/FireFlight/Renoir Audio Processor driver: snd_rn_pci_acp3x
               Device-3: AMD Family 17h HD Audio driver: snd_hda_intel
               Sound Server-1: ALSA v: k5.15.11_1 running: yes
               Sound Server-2: PipeWire v: 0.3.42 running: yes
```

I'm running reliably into this issue when launching Elite:Dangerous through Steam/Proton. The crash happens during the planet generation shader profiling phase. The kernel log shows this:

```
    [  732.515287] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:16 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  732.515299] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000010000 from IH client 0x12 (VMC)
    [  732.515306] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000021
    [  732.515309] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  732.515312] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x1
    [  732.515314] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  732.515316] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x2
    [  732.515318] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
    [  732.515320] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
    [  732.515322] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  732.515327] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000011000 from IH client 0x12 (VMC)
    [  732.515331] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
    [  732.515333] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  732.515335] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x0
    [  732.515337] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  732.515339] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
    [  732.515341] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
    [  732.515343] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
    
    ...
    
    [  742.526075] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=12215, emitted seq=12218
    [  742.526228] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
    [  742.526361] amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
    [  742.630706] [drm] free PSP TMR buffer
    [  742.658826] amdgpu 0000:07:00.0: amdgpu: MODE2 reset
    [  742.658903] amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
    [  742.659047] [drm] PCIE GART of 1024M enabled.
    [  742.659066] [drm] PTB located at 0x000000F400900000
    [  742.659080] [drm] VRAM is lost due to GPU reset!
    [  742.659470] [drm] PSP is resuming...
    [  742.679525] [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
    [  742.762836] amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
    [  742.771617] amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
    [  742.771622] amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
    [  742.771627] amdgpu 0000:07:00.0: amdgpu: SMU is resuming...
    [  742.772490] amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully!
    [  743.018374] [drm] kiq ring mec 2 pipe 1 q 0
    [  743.221239] amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
    [  743.221401] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
    [  743.221590] amdgpu 0000:07:00.0: amdgpu: GPU reset(1) failed
    [  743.221704] amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110
    [  747.518075] gmc_v9_0_process_interrupt: 708172 callbacks suppressed
    [  747.518084] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:136 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  747.518093] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000014000 from IH client 0x12 (VMC)
    [  747.518100] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000111
    [  747.518102] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  747.518105] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x1
    [  747.518106] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  747.518108] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x1
    [  747.518110] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
    [  747.518112] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
    [  747.518114] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:16 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  747.518118] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000010000 from IH client 0x12 (VMC)
    [  747.518122] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000111
    [  747.518124] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  747.518126] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x1
    [  747.518128] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  747.518129] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x1
    [  747.518131] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
    [  747.518133] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
```
Comment 17 Antonio Chirizzi 2022-01-09 17:24:45 UTC
[Sun Jan  9 04:08:13 2022] Linux version 5.15.12-200.fc35.x86_64
[Sun Jan  9 04:08:13 2022] smpboot: CPU0: AMD Ryzen 7 5700G with Radeon Graphics (family: 0x19, model: 0x50, stepping: 0x0)
...
[Sun Jan  9 04:08:13 2022] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./X300M-STX, BIOS P1.70 07/01/2021
...
[Sun Jan  9 16:43:53 2022] [drm] kiq ring mec 2 pipe 1 q 0
[Sun Jan  9 16:43:53 2022] [drm] DMUB hardware initialized: version=0x0101001C
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32769, for process Xorg pid 2217 thread Xorg:cs0 pid 2218)
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x007C0071
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          RW: 0x1
Comment 18 DocMAX 2022-01-16 11:46:34 UTC
no crash anymore on kernel 5.16
Comment 19 Michal Przybylowicz 2022-06-30 13:59:11 UTC
No crash on kernel 5.18.6-xanmod1-x64v2 but the errors are still in the log:

Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:3 pasid:32775, for process vivaldi-bin pid 1573 thread vivaldi-bi:cs0 pid 1587)
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000080013f43f000 from client 0x12 (VMC)
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00305631
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: VCN0 (0x2b)
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:3 pasid:32775, for process vivaldi-bin pid 1573 thread vivaldi-bi:cs0 pid 1587)
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000080013f441000 from client 0x12 (VMC)
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00305631
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: VCN0 (0x2b)
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 30 13:19:19 dagon kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Comment 20 Jack W 2022-07-27 00:39:29 UTC
Special thanks to the dev who added support for 6000m (mobile) AMD GPU series in 5.16, whoever you are.

Hi, I am also having this issue. While the "*_L2_PROTECTION_FAULT_STATUS" happens (note asterisk), there will also be GUI freezes. 

I am currently running 5.18.14 and I am getting:

[  668.772580] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800380830000 from client 0x1b (UTCL2)
[  668.772584] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00201031
[  668.772586] amdgpu 0000:03:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[  668.772589] amdgpu 0000:03:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[  668.772590] amdgpu 0000:03:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[  668.772593] amdgpu 0000:03:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[  668.772595] amdgpu 0000:03:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[  668.772596] amdgpu 0000:03:00.0: amdgpu: 	 RW: 0x0
[  668.772601] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32770, for process Xorg pid 3051 thread Xorg:cs0 pid 3058)

___________

- To note that I am running an external monitor in addition to laptop display. 
- I am running it on DisplayPort, but HDMI also exhibits the issue.
- The issues seems to occur every 200-600 seconds (sometimes even more often that that), and that excerpt will be output a few times when the freezes happen.
- If I run without the external monitor, issue does not come about at all.

Like others have also said (maybe I saw it on other forums, I  think Linux Mint?): the system will ultimately end up with a black screen, after a few of these occurrences.
Comment 21 Jack W 2022-07-27 09:22:24 UTC
In my case with both integrated and dedicated GPUs being AMD, this MAY be related to AMD's SmartShift. 

I have a feeling that these freezes every 200-600s (or sometimes more often) happen when SmartShift shifts voltages around under the kernel at hardware level.

I cannot fully confirm, but using DRI_PRIME=1 as kernel param in /etc/environment makes it not crash at all even while connected to the external monitor (which is really the only time it does crash, in my case). 
And from what I understand, this kernel param makes amdgpu use only the integrated graphics card thus bypassing any effects SmartShift may have on graphics themselves, hence the hunch / idea.

This is not authoritative, just a hunch based on observations, hoping it helps whoever ends up looking at this, if anybody :( ?
Comment 22 dev.claude.petrescu 2022-08-02 12:38:58 UTC
Bump, confirming that the error still is present on v5.19.0 which was just released 1 or 2 days ago.
Comment 24 darinp 2022-09-12 04:53:24 UTC
Here is what I did to fix it with Ubuntu 22.04

I was getting consistently this every time when I start latest stable chrome:

amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:153 vmid:4 pasid:32792, for process chrome pid 2633417 thread chrome pid 2633417)
amdgpu 0000:2b:00.0: amdgpu:   in page starting at address 0x0000800000240000 from IH client 0x1b (UTCL2)
amdgpu 0000:2b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401533
amdgpu 0000:2b:00.0: amdgpu:      Faulty UTCL2 client ID: SQC (data) (0xa)
amdgpu 0000:2b:00.0: amdgpu:      MORE_FAULTS: 0x1
amdgpu 0000:2b:00.0: amdgpu:      WALKER_ERROR: 0x1
amdgpu 0000:2b:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
amdgpu 0000:2b:00.0: amdgpu:      MAPPING_ERROR: 0x1
amdgpu 0000:2b:00.0: amdgpu:      RW: 0x0

I tried 6.0.0-rc5 which includes https://github.com/torvalds/linux/commit/b8983d42524f10ac6bf35bbce6a7cc8e45f61e04 and still got the problem.

Also tried amdgpu.noretry=0 and was still getting it. 

I was then looking at the mesa drivers and saw that some were coming from ppa:oibaf/graphics-drivers which I had removed already.
So I did

add-apt-repository ppa:oibaf/graphics-drivers
ppa-purge ppa:oibaf/graphics-drivers
add-apt-repository --remove ppa:oibaf/graphics-drivers
apt update

and then reboot and the problem is gone.
Comment 25 bxkx 2022-10-02 16:33:41 UTC
I see similar errors. Happened when I tried to start a game.
Are they related? I have no "no-retry" for example.

amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32776, for process steamwebhelper pid 3552 thread steamwebhe:cs0 pid 3554)
amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800111cc0000 from client 0x1b (UTCL2)
amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00241051
amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32776, for process steamwebhelper pid 3552 thread steamwebhe:cs0 pid 3554)
amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800111cc0000 from client 0x1b (UTCL2)
amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32776, for process steamwebhelper pid 3552 thread steamwebhe:cs0 pid 3554)
Comment 26 dev 2022-10-07 04:13:29 UTC
I've been encountering this or a similar error for a few months now, where my screen locks up and/or gets covered in colored squares.  I've tried setting amdgpu.noretry=0 in my kernel parameters without any luck.  This is also a fresh install (attempted to resolve this error without success), so I know there aren't any old PPAs lying around.  I'm using a desktop computer with a dedicated GPU (Vega 56) with kernel 5.19.0.  I always have Firefox running when this happens, sometimes Chromium (from flatpak) as well.

Here's the error message I've been getting:
amdgpu 0000:0b:00.0: amdgpu: IH ring buffer overflow (0x00080FA0, 0x00000000, 0x00000FC0)
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c35000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c3f000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c31000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c33000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c30000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c32000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c3b000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c39000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c39000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c3d000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
amdgpu 0000:0b:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 1914 thread Xorg:cs0 pid 1916)
amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800101c34000 from IH client 0x1b (UTCL2)
amdgpu 0000:0b:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:0b:00.0: amdgpu:          RW: 0x1
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=7356381, emitted seq=7356383
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1914 thread Xorg:cs0 pid 1916
amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
[drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
[drm] free PSP TMR buffer
CPU: 9 PID: 23890 Comm: kworker/u64:3 Tainted: G           OE     5.19.0-76051900-generic #202207312230~1663791054~22.04~28340d4
Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS (WI-FI), BIOS 1404 11/08/2019
Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Call Trace:
 <TASK>
 show_stack+0x52/0x5c
 dump_stack_lvl+0x49/0x63
 dump_stack+0x10/0x16
 amdgpu_do_asic_reset+0x2b/0x435 [amdgpu]
 amdgpu_device_gpu_recover_imp.cold+0x706/0x7d3 [amdgpu]
 amdgpu_job_timedout+0x153/0x190 [amdgpu]
 ? finish_task_switch.isra.0+0x81/0x280
 drm_sched_job_timedout+0x6d/0x110 [gpu_sched]
 process_one_work+0x21f/0x3f0
 worker_thread+0x50/0x3e0
 ? rescuer_thread+0x3a0/0x3a0
 kthread+0xee/0x120
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30
 </TASK>
Comment 27 Alexey Kuznetsov 2023-04-19 12:56:23 UTC
Keep seeing this on 6.1.0-7-amd64. Logs:

* https://linux-hardware.org/?probe=d669bcc680&log=dmesg.1
Comment 28 Jack W 2023-04-19 15:11:23 UTC
I'm still getting this cr*p nearly a year later. 
State of linux in 2023.
Comment 29 Timon Z. 2023-07-28 09:22:31 UTC
Same issue for me :-(

cat /proc/version

Linux version 5.19.0-50-generic (buildd@lcy02-amd64-030) (x86_64-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #50-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 10 18:24:29 UTC 2023
Comment 30 oneofone 2023-10-19 16:31:57 UTC
Still an issue for me, although it doesn't lock up anymore, just spams dmesg.

Linux 6.6.0-rc6 / Arch.

With both mesa 23.2.1 and mesa from git.
Comment 31 Artem S. Tashkinov 2024-03-13 09:10:33 UTC
The AMD driver stack changes rapidly and contains lots of shared code across products so it's possible that it has already been fixed. Please upgrade to a current stable kernel and userspace stack and try again. If you still experience this issue with the latest driver stack, please capture relevant logging and open a new issue referring back to this one:

https://gitlab.freedesktop.org/drm/amd/-/issues

The current stable kernel releases are 6.8 and 6.7.9.