Bug 211157 - amdgpu VM_L2_PROTECTION_FAULT_STATUS:0x00701431
Summary: amdgpu VM_L2_PROTECTION_FAULT_STATUS:0x00701431
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(Other) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-12 18:53 UTC by Alexey Kuznetsov
Modified: 2022-01-16 11:46 UTC (History)
10 users (show)

See Also:
Kernel Version: 5.10.0-1-amd64
Tree: Mainline
Regression: No


Attachments

Description Alexey Kuznetsov 2021-01-12 18:53:42 UTC
My laptop failed rarely during game play (twice a day). First is start to laggin baddly and in one second freeze completely. I still able to login remotely using ssh, but keyboard, caps lock or alt+f1 not working.

dmesg shows a related errors:

[ 7738.264852] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x00008001bfd14000 from client 27
[ 7738.264856] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00701431
[ 7738.264860] amdgpu 0000:03:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[ 7738.264864] amdgpu 0000:03:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[ 7738.264867] amdgpu 0000:03:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7738.264871] amdgpu 0000:03:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[ 7738.264874] amdgpu 0000:03:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7738.264878] amdgpu 0000:03:00.0: amdgpu: 	 RW: 0x0
[ 7738.265691] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32778, for process PathOfExile_x64 pid 14159 thread PathOfExile_x64 pid 14231)


Full log: https://linux-hardware.org/?probe=7f9336625d
Comment 1 Arunanshu Biswas 2021-05-15 15:56:25 UTC
Same here. This happens with 5.11 and 5.12 (AFAIK). Strange boxes appear on screen, freezes etc at very random intervals.
Comment 2 Sylvia 2021-06-06 18:59:04 UTC
Started to happen after 5.12.4 or so...

Bug also been "backported" to 5.10.x

Happens often, always with Firefox, when opening heavy graphics pages under CPU load (usually)

Sometimes can recover, sometimes freezing the system completely.


Hardware: Lenovo 330S 15ARR 
AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx AuthenticAMD GNU/Linux
AMD Radeon(TM) Vega 8 Graphics

Jun 06 21:40:02 [kernel] [11217.699553] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800119c80000 from client 27
Jun 06 21:40:02 [kernel] [11217.699555] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x006C0071
Jun 06 21:40:02 [kernel] [11217.699557] amdgpu 0000:04:00.0: amdgpu: _ Faulty UTCL2 client ID: CB (0x0)
Jun 06 21:40:02 [kernel] [11217.699559] amdgpu 0000:04:00.0: amdgpu: _ MORE_FAULTS: 0x1
Jun 06 21:40:02 [kernel] [11217.699560] amdgpu 0000:04:00.0: amdgpu: _ WALKER_ERROR: 0x0
Jun 06 21:40:02 [kernel] [11217.699562] amdgpu 0000:04:00.0: amdgpu: _ PERMISSION_FAULTS: 0x7
Jun 06 21:40:02 [kernel] [11217.699564] amdgpu 0000:04:00.0: amdgpu: _ MAPPING_ERROR: 0x0
Jun 06 21:40:02 [kernel] [11217.699566] amdgpu 0000:04:00.0: amdgpu: _ RW: 0x1
Jun 06 21:40:07 [kernel] [11222.696469] gmc_v9_0_process_interrupt: 5900 callbacks suppressed
Jun 06 21:40:07 [kernel] [11222.696476] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32777, for process GPU Process pid 3227 thread firefox:cs0 pid 3288)



Jun 06 21:40:07 [kernel] [11222.699857] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800119c80000 from client 27
Jun 06 21:40:07 [kernel] [11222.699858] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x006C0071
Jun 06 21:40:07 [kernel] [11222.699860] amdgpu 0000:04:00.0: amdgpu: _ Faulty UTCL2 client ID: CB (0x0)
Jun 06 21:40:07 [kernel] [11222.699862] amdgpu 0000:04:00.0: amdgpu: _ MORE_FAULTS: 0x1
Jun 06 21:40:07 [kernel] [11222.699863] amdgpu 0000:04:00.0: amdgpu: _ WALKER_ERROR: 0x0
Jun 06 21:40:07 [kernel] [11222.699865] amdgpu 0000:04:00.0: amdgpu: _ PERMISSION_FAULTS: 0x7
Jun 06 21:40:07 [kernel] [11222.699867] amdgpu 0000:04:00.0: amdgpu: _ MAPPING_ERROR: 0x0
Jun 06 21:40:07 [kernel] [11222.699868] amdgpu 0000:04:00.0: amdgpu: _ RW: 0x1
Jun 06 21:40:07 [kernel] [11222.700717] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32777, for process GPU Process pid 3227 thread firefox:cs0 pid 3288)
Comment 3 Arunanshu Biswas 2021-06-06 21:28:39 UTC
I think I found a solution. This applies to TLP users specifically, but non-TLP users can try that too.

In `/etc/tlp.conf` file, set this to:

RUNTIME_PM_DRIVER_BLACKLIST=""
# originally, it looks like this:
# RUNTIME_PM_DRIVER_BLACKLIST="amdgpu mei_me nouveau pcieport radeon"
Comment 4 Arunanshu Biswas 2021-06-06 21:29:59 UTC
(In reply to Arunanshu Biswas from comment #3)
> I think I found a solution. This applies to TLP users specifically, but
> non-TLP users can try that too.
> 
> In `/etc/tlp.conf` file, set this to:
> 
> RUNTIME_PM_DRIVER_BLACKLIST=""
> # originally, it looks like this:
> # RUNTIME_PM_DRIVER_BLACKLIST="amdgpu mei_me nouveau pcieport radeon"

Or you can simply blacklist it in modprobe.
Comment 5 Sylvia 2021-06-21 16:24:20 UTC
I have removed ivrs_ioapic[32]=00:14.0 froom boot params, and it seems to be that problem vanished (or just for now.. still keeping an eye for crashes), this param was required to boot my Lenovo laptop with earlier kernels.

00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
Comment 6 Dmitry Myachin 2021-06-23 14:53:56 UTC
I have same problem.

All working fine some time and then black screen. Music still playing but keyboard's multimedia key does not stop it anymore. Also Alt+F* not working.


Kernel: 5.12.12-zen1-1-zen
CPU: AMD Ryzen 5 3400G (8) @ 3.700GHz
GPU: AMD ATI 07:00.0 Picasso


Arch Linux. And it does not matter -zen or not.
Comment 7 Sylvia 2021-06-25 11:05:40 UTC
(In reply to Sylvia from comment #5)
> I have removed ivrs_ioapic[32]=00:14.0 froom boot params, and it seems to be
> that problem vanished (or just for now.. still keeping an eye for crashes),



Still..

[  430.943898] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32775, for process GPU Process pid 2960 thread firefox:cs0 pid 3006)
[  430.943905] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800112200000 from client 27
[  430.943908] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
[  430.943912] amdgpu 0000:04:00.0: amdgpu:      Faulty UTCL2 client ID: TCP (0x8)
[  430.943914] amdgpu 0000:04:00.0: amdgpu:      MORE_FAULTS: 0x1
[  430.943916] amdgpu 0000:04:00.0: amdgpu:      WALKER_ERROR: 0x0
[  430.943918] amdgpu 0000:04:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[  430.943921] amdgpu 0000:04:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  430.943923] amdgpu 0000:04:00.0: amdgpu:      RW: 0x1

[  435.948029] gmc_v9_0_process_interrupt: 47127 callbacks suppressed
[  440.952210] gmc_v9_0_process_interrupt: 47115 callbacks suppressed
[  445.955904] gmc_v9_0_process_interrupt: 615433 callbacks suppressed
[  455.963501] gmc_v9_0_process_interrupt: 623975 callbacks suppressed
[  460.967605] gmc_v9_0_process_interrupt: 622849 callbacks suppressed
[  465.971907] gmc_v9_0_process_interrupt: 624947 callbacks suppressed
[  470.976127] gmc_v9_0_process_interrupt: 625531 callbacks suppressed
[  480.104842] gmc_v9_0_process_interrupt: 107333 callbacks suppressed
[  485.107666] gmc_v9_0_process_interrupt: 624573 callbacks suppressed
[  490.111856] gmc_v9_0_process_interrupt: 624584 callbacks suppressed
490.266839] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  495.115724] gmc_v9_0_process_interrupt: 626231 callbacks suppressed
[  500.119513] gmc_v9_0_process_interrupt: 626410 callbacks suppressed

[  500.506030] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[  505.123885] gmc_v9_0_process_interrupt: 124455 callbacks suppressed

[  510.128389] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32775, for process GPU Process pid 2960 thread firefox:cs0 pid 3006)
[  510.128391] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x000080011220f000 from client 27
[  510.128393] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
[  510.128395] amdgpu 0000:04:00.0: amdgpu:      Faulty UTCL2 client ID: TCP (0x8)
[  510.128396] amdgpu 0000:04:00.0: amdgpu:      MORE_FAULTS: 0x1
[  510.128397] amdgpu 0000:04:00.0: amdgpu:      WALKER_ERROR: 0x0
[  510.128399] amdgpu 0000:04:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[  510.128400] amdgpu 0000:04:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  510.128402] amdgpu 0000:04:00.0: amdgpu:      RW: 0x1
[  510.743952] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 8 Vlastimil Holer 2021-07-06 01:47:06 UTC
Still an issue with 5.13.0
(AMD Ryzen 5 PRO 3500U, Fedora 34, Mesa 21.1.4)

Jul 06 03:34:26 kernel: gmc_v9_0_process_interrupt: 116 callbacks suppressed
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a2000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a0000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a3000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a1000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a4000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a5000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a6000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a8000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110a7000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 1574 thread Xorg:cs0 pid 1627)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address 0x00008001110aa000 from IH client 0x1b (UTCL2)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 06 03:34:26 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1
Jul 06 03:34:36 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=413630, emitted seq=413632
Jul 06 03:34:36 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1574 thread Xorg:cs0 pid 1627
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fde0 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe00 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe20 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe40 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe60 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11041fe80 flags=0x0070]
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: amd_iommu_report_page_fault: 21 callbacks suppressed
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041fea0 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041fec0 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041fee0 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff00 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff20 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff40 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:36 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x11041ff60 flags=0x0070]
Jul 06 03:34:36 kernel: [drm] free PSP TMR buffer
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
Jul 06 03:34:36 kernel: [drm] PCIE GART of 1024M enabled.
Jul 06 03:34:36 kernel: [drm] PTB located at 0x000000F401FA4000
Jul 06 03:34:36 kernel: [drm] PSP is resuming...
Jul 06 03:34:36 kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jul 06 03:34:36 kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jul 06 03:34:37 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jul 06 03:34:37 kernel: [drm] VCN decode and encode initialized successfully(under SPG Mode).
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow start
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow done
Jul 06 03:34:37 kernel: [drm] Skip scheduling IBs!
Jul 06 03:34:37 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(2) succeeded!
Jul 06 03:34:47 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=413639, emitted seq=413641
Jul 06 03:34:47 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1574 thread Xorg:cs0 pid 1627
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jul 06 03:34:47 kernel: amd_iommu_report_page_fault: 36 callbacks suppressed
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1104075a0 flags=0x0070]
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110440000 flags=0x0070]
Jul 06 03:34:47 kernel: [drm] free PSP TMR buffer
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
Jul 06 03:34:47 kernel: [drm] PCIE GART of 1024M enabled.
Jul 06 03:34:47 kernel: [drm] PTB located at 0x000000F401FA4000
Jul 06 03:34:47 kernel: [drm] PSP is resuming...
Jul 06 03:34:47 kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jul 06 03:34:47 kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jul 06 03:34:47 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jul 06 03:34:48 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
Jul 06 03:34:48 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
Jul 06 03:34:48 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(4) failed
Jul 06 03:34:48 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset end with ret = -110
Jul 06 03:34:58 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 06 03:35:08 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 9 Mthw 2021-07-27 13:29:55 UTC
Today on AMD 3550H, Arch Linux, linux 5.13.5, linux-firmware 20210315.3568f96. Froze for a few seconds, then recovered, had to restart Firefox though. Log:

júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e35000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00401031
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e36000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e32000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e26000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e33000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e0b000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e03000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e0a000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e27000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32778, for process firefox pid 23395 thread firefox:cs0 pid 23464)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000800120e02000 from IH client 0x1b (UTCL2)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
júl 27 15:04:40 tuf-red kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Comment 10 Mthw 2021-07-27 13:33:47 UTC
Since this issue is present for multiple users can it be considered CONFIRMED? Also, does anyone know of a linux and linux-firmware version that is proven to be working correctly? I personally didn't have an issues yet on 5.10.17 + 20210315.3568f96.
Comment 11 Kunal Bhat 2021-07-28 08:49:23 UTC
(In reply to Mthw from comment #10)
> Since this issue is present for multiple users can it be considered
> CONFIRMED? Also, does anyone know of a linux and linux-firmware version that
> is proven to be working correctly? I personally didn't have an issues yet on
> 5.10.17 + 20210315.3568f96.

Seems to be working fine on anything under linux-firmware version 20210517.
Comment 12 Arunanshu Biswas 2021-08-13 06:05:58 UTC
Using amdgpu.noretry=0 is the best workaround right now. The previous answers (by me) failed at one point or another.
Comment 13 Sylvia 2021-08-27 09:54:32 UTC
updating bios to latest version helped for me

this works:
LENOVO 81FB/LNVNB161216, BIOS 7WCN38WW 11/04/2019


this one been faulty:
LENOVO 81FB/LNVNB161216, BIOS 7WCN36WW 05/10/2019

maybe not so compatible with recent AMD microcode updates?
Comment 14 DocMAX 2021-11-02 23:47:32 UTC
for me, this error begins with kernel 5.14. no issues on 5.13.13.
amd apu 4800u.
Comment 15 DocMAX 2021-11-02 23:52:33 UTC
here is a snippet:

Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000002000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000003000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000031
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000004000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000005000 from IH client 0x12 (VMC)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Nov 02 21:54:07 zeus kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Comment 16 datenwolf 2021-12-24 19:49:12 UTC
I have the same issue. System information (inxi)

```
    System:    Host: kraeh.datenwolf.net Kernel: 5.15.11_1 x86_64 bits: 64 Desktop: awesome 4.3
               Distro: void
    Machine:   Type: Laptop System: LENOVO product: 20UES00L00 v: ThinkPad T14 Gen 1 serial: PF2CQSMV
               Mobo: LENOVO model: 20UES00L00 serial: L1HF06R00VB UEFI: LENOVO v: R1BET61W(1.30 )
               date: 12/21/2020
    CPU:       Info: 8-Core model: AMD Ryzen 7 PRO 4750U with Radeon Graphics bits: 64 type: MT MCP cache:
               L2: 4 MiB
               Speed: 1396 MHz min/max: 1400/1700 MHz Core speeds (MHz): 1: 1396 2: 1396 3: 1397 4: 1397
               5: 1397 6: 1397 7: 1418 8: 1397 9: 1398 10: 1397 11: 1397 12: 1397 13: 1387 14: 1396
               15: 1415 16: 1397
    Graphics:  Device-1: AMD Renoir driver: amdgpu v: kernel
               Device-2: Chicony Integrated Camera type: USB driver: uvcvideo
               Display: server: X.Org 1.20.14 driver: loaded: amdgpu,ati unloaded: fbdev,modesetting,vesa
               resolution: 1920x1080~60Hz
               OpenGL: renderer: AMD RENOIR (DRM 3.42.0 5.15.11_1 LLVM 12.0.1) v: 4.6 Mesa 21.3.2
    Audio:     Device-1: AMD driver: snd_hda_intel
               Device-2: AMD Raven/Raven2/FireFlight/Renoir Audio Processor driver: snd_rn_pci_acp3x
               Device-3: AMD Family 17h HD Audio driver: snd_hda_intel
               Sound Server-1: ALSA v: k5.15.11_1 running: yes
               Sound Server-2: PipeWire v: 0.3.42 running: yes
```

I'm running reliably into this issue when launching Elite:Dangerous through Steam/Proton. The crash happens during the planet generation shader profiling phase. The kernel log shows this:

```
    [  732.515287] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:16 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  732.515299] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000010000 from IH client 0x12 (VMC)
    [  732.515306] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000021
    [  732.515309] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  732.515312] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x1
    [  732.515314] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  732.515316] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x2
    [  732.515318] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
    [  732.515320] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
    [  732.515322] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:24 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  732.515327] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000011000 from IH client 0x12 (VMC)
    [  732.515331] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
    [  732.515333] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  732.515335] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x0
    [  732.515337] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  732.515339] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
    [  732.515341] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
    [  732.515343] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
    
    ...
    
    [  742.526075] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=12215, emitted seq=12218
    [  742.526228] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
    [  742.526361] amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
    [  742.630706] [drm] free PSP TMR buffer
    [  742.658826] amdgpu 0000:07:00.0: amdgpu: MODE2 reset
    [  742.658903] amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
    [  742.659047] [drm] PCIE GART of 1024M enabled.
    [  742.659066] [drm] PTB located at 0x000000F400900000
    [  742.659080] [drm] VRAM is lost due to GPU reset!
    [  742.659470] [drm] PSP is resuming...
    [  742.679525] [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
    [  742.762836] amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
    [  742.771617] amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
    [  742.771622] amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
    [  742.771627] amdgpu 0000:07:00.0: amdgpu: SMU is resuming...
    [  742.772490] amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully!
    [  743.018374] [drm] kiq ring mec 2 pipe 1 q 0
    [  743.221239] amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
    [  743.221401] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
    [  743.221590] amdgpu 0000:07:00.0: amdgpu: GPU reset(1) failed
    [  743.221704] amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110
    [  747.518075] gmc_v9_0_process_interrupt: 708172 callbacks suppressed
    [  747.518084] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:136 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  747.518093] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000014000 from IH client 0x12 (VMC)
    [  747.518100] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000111
    [  747.518102] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  747.518105] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x1
    [  747.518106] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  747.518108] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x1
    [  747.518110] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
    [  747.518112] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
    [  747.518114] amdgpu 0000:07:00.0: amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:16 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
    [  747.518118] amdgpu 0000:07:00.0: amdgpu:   in page starting at address 0x0000000000010000 from IH client 0x12 (VMC)
    [  747.518122] amdgpu 0000:07:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000111
    [  747.518124] amdgpu 0000:07:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
    [  747.518126] amdgpu 0000:07:00.0: amdgpu: 	 MORE_FAULTS: 0x1
    [  747.518128] amdgpu 0000:07:00.0: amdgpu: 	 WALKER_ERROR: 0x0
    [  747.518129] amdgpu 0000:07:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x1
    [  747.518131] amdgpu 0000:07:00.0: amdgpu: 	 MAPPING_ERROR: 0x1
    [  747.518133] amdgpu 0000:07:00.0: amdgpu: 	 RW: 0x0
```
Comment 17 Antonio Chirizzi 2022-01-09 17:24:45 UTC
[Sun Jan  9 04:08:13 2022] Linux version 5.15.12-200.fc35.x86_64
[Sun Jan  9 04:08:13 2022] smpboot: CPU0: AMD Ryzen 7 5700G with Radeon Graphics (family: 0x19, model: 0x50, stepping: 0x0)
...
[Sun Jan  9 04:08:13 2022] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./X300M-STX, BIOS P1.70 07/01/2021
...
[Sun Jan  9 16:43:53 2022] [drm] kiq ring mec 2 pipe 1 q 0
[Sun Jan  9 16:43:53 2022] [drm] DMUB hardware initialized: version=0x0101001C
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32769, for process Xorg pid 2217 thread Xorg:cs0 pid 2218)
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x007C0071
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Sun Jan  9 16:43:53 2022] amdgpu 0000:05:00.0: amdgpu:          RW: 0x1
Comment 18 DocMAX 2022-01-16 11:46:34 UTC
no crash anymore on kernel 5.16

Note You need to log in before you can comment on or make changes to this bug.