Whenever X is running I get persistent page faults like this: Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:169 vmid:0 pasid:0, for process pid 0 thread pid 0) Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: in page starting at address 0x00000000fffb0000 from client 18 Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00041F52 Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: Faulty UTCL2 client ID: 0xf Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: MORE_FAULTS: 0x0 Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: WALKER_ERROR: 0x1 Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: PERMISSION_FAULTS: 0x5 Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: MAPPING_ERROR: 0x1 Jul 21 15:19:16 jay-X470-AORUS-ULTRA-GAMING kernel: amdgpu 0000:0c:00.0: amdgpu: RW: 0x1 Sometimes I get several of these per second. Sometimes there are none for a few minutes. If I boot into runlevel 3 (i.e. without starting X) I get one of these during boot, but then there are no more after that. I'm running Ubuntu 20.04 but I also saw this on 18.04. Kernel version is 5.4.0-42-generic but I also saw this with 5.3.0-51-generic. I'm using the amdgpu-pro drivers. Graphics card is a Navi 10. Motherboard is a Gigabyte X470 AORUS ULTRA GAMING. CPU is an AMD Ryzen 9 3900X. A very similar sounding bug was reported here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1888116
This is most likely a userspace issue (e.g., mesa). The kernel driver is just the messenger.
Wouldn't there normally be a useful pid in the first line if it came from userspace?
Hi Alex, I asked Jay to report this because (1) the fact that there's a fault during boot is suspicious and points in the direction of this being the kernel's fault and (2) the fact that it's an *mmhub* fault is even more suspicious. Certainly this seems to happen without Mesa video encode/decode activity, so it can't really be Mesa's (or any graphics driver's) fault. Someone suggested that audio support also goes through mmhub and that it may be related. I have no idea if that's true.
Please attach your full dmesg output and xorg log (if using X).
Created attachment 290439 [details] output of journalctl -b-5 -k
FWIW, this still (or again) happens with a 5.9.0-RC7 with a Navi 10. It did not happen on 5.8.6 with slightly different .config though. Attaching a full dmesg. Note that the page faults start happening very shortly after the snd_hda_intel initialization which activates amdgpu.
Created attachment 292697 [details] dmesg on 5.9.0-RC7 dmesg