Created attachment 304361 [details] softlockup Updating linux-firmware to the latest git version causes my pc to lock up during boot. I have a 3900x paired with a 7900xtx running arch linux with 6.3.4 xanmod kernel (but this happens with kernel from the core repo as well) and mesa 23.1.1 if that matters. During boot time I see the following error printed and the system is completely locked up, only hard reset helps: `May 31 07:20:40 valhalla kernel: watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [swapper/5:0]` accompanied with a lots of amdgpu errors in the journal (followed by stack trace after both): ``` May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:9 pasid:32768, for process pid 0 thread pid 0) May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: in page starting at address 0x0000ffff0021a000 from client 10 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00900831 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Faulty UTCL2 client ID: CPF (0x4) May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: MORE_FAULTS: 0x1 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: WALKER_ERROR: 0x0 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: PERMISSION_FAULTS: 0x3 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: MAPPING_ERROR: 0x0 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: RW: 0x0 ``` full journal log in "softlockup". The issues start to happen after [this commit, ffe1a41e](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=ffe1a41e2ddbc39109b12d95dcac282d90eba8fc) but not the above mentioned soft lock, instead after initramfs loads I get the bios splash screen back and it's stuck there. There are different amdgpu errors(followed by stack trace) during this: ``` May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000 May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to enable requested dpm features! May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to setup smc hw! May 31 09:18:37 valhalla kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <smu> failed -62 May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu: finishing device. ``` Logs during this in "amdgpu_error" Note that at the end it seems like the system is running but as I only saw the bios splash screen rebooted via sysrq/reisub. The commit after ffe1a41 ([56832557](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=568325574a3b6148f3296984aa24fcd1fb4b912c) or might be the one after that [39dafcc](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=39d6fcc73100ae4aeeec0194bbf102c672673edd), not sure at the moment) gets past the splash screen but that's where the soft lockup starts to happen.
Created attachment 304362 [details] amdgpu_error
Does this kernel change fix the issues? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ee33d905f89c18d4b33da6e5eefdae6060502df
(In reply to Alex Deucher from comment #2) > Does this kernel change fix the issues? > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=5ee33d905f89c18d4b33da6e5eefdae6060502df Well, it turns out arch updated the kernel to 6.3.5 yesterday evening, xanmod this morning which I didn't noticed earlier (I first encountered the issue like 2 days ago) and that already includes this patch and it's indeed working now. I guess this can be closed. Thank you!