Bug 214963 - [amdgpu] resuming from suspend fails when IOMMU is missing
Summary: [amdgpu] resuming from suspend fails when IOMMU is missing
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-07 22:48 UTC by spasswolf
Modified: 2021-12-28 11:42 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.15.0, 5.15.1, 5.15.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description spasswolf 2021-11-07 22:48:51 UTC
On the HP bw064-ng Laptop the BIOS does not correctly initialize the IOMMU. This brakes resuming from suspend:
Nov  7 23:23:02 bart kernel: [   96.378265] kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874
Nov  7 23:23:02 bart kernel: [   96.378271] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_resume failed (-6).
Nov  7 23:23:02 bart kernel: [   96.378274] PM: dpm_run_callback(): pci_pm_resume+0x0/0x110 returns -6
Nov  7 23:23:02 bart kernel: [   96.378283] amdgpu 0000:00:01.0: PM: failed to resume async: error -6

leading to the follow up errors:
Nov 7 23:23:03 bart kernel: [   97.580799] [drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
...
Nov  7 23:23:04 bart kernel: [   98.646241] [drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
...
Nov  7 23:23:05 bart kernel: [   99.709472] [drm] perform_link_training_with_retries: Link training attempt 3 of 4 failed
Nov  7 23:23:06 bart kernel: [  100.772840] [drm] enabling link 0 failed: 15
Nov  7 23:23:22 bart kernel: [  106.160854] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Nov  7 23:23:22 bart kernel: [  116.397484] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:47:crtc-0] flip_done timed out

Simply changing the amdgpu_device_ip function in drivers/gpu/drm/amd/amdgpu/amdgpu_device.c to
static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
{
	int r;

#if 0 
	r = amdgpu_amdkfd_resume_iommu(adev);
	if (r)
		return r;
#endif

	r = amdgpu_device_ip_resume_phase1(adev);
	if (r)
		return r;

	r = amdgpu_device_fw_loading(adev);
	if (r)
		return r;

	r = amdgpu_device_ip_resume_phase2(adev);

	return r;
}
makes resuming from suspend work for me again. But of course this is not a viable solution as on systems with working IOMMU this is needed.
One could probably add a flag so that in case of missing IOMMU amdgpu_amdkfd_resume_iommu returns 0.

Note You need to log in before you can comment on or make changes to this bug.