Bug 185681

Summary: amdgpu: powerplay initialization failed
Product: Drivers Reporter: René Linder (rene.linder)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: high CC: alexdeucher, nayan26deshmukh
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.9.0-rc3-1.gb005706-vanilla Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
Kernel 4.8.4 dmesg Output
patch 1/3
patch 2/3
patch 3/3
The Working gmesg Log from Kernel 4.9 RC4
patch 3/3 (updated)
patch 4/3
Teted with latest Patches on 4.9rc4

Description René Linder 2016-10-31 14:48:53 UTC
Created attachment 243351 [details]
dmesg output

Since rc2/rc3 the initialization of the TOPAZ Gpu fails on Startup, Kernel 4.8.4 Works fine.

Didn't have currently a rc1 initialization on my Notebook to Test it.

DMESG Output Attached.
Comment 1 Nayan Deshmukh 2016-10-31 14:59:31 UTC
This is a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=98357

Is you topaz working with/after 4.8-rc3?
Comment 2 Alex Deucher 2016-10-31 15:21:03 UTC
Can you bisect?
Comment 3 René Linder 2016-10-31 15:30:38 UTC
Created attachment 243361 [details]
Kernel 4.8.4 dmesg Output
Comment 4 René Linder 2016-10-31 15:32:53 UTC
I going to try to revert the last Patches on Powerplay. This will need a moment.
Comment 5 René Linder 2016-11-04 08:08:13 UTC
If i go back to where the Powermanagment (smu7_hwmgr.c) where the commit change:

Since Commit: ab4f06d3adcc5165b13ed2e657050fd1808f319b (agd5f/linux origin/drm-fixes-4.9
-			iceland_hwmgr_init(hwmgr);
+			topaz_set_asic_special_caps(hwmgr);
+			hwmgr->feature_mask &= ~(PP_SMC_VOLTAGE_CONTROL_MASK |

And the Problem is that it get an empty table back here:
        struct phm_ppt_v1_information *table_info =
                        (struct phm_ppt_v1_information *)hwmgr->pptable;
        struct phm_ppt_v1_clock_voltage_dependency_table *sclk_table = NULL;

        printk("Table info");
        if (table_info == NULL) {
                printk("failed\n");
                return -EINVAL;
        }
        printk("successfull");

dmesg Output of this Part:
[    9.616470] [drm] Memory usable by graphics device = 2048M
[    9.616473] [drm] Replacing VGA console driver
[    9.616513] [drm] ACPI BIOS requests an excessive sleep of 5000 ms, using 1500 ms instead
[    9.623922] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.623924] [drm] Driver supports precise vblank timestamp query.
[    9.627447] vga_switcheroo: enabled
[    9.627897] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    9.639545] ATOM BIOS: HP
[    9.639562] [drm] GPU posting now...
[    9.643523] [drm] Changing default dispclk from 0Mhz to 600Mhz
[    9.646378] [TTM] Zone  kernel: Available graphics memory: 1910192 kiB
[    9.646380] [TTM] Initializing pool allocator
[    9.646387] [TTM] Initializing DMA pool allocator
[    9.646413] amdgpu 0000:0a:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[    9.646415] amdgpu 0000:0a:00.0: GTT: 1865M 0x0000000040000000 - 0x00000000B496BFFF
[    9.646426] [drm] Detected VRAM RAM=1024M, BAR=256M
[    9.646428] [drm] RAM width 64bits DDR3
[    9.646448] [drm] amdgpu: 1024M of VRAM memory ready
[    9.646450] [drm] amdgpu: 1865M of GTT memory ready.
[    9.646470] [drm] GART: num cpu pages 477548, num gpu pages 477548
[    9.648496] [drm] PCIE GART of 1865M enabled (table at 0x0000000000040000).
[    9.648543] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.648544] [drm] Driver supports precise vblank timestamp query.
[    9.648598] amdgpu 0000:0a:00.0: amdgpu: using MSI.
[    9.648633] [drm] amdgpu: irq initialized.
[    9.648642] Table info
[    9.648643] failed
[    9.648644] Get EVV Voltage Failed.  Abort Driver loading!
[    9.648646] amdgpu: powerplay initialization failed
[    9.648706] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <amdgpu_powerplay> failed -22
[    9.648713] amdgpu 0000:0a:00.0: amdgpu_init failed
[    9.650148] [TTM] Finalizing pool allocator
[    9.650152] [TTM] Finalizing DMA pool allocator
[    9.650181] [TTM] Zone  kernel: Used memory at exit: 0 kiB
[    9.650182] [drm] amdgpu: ttm finalized
[    9.650187] amdgpu 0000:0a:00.0: Fatal error during GPU init
[    9.650193] [drm] amdgpu: finishing device.
[    9.650195] [TTM] Memory type 2 has not been initialized
[    9.650232] vga_switcheroo: disabled
[    9.650546] amdgpu: probe of 0000:0a:00.0 failed with error -22
Comment 6 René Linder 2016-11-04 14:37:51 UTC
I found some missing / wrong parts, if i look at the handle of PP_TABLE_V0 and PP_TABLE_V1 Handling there is through the whole code everytime a direct access to V0 = hwmgr->  V1 = table_info->

And i never found in the V0 source parts any pptable fill... i removed now the smu7_get_evv_voltage from the V0 and now it seems to work partialy... I think there is going something forget on the rework of the old explicit iceland code path.
Comment 7 René Linder 2016-11-04 15:04:11 UTC
https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/powerplay?h=drm-fixes-4.9&id=025f8bfb84cbcaa78df31ab00d7e3c5f979e9e27

there is the iceland_get_evv_voltage who works ... and the new smu7_get_evv_voltage uses generally the pptable in the beginning and thats wrong.

There is a if for the V0 path but to late and wrong:


This must be switched bevore hwmgr->pptable (table_info) is requested.			if ((hwmgr->pp_table_version == PP_TABLE_V0)
				|| !phm_get_sclk_for_voltage_evv(hwmgr,
Wrong part 2 should be hwmgr and not table_info:					table_info->vddc_lookup_table, vv_id, &sclk)) {
				if (phm_cap_enabled(hwmgr->platform_descriptor.platformCaps,
						PHM_PlatformCaps_ClockStretcher)) {
					for (j = 1; j < sclk_table->count; j++) {
						if (sclk_table->entries[j].clk == sclk &&
								sclk_table->entries[j].cks_enable == 0) {
							sclk += 5000;
							break;
						}
					}
				}
Comment 8 Alex Deucher 2016-11-09 23:11:59 UTC
Created attachment 244131 [details]
patch 1/3

Does the attached patch set fix the issue?
Comment 9 Alex Deucher 2016-11-09 23:12:25 UTC
Created attachment 244141 [details]
patch 2/3
Comment 10 Alex Deucher 2016-11-09 23:12:47 UTC
Created attachment 244151 [details]
patch 3/3
Comment 11 René Linder 2016-11-10 11:02:00 UTC
That has worked :-) Now my graphic card did as it should i think.

Something like this in the dmesg log is still there:

[ powerplay ] VBIOS did not find boot engine clock value                         in dependency table. Using Memory DPM level 0!

And glxspheres / glxgears works.

I've applied the patches to the rc3 & rc4 Kernel both worked.
Comment 12 René Linder 2016-11-10 11:03:06 UTC
Created attachment 244161 [details]
The Working gmesg Log from Kernel 4.9 RC4
Comment 13 René Linder 2016-11-10 11:32:14 UTC
Tested also Megaglest, Systemshock Demo (The new one with Unity Engine), Cube 2/Sauerbraten, vdrift, torcs everything works fine.
Comment 14 Alex Deucher 2016-11-10 16:14:34 UTC
Created attachment 244171 [details]
patch 3/3 (updated)

Please try this updated patch 3 and the new patch 4 in combination with the original patches 1, 2.
Comment 15 Alex Deucher 2016-11-10 16:15:01 UTC
Created attachment 244181 [details]
patch 4/3
Comment 16 Alex Deucher 2016-11-10 21:54:18 UTC
See also:
https://bugs.freedesktop.org/show_bug.cgi?id=98357
Comment 17 René Linder 2016-11-15 11:55:35 UTC
For Me the new Patches works fine ... except the message it didn't found default frequency see dmesg log. But re clocking and everything i have on my notebook works fine.
Comment 18 René Linder 2016-11-15 11:56:34 UTC
Created attachment 244601 [details]
Teted with latest Patches on 4.9rc4
Comment 19 René Linder 2016-11-18 18:01:31 UTC
Fix is in 4.9rc5 and works.