Created attachment 255215 [details] zip file with all listed attachments There seems to be a logical error while specifying the memory sizes for ttm in the amdgpu module on the SI architecture: while the Fiji card boots fine, the Cape Verde card gives a kernel BUG. dmesg and .config and proposed patch in attachment. the problem lies in linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c: the determination of the p_size is reduced 0 when the page_shift is too big I managed to work around the problem when changing the sentence "adev->gds.mem.total_size >> PAGE_SHIFT)" in amdgpu_ttm_init to "(adev->gds.mem.total_size >> PAGE_SHIFT) + 1)", and the same for "(adev->gds.gws.total_size" and "adev->gds.oa.total_size", though I am not sure this is the correct solution. The problem is that my SI card is limited in memory (I guess) and the page_size is 12
Created attachment 255263 [details] patch 1/2 Does this patch set fix the issue?
Created attachment 255265 [details] patch 2/2
we're one step further: see triplefault.txt output. I set the kernel verbosity to 7, and did a modprobe amdgpu (the module is blacklisted). The error is gone, but the machine hits a triple fault (I suspect it does, don't blame me when it doesn't) and because of that, it immediately reboots without panic. should I file a new bug for that, or can you have a look at it? notice that this does not happen with dpm disabled.
Created attachment 255283 [details] /proc/kmsg output
Did this work with kernel 4.10 or older?
no, the output is exactly the same: after the 4 ring tests, it reboots
That should be tracked in a separate report then.
is there any documentation except my kernel config, lsmod, /proc/kmsg and lspci you need to handle this as a new report? 'cause I can annoy myself with users coming to me saying 'it doesn't work and i know nothing about it', so I'd like to provide every possible info you people need