Bug 194867
Summary: | DRM BUG while initializing cape verde (2nd card) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Janpieter Sollie (janpieter.sollie) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alexdeucher |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.11-rc2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
zip file with all listed attachments
patch 1/2 patch 2/2 /proc/kmsg output |
Created attachment 255263 [details]
patch 1/2
Does this patch set fix the issue?
Created attachment 255265 [details]
patch 2/2
we're one step further: see triplefault.txt output. I set the kernel verbosity to 7, and did a modprobe amdgpu (the module is blacklisted). The error is gone, but the machine hits a triple fault (I suspect it does, don't blame me when it doesn't) and because of that, it immediately reboots without panic. should I file a new bug for that, or can you have a look at it? notice that this does not happen with dpm disabled. Created attachment 255283 [details]
/proc/kmsg output
Did this work with kernel 4.10 or older? no, the output is exactly the same: after the 4 ring tests, it reboots That should be tracked in a separate report then. is there any documentation except my kernel config, lsmod, /proc/kmsg and lspci you need to handle this as a new report? 'cause I can annoy myself with users coming to me saying 'it doesn't work and i know nothing about it', so I'd like to provide every possible info you people need |
Created attachment 255215 [details] zip file with all listed attachments There seems to be a logical error while specifying the memory sizes for ttm in the amdgpu module on the SI architecture: while the Fiji card boots fine, the Cape Verde card gives a kernel BUG. dmesg and .config and proposed patch in attachment. the problem lies in linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c: the determination of the p_size is reduced 0 when the page_shift is too big I managed to work around the problem when changing the sentence "adev->gds.mem.total_size >> PAGE_SHIFT)" in amdgpu_ttm_init to "(adev->gds.mem.total_size >> PAGE_SHIFT) + 1)", and the same for "(adev->gds.gws.total_size" and "adev->gds.oa.total_size", though I am not sure this is the correct solution. The problem is that my SI card is limited in memory (I guess) and the page_size is 12