Created attachment 257863 [details] dmesg output the clinfo command does not work anymore since I tested 4.13-rc4 on my pc. kernel error message in attachment. I'm using the amdgpu-pro libraries (not the kernel driver, really only the libraries) on a dual opteron with a R9 nano and a HD 7700 installed. I must say I was very happy DPM is working now, but the clinfo calls not working is a bit of a bummer :(
Created attachment 257865 [details] lspci output
Created attachment 257867 [details] kernel config
Created attachment 257881 [details] working dmesg dmesg with amdgpu.dpm=0 seems to intitialize the device correctly
Created attachment 257883 [details] working clinfo clinfo output with amdgpu.dpm=0
the system works with dpm=0. I attached some info about the working system. Please note that I DO NOT use the amdgpu-pro kernel module, only its libraries
Can you bisect the kernel?
I'm not a kernel developer, but I am willing to help you where I can. what do you need from the bisection?
No need to be a developer, just to compile and test a number of kernel Git commits. Search for "git bisect howto".
I just browsed through a few howtos: It won't be easy to point to the problem: in 4.10, it hit a triple fault and then crashed with dpm enabled. do you want a bisection from that one(see 194899) to the current status or do I need to do something else?
(In reply to Janpieter Sollie from comment #9) > It won't be easy to point to the problem: in 4.10, it hit a triple fault and > then crashed with dpm enabled. do you want a bisection from that one(see > 194899) to the current status or do I need to do something else? Hmm, I guess I misread the bug description as meaning it worked properly before. If that's not the case, there's probably no point in bisecting.
the problem SEEMS to be with CIK support and upgrade to rc6 ... disabling CIK support in my kernel and upgrading it to rc6 solved the problem. Probably CIK and SI are not really cooperating properly yet.
(In reply to Janpieter Sollie from comment #11) > disabling CIK support in my kernel and upgrading it to rc6 solved the > problem. > Probably CIK and SI are not really cooperating properly yet. Weird, does rc6 still work with CIK support enabled?
yes. But I really think the problem is application-layer: I do not see any errors in dmesg when running clinfo, but when I run the application I'm developing, I see the following errors in dmesg: [31637.263268] amdgpu 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0x00000000ff9f4000 flags=0x0000] [31637.263379] amdgpu 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 address=0x00000000ff9e4000 flags=0x0000] ... and the application hangs the interesting part here is: to make sure the driver does not "accidentally work", I added a polaris device to the system. The amdgpu recognised the polaris, fiji and SI, but only the SI gives these faults. do you know how I can figure out whether this is a kernel / midline / application layer problem?