01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M] (rev ff) linux 3.16.1 drmCommandWriteRead() in mesa receives the wrong value already so I put this here. As long as the gpu is not powered off in between runs the value stays the same. When the gpu is powered off and on, the value will be increased by 100. Test program output: $ g++ numcomp.cpp -o numcomp -lOpenCL; ./numcomp; ./numcomp; sleep 10; ./numcomp; ./numcomp OpenCL Number of compute units: 7500 OpenCL Number of compute units: 7500 OpenCL Number of compute units: 7600 OpenCL Number of compute units: 7600 Test program source: #include <CL/cl.hpp> #include <iostream> int main() { int err, numberOfComputeUnits = 0; std::vector<cl::Platform> platformList; cl::Platform::get(&platformList); cl_context_properties cprops[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties) (platformList[0])(), 0 }; cl::Context *context = new cl::Context(CL_DEVICE_TYPE_GPU, cprops, NULL, NULL, &err); std::vector<cl::Device> devices = context->getInfo<CL_CONTEXT_DEVICES>(); devices[0].getInfo(CL_DEVICE_MAX_COMPUTE_UNITS, &numberOfComputeUnits); std::cout << "OpenCL Number of compute units: " << numberOfComputeUnits << std::endl;; return 0; } Also with other tools like http://graphics.stanford.edu/~yoel/notes/clInfo.c: $ ./clInfo | grep MAX_COMPUTE_UNITS device[0x1118288]: MAX_COMPUTE_UNITS: 10400
Created attachment 147041 [details] possible fix The attached patch should fix it.
It helps for the issue of increasing. But now it always returns 100. I don't think the HD 7970M has 100 compute units. http://www.amd.com/de-de/products/graphics/notebook/7900m#2 says "20 Compute Units (1280 Stream Processors)" To see how it adds up to 100 I added some debug info like this: printk("rdbg max_shader_engines: %d\n", rdev->config.si.max_shader_engines); printk("rdbg max_sh_per_se: %d\n", rdev->config.si.max_sh_per_se); printk("rdbg max_cu_per_sh: %d\n", rdev->config.si.max_cu_per_sh); for (i = 0; i < rdev->config.si.max_shader_engines; i++) { for (j = 0; j < rdev->config.si.max_sh_per_se; j++) { for (k = 0; k < rdev->config.si.max_cu_per_sh; k++) { rdev->config.si.active_cus += hweight32(si_get_cu_active_bitmap(rdev, i, j)); printk("rdbg inner: rdev->config.si.active_cus: %d, hweight32(si_get_cu_active_bitmap(rdev, %d, %d)): %d\n", rdev->config.si.active_cus, i, j, hweight32(si_get_cu_active_bitmap(rdev, i, j))); } } } And then I got this output: rdbg max_shader_engines: 2 rdbg max_sh_per_se: 2 rdbg max_cu_per_sh: 5 rdbg inner: rdev->config.si.active_cus: 5, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5 rdbg inner: rdev->config.si.active_cus: 10, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5 rdbg inner: rdev->config.si.active_cus: 15, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5 rdbg inner: rdev->config.si.active_cus: 20, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5 rdbg inner: rdev->config.si.active_cus: 25, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5 rdbg inner: rdev->config.si.active_cus: 30, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5 rdbg inner: rdev->config.si.active_cus: 35, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5 rdbg inner: rdev->config.si.active_cus: 40, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5 rdbg inner: rdev->config.si.active_cus: 45, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5 rdbg inner: rdev->config.si.active_cus: 50, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5 rdbg inner: rdev->config.si.active_cus: 55, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5 rdbg inner: rdev->config.si.active_cus: 60, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5 rdbg inner: rdev->config.si.active_cus: 65, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5 rdbg inner: rdev->config.si.active_cus: 70, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5 rdbg inner: rdev->config.si.active_cus: 75, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5 rdbg inner: rdev->config.si.active_cus: 80, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5 rdbg inner: rdev->config.si.active_cus: 85, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5 rdbg inner: rdev->config.si.active_cus: 90, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5 rdbg inner: rdev->config.si.active_cus: 95, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5 rdbg inner: rdev->config.si.active_cus: 100, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5 I think the k loop is already iterating over the compute units, but instead of 1 unit being added to the total, the result of si_get_cu_active_bitmap is added, which also seems to add up the compute units.
Created attachment 147161 [details] possible fix This should do the trick.
OpenCL Number of compute units: 20 Works for me.
Since it's in the 3.17 rc I use, I'm closing this as fixed.