Bug 82581 - CL_DEVICE_MAX_COMPUTE_UNITS increases by 100 every time runpm powers on 7970M pitcairn
Summary: CL_DEVICE_MAX_COMPUTE_UNITS increases by 100 every time runpm powers on 7970M...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-17 11:03 UTC by Christoph Haag
Modified: 2014-09-02 13:28 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.16.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
possible fix (1.62 KB, patch)
2014-08-18 14:20 UTC, Alex Deucher
Details | Diff
possible fix (2.48 KB, patch)
2014-08-18 21:11 UTC, Alex Deucher
Details | Diff

Description Christoph Haag 2014-08-17 11:03:43 UTC
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wimbledon XT [Radeon HD 7970M] (rev ff)

linux 3.16.1

drmCommandWriteRead() in mesa receives the wrong value already so I put this here.

As long as the gpu is not powered off in between runs the value stays the same.
When the gpu is powered off and on, the value will be increased by 100.

Test program output:

$ g++ numcomp.cpp -o numcomp -lOpenCL; ./numcomp; ./numcomp; sleep 10; ./numcomp; ./numcomp
OpenCL Number of compute units: 7500
OpenCL Number of compute units: 7500
OpenCL Number of compute units: 7600
OpenCL Number of compute units: 7600


Test program source:

#include <CL/cl.hpp>
#include <iostream>
int main() {
  int err, numberOfComputeUnits = 0;
  std::vector<cl::Platform> platformList;
  cl::Platform::get(&platformList);
  cl_context_properties cprops[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties) (platformList[0])(), 0 };
  cl::Context *context = new cl::Context(CL_DEVICE_TYPE_GPU, cprops, NULL, NULL, &err);
  std::vector<cl::Device> devices = context->getInfo<CL_CONTEXT_DEVICES>();
  devices[0].getInfo(CL_DEVICE_MAX_COMPUTE_UNITS, &numberOfComputeUnits);
  std::cout << "OpenCL Number of compute units: " << numberOfComputeUnits << std::endl;;
  return 0;
}


Also with other tools like http://graphics.stanford.edu/~yoel/notes/clInfo.c:
$ ./clInfo | grep MAX_COMPUTE_UNITS
        device[0x1118288]: MAX_COMPUTE_UNITS: 10400
Comment 1 Alex Deucher 2014-08-18 14:20:29 UTC
Created attachment 147041 [details]
possible fix

The attached patch should fix it.
Comment 2 Christoph Haag 2014-08-18 19:02:22 UTC
It helps for the issue of increasing.

But now it always returns 100.

I don't think the HD 7970M has 100 compute units.

http://www.amd.com/de-de/products/graphics/notebook/7900m#2
says
"20 Compute Units (1280 Stream Processors)"

To see how it adds up to 100 I added some debug info like this:

printk("rdbg max_shader_engines: %d\n", rdev->config.si.max_shader_engines);
printk("rdbg max_sh_per_se: %d\n", rdev->config.si.max_sh_per_se);
printk("rdbg max_cu_per_sh: %d\n", rdev->config.si.max_cu_per_sh);
for (i = 0; i < rdev->config.si.max_shader_engines; i++) {
        for (j = 0; j < rdev->config.si.max_sh_per_se; j++) {
                for (k = 0; k < rdev->config.si.max_cu_per_sh; k++) {
                        rdev->config.si.active_cus +=
                                hweight32(si_get_cu_active_bitmap(rdev, i, j));
                        printk("rdbg inner: rdev->config.si.active_cus: %d, hweight32(si_get_cu_active_bitmap(rdev, %d, %d)): %d\n", rdev->config.si.active_cus, i, j, hweight32(si_get_cu_active_bitmap(rdev, i, j)));
                }
        }
}

And then I got this output:

rdbg max_shader_engines: 2
rdbg max_sh_per_se: 2
rdbg max_cu_per_sh: 5
rdbg inner: rdev->config.si.active_cus: 5, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5
rdbg inner: rdev->config.si.active_cus: 10, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5
rdbg inner: rdev->config.si.active_cus: 15, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5
rdbg inner: rdev->config.si.active_cus: 20, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5
rdbg inner: rdev->config.si.active_cus: 25, hweight32(si_get_cu_active_bitmap(rdev, 0, 0)): 5
rdbg inner: rdev->config.si.active_cus: 30, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5
rdbg inner: rdev->config.si.active_cus: 35, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5
rdbg inner: rdev->config.si.active_cus: 40, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5
rdbg inner: rdev->config.si.active_cus: 45, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5
rdbg inner: rdev->config.si.active_cus: 50, hweight32(si_get_cu_active_bitmap(rdev, 0, 1)): 5
rdbg inner: rdev->config.si.active_cus: 55, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5
rdbg inner: rdev->config.si.active_cus: 60, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5
rdbg inner: rdev->config.si.active_cus: 65, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5
rdbg inner: rdev->config.si.active_cus: 70, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5
rdbg inner: rdev->config.si.active_cus: 75, hweight32(si_get_cu_active_bitmap(rdev, 1, 0)): 5
rdbg inner: rdev->config.si.active_cus: 80, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5
rdbg inner: rdev->config.si.active_cus: 85, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5
rdbg inner: rdev->config.si.active_cus: 90, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5
rdbg inner: rdev->config.si.active_cus: 95, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5
rdbg inner: rdev->config.si.active_cus: 100, hweight32(si_get_cu_active_bitmap(rdev, 1, 1)): 5

I think the k loop is already iterating over the compute units, but instead of 1 unit being added to the total, the result of si_get_cu_active_bitmap is added, which also seems to add up the compute units.
Comment 3 Alex Deucher 2014-08-18 21:11:10 UTC
Created attachment 147161 [details]
possible fix

This should do the trick.
Comment 4 Christoph Haag 2014-08-19 09:19:44 UTC
OpenCL Number of compute units: 20

Works for me.
Comment 5 Christoph Haag 2014-09-02 13:28:13 UTC
Since it's in the 3.17 rc I use, I'm closing this as fixed.

Note You need to log in before you can comment on or make changes to this bug.