How can I get number of Cores in cuda device?

The cores per multiprocessor is the only "missing" piece of data. That data is not provided directly in the cudaDeviceProp structure, but it can be inferred based on published data and more published data from the devProp.major and devProp.minor entries, which together make up the CUDA compute capability of the device.

Something like this should work:

#include "cuda_runtime_api.h"
// you must first call the cudaGetDeviceProperties() function, then pass 
// the devProp structure returned to this function:
int getSPcores(cudaDeviceProp devProp)
{  
    int cores = 0;
    int mp = devProp.multiProcessorCount;
    switch (devProp.major){
     case 2: // Fermi
      if (devProp.minor == 1) cores = mp * 48;
      else cores = mp * 32;
      break;
     case 3: // Kepler
      cores = mp * 192;
      break;
     case 5: // Maxwell
      cores = mp * 128;
      break;
     case 6: // Pascal
      if ((devProp.minor == 1) || (devProp.minor == 2)) cores = mp * 128;
      else if (devProp.minor == 0) cores = mp * 64;
      else printf("Unknown device type\n");
      break;
     case 7: // Volta and Turing
      if ((devProp.minor == 0) || (devProp.minor == 5)) cores = mp * 64;
      else printf("Unknown device type\n");
      break;
     case 8: // Ampere
      if (devProp.minor == 0) cores = mp * 64;
      else if (devProp.minor == 6) cores = mp * 128;
      else printf("Unknown device type\n");
      break;
     default:
      printf("Unknown device type\n"); 
      break;
      }
    return cores;
}

(coded in browser)

"cores" is a bit of a marketing term. The most common connotation in my opinion is to equate it with SP units in the SM. That is the meaning I have demonstrated here. I've also omitted cc 1.x devices from this, as those device types are no longer supported in CUDA 7.0 and CUDA 7.5

A pythonic version is here


In linux you can run the following command to get the number of CUDA cores:

nvidia-settings -q CUDACores -t

To get the output of this command in C, use the popen function.


As Vraj Pandya already said, there is a function (_ConvertSMVer2Cores) in the Common/helper_cuda.h file on nvidia's cuda-samples github repository, which provides this functionality. You just need to multiply its result with the multiprocessor count from the GPU.

Just wanted to provide a current link.

#include <cuda.h>
#include <cuda_runtime.h>
#include <helper_cuda.h> // You need to place this file somewhere where it can be
                         // found by the linker. 
                         // The file itself seems to also require the 
                         // `helper_string.h` file (in the same folder as 
                         // `helper_cuda.h`).

int deviceID;
cudaDeviceProp props;

cudaGetDevice(&deviceID);
cudaGetDeviceProperties(&props, deviceID);
    
int CUDACores = _ConvertSMVer2Cores(props.major, props.minor) * props.multiProcessorCount;

Tags:

C

Cuda

Nvidia