While the OpenCL API is written in C, the OpenCL 1.1 specification also comes with a specification for C++ bindings. In this post I go through how to use the C++ bindings instead of C for the simple example of vector addition from my previous post Getting started with OpenCL and GPU computing.

Download the bindings

First of all the C++ bindings for OpenCL can be downloaded directly from Khronos website. The cl.hpp file should be put in the include folder with cl.h and the other OpenCL header files. You might not need to download the bindings file as it is usually included in the include folder of the vendors SDK (ATI Stream and CUDA Toolkit). Note that you have to set up OpenCL before you can use the C++ bindings. If you haven’t set OpenCL up yet, read through my Getting started with OpenCL and GPU computing post first.

Using the bindings

To use the C++ bindings simply include the cl.hpp file instead of the cl.h file. The bindings include several useful objects like Platform, Device, Context, CommandQueue, Program etc. which will make your life easier. All of the objects are defined in the namespace cl.

API

An overview of all of the objects, functions and variables can be found in the official C++ bindings specification from Khronos

Exceptions

If you want to use exceptions instead of checking for errors after every call to a function you can do so by defining __CL_ENABLE_EXCEPTIONS to the preprocessor and then wrapping your code in a try-catch block. See the example below for more details.

The vector addition example

This is the example of vector addition from my previous post, Getting started with OpenCL and GPU computing, which simply computes the sum of two lists in parallel on the GPU. Note that though the API from above says that the context properties array in line 26 can be omitted from the context constructor and that the platform will then be selected automatically I got an error on clCreateContextFromType while trying to do so. That is why I have included lines 22-30 to select the first platform the system finds, which in my opinion shouldn’t be necessary. Compile the program using your favorite C++ compiler with the same compile options as with OpenCL C.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#define __NO_STD_VECTOR // Use cl::vector instead of STL version
#define __CL_ENABLE_EXCEPTIONS
#include <CL/cl.hpp>
#include <utility>
#include <iostream>
#include <fstream>
#include <string>
using namespace cl;
 
int main() {
    // Create the two input vectors
    const int LIST_SIZE = 1000;
    int *A = new int[LIST_SIZE]; 
    int *B = new int[LIST_SIZE];
    for(int i = 0; i < LIST_SIZE; i++) {
        A[i] = i;
        B[i] = LIST_SIZE - i;
    }
 
   try { 
        // Get available platforms
        vector<Platform> platforms;
        Platform::get(&platforms);
 
        // Select the default platform and create a context using this platform and the GPU
        cl_context_properties cps[3] = { 
            CL_CONTEXT_PLATFORM, 
            (cl_context_properties)(platforms[0])(), 
            0 
        };
        Context context( CL_DEVICE_TYPE_GPU, cps);
 
        // Get a list of devices on this platform
        vector<Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();
 
        // Create a command queue and use the first device
        CommandQueue queue = CommandQueue(context, devices[0]);
 
        // Read source file
        std::ifstream sourceFile("vector_add_kernel.cl");
        std::string sourceCode(
            std::istreambuf_iterator<char>(sourceFile),
            (std::istreambuf_iterator<char>()));
        Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length()+1));
 
        // Make program of the source code in the context
        Program program = Program(context, source);
 
        // Build program for these specific devices
        program.build(devices);
 
        // Make kernel
        Kernel kernel(program, "vector_add");
 
        // Create memory buffers
        Buffer bufferA = Buffer(context, CL_MEM_READ_ONLY, LIST_SIZE * sizeof(int));
        Buffer bufferB = Buffer(context, CL_MEM_READ_ONLY, LIST_SIZE * sizeof(int));
        Buffer bufferC = Buffer(context, CL_MEM_WRITE_ONLY, LIST_SIZE * sizeof(int));
 
        // Copy lists A and B to the memory buffers
        queue.enqueueWriteBuffer(bufferA, CL_TRUE, 0, LIST_SIZE * sizeof(int), A);
        queue.enqueueWriteBuffer(bufferB, CL_TRUE, 0, LIST_SIZE * sizeof(int), B);
 
        // Set arguments to kernel
        kernel.setArg(0, bufferA);
        kernel.setArg(1, bufferB);
        kernel.setArg(2, bufferC);
 
        // Run the kernel on specific ND range
        NDRange global(LIST_SIZE);
        NDRange local(1);
        queue.enqueueNDRangeKernel(kernel, NullRange, global, local);
 
        // Read buffer C into a local list
        int *C = new int[LIST_SIZE];
        queue.enqueueReadBuffer(bufferC, CL_TRUE, 0, LIST_SIZE * sizeof(int), C);
 
        for(int i = 0; i < LIST_SIZE; i ++)
             std::cout << A[i] << " + " << B[i] << " = " << C[i] << std::endl; 
    } catch(Error error) {
       std::cout << error.what() << "(" << error.err() << ")" << std::endl;
    }
 
    return 0;
}