If used correctly, OpenCL images / textures can give you large speedups on GPUs. In this post, I’ll show you a very short example of how to use OpenCL to blur/smooth an image. The goal is to show how images/textures are used in OpenCL and the benefits of using them.

### How Gaussian blurring works

Blurring is to smooth an image as shown in the top of this post. In practice, this is done by discrete convolution of the image and a mask. Gaussian blurring entails using an approximation of the Gaussian distribution as the convolution mask. Discrete convolution performs the same set of instructions on each pixel and is thus ideal for execution on a GPU. Each pixel gets a new value that is a weighted average of its closest neighbors. Mathematically, convolution of an image I, and a Gaussian mask G, with a specific $$\sigma$$, can be expressed as:

$$(I * G_\sigma)(\vec v) = \frac{1}{Z} \sum^N_{a=-N} \sum^N_{b=-N} I(\vec v+(a,b)) e^{-\frac{a^2+b^2}{2\sigma^2}}$$

where Z is a normalization constant equal to the sum of the mask.

### OpenCL – Images and Textures

Most modern GPUs have a separate texture cache. These texture caches exists on GPUs because a lot of video games and 3D applications use texture mapping to map an image to 3D objects to create a realistic 3D scene. Textures are simply images, either 1, 2 or 3 dimensional. When a specific pixel in the texture is requested, the GPU will store the data and the neighboring data in a special buffer that is close to where the actual calculations are performed. Unlike regular linear storage buffers which only have caching in one dimension, textures can cache neighboring data in 2 or 3 dimensions. Thus, when a pixel is requested, the neighboring pixels above and below as well as those to the left and right are cached.

Note that writing to a 3D texture from inside a kernel is not supported in OpenCL by default. It is enabled through an extension called cl_khr_3d_image_writes. AMD supports this extension, while NVIDIA does not.

Caching is one of the main features of textures, but other nice features are:

• Interpolation – Request a point in the image and get the linear approximation of that point using the closest neighboring pixels (enabled with CLK_FILTER_LINEAR flag in sampler)
• Data type conversion – The texture fetch units on the GPU can convert between different data types. Used with the suffixes of the read_image and write_image functions (f for float, i for integer and ui for unsigned integer).
• Normalized data types – These data types (CL_SNORM_INT8, CL_UNORM_INT8, CL_SNORM_INT16, CL_UNORM_INT16) can reduce the amount of memory used and transferred and may give you large speedups. Use them if you are using floats with a normalized range (0.0-1.0, -1.0-1.0) and accuracy is not that important. These data types store floats as a 16 or 8 bit integer in the texture and the texture fetch unit converts it to a float with a normalized range when requested.
• Out of bounds handling – Reading outside a texture will not create a seg fault. Instead, you can define what should be returned by the read_image function in the sampler (CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP, CLK_ADDRESS_NONE..).
• Channels – Each pixel in a texture can contain 1 to 4 channels. If you want to pack several values in each pixel this can be very useful. Defined in the creation of the image. ImageFormat(CL_RGBA, CL_FLOAT) is for instance a 4 channel (RGBA: red, green, blue, alpha) image of floats.

### The code

#### OpenCL Kernel Code

Below is the OpenCL code for the Gaussian blur kernel. The functions write_image{f|i|ui} are used to write to an image in a kernel and read_image{f|i|ui} to read. Note that the read_image functions will always return a vector of size 4 and since we only use one channel in this application we retrieve the first component (.x) read_imagef(..).x

Another thing to note, is that you can only read OR write to a an image in an kernel. You can’t read and write to the same image. If you want to use images in several kernel calls after another I suggest using a dual buffering approach.

When reading images you have to define a sampler which you can see in the first line in the code below. Normalized coordinates enables accessing the image using normalized floating point coordinates (0.0 to 1.0). CLK_ADDRESS_CLAMP_TO_EDGE means that if a pixel outside the image is requested, it will use the pixel value that is closest to the edge. The last flag CLK_FILTER_NEAREST instructs the fetch unit to fetch the pixel that is closest to the coordinate. The alternative is CLK_FILTER_LINEAR which fetches a linear approximation of the 4(2D) or 8(3D) closest pixels. For more info on samplers, see the specification.