## OpenCL-OpenGL interoperability problems on AMD GPUs and Linux

0

After experimenting with the OpenCL-OpenGL interoperability on AMD GPUs on Ubuntu Linux I got some cryptic error messages from X (see below). This happens both for the AMD APP samples like SimpleGL and my own OpenCL implementation of Marching Cubes.

Erorr message:

XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0"
after 28 requests (28 known processed) with 0 events remaining.


Or this message:

X Error of failed request:  BadMatch (invalid parameter attributes)
Major opcode of failed request:  160 (GLX)
Minor opcode of failed request:  5 (X_GLXMakeCurrent)
Serial number of failed request:  28
Current serial number in output stream:  28


The problem seems to be that the dynamic linker links to the wrong OpenGL libraries. When using OpenCL-OpenGL interoperability we want to use AMDs OpenGL implementation and not mesa. To fix this set the following environment variable before running your code:

export LD_LIBRARY_PATH=/usr/lib/fglrx

You can add this to you .bashrc file if you want it to be permanent.

## Level set segmentation on GPUs using OpenCL

0

Brain segmented from synthetic MR images (generated at BrainWeb) on the GPU using OpenCL and the Level Set method

Level sets is a mathematical method of evolving contours in Cartesian grids such as images. The method works by considering a function $$\phi$$, called the level set function, which has one more dimension than the Cartesian grid we want to evolve the contour on. Thus, for a 2D image the level set function defines a 3D surface, while for a 3D volume the level set function is a 4D hypersurface. For each point on the grid (x, y, z), it defines the height h from the surface to the grid at a given time t: $$h = \phi(x,y,z,t)$$

The actual contour, is defined by the zero level set, which are the coordinates (x,y,z) where the level set function is zero:

$$\phi(x,y,z,t) = 0$$

To move the contour, the level set function is derivated in respect to time:

$$\frac{\partial \phi}{\partial t} = -F|\nabla \phi|$$

F is called the speed function and defines how fast and in which direction the contour moves. The speed function can be tailored for any problem. In image segmentation it is usual to model the speed function to be high at coordinates where the image has a desired intensity and visa versa. To make the contour smooth and avoid leaking into surrounding regions a curvature term ($$\kappa = \nabla \cdot \frac{\nabla \phi}{|\nabla \phi|}$$) is often included in the speed function. A popular choice of speed function for image segmentation is:

$$F = -\alpha (\epsilon – |T – I(x,y,z)|) + (1-\alpha)\kappa(x,y,z)$$

Here $$\alpha \in [0,1]$$ is a weighting parameter between the intensity and the curvature term. The parameters T and $$\epsilon$$ are used to drive to contour toward voxels with intensity in the range $$I \in [T-\epsilon,T+\epsilon]$$.

Level set surface moving in the image plane. The red circles show the zero level set at various time steps. As time goes, the surface is moved down through the image plane and the zero level set change according to the shape of the surface.

The level set method is very computationally expensive because each voxel has to be updated for each iteration. However, each voxel can be updated in parallel using the same instructions, making level sets ideal for GPUs (see [2,3,4] for details on different GPU implementations). I have created a simple GPU accelerated version of level set volume segmentation using OpenCL. The implementation uses 3D textures on the GPU to reduce memory access latency. Read more on textures in OpenCL my previous post on Gaussian Blur using OpenCL. If you want to look into further optimizing the level set computation you should look into the narrow band, sparse field or fast marching methods (see [1] for more details).

The level set gradient $$\nabla \phi$$ and the curvature $$\kappa$$ has to approximated numerically. This can be done using the upwinding scheme.

The level set function has to be initialized. It is common to initialize it to the distance transform which calculates the distance from each voxel to the initial contour. The signed distance is negative for voxels inside the initial contour and positive outside. If we use a spherical initial contour the signed distance transform can be easily calculated in parallel for each voxel using the following equation $$d = |\vec x – \vec c| – r$$ where $$\vec x$$ is the coordinate of the voxel, $$\vec c$$ is the position of the center and r is the radius.

The program uses the Simple Image Processing Library (SIPL) for loading, storing and displaying the volumes. This library is dependent on GTK 2.

# Install dependencies (OpenCL has to be installed manually) sudo apt-get install libgtk2.0-dev   # Download git clone git://github.com/smistad/OpenCL-Level-Set-Segmentation.git cd OpenCL-Level-Set-Segmentation git submodule init git submodule update   # Compile and run cmake . make ./levelSetSeg example_data/mr_brain.mhd result.mhd 100 100 100 10 2000 125 40 0.05 125 255

## References

1. Level Set Methods and Fast Marching Methods by J.A. Sethian. Cambridge University Press
2. Rumpf, M., Strzodka, R. Level set segmentation in graphics hardware. Proceedings 2001 International Conference on Image Processing 1103–1106
3. Lefohn, A., Cates, J., & Whitaker, R. . Interactive, gpu-based level sets for 3d segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2003. 564–572
4. Roberts, M., Packer, J., Sousa, M. C., & Mitchell, J. R. (2010). A Work-Efficient GPU Algorithm for Level Set Segmentation. Proceedings of the Conference on High Performance Graphics. 123–132.
5. BrainWeb. http://brainweb.bic.mni.mcgill.ca/brainweb/

## Memory-mapped files using the boost library

11

The objective of memory mapping files is to increase I/O performance. Memory mapping a file creates a pointer to a segment in virtual memory and the actual loading is performed by the Operating System one page at a time. For large files, this is much faster than using traditional methods in C such as fopen/fread/fwrite.

In this post, I show an example of how to use the boost iostreams library to create a memory mapped file that, unlike mmap, works for both Windows and Linux.

Start with installing the boost iostreams library. On ubuntu this is done by installing the libboost-iostreams-dev package.

sudo apt-get install libboost-iostreams-dev

The example below will create a memory mapping of 1000000 integers for the file filename.raw. The integers will be available from the pointer called data.

#include <boost/iostreams/device/mapped_file.hpp> #include <iostream>   int main() {   boost::iostreams::mapped_file_source file; int numberOfElements = 1000000; int numberOfBytes = numberOfElements*sizeof(int); file.open("filename.raw", numberOfBytes);   // Check if file was successfully opened if(file.is_open()) { // Get pointer to the data int * data = (int *)file.data();   // Do something with the data for(int i = 0; i < numberOfElements; i++) std::cout << data[i] << " ";   // Remember to unmap the file file.close(); } else { std::cout << "could not map the file filename.raw" << std::endl; } }

Here is a minimal CMakeLists.txt file for compiling this example together with the boost iostreams library.

cmake_minimum_required(VERSION 2.8) find_package(Boost COMPONENTS iostreams REQUIRED)   add_executable(memory-map main.cpp) target_link_libraries(memory-map ${Boost_LIBRARIES}) As usual you can download/clone the code and the sample raw file from my GitHub page ## GPU-based Gradient Vector Flow using OpenCL 26 Illustration of Gradient Vector Flow performed on an image. The colors represents the vector direction. Gradient Vector Flow (GVF) is a feature-preserving diffusion of gradient information. It was originally introduced by Xu and Prince to drive snakes, or active contours, towards edges of interest in image segmentation. However, GVF is also used for detection of tubular structures and skeletonization. I just recently published an article in the Journal of Real-Time Image Processing entitled “Real-time gradient vector flow on GPUs using OpenCL” describing an optimized OpenCL implementation of Gradient Vector Flow (GVF) that runs on GPUs and CPUs for both 2D and 3D. (more…) ## Gaussian Blur using OpenCL and the built-in Images/Textures 31 If used correctly, OpenCL images / textures can give you large speedups on GPUs. In this post, I’ll show you a very short example of how to use OpenCL to blur/smooth an image. The goal is to show how images/textures are used in OpenCL and the benefits of using them. ## Measuring runtime in milliseconds using the C++ 11 chrono library 0 I have been playing around with the new C++ 11 standard. It includes a nice new library called chrono which includes some useful clocks and timers. Below is an example of some macros you can use to time your applications in milliseconds and print out the result. Timing can be turned off by removing the #define TIMING line. Remember to compile the program with C++11 (or C++0x) enabled. For GCC this should be: g++ main.cpp -std=c++0x #include <iostream> #include <chrono> #define TIMING #ifdef TIMING #define INIT_TIMER auto start = std::chrono::high_resolution_clock::now(); #define START_TIMER start = std::chrono::high_resolution_clock::now(); #define STOP_TIMER(name) std::cout << "RUNTIME of " << name << ": " << \ std::chrono::duration_cast<std::chrono::milliseconds>( \ std::chrono::high_resolution_clock::now()-start \ ).count() << " ms " << std::endl; #else #define INIT_TIMER #define START_TIMER #define STOP_TIMER(name) #endif int main() { INIT_TIMER START_TIMER sleep(2); STOP_TIMER("sleeping for 2 seconds") START_TIMER long unsigned int b = 0; for(int i = 0; i < 10000000; i++) { b += i; } STOP_TIMER("some long loop") } Example output: RUNTIME of sleeping for 2 seconds: 2000 ms RUNTIME of some long loop: 24 ms  ## Getting started with Google Test (GTest) on Ubuntu 15 Google test is a framework for writing C++ unit tests. In this short post, I explain how to set it up in Ubuntu. Start by installing the gtest development package: sudo apt-get install libgtest-dev Note that this package only install source files. You have to compile the code yourself to create the necessary library files. These source files should be located at /usr/src/gtest. Browse to this folder and use cmake to compile the library: sudo apt-get install cmake # install cmake cd /usr/src/gtest sudo cmake CMakeLists.txt sudo make # copy or symlink libgtest.a and libgtest_main.a to your /usr/lib folder sudo cp *.a /usr/lib Lets say we now want to test the following simple squareRoot function: // whattotest.cpp #include <math.h> double squareRoot(const double a) { double b = sqrt(a); if(b != b) { // nan check return -1.0; }else{ return sqrt(a); } } In the following code, we create two tests that test the function using a simple assertion. There exists many other assertion macros in the framework (see http://code.google.com/p/googletest/wiki/Primer#Assertions). The code contains a small main function that will run all of the tests automatically. Nice and simple! // tests.cpp #include "whattotest.cpp" #include <gtest/gtest.h> TEST(SquareRootTest, PositiveNos) { ASSERT_EQ(6, squareRoot(36.0)); ASSERT_EQ(18.0, squareRoot(324.0)); ASSERT_EQ(25.4, squareRoot(645.16)); ASSERT_EQ(0, squareRoot(0.0)); } TEST(SquareRootTest, NegativeNos) { ASSERT_EQ(-1.0, squareRoot(-15.0)); ASSERT_EQ(-1.0, squareRoot(-0.2)); } int main(int argc, char **argv) { testing::InitGoogleTest(&argc, argv); return RUN_ALL_TESTS(); } The next step is to compile the code. I’ve set up a small CMakeLists.txt file below to compile the tests. This file locates the google test library and links it with the test application. Note that we also have to link to the pthread library or the application won’t compile. cmake_minimum_required(VERSION 2.6) # Locate GTest find_package(GTest REQUIRED) include_directories(${GTEST_INCLUDE_DIRS})   # Link runTests with what we want to test and the GTest and pthread library add_executable(runTests tests.cpp) target_link_libraries(runTests \${GTEST_LIBRARIES} pthread)

Compile and run the tests:

cmake CMakeLists.txt make ./runTests

## Simple Image Processing Library

6

I do a lot image processing both on images and 3D images / volumes. There exist many image processing libraries out there. Some are big and some are small, but none seems to fit my taste. ITK is one of the major image processing libraries used in my field of research, but this library is, in my opinion, extremly cumbersome. And I can’t be the only one who think so since there has been made an alternative called Simple ITK. There exists many other image processing libraries that tries to be simple to use, but most of them don’t allow you to do volume processing, which I do a lot of. I want a library that allows me to quickly go from an algorithm concept to getting actual pictures on the screen so that I can quickly verify the results. So far I’ve been using Matlab for prototyping image processing algoritmhs, and it have worked quite well, but as I see it Matlab has two major problem: speed and computation and GUI in one thread. A long story short, I’ve made my own Simple Image Processing Library (SIPL) which I now use in my research. I’ve added a short guide here on how to use and install it in case anybody else feel the same as I do and thinks this library could be of any use to them as well. Also, this small little library is still in development so if you have any feedback, suggestions, comments or bug reports please let me know.

Main goals of the library:

• Simple and condensed – Easy to get from an algorithm concept to pictures on the screen
• GUI in seperat thread – Display and explore images interactively while computation is still going on
• Cross-platform – Linux, Windows and Mac compatible

(more…)

## Marching Cubes implementation using OpenCL and OpenGL

16

In a school project I recently created a fast implementation of Marching Cubes that uses OpenCL to extract surfaces from volumetric datasets and OpenGL to render the surfaces on screen. I wrote a paper together with my two supervisors about the implementation and presented it at the Joint Workshop on High Performance and Distributed Computing for Medical Imaging at the MICCAI 2011 conference. Our implementation achieved real-time speeds for volumes of sizes up to 512x512x512 on a standard GPU with 1GB memory. The paper entitled “Real-Time Surface Extraction and Visualization of Medical Images using OpenCL and GPUs” describing the implementation can be downloaded here. The source code of the implementation can be downloaded from my GitHub page.

## OpenCL C++ Utilities

12

I recently created a small utility library for OpenCL with C++. It consists of a set of function based on the OpenCL C++ bindings to help set up an OpenCL context, compiling OpenCL code and viewing error functions. I hope these functions can be useful for others and I’m planning on adding more utility functions in the future. Note that I haven’t tested it on all platforms yet. Feedback and comments are most welcome.