Simple C++ sample showing how to use OpenCL v1.2 on Windows/Linux/OSX with no 3rd party SDK installs

Overview

simple_opencl

This is a simple and practical C++ sample showing how to use OpenCL v1.2 on Windows/Linux/OSX with no 3rd party SDK installs required under Windows/OSX.

Unlike every other OpenCL example I've seen, this example demonstrates a bunch of things you would need to do in practice to build an app using OpenCL:

  • Importantly, no 3rd party SDK dependencies are required to compile/link under Windows. All required OpenCL headers and the 2 import .LIB's are in the "OpenCL" directory.
  • How to safely use OpenCL from multiple threads (by using a local context, and creating your command queue/kernels on that context).
  • How to work around AMD driver serialization issues if you use OpenCL from multiple threads
  • How to load your program kernel source code from either a file, or from an array in a C-style header

Windows has strong support for OpenCL v1.2 on NVidia, AMD, and Intel drivers. In my testing, even brand new Windows AMD machines right out of the box with no updates have working OpenCL drivers. Some of OpenCL v1.2's strengths are its maturity, driver support, ease of use, and no large 3rd party SDK's or libraries are required to use it.

Here's a good introductory book on OpenCL. (This is the book I used to learn it.)

Building

Use "cmake .". Then under OSX/Linux, use "make".

Under Windows load the generated .SLN with Visual Studio 2019/2022. All included headers/import libs are in the project, so no 3rd party SDK's are required. Be sure to right click on "simple_ocl" and select "Set as Startup Project" before running.

Under Linux, you will need a driver with OpenCL support, and the OpenCL headers/libraries. The easiest thing to do is to use the NVidia proprietary driver, then use "sudo apt-get install nvidia-cuda-toolkit". This page may help. You may need to also install "opencl-headers". Install and run the "clinfo" app to validate that your driver supports OpenCL. CMake will automatically find the OpenCL headers/libraries.

Under OSX (High Sierra), it just works for me. CMake handles finding the libs/headers.

Running

The "bin" directory contains the sample executable, "simple_ocl". Running it will generate a buffer of random values. The GPU kernel will modify this buffer, and the CPU will read it back and validate its contents.

You should see something like this (note the random numbers will likely be different for you):

OpenCL platform version: "OpenCL 3.0 CUDA 11.4.94"
Serializing OpenCL calls across threads: 0
OpenCL device initialized successfully
Using kernel source code from array in header src/ocl_kernels.h
OpenCL context initialized successfully
Running "process_buffer" kernel
Validation succeeded
Input/output buffer contents (first 16 bytes):
41 41
35 34
190 188
132 135
225 229
108 105
214 208
174 169
82 90
144 153
73 67
241 250
241 253
187 182
233 231
235 228

Design

This sample was derived from how we're using OpenCL in Basis Universal, our GPU texture interchange library/tool. OpenCL support is optional, so we placed all the OpenCL code in one .cpp file and exposed a simple C-style API to its functionality. When OpenCL is not being compiled in we use dummy functions for this API which always return false. We have one large kernel source file which exposes multiple kernels. We pass in C-style structs, OpenCL buffers, and floats/ints/etc. to/from our kernels.

simple_ocl_wrapper.h contains a basic C++ wrapper on top of the C OpenCL API. OpenCL does have its own standard C++ wrapper, but by writing your own you can control exactly how OpenCL is called, which features are exposed, and what C++ features are utilized by the wrapper. (Also, the entire point of this sample is how to directly use OpenCL with as few bloated libs/wrappers/SDK's/frameworks/etc. in between you and the API as possible.)

ocl_device.cpp/h uses this wrapper to create the OpenCL device. It exposes a simple C-style API that callers can use to initialize/deinitalize the device, and create/destroy per-thread contexts and kernels. Out of the box it supports a single kernel source code file (which can contain multiple kernels) which can be either loaded from disk or from a C-style array in a header file. On (only) AMD drivers, this code automatically serializes all calls made into the driver, to avoid race conditions in AMD's driver when OpenCL is called from multiple threads.

simple_ocl.cpp utilizes the C-style API exposed by ocl_device.h. It creates a byte buffer of random numbers, then calls opencl_process_buffer() in ocl_device.cpp to process this buffer to an output buffer.

Modifying the kernel source code

By default, this sample compiles the OpenCL program from an array of text in src/ocl_kernels.h. This header file was created using the xxd tool with the -i option from the kernel source code file located under bin/ocl_kernels.cl. If you want the sample to always load the kernel source code from the "bin" directory instead, set OCL_USE_KERNELS_HEADER to 0 in src/ocl_device.cpp.

You might also like...
IDA Debugger Module to Dynamically Synchronize Memory and Registers with third-party Backends (Tenet, Unicorn, GDB, etc.)
IDA Debugger Module to Dynamically Synchronize Memory and Registers with third-party Backends (Tenet, Unicorn, GDB, etc.)

IDA Debug Bridge IDA Debugger Module to Dynamically Synchronize Memory and Registers with third-party Backends (Tenet, Unicorn, GDB, etc.) By synchron

Cross-platform tool to extract wavetables and draw envelopes from sample files, exporting the wavetable and generating the appropriate SFZ text to use in a suitable player.
Cross-platform tool to extract wavetables and draw envelopes from sample files, exporting the wavetable and generating the appropriate SFZ text to use in a suitable player.

wextract Cross-platform tool to extract wavetables and draw envelopes from sample files, exporting the wavetable and generating the appropriate SFZ te

A sample demonstrating hybrid ray tracing and rasterisation for shadow rendering and use of the FidelityFX Denoiser.
A sample demonstrating hybrid ray tracing and rasterisation for shadow rendering and use of the FidelityFX Denoiser.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

A simple and sample port of ceserver to iOS.

A simple and sample port of ceserver to iOS.This project is currently under development.

This is an sample compiler for simple calculations don't laugh at this

Compiler This is an sample compiler for simple calculations don't laugh at this Look i know iam not a big deal (atleast for now) and i know this compi

The source for the Linux kernel used in Windows Subsystem for Linux 2 (WSL2)

Introduction The WSL2-Linux-Kernel repo contains the kernel source code and configuration files for the WSL2 kernel. Reporting Bugs If you discover an

The old Windows NT OpenGL samples/SDK from an MSDN CD.

The OpenGL API is supported on a variety of graphics hardware; the software in this release provides support for graphics hardware including basic emulation on any video adapter that is supported with the operating system, and accelerated graphics hardware that is supported by an OpenGL mini-client driver (MCD) or an OpenGL installable client driver (ICD).

High-quality Interactive Audio/Video Windows SDK

腾讯云实时音视频 TRTC SDK English | 简体中文 产品介绍 腾讯实时音视频(Tencent Real-Time Communication,TRTC),将腾讯多年来在网络与音视频技术上的深度积累,以多人音视频通话和低延时互动直播两大场景化方案,通过腾讯云服务向开发者开放,致力于帮助开

Implements a Windows service (in a DLL) that removes the rounded corners for windows in Windows 11

ep_dwm Implements a Windows service that removes the rounded corners for windows in Windows 11. Tested on Windows 11 build 22000.434. Pre-compiled bin

Owner
Rich Geldreich
Rich Geldreich
Visual Studio Extension that installs additional color themes

Using this Extension Download and install the extension Restart Visual Studio Navigate to Tools > Options > Environment > General and select your colo

Microsoft 328 Dec 19, 2022
A couple of demos showing how to use the Ultra Low Power coprocessor on the ESP32

ESP32 Ultra Low Power (ULP) coprocessor You can watch a video explanation of this code here This repo contains two demo project: ulp-gpio Shows you ho

atomic14 8 Nov 9, 2022
This PoC uses two diferent technics for stealing the primary token from all running processes, showing that is possible to impersonate and use whatever token present at any process

StealAllTokens This PoC uses two diferent technics for stealing the primary token from all running processes, showing that is possible to impersonate

lab52.io 50 Dec 13, 2022
NanoShell 3rd Edition is a preemptively multi-tasked 32-bit operating system with a windowed GUI

NanoShell NanoShell 3rd Edition is a preemptively multi-tasked 32-bit operating system with a windowed GUI. Be advised that this is UNFINISHED/beta-wa

iProgramInCpp 103 Jan 7, 2023
The Raspberry Pi Pico SDK (henceforth the SDK) provides the headers, libraries and build system necessary

The Raspberry Pi Pico SDK (henceforth the SDK) provides the headers, libraries and build system necessary to write programs for the RP2040-based devices such as the Raspberry Pi Pico in C, C++ or assembly language.

Raspberry Pi 1.9k Jan 5, 2023
The Gecko SDK (GSDK) combines all Silicon Labs 32-bit IoT product software development kits (SDKs) based on Gecko Platform into a single, integrated SDK.

Silicon Labs Gecko SDK (GSDK) The Gecko SDK (GSDK) combines Silicon Labs wireless software development kits (SDKs) and Gecko Platform into a single, i

Silicon Labs 163 Dec 28, 2022
This is a template project showing how to implement an application protocol on top of the MetaProtocol.

Manage any protocols in Istio service meshes with MetaProtocol and Aeraki! meta-protocol-awesomerpc This is a template project showing how to implemen

Aeraki 5 Dec 29, 2022
A collection of code snippets and examples showing syntax and capabilities of VEX language inside SideFX Houdini

VEX tutorial A collection of code snippets and examples showing syntax and capabilities of VEX language inside SideFX Houdini by Juraj Tomori How to u

Juraj Tomori 624 Dec 11, 2022
This is a template project showing how to implement an application protocol on top of the MetaProtocol.

Manage any protocols in Istio service meshes with MetaProtocol and Aeraki! meta-protocol-awesomerpc This is a template project showing how to implemen

Aeraki Mesh 4 Jul 20, 2022
OTA Third Party Firmware Flasher for the original Wyze Plug (WLPP1) and Wyze Bulb (WLPA19).

Wyze Plug (and Bulb!) Flasher Use this software to install third party firmware on the original Wyze Plug (model WLPP1) and Wyze Bulb (model WLPA19) o

Elahd Bar-Shai 1 Dec 11, 2022