Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

Overview

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common algorithms such as scan, reduce, transform, and sort. The Bolt interface was modeled on the C++ Standard Template Library (STL). Developers familiar with the STL will recognize many of the Bolt APIs and customization techniques.

The primary goal of Bolt is to make it easier for developers to utilize the inherent performance and power efficiency benefits of heterogeneous computing. It has interfaces that are easy to use, and has comprehensive documentation for the library routines, memory management, control interfaces, and host/device code sharing.

Compared to writing the equivalent functionality in OpenCL™, you’ll find that Bolt requires significantly fewer lines-of-code and less developer effort. Bolt is designed to provide a standard way to develop an application that can execute on either a regular CPU, or use any available OpenCL™ capable accelerated compute unit, with a single code path.

Here's a link to our BOLT wiki page.

Prerequisites

Windows

  1. Visual Studio 2010 onwards (VS2012 for C++ AMP)
  2. Tested with 32/64 bit Windows® 7/8 and Windows® Blue
  3. CMake 2.8.10
  4. TBB (For Multicore CPU path only) (4.1 Update 1 or Above) . See Building Bolt with TBB.
  5. APP SDK 2.8 or onwards.

Note: If the user has installed both Visual Studio 2012 and Visual Studio 2010, the latter should be updated to SP1.

Linux

  1. GCC 4.6.3 and above
  2. Tested with OpenSuse 12.3, RHEL 6.4 64bit, RHEL 6.3 32bit, Ubuntu 13.4
  3. CMake 2.8.10
  4. TBB (For Multicore CPU path only) (4.1 Update 1 or Above) . See Building Bolt with TBB.
  5. APP SDK 2.8 or onwards.

Note: Bolt pre-built binaries for Linux are build with GCC 4.7.3, same version should be used for Application building else user has to build Bolt from source with GCC 4.6.3 or higher.

Catalyst™ package

The latest Catalyst driver contains the most recent OpenCL runtime. Recommended Catalyst package is latest 13.11 Beta Driver.

13.4 and higher is supported.

Note: 13.9 in not supported.

Supported Devices

AMD APU Family with AMD Radeon™ HD Graphics

  • A-Series
  • C-Series
  • E-Series
  • E2-Series
  • G-Series
  • R-Series

AMD Radeon™ HD Graphics

  • 7900 Series (7990, 7970, 7950)
  • 7800 Series (7870, 7850)
  • 7700 Series (7770, 7750)

AMD Radeon™ HD Graphics

  • 6900 Series (6990, 6970, 6950)
  • 6800 Series (6870, 6850)
  • 6700 Series (6790 , 6770, 6750)
  • 6600 Series (6670)
  • 6500 Series (6570)
  • 6400 Series (6450)
  • 6xxxM Series

AMD Radeon™ Rx 2xx Graphics

  • R9 2xx Series
  • R8 2xx Series
  • R7 2xx Series

AMD FirePro™ Professional Graphics

  • W9100

Compiled binary windows packages (zip packages) for Bolt may be downloaded from the Bolt landing page hosted on AMD's Developer Central website.

Examples

The simple example below shows how to use Bolt to sort a random array of 8192 integers.

#include <bolt/cl/sort.h>
#include <vector>
#include <algorithm>

int main ()
{
    // generate random data (on host)
    size_t length = 8192
    std::vector<int> a (length);
    std::generate ( a.begin (), a.end(), rand );

    // sort, run on best device in the platform
    bolt::cl::sort(a.begin(), a.end());
    return 0;
}

The code will be familiar to programmers who have used the C++ Standard Template Library; the difference is the include file (bolt/cl/sort.h) and the bolt::cl namespace before the sort call. Bolt developers do not need to learn a new device-specific programming model to leverage the power and performance advantages of heterogeneous computing.

#include <bolt/cl/device_vector.h>
#include <bolt/cl/scan.h>
#include <vector>
#include <numeric>

int main()
{
  size_t length = 1024;
  // Create device_vector and initialize it to 1
  bolt::cl::device_vector< int > boltInput( length, 1 );

  // Calculate the inclusive_scan of the device_vector
  bolt::cl::inclusive_scan(boltInput.begin(),boltInput.end(),boltInput.begin( ) );

  // Create an std vector and initialize it to 1
  std::vector<int> stdInput( length, 1 );
 
  // Calculate the inclusive_scan of the std vector
  bolt::cl::inclusive_scan(stdInput.begin( ),stdInput.end( ),stdInput.begin( ) );
  return 0;
}

This example shows how Bolt simplifies management of heterogeneous memory. The creation and destruction of device resident memory is abstracted inside of the bolt::cl::device_vector <> class, which provides an interface familiar to nearly all C++ programmers. All of Bolt’s provided algorithms can take either the normal std::vector or the bolt::cl::device_vector<> class, which allows the user to control when and where memory is transferred between host and device to optimize performance.

Copyright and Licensing information

© 2012,2014 Advanced Micro Devices, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Issues
  • scatter_if in bolt::amp not possible?

    scatter_if in bolt::amp not possible?

    Hi, I'm studying Bolt and wanted to implement an example program that needs scatter_if operation (http://thrust.github.io/doc/group__scattering.html#ga1079bc05bcb3d4b5080f1e07444fee37). I started to port thrust scatter_if code, which uses permutation_iterator but came across this (https://groups.google.com/forum/#!topic/thrust-users/Xe2JkFy_hUk). The Google Group post claims that permutation_iterator in AMP kernel is not possible because of the restriction AMP put on the use of pointer in kernel (ie. restrict(amp)). Is this true? If so, is it possible to implement permutation_iterator in bolt::cl?

    BTW, Bolt forum in AMD Dev Central does not seem to work correctly. It is set to private and my post there does not seem to go through. :(

    enhancement 
    opened by briansp2020 5
  • Differences in develop and master branch.

    Differences in develop and master branch.

    I noticed that there are some differences between develop branch and master/v1.0 branch. Is that intentional? Most of the differences are minor documentation differences that probably won't affect much. But it seems like something was missed when syncing v1.0/master/develop branches in preparation of v1.0 release.

    I'm just getting started with Git way of doing thing. I've only been using SVN. So, it's possible that I'm missing something...

    Also, to submit an entry for Bolt Sample Code Contest, I should open a pull request to develop branch, right? Or does it not matter?

    question 
    opened by briansp2020 4
  • Fix various CMake issues when building on Linux

    Fix various CMake issues when building on Linux

    The following commits fixes various issues when building Bolt on Linux:

    • Do not enable amp build by default unless VS compiler is used
    • Correctly call bootstrap and b2 when building boost
    • Correct some filename cases

    So far, compilation still fail with g++-4.7 but at least compilation starts.

    opened by mdlh 3
  • Missing Linux installation instructions

    Missing Linux installation instructions

    After downloading the binary tarball for Linux and unpacking it, I find a directory with some stuff in it, but no instructions on installation. For example, should one copy include, lib and lib64 to /usr/local? Or is it intended that Bolt applications should set up -I and -L compiler flags to wherever one unpacked Bolt? Is any special care needed if one already has Boost (and no doubt a different version of Boost) installed to avoid version conflicts?

    opened by bmerry 2
  • Style checker to use for Bolt

    Style checker to use for Bolt

    Does Bolt have a recommended style checker to use? I noticed that even though Bolt has coding style guideline, not all of it is used/enforced. In particular, use of tab character and indentation seem wrong and seems to be different depending who checks in the code.

    The guideline states "Use only spaces, and indent 2 spaces at a time", it seems most of the code is indented with 4 spaces and uses tab characters as well as spaces.

    image

    opened by briansp2020 2
  • Bolt_1.2: bolt:cl::sort is hanging for higher odd buffer sizes with 1000 iterations.

    Bolt_1.2: bolt:cl::sort is hanging for higher odd buffer sizes with 1000 iterations.

    If we run bolt::cl::sort for 1000 iterations by having higher odd buffer sizes like 2 power 23, 2 power 25,.. for double and float data type it is hanging. No issues with non powers of 2 and even buffer sizes like 2 power 24, 2 power 26..

    opened by jhkumar 1
  • Bolt1.2: Some APP SDK samples in 2.9 are failing to build with respect to bolt 1.2 package.

    Bolt1.2: Some APP SDK samples in 2.9 are failing to build with respect to bolt 1.2 package.

    When we try to build bolt samples of AMD APP SDK by linking to the bolt 1.2 package, two samples( BoxFilterSAT and Stockdataanalysis,) are failed to build by throwing compilation error.

    opened by jhkumar 1
  • Problem with gtest download URL in cmake file

    Problem with gtest download URL in cmake file

    The download URL for gtest in superbuild/ExternalGtest.cmake is not working. It begins with "https://...", after deleting the "s" character, it works fine.

    opened by pmarcelll 1
  • Develop

    Develop

    I have added the TBB Exceptional code path and restructured the code for serial, TBB and Defualt(OpenC) Path.

    I Have Added the Appropriate Google test cases so that it may take exhausted code path.

    Ensured no line is exceeding the 120 columns and

    Replaced the tab with 4 spaces

    Reported some issues in AMP routines

    opened by rkskvk 1
  • bolt 1.2, typo in transform_reduce.inl

    bolt 1.2, typo in transform_reduce.inl

    Bolt 1.2, file include/bolt/cl/detail/transform_reduce.inl, lines 446-447:

    dblog->CodePathTaken(BOLTLOG::BOLT_TRANSFORMREDUCE,BOLTLOG::BOLT_MULTICORE_CPU,"

    ::Transform_Reduce::MULTICORE_CPU");

    Clearly the string markers (") are located in two lines, which makes the compiler issue unnecessary warnings

    Z Koza

    opened by zkoza 0
  • Bolt1.2: Ubuntu 32bit: gcc4.8.1: std::stable_sort api compilation failure

    Bolt1.2: Ubuntu 32bit: gcc4.8.1: std::stable_sort api compilation failure

    when std::sort function call on device_vector, means running function on CPU when we have data on GPU. These are working fine on Windows 32/64 and linux 64bit. We are seeing this only on 32 bit linux which may be because of compiler restriction.

    opened by jhkumar 0
  • throw opencl kernel compile issue when run test opencl case for example  clBolt.Test.StableSort

    throw opencl kernel compile issue when run test opencl case for example clBolt.Test.StableSort

    Hi , clone the bolt codes, compile on rocm1,9 and opencl-runtime, build the project with cmake commands as "cmake -DBOOST_LIBRARYDIR=/home/qcxie/software/boost_1_65_1/stage/lib -DBOOST_ROOT=/home/qcxie/software/boost_1_65_1 -DGTEST_ROOT=/home/qcxie/software/boost_1_65_1 -DCMAKE_BUILD_TYPE=Debug -DBolt_BUILD64=1 -DCMAKE_CXX_FLAGS="-std =c++14 -fpermissive -I /opt/rocm/opencl/include -L/opt/rocm/opencl/lib/x86_64 -lOpenCL" ../" successfully, but, it throws cl kernels error in running test case, for example clBolt.Test.StableSort " error: unknown type name 'namespace' namespace bolt { namespace cl { " how i should do to configure or set buildprogram optimons to fix this issue? thanks very much.

    opened by xqch1983 0
  • Fixed

    Fixed "%d , gx " console spam.

    It looks like there was an accidental printf leftover from debugging. The problem is evident in the MonteCarloPI sample and is fixed by removing the printf.

    opened by jhoffman0x 0
  • unable to input bolt::cl::transform_iterator into bolt::cl::copy

    unable to input bolt::cl::transform_iterator into bolt::cl::copy

    #include <iostream>
    #include <vector>
    #include <bolt/cl/iterator/counting_iterator.h>
    #include <bolt/cl/iterator/transform_iterator.h>
    #include <bolt/cl/functional.h>
    #include <bolt/cl/device_vector.h>
    #include <bolt/cl/copy.h>
    
    BOLT_FUNCTOR(GetSquare,
    struct GetSquare
    {
    public:
        int operator()(const int& globalId) const
        {
            return globalId*globalId;
        }
    };);
    
    int main()
    {
        const std::size_t n=10;
    
        bolt::cl::control ctrl = bolt::cl::control::getDefault();
    
        bolt::cl::device_vector<int> debug(n);
    
        auto globalId = bolt::cl::make_counting_iterator(0);
    
        // This is OK
        // bolt::cl::transform(globalId, globalId + n, debug.begin(), GetSquare());
    
        // This causes compilation error
        auto square = bolt::cl::make_transform_iterator(globalId, GetSquare());
        bolt::cl::copy(square, square + n, debug.begin());
    
        for(int i = 0; i < n; i++)
        {
            std::cout << i << ": " << debug[i] << std::endl;
        }
    
        return 0;
    }
    

    This problem seems to be because bolt::cl::transform_iterator have the method getContainer() only with template type

    template<typename Container >
    Container& getContainer() const
    {
        return this->base().getContainer( );
    }
    

    but bolt::cl::copy needs ITERATOR::getContainer() without any template type

    V_OPENCL( kernels[whichKernel].setArg( 0, first.getContainer().getBuffer()), "Error setArg kernels[ 0 ]" );
    

    This can be solved with C++11

    auto getContainer() const -> decltype(base().getContainer())
    {
        return this->base().getContainer();
    }
    

    I don't have any idea in C++03 (boost::result_of?).

    opened by aokomoriuta 0
  • Don't default to 32 bit builds.

    Don't default to 32 bit builds.

    Linux systems do not have 32 bit headers and libraries installed by default. Debugging the errors that arise because of this cause a lot of overhead on the users end.

    opened by pavanky 0
Releases(v1.3GA)
  • v1.3GA(Dec 9, 2014)

    Bolt 1.3 is released on 9th of Dec, 2014. Bolt was made Open Source on GitHub with Apache License with 1.0 Beta. For Bolt related documentation go to the Bolt wiki page To download the Bolt Binaries go to Bolt Landing Page

    This release contains the following features:

    1. Following new functions with Serial, TBB(Multicore) and C++ AMP path for amp namespace.

    copy_if find find_if find_if_not transform_if for_each for_each_n replace replace_if replace_copy replace_copy_if remove remove_if remove_copy remove_copy_if unique unique_copy all_of any_of none_of

    2.Performance optimizations of key routines(scan and sort family) for OpenCL backend.

    3.Transform Iterator support for AMP backend for all function.

    4.Support for OpenCL CPU device Command Queue.

    5.List of Supported Functions

    6.List of Known Issues Known Issues

    7.Bug Fixes.

    Source code(tar.gz)
    Source code(zip)
  • v1.3Alpha(Nov 25, 2014)

    Bolt 1.3 Alpha is released on 25th of November, 2014. Bolt was made Open Source on GitHub with Apache License with 1.0 Beta. For Bolt related documentation go to the Bolt wiki page. To download the Bolt Binaries go to Bolt Landing Page

    This release contains the following features:

    1. Performance optimizations of key routines on OpenCL backend.
    2. New routines added for AMP backend – for_each, copy_if, transform_if.
    3. Transform Iterator support for AMP backend.
    4. Bug Fixes.
    5. Support for OpenCL CPU device Command Queue.
    Source code(tar.gz)
    Source code(zip)
  • v1.2GA(Jul 2, 2014)

    Bolt 1.2 is released on 2nd of July, 2014. Bolt was made Open Source on GitHub with Apache License with 1.0 Beta. For Bolt related documentation go to the Bolt wiki page. To download the Bolt Binaries go to Bolt Landing Page

    This release contains the following features: 1.Bolt C++ AMP function parity with Bolt OpenCL/MultiCore/Serial path. Now we have C++ AMP, OpenCL, MultiCore and Serial path for all routines. New C++ AMP functions:

    • constant_iterator
    • copy
    • copy_n
    • counting_iterator
    • inclusive_scan_by_key
    • exclusive_scan_by_key
    • fill
    • fill_n
    • generate
    • generate_n
    • inner_product
    • max_element
    • min_element
    • reduce_by_key
    • sort_by_key
    • stable_sort
    • stable_sort_by_key
    • transform_exclusive_scan
    • transform_inclusive_scan
    • binary_search
    • merge
    • scatter
    • scatter_if
    • gather
    • gather_if

    2.Added transform iterator support for OpenCL path for all Bolt routines.

    3.Added permutation iterator support for C++ AMP path for all Bolt routines.

    4.Performance optimizations for Reduce, Scan and Sort families.

    5.List of Supported Functions

    6.List of Known Issues

    5.Bug Fixes.

    Source code(tar.gz)
    Source code(zip)
  • v1.1GA(May 29, 2014)

    Bolt 1.1 is released on 11th of November, 2013. Bolt was made Open Source on GitHub with Apache License with 1.0 Beta. For Bolt related documentation go to the Bolt wiki page. To download the Bolt Binaries go to Bolt Landing Page

    This release contains the following features: 1.Linux support for GCC 4.6 and above.

    2.Performance optimizations for below routines for OpenCL path.

    • transform_scan(inclusive/exclusive)
    • reduce, transform_reduce, min, max, count, count if
    • reduce_by_key
    • sort and stablesort with ints and unsigned ints as data types.
    • sort_by_key and stablesort_by_key with ints and unsigned ints as keys.

    3.Added Below list of new function for OpenCL, TBB and Serial path:

    • bool binary_search
    • scatter
    • scatter_if
    • gather
    • gather_if
    • merge

    4.Added MultiCore code path for all routines including new ones. Now we have OpenCL, MultiCore and Serial path for all routines.

    5.Bug Fixes.

    6.Debug Log facility to determine the executed code path.

    Source code(tar.gz)
    Source code(zip)
  • v1.0GA(Jul 9, 2013)

    Current Bolt release is 1.0 GA which is released on 9th of July, 2013. BOLT was made Open Source on Github with Apache License with 1.0 Beta. This is the second release post the 1.0 Beta drop. For Bolt related documentation go to the BOLT wiki page. To download the BOLT Binaries go to Bolt Landing Page

    This release contains the following features:

    • Serial code paths for all routines including routines which do not have a TBB implementation.
    • Performance optimizations for Scan and scan by key, Sort and Stable Sort routines. (Up to 45% for Scan routines for certain data sizes)
    • Provided Offset support to all the routines for both the source and destination iterators.
    • Added Support for “Sort by Key” routines to work for non-power of 2 buffer sizes in the OpenCL. Uses Merge sort.
    • Added support for Amp Sort for non power of 2 buffer sizes.
    • Moved the TBB code to a separate TBB folder so that there is no duplication of code in OpenCL and AMP.
    • Bug Fixes.
    Source code(tar.gz)
    Source code(zip)
Pool is C++17 memory pool template with different implementations(algorithms)

Object Pool Description Pool is C++17 object(memory) pool template with different implementations(algorithms) The classic object pool pattern is a sof

KoynovStas 1 Feb 14, 2022
Multi-backend implementation of SYCL for CPUs and GPUs

hipSYCL - a SYCL implementation for CPUs and GPUs hipSYCL is a modern SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing

Aksel Alpay 563 Jul 28, 2022
Vgpu unlock - Unlock vGPU functionality for consumer grade GPUs.

vgpu_unlock Unlock vGPU functionality for consumer-grade Nvidia GPUs. Important! This tool is not guarenteed to work out of the box in some cases, so

Jonathan Johansson 3.3k Aug 2, 2022
Concurrency Kit 2.1k Aug 2, 2022
High Performance Linux C++ Network Programming Framework based on IO Multiplexing and Thread Pool

Kingpin is a C++ network programming framework based on TCP/IP + epoll + pthread, aims to implement a library for the high concurrent servers and clie

null 16 Jul 16, 2022
A C++17 thread pool for high-performance scientific computing.

We present a modern C++17-compatible thread pool implementation, built from scratch with high-performance scientific computing in mind. The thread pool is implemented as a single lightweight and self-contained class, and does not have any dependencies other than the C++17 standard library, thus allowing a great degree of portability

Barak Shoshany 827 Jul 30, 2022
C++-based high-performance parallel environment execution engine for general RL environments.

EnvPool is a highly parallel reinforcement learning environment execution engine which significantly outperforms existing environment executors. With

Sea AI Lab 571 Aug 5, 2022
Thread-pool-cpp - High performance C++11 thread pool

thread-pool-cpp It is highly scalable and fast. It is header only. No external dependencies, only standard library needed. It implements both work-ste

Andrey Kubarkov 533 Jul 17, 2022
An optimized C library for math, parallel processing and data movement

PAL: The Parallel Architectures Library The Parallel Architectures Library (PAL) is a compact C library with optimized routines for math, synchronizat

Parallella 296 Jul 24, 2022
Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

NVIDIA Corporation 1.8k Aug 4, 2022
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

VexCL VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to redu

Denis Demidov 674 Aug 4, 2022
Sqrt OS is a simulation of an OS scheduler and memory manager using different scheduling algorithms including Highest Priority First (non-preemptive), Shortest Remaining Time Next, and Round Robin

A CPU scheduler determines an order for the execution of its scheduled processes; it decides which process will run according to a certain data structure that keeps track of the processes in the system and their status.

null 11 Jul 14, 2021
Operating system project - implementing scheduling algorithms and some system calls for XV6 OS

About XV6 xv6 is a modern reimplementation of Sixth Edition Unix in ANSI C for multiprocessor x86 and RISC-V systems. It was created for pedagogical p

Amirhossein Rajabpour 20 May 19, 2022
Thrust - The C++ parallel algorithms library.

Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac

NVIDIA Corporation 4.1k Jul 29, 2022
An implementation of Actor, Publish-Subscribe, and CSP models in one rather small C++ framework. With performance, quality, and stability proved by years in the production.

What is SObjectizer? What distinguishes SObjectizer? SObjectizer is not like TBB, taskflow or HPX Show me the code! HelloWorld example Ping-Pong examp

Stiffstream 293 Jul 20, 2022
Kokkos C++ Performance Portability Programming EcoSystem: The Programming Model - Parallel Execution and Memory Abstraction

Kokkos: Core Libraries Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platfor

Kokkos 1.1k Jul 30, 2022
Simple and fast C library implementing a thread-safe API to manage hash-tables, linked lists, lock-free ring buffers and queues

libhl C library implementing a set of APIs to efficiently manage some basic data structures such as : hashtables, linked lists, queues, trees, ringbuf

Andrea Guzzo 387 Jul 30, 2022
OOX: Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling.

OOX Out-of-Order Executor library. Yet another approach to efficient and scalable tasking API and task scheduling. Try it Requirements: Install cmake,

Intel Corporation 17 Mar 10, 2022