A C++ GPU Computing Library for OpenCL

Overview

Boost.Compute

Build Status Build status Coverage Status Gitter

Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL.

The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers.

On top of the core library is a generic, STL-like interface providing common algorithms (e.g. transform(), accumulate(), sort()) along with common containers (e.g. vector<T>, flat_set<T>). It also features a number of extensions including parallel-computing algorithms (e.g. exclusive_scan(), scatter(), reduce()) and a number of fancy iterators (e.g. transform_iterator<>, permutation_iterator<>, zip_iterator<>).

The full documentation is available at http://boostorg.github.io/compute/.

Example

The following example shows how to sort a vector of floats on the GPU:

#include <vector>
#include <algorithm>
#include <boost/compute.hpp>

namespace compute = boost::compute;

int main()
{
    // get the default compute device
    compute::device gpu = compute::system::default_device();

    // create a compute context and command queue
    compute::context ctx(gpu);
    compute::command_queue queue(ctx, gpu);

    // generate random numbers on the host
    std::vector<float> host_vector(1000000);
    std::generate(host_vector.begin(), host_vector.end(), rand);

    // create vector on the device
    compute::vector<float> device_vector(1000000, ctx);

    // copy data to the device
    compute::copy(
        host_vector.begin(), host_vector.end(), device_vector.begin(), queue
    );

    // sort data on the device
    compute::sort(
        device_vector.begin(), device_vector.end(), queue
    );

    // copy data back to the host
    compute::copy(
        device_vector.begin(), device_vector.end(), host_vector.begin(), queue
    );

    return 0;
}

Boost.Compute is a header-only library, so no linking is required. The example above can be compiled with:

g++ -I/path/to/compute/include sort.cpp -lOpenCL

More examples can be found in the tutorial and under the examples directory.

Support

Questions about the library (both usage and development) can be posted to the mailing list.

Bugs and feature requests can be reported through the issue tracker.

Also feel free to send me an email with any problems, questions, or feedback.

Help Wanted

The Boost.Compute project is currently looking for additional developers with interest in parallel computing.

Please send an email to Kyle Lutz ([email protected]) for more information.

Comments
  • How to implement `maskedSelect` using boost.compute, in opencl?

    How to implement `maskedSelect` using boost.compute, in opencl?

    Hi Karl,

    I need to port maskedSelect from CUDA to opencl. Currently, the code uses thrust, ie https://github.com/torch/cutorch/blob/master/lib/THC/generic/THCTensorMasked.cu#L116 Thoughts on how to write a maskedSelect in boost.compute, using opencl?

    Hugh

    opened by hughperkins 40
  • Tests failing

    Tests failing

    Hello Kyle,

    Here is output of make test on NVIDIA and AMD cards:

    Nvidia Tesla K20:

    Test project /home/demidov/work/opencl/compute/build
          Start  1: core.buffer
     1/69 Test  #1: core.buffer ............................   Passed    0.53 sec
          Start  2: core.command_queue
     2/69 Test  #2: core.command_queue .....................   Passed    0.88 sec
          Start  3: core.device
     3/69 Test  #3: core.device ............................   Passed    0.12 sec
          Start  4: core.image2d
     4/69 Test  #4: core.image2d ...........................   Passed    0.51 sec
          Start  5: core.image3d
     5/69 Test  #5: core.image3d ...........................   Passed    0.51 sec
          Start  6: core.image_sampler
     6/69 Test  #6: core.image_sampler .....................   Passed    0.51 sec
          Start  7: core.kernel
     7/69 Test  #7: core.kernel ............................***Failed    0.50 sec
          Start  8: core.program
     8/69 Test  #8: core.program ...........................   Passed    0.49 sec
          Start  9: core.system
     9/69 Test  #9: core.system ............................   Passed    0.12 sec
          Start 10: core.type_traits
    10/69 Test #10: core.type_traits .......................   Passed    0.01 sec
          Start 11: core.types
    11/69 Test #11: core.types .............................   Passed    0.00 sec
          Start 12: algorithm.accumulate
    12/69 Test #12: algorithm.accumulate ...................***Failed    0.51 sec
          Start 13: algorithm.adjacent_difference
    13/69 Test #13: algorithm.adjacent_difference ..........   Passed    0.50 sec
          Start 14: algorithm.adjacent_find
    14/69 Test #14: algorithm.adjacent_find ................   Passed    0.50 sec
          Start 15: algorithm.any_all_none_of
    15/69 Test #15: algorithm.any_all_none_of ..............   Passed    0.51 sec
          Start 16: algorithm.binary_search
    16/69 Test #16: algorithm.binary_search ................   Passed    0.51 sec
          Start 17: algorithm.copy
    17/69 Test #17: algorithm.copy .........................***Failed    0.51 sec
          Start 18: algorithm.copy_if
    18/69 Test #18: algorithm.copy_if ......................   Passed    0.50 sec
          Start 19: algorithm.count
    19/69 Test #19: algorithm.count ........................***Failed    0.51 sec
          Start 20: algorithm.equal
    20/69 Test #20: algorithm.equal ........................   Passed    0.50 sec
          Start 21: algorithm.equal_range
    21/69 Test #21: algorithm.equal_range ..................   Passed    0.53 sec
          Start 22: algorithm.extrema
    22/69 Test #22: algorithm.extrema ......................   Passed    0.49 sec
          Start 23: algorithm.fill
    23/69 Test #23: algorithm.fill .........................   Passed    0.51 sec
          Start 24: algorithm.find
    24/69 Test #24: algorithm.find .........................   Passed    0.50 sec
          Start 25: algorithm.for_each
    25/69 Test #25: algorithm.for_each .....................   Passed    0.50 sec
          Start 26: algorithm.gather
    26/69 Test #26: algorithm.gather .......................   Passed    0.50 sec
          Start 27: algorithm.generate
    27/69 Test #27: algorithm.generate .....................   Passed    0.49 sec
          Start 28: algorithm.histogram
    28/69 Test #28: algorithm.histogram ....................   Passed    0.51 sec
          Start 29: algorithm.inner_product
    29/69 Test #29: algorithm.inner_product ................   Passed    0.51 sec
          Start 30: algorithm.inplace_reduce
    30/69 Test #30: algorithm.inplace_reduce ...............***Failed    2.26 sec
          Start 31: algorithm.insertion_sort
    31/69 Test #31: algorithm.insertion_sort ...............***Failed    1.69 sec
          Start 32: algorithm.iota
    32/69 Test #32: algorithm.iota .........................   Passed    2.89 sec
          Start 33: algorithm.is_sorted
    33/69 Test #33: algorithm.is_sorted ....................   Passed    2.76 sec
          Start 34: algorithm.merge
    34/69 Test #34: algorithm.merge ........................   Passed    1.71 sec
          Start 35: algorithm.mismatch
    35/69 Test #35: algorithm.mismatch .....................   Passed    0.51 sec
          Start 36: algorithm.partial_sum
    36/69 Test #36: algorithm.partial_sum ..................   Passed    0.51 sec
          Start 37: algorithm.partition
    37/69 Test #37: algorithm.partition ....................***Failed    6.62 sec
          Start 38: algorithm.radix_sort
    38/69 Test #38: algorithm.radix_sort ...................***Failed    2.57 sec
          Start 39: algorithm.random_shuffle
    39/69 Test #39: algorithm.random_shuffle ...............   Passed    1.04 sec
          Start 40: algorithm.reduce
    40/69 Test #40: algorithm.reduce .......................***Failed    1.68 sec
          Start 41: algorithm.remove
    41/69 Test #41: algorithm.remove .......................   Passed    6.25 sec
          Start 42: algorithm.replace
    42/69 Test #42: algorithm.replace ......................   Passed    1.08 sec
          Start 43: algorithm.reverse
    43/69 Test #43: algorithm.reverse ......................   Passed    3.36 sec
          Start 44: algorithm.scan
    44/69 Test #44: algorithm.scan .........................***Failed    0.51 sec
          Start 45: algorithm.scatter
    45/69 Test #45: algorithm.scatter ......................***Failed    1.11 sec
          Start 46: algorithm.sort
    46/69 Test #46: algorithm.sort .........................***Failed   11.38 sec
          Start 47: algorithm.stable_sort
    47/69 Test #47: algorithm.stable_sort ..................   Passed    0.50 sec
          Start 48: algorithm.transform
    48/69 Test #48: algorithm.transform ....................***Failed    4.70 sec
          Start 49: algorithm.transform_reduce
    49/69 Test #49: algorithm.transform_reduce .............***Failed    1.13 sec
          Start 50: container.allocator
    50/69 Test #50: container.allocator ....................   Passed    0.50 sec
          Start 51: container.array
    51/69 Test #51: container.array ........................***Failed    0.51 sec
          Start 52: container.flat_map
    52/69 Test #52: container.flat_map .....................   Passed   19.66 sec
          Start 53: container.flat_set
    53/69 Test #53: container.flat_set .....................***Failed    1.70 sec
          Start 54: container.stack
    54/69 Test #54: container.stack ........................   Passed    1.63 sec
          Start 55: container.string
    55/69 Test #55: container.string .......................   Passed    0.50 sec
          Start 56: container.valarray
    56/69 Test #56: container.valarray .....................   Passed    0.51 sec
          Start 57: container.vector
    57/69 Test #57: container.vector .......................   Passed    7.94 sec
          Start 58: iterator.adjacent_transform_iterator
    58/69 Test #58: iterator.adjacent_transform_iterator ...   Passed    1.07 sec
          Start 59: iterator.zip_iterator
    59/69 Test #59: iterator.zip_iterator ..................   Passed    1.46 sec
          Start 60: random.mersenne_twister
    60/69 Test #60: random.mersenne_twister ................   Passed    1.15 sec
          Start 61: blas.gemm
    61/69 Test #61: blas.gemm ..............................***Failed    1.67 sec
          Start 62: blas.gemv
    62/69 Test #62: blas.gemv ..............................   Passed    1.07 sec
          Start 63: blas.iamax
    63/69 Test #63: blas.iamax .............................   Passed    1.08 sec
          Start 64: blas.norm2
    64/69 Test #64: blas.norm2 .............................   Passed    1.62 sec
          Start 65: ext.complex
    65/69 Test #65: ext.complex ............................   Passed    3.91 sec
          Start 66: ext.lambda
    66/69 Test #66: ext.lambda .............................   Passed    1.65 sec
          Start 67: ext.malloc
    67/69 Test #67: ext.malloc .............................   Passed    0.51 sec
          Start 68: ext.pair
    68/69 Test #68: ext.pair ...............................   Passed    2.82 sec
          Start 69: ext.tuple
    69/69 Test #69: ext.tuple ..............................***Failed    0.51 sec
    
    74% tests passed, 18 tests failed out of 69
    
    Total Test time (real) = 119.04 sec
    
    The following tests FAILED:
          7 - core.kernel (Failed)
         12 - algorithm.accumulate (Failed)
         17 - algorithm.copy (Failed)
         19 - algorithm.count (Failed)
         30 - algorithm.inplace_reduce (Failed)
         31 - algorithm.insertion_sort (Failed)
         37 - algorithm.partition (Failed)
         38 - algorithm.radix_sort (Failed)
         40 - algorithm.reduce (Failed)
         44 - algorithm.scan (Failed)
         45 - algorithm.scatter (Failed)
         46 - algorithm.sort (Failed)
         48 - algorithm.transform (Failed)
         49 - algorithm.transform_reduce (Failed)
         51 - container.array (Failed)
         53 - container.flat_set (Failed)
         61 - blas.gemm (Failed)
         69 - ext.tuple (Failed)
    

    AMD Capeverde:

    Test project /home/demidov/work/opencl/compute/build/test
          Start  1: core.buffer
     1/69 Test  #1: core.buffer ............................   Passed    2.12 sec
          Start  2: core.command_queue
     2/69 Test  #2: core.command_queue .....................   Passed    0.89 sec
          Start  3: core.device
     3/69 Test  #3: core.device ............................   Passed    0.29 sec
          Start  4: core.image2d
     4/69 Test  #4: core.image2d ...........................***Failed    0.88 sec
          Start  5: core.image3d
     5/69 Test  #5: core.image3d ...........................   Passed    0.29 sec
          Start  6: core.image_sampler
     6/69 Test  #6: core.image_sampler .....................   Passed    0.29 sec
          Start  7: core.kernel
     7/69 Test  #7: core.kernel ............................***Failed    0.29 sec
          Start  8: core.program
     8/69 Test  #8: core.program ...........................***Failed    0.29 sec
          Start  9: core.system
     9/69 Test  #9: core.system ............................   Passed    0.29 sec
          Start 10: core.type_traits
    10/69 Test #10: core.type_traits .......................   Passed    0.01 sec
          Start 11: core.types
    11/69 Test #11: core.types .............................   Passed    0.00 sec
          Start 12: algorithm.accumulate
    12/69 Test #12: algorithm.accumulate ...................***Failed    2.64 sec
          Start 13: algorithm.adjacent_difference
    13/69 Test #13: algorithm.adjacent_difference ..........***Failed    0.60 sec
          Start 14: algorithm.adjacent_find
    14/69 Test #14: algorithm.adjacent_find ................***Failed    0.90 sec
          Start 15: algorithm.any_all_none_of
    15/69 Test #15: algorithm.any_all_none_of ..............***Failed    0.90 sec
          Start 16: algorithm.binary_search
    16/69 Test #16: algorithm.binary_search ................***Failed    1.48 sec
          Start 17: algorithm.copy
    17/69 Test #17: algorithm.copy .........................***Failed   13.78 sec
          Start 18: algorithm.copy_if
    18/69 Test #18: algorithm.copy_if ......................***Failed    2.04 sec
          Start 19: algorithm.count
    19/69 Test #19: algorithm.count ........................***Failed    2.34 sec
          Start 20: algorithm.equal
    20/69 Test #20: algorithm.equal ........................***Failed    2.52 sec
          Start 21: algorithm.equal_range
    21/69 Test #21: algorithm.equal_range ..................***Failed    0.88 sec
          Start 22: algorithm.extrema
    22/69 Test #22: algorithm.extrema ......................***Failed    2.36 sec
          Start 23: algorithm.fill
    23/69 Test #23: algorithm.fill .........................***Failed    1.18 sec
          Start 24: algorithm.find
    24/69 Test #24: algorithm.find .........................***Failed    2.95 sec
          Start 25: algorithm.for_each
    25/69 Test #25: algorithm.for_each .....................***Failed    0.91 sec
          Start 26: algorithm.gather
    26/69 Test #26: algorithm.gather .......................***Failed    1.49 sec
          Start 27: algorithm.generate
    27/69 Test #27: algorithm.generate .....................***Failed    0.60 sec
          Start 28: algorithm.histogram
    28/69 Test #28: algorithm.histogram ....................***Failed    0.89 sec
          Start 29: algorithm.inner_product
    29/69 Test #29: algorithm.inner_product ................***Failed    1.18 sec
          Start 30: algorithm.inplace_reduce
    30/69 Test #30: algorithm.inplace_reduce ...............***Failed    1.77 sec
          Start 31: algorithm.insertion_sort
    31/69 Test #31: algorithm.insertion_sort ...............***Failed    6.14 sec
          Start 32: algorithm.iota
    32/69 Test #32: algorithm.iota .........................***Failed    0.88 sec
          Start 33: algorithm.is_sorted
    33/69 Test #33: algorithm.is_sorted ....................***Failed    2.04 sec
          Start 34: algorithm.merge
    34/69 Test #34: algorithm.merge ........................***Failed    1.18 sec
          Start 35: algorithm.mismatch
    35/69 Test #35: algorithm.mismatch .....................***Failed    1.19 sec
          Start 36: algorithm.partial_sum
    36/69 Test #36: algorithm.partial_sum ..................***Failed    0.88 sec
          Start 37: algorithm.partition
    37/69 Test #37: algorithm.partition ....................***Failed    2.66 sec
          Start 38: algorithm.radix_sort
    38/69 Test #38: algorithm.radix_sort ...................***Failed    6.13 sec
          Start 39: algorithm.random_shuffle
    39/69 Test #39: algorithm.random_shuffle ...............***Failed    1.79 sec
          Start 40: algorithm.reduce
    40/69 Test #40: algorithm.reduce .......................***Failed    3.51 sec
          Start 41: algorithm.remove
    41/69 Test #41: algorithm.remove .......................***Failed    0.90 sec
          Start 42: algorithm.replace
    42/69 Test #42: algorithm.replace ......................***Failed    0.89 sec
          Start 43: algorithm.reverse
    43/69 Test #43: algorithm.reverse ......................***Failed    1.18 sec
          Start 44: algorithm.scan
    44/69 Test #44: algorithm.scan .........................***Failed    3.23 sec
          Start 45: algorithm.scatter
    45/69 Test #45: algorithm.scatter ......................***Failed    1.76 sec
          Start 46: algorithm.sort
    46/69 Test #46: algorithm.sort .........................***Failed    9.35 sec
          Start 47: algorithm.stable_sort
    47/69 Test #47: algorithm.stable_sort ..................***Failed    0.90 sec
          Start 48: algorithm.transform
    48/69 Test #48: algorithm.transform ....................***Failed   12.02 sec
          Start 49: algorithm.transform_reduce
    49/69 Test #49: algorithm.transform_reduce .............***Failed    1.52 sec
          Start 50: container.allocator
    50/69 Test #50: container.allocator ....................   Passed    0.59 sec
          Start 51: container.array
    51/69 Test #51: container.array ........................***Failed    6.25 sec
          Start 52: container.flat_map
    52/69 Test #52: container.flat_map .....................***Failed    5.27 sec
          Start 53: container.flat_set
    53/69 Test #53: container.flat_set .....................***Failed    3.85 sec
          Start 54: container.stack
    54/69 Test #54: container.stack ........................***Failed    2.64 sec
          Start 55: container.string
    55/69 Test #55: container.string .......................   Passed    0.60 sec
          Start 56: container.valarray
    56/69 Test #56: container.valarray .....................***Failed    3.52 sec
          Start 57: container.vector
    57/69 Test #57: container.vector .......................***Failed   54.50 sec
          Start 58: iterator.adjacent_transform_iterator
    58/69 Test #58: iterator.adjacent_transform_iterator ...***Failed    1.48 sec
          Start 59: iterator.zip_iterator
    59/69 Test #59: iterator.zip_iterator ..................***Failed    1.49 sec
          Start 60: random.mersenne_twister
    60/69 Test #60: random.mersenne_twister ................***Failed    0.60 sec
          Start 61: blas.gemm
    61/69 Test #61: blas.gemm ..............................***Failed    0.90 sec
          Start 62: blas.gemv
    62/69 Test #62: blas.gemv ..............................***Failed    0.61 sec
          Start 63: blas.iamax
    63/69 Test #63: blas.iamax .............................***Failed    0.61 sec
          Start 64: blas.norm2
    64/69 Test #64: blas.norm2 .............................***Failed    0.60 sec
          Start 65: ext.complex
    65/69 Test #65: ext.complex ............................***Failed    4.40 sec
          Start 66: ext.lambda
    66/69 Test #66: ext.lambda .............................***Failed    3.24 sec
          Start 67: ext.malloc
    67/69 Test #67: ext.malloc .............................   Passed    0.88 sec
          Start 68: ext.pair
    68/69 Test #68: ext.pair ...............................***Failed    5.90 sec
          Start 69: ext.tuple
    69/69 Test #69: ext.tuple ..............................   Passed    2.37 sec
    
    17% tests passed, 57 tests failed out of 69
    
    Total Test time (real) = 206.29 sec
    
    The following tests FAILED:
          4 - core.image2d (Failed)
          7 - core.kernel (Failed)
          8 - core.program (Failed)
         12 - algorithm.accumulate (Failed)
         13 - algorithm.adjacent_difference (Failed)
         14 - algorithm.adjacent_find (Failed)
         15 - algorithm.any_all_none_of (Failed)
         16 - algorithm.binary_search (Failed)
         17 - algorithm.copy (Failed)
         18 - algorithm.copy_if (Failed)
         19 - algorithm.count (Failed)
         20 - algorithm.equal (Failed)
         21 - algorithm.equal_range (Failed)
         22 - algorithm.extrema (Failed)
         23 - algorithm.fill (Failed)
         24 - algorithm.find (Failed)
         25 - algorithm.for_each (Failed)
         26 - algorithm.gather (Failed)
         27 - algorithm.generate (Failed)
         28 - algorithm.histogram (Failed)
         29 - algorithm.inner_product (Failed)
         30 - algorithm.inplace_reduce (Failed)
         31 - algorithm.insertion_sort (Failed)
         32 - algorithm.iota (Failed)
         33 - algorithm.is_sorted (Failed)
         34 - algorithm.merge (Failed)
         35 - algorithm.mismatch (Failed)
         36 - algorithm.partial_sum (Failed)
         37 - algorithm.partition (Failed)
         38 - algorithm.radix_sort (Failed)
         39 - algorithm.random_shuffle (Failed)
         40 - algorithm.reduce (Failed)
         41 - algorithm.remove (Failed)
         42 - algorithm.replace (Failed)
         43 - algorithm.reverse (Failed)
         44 - algorithm.scan (Failed)
         45 - algorithm.scatter (Failed)
         46 - algorithm.sort (Failed)
         47 - algorithm.stable_sort (Failed)
         48 - algorithm.transform (Failed)
         49 - algorithm.transform_reduce (Failed)
         51 - container.array (Failed)
         52 - container.flat_map (Failed)
         53 - container.flat_set (Failed)
         54 - container.stack (Failed)
         56 - container.valarray (Failed)
         57 - container.vector (Failed)
         58 - iterator.adjacent_transform_iterator (Failed)
         59 - iterator.zip_iterator (Failed)
         60 - random.mersenne_twister (Failed)
         61 - blas.gemm (Failed)
         62 - blas.gemv (Failed)
         63 - blas.iamax (Failed)
         64 - blas.norm2 (Failed)
         65 - ext.complex (Failed)
         66 - ext.lambda (Failed)
         68 - ext.pair (Failed)
    
    opened by ddemidov 38
  • compute::vector<custom_type> and resize

    compute::vector and resize

    When I want to create vector with custom type and resize it, I get compilation error:

    C:\boost_gcc\include\boost-1_61/boost/compute/type_traits/type_name.hpp:98:46: error: incomplete type 'boost::compute::detail::type_name_trait<custom_type>' used in nested name specifier return detail::type_name_trait::value();

    Code example:

    #include <boost/compute.hpp>
    
    using namespace std;
    
    struct custom_type {
        boost::compute::int_ someInt;
        boost::compute::float2_ somePoint;
    };
    
    int main() {
        boost::compute::vector<custom_type> v;
        v.resize(10);
    
        return 0;
    }
    
    

    Please, fix it.

    opened by dPavelDev 22
  • compute::sort fails with as little as 33 items

    compute::sort fails with as little as 33 items

    using https://github.com/kylelutz/compute/blob/master/example/sort_vector.cpp we can make it fail by using a host_vector initialized with more than 32 items. Tested on mac osx with a hd4000 and on linux using a nvidia card.

    On mac osx:

    input: [ 7, 49, 73, 58, 30, 72, 44, 78, 23, 9, 40, 65, 92, 42, 87, 3, 27, 29, 40, 12, 3, 69, 9, 57, 60, 33, 99, 78, 16, 35, 97, 26, 12 ]
    terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::context_error> >'
      what():  [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
    [1]    92979 abort      ./sort_vector.osx
    

    On linux/nvidia:

    input: [ 83, 86, 77, 15, 93, 35, 86, 92, 49, 21, 62, 27, 90, 59, 63, 26, 40, 26, 72, 36, 11, 68, 67, 29, 82, 30, 62, 23, 67, 35, 29, 2, 22 ]
    terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::runtime_exception> >'
      what():  Build Program Failure
    Stack dump:
    [1]    19134 segmentation fault (core dumped)  ./sort_vector.linux
    

    Easy c&p reproduce: https://gist.github.com/hansbogert/10975461

    bug 
    opened by hansbogert 22
  • test suite failing with pocl icd

    test suite failing with pocl icd

    I have tried running the test suite with the pocl OpenCL ICD which allows to run OpenCL kernels on the CPU. This is potentially useful for Linux packaging build-farms, which may not have much OpenCL compute capability to run the test suite besides on the CPU.

    My setup consists of an up-to-date Debian testing installation. The pocl OpenCL ICD driver is provided by the pocl-opencl-icd driver.

    The build was performed with the following CMake options -DBOOST_COMPUTE_BUILD_TESTS=ON -DBOOST_COMPUTE_THREAD_SAFE=ON on both the v0.4 tagged release and the most current develop snapshot. Test suite was run via make test.

    Please find the respective reports in separate posts below. Feel free to ask for futher instructions and I'll try to help as much as I can.

    Best regards,

    opened by ghisvail 21
  • Problem using arbitrary struct as value parameter in compute::sort_by_key()

    Problem using arbitrary struct as value parameter in compute::sort_by_key()

    Hello Kyle,

    I tried to use compute::sort_by_key with a vector as key and a struct like that

    typedef struct
    {
        float a;
        float v[20];
    } imx;
    vector<imx>
    

    as the value (or payload). But unfortunately I cannot use this struct, only basic data types like int, float, double or so ... Browsing the code it looks not too difficult to extend the current behaviour ...

    /Jesko

    //--------------------------------------------------------- code snippet below
        std::vector<float> host_vector_keys(33);
        std::vector<imx> host_vector_payload(host_vector_keys.size());
        compute::vector<float> device_vector_keys = host_vector_keys;
        compute::vector<imx> device_vector_payload = host_vector_payload;
        compute::sort_by_key(device_vector_keys.begin(), device_vector_keys.end(), device_vector_payload.begin());
    

    bug question 
    opened by jesko42 17
  • Geometry

    Geometry "builtins" normalize and fast_normalize not declared

    The following code to calculate the normals of a vector of points fails to compile:

    #include <boost/compute/functional/math.hpp>
    #include <boost/compute/functional/geometry.hpp>
    #include <boost/compute/lambda.hpp>
    
    namespace compute = boost::compute;
    
    ...
      // Calculate normals on the device
      using namespace boost::compute::lambda;
      boost::compute::transform(device_points.begin(), device_points.end() -1,
                                device_points.begin() +1, device_normals.begin(),
                                normalize(cross(_1, _2)), queue);
    

    with:

    error: 'normalize' was not declared in this scope
                                normalize(cross(_1, _2)), queue);
                                                      ^
    

    Replacing normalize with fast_normalize fails with the same error, as does the "proper" English spellings of normalise and fast_normalise ; ). However, replacing normalize with cross compiles and runs fine, but calculates the wrong result (obviously!).

    I can see the builtins for normalize and fast_normalize declared in geometry.hpp along with cross, etc.:

    namespace boost {
    namespace compute {
    BOOST_COMPUTE_DECLARE_BUILTIN_FUNCTION(cross, T (T, T), class T)
    ...
    BOOST_COMPUTE_DECLARE_BUILTIN_FUNCTION(normalize, T (T), class T)
    BOOST_COMPUTE_DECLARE_BUILTIN_FUNCTION(fast_normalize, T (T), class T)
    } // end compute namespace
    } // end boost namespace
    

    But using boost::compute::normalize(cross(_1, _2) gives:

    error: missing template arguments before '(' token
                                 boost::compute::normalize(cross(_1, _2)), queue);
                                                         ^
    

    I've got a workaround with a kernel function using boost::compute::function , but it'd be simpler (and it may be quicker) to use normalize...

    I'm compiling under Windows 10 using MinGw version: 5.3.0 and compute from boost: version 1.61

    opened by kenba 15
  • Sort algorithms fail running on AMD Radeon RX Vega 56

    Sort algorithms fail running on AMD Radeon RX Vega 56

    Similar to issue reported in #795, I am facing all kinds of sort() failures with AMD RX Vega GPUs

    I ran the test with the command ctest --output-on-failure. And here is the summary

    Test result for driver Adrenalin 18.5.2 & 18.9.1

    The following tests FAILED: 54 - algorithm.radix_sort (Failed) 55 - algorithm.radix_sort_by_key (Failed) 75 - algorithm.sort_by_key (Failed) 77 - algorithm.stable_sort (Failed) 143 - misc.amd_cpp_kernel_language (Failed) 148 - example.amd_cpp_kernel (Exit code 0xc0000409)

    Test result for driver Adrenalin 18.12.2

    The following tests FAILED: 41 - algorithm.insertion_sort (Failed) 45 - algorithm.merge_sort_gpu (Failed) 54 - algorithm.radix_sort (Failed) 55 - algorithm.radix_sort_by_key (Failed) 74 - algorithm.sort (Failed) 75 - algorithm.sort_by_key (Failed) 77 - algorithm.stable_sort (Failed) 143 - misc.amd_cpp_kernel_language (Failed) 148 - example.amd_cpp_kernel (Exit code 0xc0000409)

    Bold items mean the test failed on new driver but not in older drivers

    Curiously, with latest drivers it gets even worse.

    Note 1. For complete failure reports look here.
    Note 2. As I recall, AMD Radeon HD 6770 passes every test (maybe except for the amd_cpp_kernel tests)

    driver bug 
    opened by rosenrodt 12
  • Add missing lambda wrappers for builtin OpenCL funcs

    Add missing lambda wrappers for builtin OpenCL funcs

    I added a few missing lambda wrappers for OpenCL builtin function (see https://github.com/boostorg/compute/issues/659), but I noticed that a lot more is missing. @kylelutz, did you skip them on purpose or should I add all of them?

    opened by jszuppe 12
  • BOOST_COMPUTE_ADAPT_STRUCT crash with transform

    BOOST_COMPUTE_ADAPT_STRUCT crash with transform

    This code always crashes once transform is called.

    BOOST_COMPUTE_FUNCTION(bool, modify, (test::clParticle a),
    {
        a.y += 0.5;
        return true;
    });
    
    boost::compute::transform   (gpu_particles.begin(), gpu_particles.end(), gpu_particles.begin(), modify, 
    
    opened by Naviee 12
  • Improving Nbody Example

    Improving Nbody Example

    I have to admit that I am not aware of most features of Boost.Compute's features. I am trying to follow the suggestions that were made here. The current state is here. I need help working further on it. (please do not wonder to much about that only the x component is unequal to zero, this was part of an experiment).

    I created a boost::compute::vector<float4_> to be able to use the fill()-algorithm. Is my assumption that the representing buffer obtained boost::compute::vector<float4_>::get_buffer() is a segment of memory containing the float4 linearly?

    I have no idea how to generate the initial positions randomly on the device. The problem is that OpenCL-OpenGL interaction is possible in one direction only. Can I create a vector or anything else that offers iterators from my VBO or is there another way to init it with random values?

    Also I seem to have messed up the drawing routine, no vertices are appearing. There is probably something wrong with my glVertexPointer() and/or my glDrawArrays() call (already tried to use. I am definitely lacking the experience with OpenGL to see what's wrong.

    opened by f-koehler 12
  • Please do not set CMAKE_MODULE_PATH.

    Please do not set CMAKE_MODULE_PATH.

    https://github.com/boostorg/compute/blob/36350b7/CMakeLists.txt#L64 sets CMAKE_MODULE_PATH and overwrites CMake and possibly toolchain and Compiler/Platform settings.

    This prevents users from providing find_package scripts or configs that use include(... relative to the default or toolchain CMAKE_MODULE_PATH.

    Please replace it with

    list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
    

    like the hof library does, or

    list(INSERT CMAKE_MODULE_PATH 0 ${CMAKE_BINARY_DIR}/cmake)
    

    like the gil does.

    opened by qis 0
  • Minimum Boost Version

    Minimum Boost Version

    According to https://www.boost.org/users/history/ Compute was added in 1.61. Does the boost compute repo work with versions earlier than 1.61 since im only including header files? Does anyone know?

    Best, Marc

    opened by geoeo 0
Owner
Boost.org
Boost provides free peer-reviewed portable C++ source libraries.
Boost.org
OpenCL based GPU accelerated SPH fluid simulation library

libclsph An OpenCL based GPU accelerated SPH fluid simulation library Can I see it in action? Demo #1 Demo #2 Why? Libclsph was created to explore the

null 47 Jul 27, 2022
Patterns and behaviors for GPU computing

moderngpu 2.0 (c) 2016 Sean Baxter You can drop me a line here Full documentation with github wiki under heavy construction. Latest update: 2.12 2016

null 1.4k Jan 5, 2023
ParallelComputingPlayground - Shows different programming techniques for parallel computing on CPU and GPU

ParallelComputingPlayground Shows different programming techniques for parallel computing on CPU and GPU. Purpose The idea here is to compute a Mandel

Morten Nobel-Jørgensen 2 May 16, 2020
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

VexCL VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to redu

Denis Demidov 683 Nov 27, 2022
A small C OpenCL wrapper

oclkit, plain and stupid OpenCL helper oclkit is a small set of C functions, to avoid writing the same OpenCL boiler plate over and over again, yet ke

Matthias Vogelgesang 15 Jul 22, 2022
Fidelius - YeeZ Privacy Computing

Fidelius - YeeZ Privacy Computing Introduction In order to empower data collaboration between enterprises and help enterprises use data to enhance the

YeeZTech 59 Dec 9, 2022
A C++17 thread pool for high-performance scientific computing.

We present a modern C++17-compatible thread pool implementation, built from scratch with high-performance scientific computing in mind. The thread pool is implemented as a single lightweight and self-contained class, and does not have any dependencies other than the C++17 standard library, thus allowing a great degree of portability

Barak Shoshany 1.1k Jan 4, 2023
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 4k Dec 27, 2022
Optimized primitives for collective multi-GPU communication

NCCL Optimized primitives for inter-GPU communication. Introduction NCCL (pronounced "Nickel") is a stand-alone library of standard communication rout

NVIDIA Corporation 1.9k Dec 30, 2022
stdgpu: Efficient STL-like Data Structures on the GPU

stdgpu: Efficient STL-like Data Structures on the GPU Features | Examples | Documentation | Building | Integration | Contributing | License | Contact

Patrick Stotko 777 Jan 8, 2023
Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common

null 360 Dec 27, 2022
oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html

oneAPI DPC++ Library (oneDPL) The oneAPI DPC++ Library (oneDPL) aims to work with the oneAPI DPC++ Compiler to provide high-productivity APIs to devel

oneAPI-SRC 646 Dec 29, 2022
C++React: A reactive programming library for C++11.

C++React is reactive programming library for C++14. It enables the declarative definition of data dependencies between state and event flows. Based on

Sebastian 968 Dec 22, 2022
A library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies.

Fiber Tasking Lib This is a library for enabling task-based multi-threading. It allows execution of task graphs with arbitrary dependencies. Dependenc

RichieSams 796 Dec 30, 2022
The C++ Standard Library for Parallelism and Concurrency

Documentation: latest, development (master) HPX HPX is a C++ Standard Library for Concurrency and Parallelism. It implements all of the corresponding

The STE||AR Group 2.1k Jan 3, 2023
A C++ library of Concurrent Data Structures

CDS C++ library The Concurrent Data Structures (CDS) library is a collection of concurrent containers that don't require external (manual) synchroniza

Max Khizhinsky 2.2k Jan 3, 2023
A header-only C++ library for task concurrency

transwarp Doxygen documentation transwarp is a header-only C++ library for task concurrency. It allows you to easily create a graph of tasks where eve

Christian Blume 592 Dec 19, 2022
:copyright: Concurrent Programming Library (Coroutine) for C11

libconcurrent tiny asymmetric-coroutine library. Description asymmetric-coroutine bidirectional communication by yield_value/resume_value native conte

sharow 350 Sep 2, 2022
Simple and fast C library implementing a thread-safe API to manage hash-tables, linked lists, lock-free ring buffers and queues

libhl C library implementing a set of APIs to efficiently manage some basic data structures such as : hashtables, linked lists, queues, trees, ringbuf

Andrea Guzzo 392 Dec 3, 2022