ArrayFire: a general purpose GPU library.

Overview

ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs, and other hardware acceleration devices. The library serves users in every technical computing market.

Several of ArrayFire's benefits include:

  • Hundreds of accelerated tensor computing functions, in the following areas:
    • Array handling
    • Computer vision
    • Image processing
    • Linear algebra
    • Machine learning
    • Standard math
    • Signal Processing
    • Statistics
    • Vector algorithms
  • Easy to use, stable, well-documented API
  • Rigorous benchmarks and tests ensuring top performance and numerical accuracy
  • Cross-platform compatibility with support for CUDA, OpenCL, and native CPU on Windows, Mac, and Linux
  • Built-in visualization functions through Forge
  • Commercially friendly open-source licensing
  • Enterprise support from ArrayFire

ArrayFire provides software developers with a high-level abstraction of data that resides on the accelerator, the af::array object. Developers write code that performs operations on ArrayFire arrays, which, in turn, are automatically translated into near-optimal kernels that execute on the computational device.

ArrayFire runs on devices ranging from low-power mobile phones to high-power GPU-enabled supercomputers. ArrayFire runs on CPUs from all major vendors (Intel, AMD, ARM), GPUs from the prominent manufacturers (NVIDIA, AMD, and Qualcomm), as well as a variety of other accelerator devices on Windows, Mac, and Linux.

Getting ArrayFire

Instructions to install or to build ArrayFire from source can be found on the wiki.

Conway's Game of Life Using ArrayFire

Visit the Wikipedia page for a description of Conway's Game of Life.

Conway's Game of Life

static const float h_kernel[] = { 1, 1, 1, 1, 0, 1, 1, 1, 1 };
static const array kernel(3, 3, h_kernel, afHost);

array state = (randu(128, 128, f32) > 0.5).as(f32); // Init state
Window myWindow(256, 256);
while(!myWindow.close()) {
    array nHood = convolve(state, kernel); // Obtain neighbors
    array C0 = (nHood == 2);  // Generate conditions for life
    array C1 = (nHood == 3);
    state = state * C0 + C1;  // Update state
    myWindow.image(state);    // Display
}

The complete source code can be found here.

Perceptron

Perceptron

array predict(const array &X, const array &W) {
    return sigmoid(matmul(X, W));
}

array train(const array &X, const array &Y,
        double alpha = 0.1, double maxerr = 0.05,
        int maxiter = 1000, bool verbose = false) {
    array Weights = constant(0, X.dims(1), Y.dims(1));

    for (int i = 0; i < maxiter; i++) {
        array P   = predict(X, Weights);
        array err = Y - P;
        if (mean<float>(abs(err) < maxerr) break;
        Weights += alpha * matmulTN(X, err);
    }
    return Weights;
}
...

array Weights = train(train_feats, train_targets);
array test_outputs  = predict(test_feats, Weights);
display_results<true>(test_images, test_outputs,
                      test_targets, 20);

The complete source code can be found here.

For more code examples, visit the examples/ directory.

Documentation

You can find the complete documentation here.

Quick links:

Language support

ArrayFire has several official and community maintained language API's:

C++ Python Rust Julia Nim

  Community maintained wrappers

In-Progress Wrappers

.NET Fortran Go Java Lua NodeJS R Ruby

Contributing

The community of ArrayFire developers invites you to build with us if you are interested and able to write top-performing tensor functions. Together we can fulfill The ArrayFire Mission for fast scientific computing for all.

Contributions of any kind are welcome! Please refer to the wiki and our Code of Conduct to learn more about how you can get involved with the ArrayFire Community through Sponsorship, Developer Commits, or Governance.

Citations and Acknowledgements

If you redistribute ArrayFire, please follow the terms established in the license. If you wish to cite ArrayFire in an academic publication, please use the following citation document.

ArrayFire development is funded by AccelerEyes LLC and several third parties, please see the list of acknowledgements for an expression of our gratitude.

Support and Contact Info

Trademark Policy

The literal mark "ArrayFire" and ArrayFire logos are trademarks of AccelerEyes LLC (dba ArrayFire). If you wish to use either of these marks in your own project, please consult ArrayFire's Trademark Policy

Issues
  • Build error on OSX

    Build error on OSX

    While building af from the source, I got the error shown below. All the dependencies are installed (CUDA version: 6.5). How can I fix the issue?

    Any advice is appreciated.

    CMake Error at afcuda_generated_copy.cu.o.cmake:264 (message):
      Error generating file
      /Users/kerkil/Git/arrayfire/build/src/backend/cuda/CMakeFiles/afcuda.dir//./afcuda_generated_copy.cu.o
    
    
    make[2]: *** [src/backend/cuda/CMakeFiles/afcuda.dir/./afcuda_generated_copy.cu.o] Error 1
    2 errors detected in the compilation of "/var/folders/7m/g0rk38md4z75ryh_h6b60d040000gn/T//tmpxft_00003ee2_00000000-6_count.cpp1.ii".
    CMake Error at afcuda_generated_count.cu.o.cmake:264 (message):
      Error generating file
      /Users/kerkil/Git/arrayfire/build/src/backend/cuda/CMakeFiles/afcuda.dir//./afcuda_generated_count.cu.o
    
    
    make[2]: *** [src/backend/cuda/CMakeFiles/afcuda.dir/./afcuda_generated_count.cu.o] Error 1
    make[1]: *** [src/backend/cuda/CMakeFiles/afcuda.dir/all] Error 2
    make: *** [all] Error 2
    
    build OSX CUDA 
    opened by kerkilchoi 56
  • Add framework for extensible ArrayFire memory managers

    Add framework for extensible ArrayFire memory managers

    Motivation

    Many different use cases require performance across many different memory allocation patterns. Even different devices/backends have different costs associated with memory allocations/manipulations. Having the flexibility to implement different memory management schemes can help optimize performance for the use case and backend.

    Framework

    • The basic interface lives in a new header: include/af/memory.h and includes two interfaces:
      • A C-style interface defined in the af_memory_manager struct, which includes function pointers to which custom memory manager implementations should be defined along with device/backend-specific functions that can be called by the implementation (e.g. nativeAlloc) and will be dynamically set. Typesafe C-style struct inheritance should be used.
      • A C++ style interface using MemoryManagerBase, which defines pure-virtual methods for the API along with device/backend-specific functions as above.

    A C++ implementation is simple, and requires only:

    #include <af/memory.h>
    ...
    class MyCustomMemoryManager : public af::MemoryManagerBase {
    ...
      void* alloc(const size_t size, bool user_lock) override {
        ...
        void* ptr = this->nativeAlloc(...);
        ...
      }
    };
    
    // In some code run at startup:
    af::MemoryManagerBase* p = new MyCustomMemoryManager();
    af::setMemoryManager(p);
    

    For the C API:

    #include <af/memory.h>
    ...
    af_memory_manager_impl_alloc(const size_t size, bool user_lock) {
      ...
    }
    
    typedef struct af_memory_manager_impl {
      af_memory_manager manager; // inherit base methods
      // define custom implementation
    } af_memory_manager_impl;
    
    // In some code run at startup:
    ...
    af_memory_manager* p = (af_memory_manager*)malloc(sizeof(af_memory_manager_impl));
    p->af_memory_manager_alloc = &af_memory_manager_impl_alloc;
    ...
    af_set_memory_manager(p);
    
    

    Details

    • The F-bound polymorphism pattern present in the existing MemoryManager implementation is removed; removing this was required as it precludes dynamically dispatching to a derived implementation.
    • New interfaces are defined for C/C++ (see below)
    • MemoryManagerCWrapper wraps a C struct implementation of a memory manager and facilitates using the same backend and DeviceManager APIs to manipulate a manager implemented in C.

    API Design Decisions

    • If a custom memory manager is not defined or set, the default memory manager will be used. While the default memory manager implements the new existing interface, behavior is completely identical to existing behavior by default (as verified by tests)
    • Memory managers should be stored on the existing DeviceManager framework so as to preserve the integrity of existing backend APIs; memory managers can exist on a per-backend basis and work with the unified backend.
    • Existing ArrayFire APIs expect garbage collection and memory step sizing to be implemented in a memory manager. These and a few other slightly opinionated methods are included in the overall API.
      • That said, these methods can be noops or throw exceptions (e.g. garbage collection) if the style of custom memory manager implementation doesn't implement those facilities.
    • Setting a memory manager should use one API in the same C/C++ fashion so as to be compatible with the unified backend via dynamic invocation of symbols in a shared object. The C and C++ APIs should have a polymorphic relationship such that either can be passed to the public API (af::MemoryManagerBase is this a subtype of af_memory_manager, a C struct)

    • Adds tests defining custom memory manager implementations in both the C++ and C API and testing end-to-end Array allocations and public AF API calls (e.g. garbage collection, step size).
    opened by jacobkahn 52
  • Speedup of kernel caching mechanism by hashing sources at compile time

    Speedup of kernel caching mechanism by hashing sources at compile time

    Measured results: a) 1 million times join(1, af::dim4(10,10), af::dim4(10,10)) --> 63% faster vs master 3.8.0

    b) example/neural_network.cpp with varying batch sizes

    • to switch saturation between CPU to GPU)
    • the best test accuracy is obtained with a batch size around 48 (reason to go so small) on AMD A10-7870K (AMD Radeon R7 Graphics 8CU), on faster GPU's the improvement will persist with higher batch sizes. --> up to 18% faster vs master 3.8.0 Timings neural_network cacheHashing.xlsx

    c) memory footprint reduced by 37%, and on top no longer copies internal. All the OpenCL.cl kernel source code files occupy 451KB, vs the remaining code strings in the generated obfuscated hhp files only occupy 283KB. I assume that a similar effect is visible with the CUDA kernel code.

    Description

    Changes in backend/common/kernel_cache.cpp & backend/common/util.cpp

    1. Hashing is now incremental, so that only the dynamic parts are calculated
    2. Hashkey changed from string to size_t, to speed-up the find functions on map
    3. The hashkey is now for each kernel calculated at compile time by bin2cpp.exe
    4. The hashkey of multiple sources, is obtained by re-hashing the individual hashes

    Changes in all OpenCL kernels in backend/opencl/kernel/*.hpp

    1. The struct common::Source now contains: Source code, Source length & Source hash
    2. The static struct, generated at compile time, is now used directly

    Changes in interfaces:

    1. deterministicHash overloaded for incremental hashing of string, vector and vector
    2. getKernel now accepts common::Sources object, iso of vector

    Current flow of data: New kernel: 0. Compile, kernel.cl static const common::Source{*code, length, hash}

    1. Rep, kernel.hpp: vector
    2. Rep, kernel_cache.cpp: string tInstance <-- build directly from input data (fast)
    3. Rep, util.cpp: size_t moduleKey <--combined hashes for multiple sources <-- incremental hashes of options & tInstance (fast)

    Search kernel:

    1. Rep, kernel_cache.cpp: search map with moduleKey (1 cmp 64bit instruction per kernel)

    Previous (3.8.0 master) flow of data: New kernel:

    1. Once, kernel.hpp: static const string <-- main kernel codefile cached
    2. Rep, kernel.hpp: vector <-- build combined kernel codefiles
    3. Rep, kernel_cache.cpp: vector args <-- transform targs vector into args (replace)
    4. Rep, kernel_cache.cpp: string tInstance <-- build from args vector
    5. Rep, kernel_cache.cpp: vector hashingVals <-- copy tInstance + kernel codefiles + options
    6. Rep, util.cpp: string accumStr <-- copy vector into 1 string
    7. Rep, util.cpp: size_t hashkey <-- hash on full string (slow)
    8. Rep, kernel_cache.cpp: string moduleKey <-- convert size_t to string

    Search kernel:

    1. Rep, kernel_cache.cpp: search map with moduleKey (1cmp per char, 23 cmp 8bit instructions per kernel)

    Changes to Users

    None

    Checklist

    • [x] Rebased on latest master (Nov 18, 2020)
    • [x] Code compiles
    • [x] Tests pass
    • [-] Functions added to unified API
    • [-] Functions documented
    perf 
    opened by willyborn 43
  • Fontconfig error: Cannot load default config file on Mac OSX 10.11.3

    Fontconfig error: Cannot load default config file on Mac OSX 10.11.3

    Hi Everyone,

    I have a MacBook PRO with an NVIDIA discrete graphics card (750m) I succeeded to compile arrayfire without errors but when I try to run filters_cuda for example, I get:

    ArrayFire v3.3.0 (CUDA, 64-bit Mac OSX, build fd660a0) Platform: CUDA Toolkit 7.5, Driver: CUDA Driver Version: 7050 [0] GeForce GT 750M, 2048 MB, CUDA Compute 3.0 Fontconfig error: Cannot load default config file ArrayFire Exception (Internal error:998): @Freetype library:217: font face creation failed(3001)

    In function void af::Window::initWindow(const int, const int, const char *const) In file src/api/cpp/graphics.cpp:19 libc++abi.dylib: terminating with uncaught exception of type af::exception: ArrayFire Exception (Internal error:998): @Freetype library:217: font face creation failed(3001)

    In function void af::Window::initWindow(const int, const int, const char *const) In file src/api/cpp/graphics.cpp:19 Abort trap: 6

    I checked with brew install and I have both "fontconfig" and "freetype" libraries.

    OSX 
    opened by dvasiliu 42
  • NVCC does not support Apple Clang version 8.x

    NVCC does not support Apple Clang version 8.x

    Error message: nvcc fatal : The version ('80000') of the host compiler ('Apple clang') is not supported

    Steps to fix:

    1. Log in to https://developer.apple.com/downloads/
    2. Download Xcode CLT (Command Line Tools) 7.3
    3. Install CLT
    4. Run sudo xcode-select --switch /Library/Developer/CommandLineTools
    5. Verify that clang has been downgraded via clang --version

    Source: http://stackoverflow.com/a/36590330/701646

    Edit: Update to 7.3 and fail at 8.0

    OSX known issue 
    opened by mlloreda 37
  • Lapack tests fail if CBlast is used

    Lapack tests fail if CBlast is used

    A couple of BLAS-related tests (e.g. cholesky_dense, solve_dense,inverse_dense and LU) fail on a GTX Titan using OpenCL if compiled with CLBlast on Windows. Cholesky_dense gives errors like "Matrix C's OpenCL buffer is too small" , so I added some printf debugging to ClBlast's TestMatrixC:

    printf("ld          == %d\n", ld);
    printf("one         == %d\n", one);
    printf("two         == %d\n", two);
    printf("offset      == %d\n", offset);
    printf("buffer.size == %d\n", buffer.GetSize());
    printf("req size   == %d\n", required_size); 
    

    and Arrayfire's gpu_blas_herk_func in magma_blast_cblast.h:

    printf("triangle      == %d\n", triangle);
    printf("transpose     == %d\n", a_transpose);
    printf("n             == %d\n", n);
    printf("k             == %d\n", k);
    printf("a_buffer      == %d\n", a_buffer);
    printf("a_offset      == %d\n", a_offset);
    printf("a_ld          == %d\n", a_ld);
    printf("c_buffer      == %d\n", c_buffer);
    printf("c_offset      == %d\n", c_offset);
    printf("c_ld          == %d\n", c_ld);'
    

    with this, cholesky_dense_opencl produced the following output

    cholesky.txt

    I don't know if this is caused by an error in CLBlast (probably not, CLBlast's test all pass) or by the integration in arrayfire.

    Maybe @CNugteren could take a look at it?

    inverse_LU_solve.txt

    opened by fzimmermann89 35
  • Memory access errors after many iterations, cause: convolve or sum ?

    Memory access errors after many iterations, cause: convolve or sum ?

    Access Violation Error after

    Exception thrown at 0x00007FFBB53B669F (ntdll.dll) in MNIST_CNN-Toy.exe: 0xC0000005: 
    Access violation reading location 0x0000000000000010.
    

    from below line

    new_values(af::span, af::span, af::span, kernel) = 
    af::sum(af::convolve2(gradient, filter(af::span, af::span, kernel), AF_CONV_EXPAND), 3);
    

    in the code hosted at below URL https://github.com/Reithan/MachineLearning https://github.com/Reithan/MNIST_CNN-Toy

    Issue originally reported on slack channel by Reithan

    bug 
    opened by 9prady9 34
  • Compiling ArrayFire with FlexibLAS

    Compiling ArrayFire with FlexibLAS

    We are currently deploying a new AMD cluster, on which BLIS performs better than MKL, so we are moving away from building against MKL to use FlexiBLAS, which can be switched between MKL, BLIS, OpenBLAS or other BLAS/LAPACK libraries at run time.

    Would it be possible to build ArrayFire against FlexiBLAS instead of MKL ?

    At the moment, building it without MKL complains with

    392 CMake Error at CMakeModules/InternalUtils.cmake:10 (message):
    393   MKL not found
    394 Call Stack (most recent call first):
    

    Does ArrayFire needs MKL itself ? Or does it simply need BLAS/LAPACK ?

    feature 
    opened by mboisson 33
  • test/threading_cuda random crash at exit

    test/threading_cuda random crash at exit

    Worrying crash in test/threading_cuda : ArrayFire v3.7.0 (CUDA, 64-bit Linux, build 70ef1989) Platform: CUDA Toolkit 10.0, Driver: 440.33.01 [0] GeForce GTX 1080 Ti, 11179 MB, CUDA Compute 6.1

    $ test/threading_cuda Running main() from /local/nbuild/jenkins/workspace/AF-Release-Linux/test/gtest/googletest/src/gtest_main.cc [==========] Running 9 tests from 1 test case. [----------] Global test environment set-up. [----------] 9 tests from Threading [ RUN ] Threading.SetPerThreadActiveDevice Image IO Not Configured. Test will exit [ OK ] Threading.SetPerThreadActiveDevice (0 ms) [ RUN ] Threading.SimultaneousRead [ OK ] Threading.SimultaneousRead (5594 ms) [ RUN ] Threading.MemoryManagementScope [ OK ] Threading.MemoryManagementScope (2008 ms) [ RUN ] Threading.MemoryManagement_JIT_Node [ OK ] Threading.MemoryManagement_JIT_Node (8 ms) [ RUN ] Threading.FFT_R2C [ OK ] Threading.FFT_R2C (687 ms) [ RUN ] Threading.FFT_C2C [ OK ] Threading.FFT_C2C (13 ms) [ RUN ] Threading.FFT_ALL [ OK ] Threading.FFT_ALL (12 ms) [ RUN ] Threading.BLAS [ OK ] Threading.BLAS (339 ms) [ RUN ] Threading.Sparse [ OK ] Threading.Sparse (12699 ms) [----------] 9 tests from Threading (21360 ms total)

    [----------] Global test environment tear-down [==========] 9 tests from 1 test case ran. (21360 ms total) [ PASSED ] 9 tests.

    YOU HAVE 2 DISABLED TESTS

    Segmentation fault

    with gdb ... [ OK ] Threading.Sparse (12035 ms) [----------] 9 tests from Threading (20924 ms total) [----------] Global test environment tear-down [==========] 9 tests from 1 test case ran. (20924 ms total) [ PASSED ] 9 tests.

    YOU HAVE 2 DISABLED TESTS

    Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007fffe3f9f8ad in ?? () from /usr/local/cuda-10.0/lib64/libnvrtc.so.10.0 #2 0x00007fffe3f9f8d5 in ?? () from /usr/local/cuda-10.0/lib64/libnvrtc.so.10.0 #3 0x00007fffe5577d9d in __cxa_finalize () from /lib64/libc.so.6 #4 0x00007fffe3c26e06 in ?? () from /usr/local/cuda-10.0/lib64/libnvrtc.so.10.0 #5 0x0000000000000016 in ?? () #6 0x0000000000000000 in ?? ()

    opened by WilliamTambellini 32
  • [Build]Undefined reference to afcu_set_native_id

    [Build]Undefined reference to afcu_set_native_id

    Error while trying to build the program.

    Description

    I'm trying to use the setNativeId function from af/cuda.h. but Cmake can not find any reference to that function. I'm including correctly the af/cuda.h file in C++, so i think the issue is in Cmake. Cmake gets /opt/arrayfire/include included, but it still doesn't work properly. My AF Version is 3.6.2. I don't know if i need some additional headers for this function. I also tried to include /opt/arrayfire/include/af/cuda.h as an additional path, but it also failed.

    Error Log

    [email protected]:/workspaces/fiber/Debug> cmake .. -DBUILD_CUDA=ON
    Disabling BUILD_NIGHTLY_TESTS due to BUILD_TESTS
    -- Found libaf : ArrayFire::afcuda
    -- ArrayFire Include Dir /opt/arrayfire/include
    Enabled building tests
    Enabled building matio due to building tests
    -- Found PythonInterp: /usr/bin/python (found suitable version "3.4.6", minimum required is "3") 
    Build Test spot
    Build Test alpha
    Build Test interpolation
    Build Test ioMatlab
    Build Test param
    Build Test pmd
    Build Test raman
    Build Test smf_ssfm
    Build Test stepSizeDistribution
    Build Test utility
    Build Test fiber
     ##### Final Configuration ##### 
    SPOT_VERSION : v3.2-76-g351b46a
    CMAKE_BUILD_TYPE : DEBUG
    BUILD_MATLAB : ON
    BUILD_TESTS : ON
    BUILD_NIGHTLY_TESTS : OFF
    BUILD_SINGLETEST : 
    BUILD_PYTHON : ON
    BUILD_MATIO : ON
    BUILD_CUDA : ON
    CIRUNNER : OFF
    TEST_GPU : OFF
    BENCHMARK_LINEAR : ON
    BENCHMARK_NONLINEAR : ON
     ##### End of Configuration ##### 
    /opt/arrayfire/include
    -- Found MatLibs
    -- Found libmatio : /usr/local/lib/libmatio.so
    -- Found PythonInterp: /usr/bin/python (found version "3.4.6") 
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /workspaces/fiber/Debug
    [email protected]:/workspaces/fiber/Debug> make -j8
    [  0%] Built target header_obj
    Scanning dependencies of target source_obj
    [  2%] Built target cuda_kernels
    [  9%] Built target gmock_main
    [  5%] Built target gmock
    [ 21%] Built target gtest
    [ 25%] Built target obj
    [ 27%] Built target gtest_main
    [ 42%] Built target test_obj
    Scanning dependencies of target cuda_kernel
    [ 43%] Building Fortran object CMakeFiles/source_obj.dir/src/bvp_solver/BVP_M-2.f90.o
    [ 44%] Linking CXX executable interpolation
    [ 45%] Linking CXX executable ioMatlab
    [ 46%] Linking CXX executable pmd
    [ 47%] Linking CXX executable smf_ssfm
    [ 48%] Linking CXX executable alpha
    [ 50%] Building CXX object test/cuda/CMakeFiles/cuda_kernel.dir/cuda_kernels.cpp.o
    [ 51%] Linking CXX executable param
    f951: Warning: Nonexistent include directory '/usr/include/eigen3' [-Wmissing-include-dirs]
    f951: Fatal Error: '/opt/arrayfire/include/af/cuda.h' is not a directory
    compilation terminated.
    CMakeFiles/source_obj.dir/build.make:166: recipe for target 'CMakeFiles/source_obj.dir/src/bvp_solver/BVP_M-2.f90.o' failed
    make[2]: *** [CMakeFiles/source_obj.dir/src/bvp_solver/BVP_M-2.f90.o] Error 1
    CMakeFiles/Makefile2:109: recipe for target 'CMakeFiles/source_obj.dir/all' failed
    make[1]: *** [CMakeFiles/source_obj.dir/all] Error 2
    make[1]: *** Waiting for unfinished jobs....
    /usr/lib64/gcc/x86_64-suse-linux/6/../../../../x86_64-suse-linux/bin/ld: CMakeFiles/test_obj.dir/__/src/param.cpp.o: in function `afcu::setNativeId(int)':
    /opt/arrayfire/include/af/cuda.h:115: undefined reference to `afcu_set_native_id'
    collect2: error: ld returned 1 exit status
    test/CMakeFiles/alpha.dir/build.make:166: recipe for target 'test/alpha' failed
    make[2]: *** [test/alpha] Error 1
    CMakeFiles/Makefile2:544: recipe for target 'test/CMakeFiles/alpha.dir/all' failed
    make[1]: *** [test/CMakeFiles/alpha.dir/all] Error 2
    /usr/lib64/gcc/x86_64-suse-linux/6/../../../../x86_64-suse-linux/bin/ld: CMakeFiles/test_obj.dir/__/src/param.cpp.o: in function `afcu::setNativeId(int)':
    /opt/arrayfire/include/af/cuda.h:115: undefined reference to `afcu_set_native_id'
    collect2: error: ld returned 1 exit status
    test/CMakeFiles/interpolation.dir/build.make:166: recipe for target 'test/interpolation' failed
    make[2]: *** [test/interpolation] Error 1
    CMakeFiles/Makefile2:387: recipe for target 'test/CMakeFiles/interpolation.dir/all' failed
    make[1]: *** [test/CMakeFiles/interpolation.dir/all] Error 2
    /usr/lib64/gcc/x86_64-suse-linux/6/../../../../x86_64-suse-linux/bin/ld: CMakeFiles/test_obj.dir/__/src/param.cpp.o: in function `afcu::setNativeId(int)':
    /opt/arrayfire/include/af/cuda.h:115: undefined reference to `afcu_set_native_id'
    collect2: error: ld returned 1 exit status
    test/CMakeFiles/ioMatlab.dir/build.make:166: recipe for target 'test/ioMatlab' failed
    make[2]: *** [test/ioMatlab] Error 1
    CMakeFiles/Makefile2:307: recipe for target 'test/CMakeFiles/ioMatlab.dir/all' failed
    make[1]: *** [test/CMakeFiles/ioMatlab.dir/all] Error 2
    /usr/lib64/gcc/x86_64-suse-linux/6/../../../../x86_64-suse-linux/bin/ld: CMakeFiles/test_obj.dir/__/src/param.cpp.o: in function `afcu::setNativeId(int)':
    /opt/arrayfire/include/af/cuda.h:115: undefined reference to `afcu_set_native_id'
    collect2: error: ld returned 1 exit status
    /usr/lib64/gcc/x86_64-suse-linux/6/../../../../x86_64-suse-linux/bin/ld: CMakeFiles/test_obj.dir/__/src/param.cpp.o: in function `afcu::setNativeId(int)':
    /opt/arrayfire/include/af/cuda.h:115: undefined reference to `afcu_set_native_id'
    collect2: error: ld returned 1 exit status
    test/CMakeFiles/pmd.dir/build.make:166: recipe for target 'test/pmd' failed
    make[2]: *** [test/pmd] Error 1
    CMakeFiles/Makefile2:347: recipe for target 'test/CMakeFiles/pmd.dir/all' failed
    make[1]: *** [test/CMakeFiles/pmd.dir/all] Error 2
    test/CMakeFiles/smf_ssfm.dir/build.make:166: recipe for target 'test/smf_ssfm' failed
    make[2]: *** [test/smf_ssfm] Error 1
    CMakeFiles/Makefile2:427: recipe for target 'test/CMakeFiles/smf_ssfm.dir/all' failed
    make[1]: *** [test/CMakeFiles/smf_ssfm.dir/all] Error 2
    /usr/lib64/gcc/x86_64-suse-linux/6/../../../../x86_64-suse-linux/bin/ld: CMakeFiles/test_obj.dir/__/src/param.cpp.o: in function `afcu::setNativeId(int)':
    /opt/arrayfire/include/af/cuda.h:115: undefined reference to `afcu_set_native_id'
    collect2: error: ld returned 1 exit status
    test/CMakeFiles/param.dir/build.make:166: recipe for target 'test/param' failed
    make[2]: *** [test/param] Error 1
    CMakeFiles/Makefile2:467: recipe for target 'test/CMakeFiles/param.dir/all' failed
    make[1]: *** [test/CMakeFiles/param.dir/all] Error 2
    [ 52%] Linking CUDA device code CMakeFiles/cuda_kernel.dir/cmake_device_link.o
    [ 53%] Linking CXX executable cuda_kernel
    /usr/lib64/gcc/x86_64-suse-linux/6/../../../../x86_64-suse-linux/bin/ld: CMakeFiles/obj.dir/__/__/src/param.cpp.o: in function `afcu::setNativeId(int)':
    /opt/arrayfire/include/af/cuda.h:115: undefined reference to `afcu_set_native_id'
    collect2: error: ld returned 1 exit status
    test/cuda/CMakeFiles/cuda_kernel.dir/build.make:162: recipe for target 'test/cuda/cuda_kernel' failed
    make[2]: *** [test/cuda/cuda_kernel] Error 1
    CMakeFiles/Makefile2:932: recipe for target 'test/cuda/CMakeFiles/cuda_kernel.dir/all' failed
    make[1]: *** [test/cuda/CMakeFiles/cuda_kernel.dir/all] Error 2
    Makefile:140: recipe for target 'all' failed
    make: *** [all] Error 2
    

    Build Environment

    Compiler version: gcc 6.3.0 Operating system: Debian GNU/Linux 9.13 Build environment: CMake variables: cmake .. -DBUILD_CUDA=ON

    question build 
    opened by DamjanBVB 30
  • [Question] Potential issue with arrayfire/cuda interoperability when using cusolverDn library inside an arrayfire application

    [Question] Potential issue with arrayfire/cuda interoperability when using cusolverDn library inside an arrayfire application

    Hello,

    I'm trying to use the function "cusolverDnSsyevd" of the "cusolverDn" library to perform EVD of a symmetric hermitian matrix inside an arrayfire program.

    The code is here, as well as the compilation line I'm using: https://github.com/PierreLadis/arrayfire_tests/blob/c4cc07830f2a3690837e15c5afb4491100370f06/test_EVCD3.cpp

    I tried two different ways:

    1). function "evd_interop" (see code by following the link above): I started by following the guidelines for arrayfire/cuda interoperability detailed in the paragraph: "Adding custom CUDA kernels to an existing ArrayFire application". I start by retrieving the id of the cuda stream used by arrayfire and then, I make use of the function "cusolverDnSetStream" to force "cuSolver" to work inside that stream. The cuda part completes normally. I get an error when I try to use the results of the cuda function with arrayfire: "terminate called after throwing an instance of 'af::exception' what(): ArrayFire Exception (Internal error:998): "

    2). function "evd_standalone" (see code): I did a standalone cuda program that does evd. (I checked as a first step that it doesn't contain errors by using it outside arrayfire context in a standalone program compiled without linking to arrayfire). It also raises an error (same as before) when I'm trying to use the results from it with arrayfire after the function "evd_standalone" has completed.

    I also tried to make cublas matrix multiplication work using the same framework of code for interoperability: it completes without any problem. So the problem seems specific to the use of "cuSolverDn" with arrayfire.

    I'm using precompiled binaries of arrayfire 3.7.2 on a linux CentOS machine with 4 GPUs. Here is the output of af::info(): ArrayFire v3.7.2 (CUDA, 64-bit Linux, build 218dd2c) Platform: CUDA Runtime 10.0, Driver: 410.79 [0] GeForce GTX 1080 Ti, 11179 MB, CUDA Compute 6.1 -1- GeForce GTX 1080 Ti, 11179 MB, CUDA Compute 6.1 -2- GeForce GTX 1080 Ti, 11179 MB, CUDA Compute 6.1 -3- GeForce GTX 1080 Ti, 11176 MB, CUDA Compute 6.1

    Thank you in advance for considering my request.

    question 
    opened by PierreLadis 30
  • Call setDevice on each thread at entry point.

    Call setDevice on each thread at entry point.

    This PR improves the af_init function so that it calls cudaSetDevice for each new thread when creating any ArrayFire objects.

    Description

    CUDA requires that cudaSetDevice be called in each thread before any other calls are made to the CUDA API. This is done by default on the main thread but it is not done on new threads created. This commit changes the behavior or the af_init function so that it call the cudaSetDevice when creating a new object in ArrayFire.

    This commit also refactors the af_init function so that it calls a lower overhead init function which initializes the device manager.

    Changes to Users

    Users may not need to call af::setDevice in new threads before they can use ArrayFire functions.

    Checklist

    • [x] Rebased on latest master
    • [x] Code compiles
    • [x] Tests pass
    • ~[ ] Functions added to unified API~
    • ~[ ] Functions documented~
    opened by umar456 0
  • [Question] Convolution mode (af_conv_mode) for no padding

    [Question] Convolution mode (af_conv_mode) for no padding

    Before asking a question on github, please consider if it is more appropriate for these other platforms:

    Hi arrayfire,

    I have a question about the convolution mode for no padding.

    Here is a background of this question.

    Suppose I have a rectangular image of which dimensions is [x, y, 1, 1].

    I have a filter of which the dimension is [x, 1, 1, 1]

    What I want is a simple 1d convolution between the image and the filter along the y-axis without any padding. In my application, convolution result at the image edges is not considered valid value, so I prefer to reduce the output size dimension to contain only the valid values. This will do fewer multiplications and generally help speedup my application (for processing 10^4 orders of images in cpu)

    So, the dimension of output I need is [1, y, 1, 1].

    But it seems the arrayfire convolution operations support only 2 modes regarding the padding. (AF_CONV_DEFAULT, AF_CONV_EXPEND). The former keeps the output dimensions the same as the input dimensions. The latter will even expand the output demension. Neither shrink the output dimensions.

    So, after arrayfire convolution operation, I always get [x, y, 1, 1] or greater.

    The questions are

    1. Is there a convolution mode that supports the output dimension shrinking by skipping any padding?
    2. If there is no such mode, is there any way you can suggest within arrayfire functions capability?

    Thank you.

    Useong Kim

    opened by useongkim 0
  • [Question] is there a way to enforce that arrayfire use only single thread in cpu backend (or cpu with unified backend)?

    [Question] is there a way to enforce that arrayfire use only single thread in cpu backend (or cpu with unified backend)?

    Before asking a question on github, please consider if it is more appropriate for these other platforms:

    Hello arrayfire,

    I am building a c++ scientific application on top of arrayfire unified backend (v3.8.2)

    Here, I would like to ask about cpu backend (as a part of the unified backend).

    I have a heavy simulator and substantial number of dataset, ~10,000. I need to design a calibrator that search model parameters in the simulator.

    My design is mostly inspired by the Flashlight (https://github.com/flashlight/flashlight). The model training process is basically the same as the neural net training, (forward, backward, and optimizer step)

    At first I wrote a single-threaded application with expectation that arrayfire would handle the parallelism implicitly, but it did not fully utilize the all cpu cores. The cpu core utilization was more or less 1,000 % in intel zeon processor with 24 cores.

    The full core utilization is important. So, given dataset, I am trying to accelerate the modeling training process by mimicking the distributed data parallelism in a single machine using multiple threads. That means I am creating 24 model clones, spliting data evenly to the clones, running the forward simulatior (optionally backward as well) in parallel with 24 threads. Each thread has a single clone and does not expected to launch a new child thread during the computation.

    This design in fact hurted the performance. My multithreaded application ran slower than the single-threaded one. (Each thread in the multithreaded application with a parital dataset ran slower than the single-threaded application with full dataset.) I think it may be related to the implicit multi-threading in arrayfire.

    So the question is that is there a way to enforce the arrayfire cpu backend to use only single thread?

    Thank you.

    Useong Kim

    opened by useongkim 3
  • [Question] Potential issue with implicit conversion of 0 literal to pointer types

    [Question] Potential issue with implicit conversion of 0 literal to pointer types

    This is a relatively minor issue, but it is frustrating when it pops up. There are two conflicting constructors for af::dim4 that cause problems when trying to instantiate one with dimensions 0, 0. This is not something you have to do for any of the functions except af::convolve2NN. Apparently c++ will implicitly convert a 0 integer literal to any pointer type. I only know this because I was trying to run a convolution with zero padding, and the af::dim4(0,0) call was ambiguous. I think the only way to fix it without refactoring the pointer constructor into a static builder function (which would probably be too big of an API change to be worth while) would be to just make a custom type for the parameters of af::convolve2NN. It would make sense as the dim4 type isn't really meant to be a general 4 component vector, so its use in af::convolve2NN seems a bit shoehorned in.

    If this isn't something you'd consider implementing, feel free to just close this issue. I know its not a big problem, really just an obscure edge case scenario.

    opened by errata-c 0
  • [BUG] MacOS Download link broken on official website

    [BUG] MacOS Download link broken on official website

    The MacOS download link on the official website gives me the following error:

    Error>
    <Code>NoSuchKey</Code>
    <Message>The specified key does not exist.</Message>
    <Key>3.8.2/ArrayFire-v3.8.2_OSX_x86_64.pkg</Key>
    <RequestId>B4AB3N989RCGSZ10</RequestId>
    <HostId>P1jNo0kzuRJD5M1iRYsldlm02aZRRehB935zQBx+bkyoCZEyrktw9ieJWnCifXebgefhBvNvlUU=</HostId>
    </Error>
    
    bug 
    opened by LilithHafner 1
  • [Question] How to query the size of device RAM?

    [Question] How to query the size of device RAM?

    Hello,

    I'd like to dynamically use different sizes of a 3D tensor based on how much device memory (video ram, regular ram, fpga memory, etc) is available at one time.

    There seems to be a few functions adjacent to this, device_mem_info, etc. Nothing seems to provide device memory size, which is what I need to know.

    Thank you

    opened by wbrickner 1
Releases(v3.8.2)
  • v3.8.2(May 19, 2022)

    v3.8.2

    Improvements

    • Optimize JIT by removing some consecutive cast operations #3031
    • Add driver checks checks for CUDA 11.5 and 11.6 #3203
    • Improve the timing algorithm used for timeit #3185
    • Dynamically link against CUDA numeric libraries by default #3205
    • Add support for pruning CUDA binaries to reduce static binary sizes #3234 #3237
    • Remove unused cuDNN libraries from installations #3235
    • Add support to staticly link NVRTC libraries after CUDA 11.5 #3236
    • Add support for compiling with ccache when building the CUDA backend #3241

    Fixes

    • Fix issue with consecutive moddims operations in the CPU backend #3232
    • Better floating point comparisons for tests #3212
    • Fix several warnings and inconsistencies with doxygen and documentation #3226
    • Fix issue when passing empty arrays into join #3211
    • Fix default value for the AF_COMPUTE_LIBRARY when not set #3228
    • Fix missing symbol issue when MKL is staticly linked #3244
    • Remove linking of OpenCL's library to the unified backend #3244

    Contributions

    Special thanks to our contributors: Jacob Kahn Willy Born

    Source code(tar.gz)
    Source code(zip)
    arrayfire-full-3.8.2.tar.bz2(64.07 MB)
  • v3.8.1(Jan 18, 2022)

    v3.8.1

    Improvements

    • moddims now uses JIT approach for certain special cases - #3177
    • Embed Version Info in Windows DLLs - #3025
    • OpenCL device max parameter is now queries from device properties - #3032
    • JIT Performance Optimization: Unique funcName generation sped up - #3040
    • Improved readability of log traces - #3050
    • Use short function name in non-debug build error messages - #3060
    • SIFT/GLOH are now available as part of website binaries - #3071
    • Short-circuit zero elements case in detail::copyArray backend function - #3059
    • Speedup of kernel caching mechanism - #3043
    • Add short-circuit check for empty Arrays in JIT evalNodes - #3072
    • Performance optimization of indexing using dynamic thread block sizes - #3111
    • ArrayFire starting with this release will use Intel MKL single dynamic library which resolves lot of linking issues unified library had when user applications used MKL themselves - #3120
    • Add shortcut check for zero elements in af_write_array - #3130
    • Speedup join by eliminating temp buffers for cascading joins - #3145
    • Added batch support for solve - #1705
    • Use pinned memory to copy device pointers in CUDA solve - #1705
    • Added package manager instructions to docs - #3076
    • CMake Build Improvements - #3027 , #3089 , #3037 , #3072 , #3095 , #3096 , #3097 , #3102 , #3106 , #3105 , #3120 , #3136 , #3135 , #3137 , #3119 , #3150 , #3138 , #3156 , #3139 , #1705 , #3162
    • CPU backend improvements - #3010 , #3138 , #3161
    • CUDA backend improvements - #3066 , #3091 , #3093 , #3125 , #3143 , #3161
    • OpenCL backend improvements - #3091 , #3068 , #3127 , #3010 , #3039 , #3138 , #3161
    • General(including JIT) performance improvements across backends - #3167
    • Testing improvements - #3072 , #3131 , #3151 , #3141 , #3153 , #3152 , #3157 , #1705 , #3170 , #3167
    • Update CLBlast to latest version - #3135 , #3179
    • Improved Otsu threshold computation helper in canny algorithm - #3169
    • Modified default parameters for fftR2C and fftC2R C++ API from 0 to 1.0 - #3178
    • Use appropriate MKL getrs_batch_strided API based on MKL Versions - #3181

    Fixes

    • Fixed a bug JIT kernel disk caching - #3182
    • Fixed stream used by thrust(CUDA backend) functions - #3029
    • Added workaround for new cuSparse API that was added by CUDA amid fix releases - #3057
    • Fixed const array indexing inside gfor - #3078
    • Handle zero elements in copyData to host - #3059
    • Fixed double free regression in OpenCL backend - #3091
    • Fixed an infinite recursion bug in NaryNode JIT Node - #3072
    • Added missing input validation check in sparse-dense arithmetic operations - #3129
    • Fixed bug in getMappedPtr in OpenCL due to invalid lambda capture - #3163
    • Fixed bug in getMappedPtr on Arrays that are not ready - #3163
    • Fixed edgeTraceKernel for CPU devices on OpenCL backend - #3164
    • Fixed windows build issue(s) with VS2019 - #3048
    • API documentation fixes - #3075 , #3076 , #3143 , #3161
    • CMake Build Fixes - #3088
    • Fixed the tutorial link in README - #3033
    • Fixed function name typo in timing tutorial - #3028
    • Fixed couple of bugs in CPU backend canny implementation - #3169
    • Fixed reference count of array(s) used in JIT operations. It is related to arrayfire's internal memory book keeping. The behavior/accuracy of arrayfire code wasn't broken earlier. It corrected the reference count to be of optimal value in the said scenarios. This may potentially reduce memory usage in some narrow cases - #3167
    • Added assert that checks if topk is called with a negative value for k - #3176
    • Fixed an Issue where countByKey would give incorrect results for any n > 128 - #3175

    Contributions

    Special thanks to our contributors: HO-COOH, Willy Born, Gilad Avidov, Pavan Yalamanchili

    Source code(tar.gz)
    Source code(zip)
    arrayfire-full-3.8.1.tar.bz2(63.82 MB)
  • v3.7.3(Nov 23, 2020)

    v3.7.3

    Improvements

    • Add f16 support for histogram - #2984
    • Update confidence connected components example for better illustration - #2968
    • Enable disk caching of OpenCL kernel binaries - #2970
    • Refactor extension of kernel binaries stored to disk .bin - #2970
    • Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
    • Improve warnings messages from run-time kernel compilation functions - #2996

    Fixes

    • Fix bias factor of variance in var_all and cov functions - #2986
    • Fix a race condition in confidence connected components function for OpenCL backend - #2969
    • Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
    • Fix randn by passing in correct values to Box-Muller - #2980
    • Fix rounding issues in Box-Muller function used for RNG - #2980
    • Fix problems in RNG for older compute architectures with fp16 - #2980 #2996
    • Fix performance regression of approx functions - #2977
    • Remove assert that check that signal/filter types have to be the same - #2993
    • Fix checkAndSetDevMaxCompute when the device cc is greater than max - #2996
    • Fix documentation errors and warnings - #2973 , #2987
    • Add missing opencl-arrayfire interoperability functions in unified back - #2981
    • Fix constexpr relates compilation error with VS2019 and Clang Compilers - #3049

    Contributions

    Special thanks to our contributors: P. J. Reed

    Source code(tar.gz)
    Source code(zip)
    arrayfire-full-3.7.3.tar.bz2(51.45 MB)
  • v3.8.0(Jan 8, 2021)

    v3.8.0

    New Functions

    • Ragged max reduction - #2786
    • Initialization list constructor for array class - #2829 , #2987
    • New API for following statistics function: cov, var and stdev - #2986
    • Bit-wise operator support for array and C API (af_bitnot) - #2865
    • allocV2 and freeV2 which return cl_mem on OpenCL backend - #2911
    • Move constructor and move assignment operator for Dim4 class - #2946

    Improvements

    • Add f16 support for histogram - #2984
    • Update confidence connected components example for better illustration - #2968
    • Enable disk caching of OpenCL kernel binaries - #2970
    • Refactor extension of kernel binaries stored to disk .bin - #2970
    • Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
    • Improve warnings messages from run-time kernel compilation functions - #2996

    Fixes

    • Fix bias factor of variance in var_all and cov functions - #2986
    • Fix a race condition in confidence connected components function for OpenCL backend - #2969
    • Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
    • Fix randn by passing in correct values to Box-Muller - #2980
    • Fix rounding issues in Box-Muller function used for RNG - #2980
    • Fix problems in RNG for older compute architectures with fp16 - #2980 #2996
    • Fix performance regression of approx functions - #2977
    • Remove assert that check that signal/filter types have to be the same - #2993
    • Fix checkAndSetDevMaxCompute when the device cc is greater than max - #2996
    • Fix documentation errors and warnings - #2973 , #2987
    • Add missing opencl-arrayfire interoperability functions in unified back - #2981

    Contributions

    Special thanks to our contributors: P. J. Reed

    Source code(tar.gz)
    Source code(zip)
    arrayfire-full-3.8.0.tar.bz2(51.46 MB)
  • v3.8.rc(Oct 5, 2020)

    v3.8.0 Release Candidate

    New Functions

    • Ragged max reduction - #2786
    • Initialization list constructor for array class - #2829 , #2987
    • New API for following statistics function: cov, var and stdev - #2986
    • Bit-wise operator support for array and C API (af_bitnot) - #2865
    • allocV2 and freeV2 which return cl_mem on OpenCL backend - #2911
    • Move constructor and move assignment operator for Dim4 class - #2946

    Improvements

    • Add f16 support for histogram - #2984
    • Update confidence connected components example for better illustration - #2968
    • Enable disk caching of OpenCL kernel binaries - #2970
    • Refactor extension of kernel binaries stored to disk .bin - #2970
    • Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
    • Improve warnings messages from run-time kernel compilation functions - #2996

    Fixes

    • Fix bias factor of variance in var_all and cov functions - #2986
    • Fix a race condition in confidence connected components function for OpenCL backend - #2969
    • Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
    • Fix randn by passing in correct values to Box-Muller - #2980
    • Fix rounding issues in Box-Muller function used for RNG - #2980
    • Fix problems in RNG for older compute architectures with fp16 - #2980 #2996
    • Fix performance regression of approx functions - #2977
    • Remove assert that check that signal/filter types have to be the same - #2993
    • Fix checkAndSetDevMaxCompute when the device cc is greater than max - #2996
    • Fix documentation errors and warnings - #2973 , #2987
    • Add missing opencl-arrayfire interoperability functions in unified back - #2981

    Contributions

    Special thanks to our contributors: P. J. Reed

    Source code(tar.gz)
    Source code(zip)
    arrayfire-full-3.8.rc.tar.bz2(51.47 MB)
  • v3.7.2(Jul 13, 2020)

    v3.7.2

    Improvements

    • Cache CUDA kernels to disk to improve load times(Thanks to @cschreib-ibex) #2848
    • Staticly link against cuda libraries #2785
    • Make cuDNN an optional build dependency #2836
    • Improve support for different compilers and OS #2876 #2945 #2925 #2942 #2943 #2945
    • Improve performance of join and transpose on CPU #2849
    • Improve documentation #2816 #2821 #2846 #2918 #2928 #2947
    • Reduce binary size using NVRTC and template reducing instantiations #2849 #2861 #2890
    • Improve reduceByKey performance on OpenCL by using builtin functions #2851
    • Improve support for Intel OpenCL GPUs #2855
    • Allow staticly linking against MKL #2877 (Sponsered by SDL)
    • Better support for older CUDA toolkits #2923
    • Add support for CUDA 11 #2939
    • Add support for ccache for faster builds #2931
    • Add support for the conan package manager on linux #2875
    • Propagate build errors up the stack in AFError exceptions #2948 #2957
    • Improve runtime dependency library loading #2954
    • Improved cuDNN runtime checks and warnings #2960
    • Document af_memory_manager_* native memory return values #2911
    • Add support for cuDNN 8 #2963

    Fixes

    • Bug crash when allocating large arrays #2827
    • Fix various compiler warnings #2827 #2849 #2872 #2876
    • Fix minor leaks in OpenCL functions #2913
    • Various continuous integration related fixes #2819
    • Fix zero padding with convolv2NN #2820
    • Fix af_get_memory_pressure_threshold return value #2831
    • Increased the max filter length for morph
    • Handle empty array inputs for LU, QR, and Rank functions #2838
    • Fix FindMKL.cmake script for sequential threading library #2840
    • Various internal refactoring #2839 #2861 #2864 #2873 #2890 #2891 #2913
    • Fix OpenCL 2.0 builtin function name conflict #2851
    • Fix error caused when releasing memory with multiple devices #2867
    • Fix missing set stacktrace symbol from unified API #2915
    • Fix zero padding issue in convolve2NN #2820
    • Fixed bugs in ReduceByKey #2957
    • Add clblast patch to handle custom context with multiple devices #2967

    Contributions

    Special thanks to our contributors: Corentin Schreiber Jacob Kahn Paul Jurczak Christoph Junghans

    Source code(tar.gz)
    Source code(zip)
    arrayfire-full-3.7.2.tar.bz2(51.47 MB)
  • v3.7.1(Mar 28, 2020)

    v3.7.1

    Improvements

    • Improve mtx download for test data #2742
    • Improve Documentation #2754 #2792 #2797
    • Remove verbose messages in older CMake versions #2773
    • Reduce binary size with the use of NVRTC #2790
    • Use texture memory to load LUT in orb and fast #2791
    • Add missing print function for f16 #2784
    • Add checks for f16 support in the CUDA backend #2784
    • Create a thrust policy to intercept temporary buffer allocations #2806

    Fixes

    • Fix segfault on exit when ArrayFire is not initialized in the main thread
    • Fix support for CMake 3.5.1 #2771 #2772 #2760
    • Fix evalMultiple if the input array sizes aren't the same #2766
    • Fix error when AF_BACKEND_DEFAULT is passed directly to backend #2769
    • Workaround name collision with AMD OpenCL implementation #2802
    • Fix on-exit errors with the unified backend #2769
    • Fix check for f16 compatibility in OpenCL #2773
    • Fix matmul on Intel OpenCL when passing same array as input #2774
    • Fix CPU OpenCL blas batching #2774
    • Fix memory pressure in the default memory manager #2801

    Contributions

    Special thanks to our contributors: padentomasello glavaux2

    Source code(tar.gz)
    Source code(zip)
    arrayfire-full-3.7.1.tar.bz2(50.61 MB)
  • v3.7.0(Feb 13, 2020)

    v3.7.0

    Major Updates

    • Added the ability to customize the memory manager(Thanks jacobkahn and flashlight) [#2461]
    • Added 16-bit floating point support for several functions [#2413] [#2587] [#2585] [#2587] [#2583]
    • Added sumByKey, productByKey, minByKey, maxByKey, allTrueByKey, anyTrueByKey, countByKey [#2254]
    • Added confidence connected components [#2748]
    • Added neural network based convolution and gradient functions [#2359]
    • Added a padding function [#2682]
    • Added pinverse for pseudo inverse [#2279]
    • Added support for uniform ranges in approx1 and approx2 functions. [#2297]
    • Added support to write to preallocated arrays for some functions [#2599] [#2481] [#2328] [#2327]
    • Added meanvar function [#2258]
    • Add support for sparse-sparse arithmetic support [#2312]
    • Added rsqrt function for reciprocal square root [#2500]
    • Added a lower level af_gemm function for general matrix multiplication [#2481]
    • Added a function to set the cuBLAS math mode for the CUDA backend [#2584]
    • Separate debug symbols into separate files [#2535]
    • Print stacktraces on errors [#2632]
    • Support move constructor for af::array [#2595]
    • Expose events in the public API [#2461]
    • Add setAxesLabelFormat to format labels on graphs [#2495]
    • Added deconvolution functions [#1881]

    Improvements

    • Better error messages for systems with driver or device incompatibilities [#2678] [#2448][#2761]
    • Optimized unified backend function calls [#2695]
    • Optimized anisotropic smoothing [#2713]
    • Optimized canny filter for CUDA and OpenCL [#2727]
    • Better MKL search script [#2738][#2743][#2745]
    • Better logging of different submodules in ArrayFire [#2670] [#2669]
    • Improve documentation [#2665] [#2620] [#2615] [#2639] [#2628] [#2633] [#2622] [#2617] [#2558] [#2326][#2515]
    • Optimized af::array assignment [#2575]
    • Update the k-means example to display the result [#2521]

    Fixes

    • Fix multi-config generators [#2736]
    • Fix access errors in canny [#2727]
    • Fix segfault in the unified backend if no backends are available [#2720]
    • Fix access errors in scan-by-key [#2693]
    • Fix sobel operator [#2600]
    • Fix an issue with the random number generator and s16 [#2587]
    • Fix issue with boolean product reduction [#2544]
    • Fix array_proxy move constructor [#2537]
    • Fix convolve3 launch configuration [#2519]
    • Fix an issue where the fft function modified the input array [#2520]
    • Added a work around for nvidia-opencl runtime if forge dependencies are missing [#2761]

    Contributions

    Special thanks to our contributors: @jacobkahn @WilliamTambellini @lehins @r-barnes @gaika @ShalokShalom

    Source code(tar.gz)
    Source code(zip)
  • v3.6.4(May 20, 2019)

    v3.6.4

    The source code with sub-modules can be downloaded directly from the following link:

    http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.4.tar.bz2

    Fixes

    • Address a JIT performance regression due to moving kernel arguments to shared memory #2501
    • Fix the default parameter for setAxisTitle #2491
    Source code(tar.gz)
    Source code(zip)
  • v3.6.3(Apr 22, 2019)

    v3.6.3

    The source code with sub-modules can be downloaded directly from the following link:

    http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.3.tar.bz2

    Improvements

    • Graphics are now a runtime dependency instead of a link time dependency #2365
    • Reduce the CUDA backend binary size using runtime compilation of kernels #2437
    • Improved batched matrix multiplication on the CPU backend by using Intel MKL's cblas_Xgemm_batched#2206
    • Print JIT kernels to disk or stream using the AF_JIT_KERNEL_TRACE environment variable #2404
    • void* pointers are now allowed as arguments to af::array::write() #2367
    • Slightly improve the efficiency of JITed tile operations #2472
    • Make the random number generation on the CPU backend to be consistent with CUDA and OpenCL #2435
    • Handled very large JIT tree generations #2484 #2487

    Bug Fixes

    • Fixed af::array::array_proxy move assignment operator #2479
    • Fixed input array dimensions validation in svdInplace() #2331
    • Fixed the typedef declaration for window resource handle #2357.
    • Increase compatibility with GCC 8 #2379
    • Fixed af::write tests #2380
    • Fixed a bug in broadcast step of 1D exclusive scan #2366
    • Fixed OpenGL related build errors on OSX #2382
    • Fixed multiple array evaluation. Performance improvement. #2384
    • Fixed buffer overflow and expected output of kNN SSD small test #2445
    • Fixed MKL linking order to enable threaded BLAS #2444
    • Added validations for forge module plugin availability before calling resource cleanup #2443
    • Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA backend #2443
    • Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 #2464
    • Fix errors on exits when using the cuda backend with unified #2470

    Documentation

    • Updated svdInplace() documentation following a bugfix #2331
    • Fixed a typo in matrix multiplication documentation #2358
    • Fixed a code snippet demonstrating C-API use #2406
    • Updated hamming matcher implementation limitation #2434
    • Added illustration for the rotate function #2453

    Misc

    • Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase #2362
    • Display a more informative error message if CUDA driver is incompatible #2421 #2448
    • Changed forge resource management to use smart pointers #2452
    • Deprecated intl and uintl typedefs in API #2360
    • Enabled graphics by default for all builds starting with v3.6.3 #2365
    • Fixed several warnings #2344 #2356 #2361
    • Refactored initArray() calls to use createEmptyArray(). initArray() is for internal use only by Array class. #2361
    • Refactored void* memory allocations to use unsigned char type #2459
    • Replaced deprecated MKL API with in-house implementations for sparse to sparse/dense conversions #2312
    • Reorganized and fixed some internal backend API #2356
    • Updated compilation order of CUDA files to speed up compile time #2368
    • Removed conditional graphics support builds after enabling runtime loading of graphics dependencies #2365
    • Marked graphics dependencies as optional in CPack RPM config #2365
    • Refactored a sparse arithmetic backend API #2379
    • Fixed const correctness of af_device_array API #2396
    • Update Forge to v1.0.4 #2466
    • Manage Forge resources from the DeviceManager class #2381
    • Fixed non-mkl & non-batch blas upstream call arguments #2401
    • Link MKL with OpenMP instead of TBB by default
    • use clang-format to format source code

    Contributions

    Special thanks to our contributors: Alessandro Bessi zhihaoy Jacob Khan William Tambellini

    Source code(tar.gz)
    Source code(zip)
  • v3.6.2(Nov 29, 2018)

    v3.6.2

    The source code with sub-modules can be downloaded directly from the following link:

    http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.2.tar.bz2

    Features

    • Batching support for cond argument in select() [#2243]
    • Broadcast batching for matmul [#2315]
    • Add support for multiple nearest neighbours from nearestNeighbour() [#2280]

    Improvements

    • Performance improvements in morph() [#2238]
    • Fix linking errors when compiling without Freeimage/Graphics [#2248]
    • Fixes to improve the usage of ArrayFire as a sub-project [#2290]
    • Allow custom library path for loading dynamic backend libraries [#2302]

    Bug fixes

    • Fix overflow in dim4::ndims. [#2289]
    • Remove setDevice from af::array destructor [#2319]
    • Fix pow precision for integral types [#2305]
    • Fix issues with tile with a large repeat dimension [#2307]
    • Fix grid based indexing calculation in af_draw_hist [#2230]
    • Fix bug when using an af::array for indexing [#2311]
    • Fix CLBlast errors on exit on Windows [#2222]

    Documentation

    • Improve unwrap documentation [#2301]
    • Improve wrap documentation [#2320]
    • Fix and improve accum documentation [#2298]
    • Improve tile documentation [#2293]
    • Clarify approx* indexing in documentation [#2287]
    • Update examples of select in detailed documentation [#2277]
    • Update lookup examples [#2288]
    • Update set documentation [#2299]

    Misc

    • New ArrayFire ASSERT utility functions [#2249][#2256][#2257][#2263]
    • Improve error messages in JIT [#2309]
    • af* library and dependencies directory changed to lib64 [#2186]

    Contributions

    Thank you to our contributors: Jacob Kahn Vardan Akopian

    Source code(tar.gz)
    Source code(zip)
  • v3.6.1(Jul 6, 2018)

    v 3.6.1

    The source code for this release can be downloaded here: http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.1.tar.bz2

    Improvements

    • FreeImage is now a run-time dependency [#2164]
    • Reduced binary size by setting the symbol visibility to hidden [#2168]
    • Add logging to memory manager and unified loader using the AF_TRACE environment variable [#2169][#2216]
    • Improved CPU Anisotropic Diffusion performance [#2174]
    • Perform normalization after FFT for improved accuracy [#2185, #2192]
    • Updated CLBlast to v1.4.0 [#2178]
    • Added additional validation when using af::seq for indexing [#2153]
    • Perform checks for unsupported cards by the CUDA implementation [#2182]
    • Avoid selecting backend if no devices are found. [#2218]

    Bug Fixes

    • Fixed region when all pixels were the foreground or background [#2152]
    • Fixed several memory leaks [#2202, #2201, #2180, #2179, #2177, #2175]
    • Fixed bug in setDevice which didn't allow you to select the last device [#2189]
    • Fixed bug in min/max where the first element of the array was a NaN value [#2155]
    • Fixed graphics window indexing [#2207]
    • Fixed renaming issue when installing cuda libraries on OSX [#2221]
    • Fixed NSIS installer PATH variable [#2223]
    Source code(tar.gz)
    Source code(zip)
  • v3.6.0(May 4, 2018)

    v3.6.0

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.0.tar.bz2

    Major Updates

    • Added the topk() function. 1
    • Added batched matrix multiply support.2 3
    • Added anisotropic diffusion, anisotropicDiffusion().Documentation 3.

    Features

    • Added support for batched matrix multiply. 1 2
    • New anisotropic diffusion function, anisotropicDiffusion(). Documentation 3.
    • New topk() function, which returns the top k elements along a given dimension of the input. Documentation. 4
    • New gradient diffusion example.

    Improvements

    • JITed select() and shift() functions for CUDA and OpenCL backends. 1
    • Significant CMake improvements. 2 3 4
    • Improved the quality of the random number generator 5
    • Corrected assert function calls in select() tests. 5
    • Modified af_colormap struct to match forge's definition. 6
    • Improved Black Scholes example. 7
    • Used CPack to generate installers. 8. We will be using CPack to generate installers beginning with this release.
    • Refactored black_scholes_options example to use built-in af::erfc function for cumulative normal distribution.9.
    • Reduced the scope of mutexes in memory manager 10
    • Official installers do not require the CUDA toolkit to be installed starting with v3.6.0.

    Bug fixes

    • Fixed shfl_down() warnings with CUDA 9. 1
    • Disabled CUDA JIT debug flags on ARM architecture.2
    • Fixed CLBLast install lib dir for linux platform where lib directory has arch(64) suffix.3
    • Fixed assert condition in 3d morph opencl kernel.4
    • Fixed JIT errors with large non-linear kernels5
    • Fixed bug in CPU JIT after moddims was called 5
    • Fixed a deadlock scenario caused by the method MemoryManager::nativeFree6

    Documentation

    • Fixed variable name typo in vectorization.md. 1
    • Fixed AF_API_VERSION value in Doxygen config file. 2

    Known issues

    • NVCC does not currently support platform toolset v141 (Visual Studio 2017 R15.6). Use the v140 platform toolset, instead. You may pass in the toolset version to CMake via the -T flag like so cmake -G "Visual Studio 15 2017 Win64" -T v140.
      • To download and install other platform toolsets, visit https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017
    • Several OpenCL tests failing on OSX:
      • canny_opencl, fft_opencl, gen_assign_opencl, homography_opencl, reduce_opencl, scan_by_key_opencl, solve_dense_opencl, sparse_arith_opencl, sparse_convert_opencl, where_opencl

    Contributions

    Special thanks to our contributors: Adrien F. Vincent, Cedric Nugteren, Felix, Filip Matzner, HoneyPatouceul, Patrick Lavin, Ralf Stubner, William Tambellini

    Source code(tar.gz)
    Source code(zip)
  • v3.5.1(Sep 19, 2017)

    v3.5.1

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.1.tar.bz2

    Installer CUDA Version: 8.0 (Required) Installer OpenCL Version: 1.2 (Minimum)

    Improvements

    • Relaxed af::unwrap() function's arguments. 1
    • Changed behavior of af::array::allocated() to specify memory allocated. 1
    • Removed restriction on the number of bins for af::histogram() on CUDA and OpenCL kernels. 1

    Performance

    • Improved JIT performance. 1
    • Improved CPU element-wise operation performance. 1
    • Improved regions performance using texture objects. 1

    Bug fixes

    • Fixed overflow issues in mean. 1
    • Fixed memory leak when chaining indexing operations. 1
    • Fixed bug in array assignment when using an empty array to index. 1
    • Fixed bug with af::matmul() which occured when its RHS argument was an indexed vector. 1
    • Fixed bug deadlock bug when sparse array was used with a JIT Array. 1
    • Fixed pixel tests for FAST kernels. 1
    • Fixed af::replace so that it is now copy-on-write. 1
    • Fixed launch configuration issues in CUDA JIT. 1
    • Fixed segfaults and "Pure Virtual Call" error warnings when exiting on Windows. 1 2
    • Workaround for clEnqueueReadBuffer bug on OSX. 1

    Build

    • Fixed issues when compiling with GCC 7.1. 1 2
    • Eliminated unnecessary Boost dependency from CPU and CUDA backends. 1

    Misc

    • Updated support links to point to Slack instead of Gitter. 1
    Source code(tar.gz)
    Source code(zip)
  • v3.5.0(Jun 23, 2017)

    v3.5.0

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.0.tar.bz2

    Installer CUDA Version: 8.0 (Required) Installer OpenCL Version: 1.2 (Minimum)

    Major Updates

    • ArrayFire now supports threaded applications. 1
    • Added Canny edge detector. 1
    • Added Sparse-Dense arithmetic operations. 1

    Features

    • ArrayFire Threading
      • af::array can be read by multiple threads
      • All ArrayFire functions can be executed concurrently by multiple threads
      • Threads can operate on different devices to simplify Muli-device workloads
    • New Canny edge detector function, af::canny(). 1
      • Can automatically calculate high threshold with AF_CANNY_THRESHOLD_AUTO_OTSU
      • Supports both L1 and L2 Norms to calculate gradients
    • New tuned OpenCL BLAS backend, CLBlast.

    Improvements

    • Converted CUDA JIT to use NVRTC instead of NVVM.
    • Performance improvements in af::reorder(). 1
    • Performance improvements in array::scalar(). 1
    • Improved unified backend performance. 1
    • ArrayFire now depends on Forge v1.0. 1
    • Can now specify the FFT plan cache size using the af::setFFTPlanCacheSize() function.
    • Get the number of physical bytes allocated by the memory manager af_get_allocated_bytes(). 1
    • af::dot() can now return a scalar value to the host. 1

    Bug Fixes

    • Fixed improper release of default Mersenne random engine. 1
    • Fixed af::randu() and af::randn() ranges for floating point types. 1
    • Fixed assignment bug in CPU backend. 1
    • Fixed complex (c32,c64) multiplication in OpenCL convolution kernels. 1
    • Fixed inconsistent behavior with af::replace() and replace_scalar(). 1
    • Fixed memory leak in af_fir(). 1
    • Fixed memory leaks in af_cast for sparse arrays. 1
    • Fixing correctness of af_pow for complex numbers by using Cartesian form. 1
    • Corrected af::select() with indexing in CUDA and OpenCL backends. 1
    • Workaround for VS2015 compiler ternary bug. 1
    • Fixed memory corruption in cuda::findPlan(). 1
    • Argument checks in af_create_sparse_array avoids inputs of type int64. 1

    Build fixes

    • On OSX, utilize new GLFW package from the brew package manager. 1 2
    • Fixed CUDA PTX names generated by CMake v3.7. 1
    • Support gcc > 5.x for CUDA. 1

    Examples

    • New genetic algorithm example. 1

    Documentation

    • Updated README.md to improve readability and formatting. 1
    • Updated README.md to mention Julia and Nim wrappers. 1
    • Improved installation instructions - docs/pages/install.md. 1

    Miscellaneous

    • A few improvements for ROCm support. 1
    • Removed CUDA 6.5 support. 1

    Known issues

    • Windows
      • The Windows NVIDIA driver version 37x.xx contains a bug which causes fftconvolve_opencl to fail. Upgrade or downgrade to a different version of the driver to avoid this failure.
      • The following tests fail on Windows with NVIDIA hardware: threading_cuda,qr_dense_opencl, solve_dense_opencl.
    • macOS
      • The Accelerate framework, used by the CPU backend on macOS, leverages Intel graphics cards (Iris) when there are no discrete GPUs available. This OpenCL implementation is known to give incorrect results on the following tests: lu_dense_{cpu,opencl}, solve_dense_{cpu,opencl}, inverse_dense_{cpu,opencl}.
      • Certain tests intermittently fail on macOS with NVIDIA GPUs apparently due to inconsistent driver behavior: fft_large_cuda and svd_dense_cuda.
      • The following tests are currently failing on macOS with AMD GPUs: cholesky_dense_opencl and scan_by_key_opencl.
    Source code(tar.gz)
    Source code(zip)
  • v3.4.2(Dec 21, 2016)

    v3.4.2

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.2.tar.bz2

    Installer CUDA Version: 8.0 (Required) Installer OpenCL Version: 1.2 (Minimum)

    Deprecation Announcement

    This release supports CUDA 6.5 and higher. The next ArrayFire release will support CUDA 7.0 and higher, dropping support for CUDA 6.5. Reasons for no longer supporting CUDA 6.5 include:

    • CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which is used by ArrayFire's CPU and OpenCL backends.
    • Very few ArrayFire users still use CUDA 6.5.

    As a result, the older Jetson TK1 / Tegra K1 will no longer be supported in the next ArrayFire release. The newer Jetson TX1 / Tegra X1 will continue to have full capability with ArrayFire.

    Docker

    Improvements

    • Implemented sparse storage format conversions between AF_STORAGE_CSR and AF_STORAGE_COO. 1
      • Directly convert between AF_STORAGE_COO <--> AF_STORAGE_CSR using the af::sparseConvertTo() function.
      • af::sparseConvertTo() now also supports converting to dense.
    • Added cast support for sparse arrays. 1
      • Casting only changes the values array and the type. The row and column index arrays are not changed.
    • Reintroduced automated computation of chart axes limits for graphics functions. 1
      • The axes limits will always be the minimum/maximum of the current and new limit.
      • The user can still set limits from API calls. If the user sets a limit from the API call, then the automatic limit setting will be disabled.
    • Using boost::scoped_array instead of boost::scoped_ptr when managing array resources. 1
    • Internal performance improvements to getInfo() by using const references to avoid unnecessary copying of ArrayInfo objects. 1
    • Added support for scalar af::array inputs for af::convolve() and set functions. 1 2 3
    • Performance fixes in af::fftConvolve() kernels. 1 2

    Build

    • Support for Visual Studio 2015 compilation. 1 2
    • Fixed FindCBLAS.cmake when PkgConfig is used. 1

    Bug fixes

    • Fixes to JIT when tree is large. 1 2
    • Fixed indexing bug when converting dense to sparse af::array as AF_STORAGE_COO. 1
    • Fixed af::bilateral() OpenCL kernel compilation on OS X. 1
    • Fixed memory leak in af::regions() (CPU) and af::rgb2ycbcr(). 1 2 3

    Installers

    • Major OS X installer fixes. 1
      • Fixed installation scripts.
      • Fixed installation symlinks for libraries.
    • Windows installer now ships with more pre-built examples.

    Examples

    • Added af::choleskyInPlace() calls to cholesky.cpp example. 1

    Documentation

    • Added u8 as supported data type in getting_started.md. 1
    • Fixed typos. 1

    CUDA 8 on OSX

    Known Issues

    • Known failures with CUDA 6.5. These include all functions that use sorting. As a result, sparse storage format conversion between AF_STORAGE_COO and AF_STORAGE_CSR has been disabled for CUDA 6.5.
    Source code(tar.gz)
    Source code(zip)
  • v3.4.1(Oct 15, 2016)

    v3.4.1

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.1.tar.bz2

    Installer CUDA Version: 8.0 (Required) Installer OpenCL Version: 1.2 (Minimum)

    Installers

    • Installers for Linux, OS X and Windows
      • CUDA backend now uses CUDA 8.0.
      • Uses Intel MKL 2017.
      • CUDA Compute 2.x (Fermi) is no longer compiled into the library.
    • Installer for OS X
      • The libraries shipping in the OS X Installer are now compiled with Apple Clang v7.3.1 (previouly v6.1.0).
      • The OS X version used is 10.11.6 (previously 10.10.5).
    • Installer for Jetson TX1 / Tegra X1
      • Requires JetPack for L4T 2.3 (containing Linux for Tegra r24.2 for TX1).
      • CUDA backend now uses CUDA 8.0 64-bit.
      • Using CUDA's cusolver instead of CPU fallback.
      • Uses OpenBLAS for CPU BLAS.
      • All ArrayFire libraries are now 64-bit.

    Improvements

    • Add sparse array support to af::eval(). 1
    • Add OpenCL-CPU fallback support for sparse af::matmul() when running on a unified memory device. Uses MKL Sparse BLAS.
    • When using CUDA libdevice, pick the correct compute version based on device. 1
    • OpenCL FFT now also supports prime factors 7, 11 and 13. 1 2

    Bug Fixes

    • Allow CUDA libdevice to be detected from custom directory.
    • Fix aarch64 detection on Jetson TX1 64-bit OS. 1
    • Add missing definition of af_set_fft_plan_cache_size in unified backend. 1
    • Fix intial values for af::min() and af::max() operations. 1 2
    • Fix distance calculation in af::nearestNeighbour for CUDA and OpenCL backend. 1 2
    • Fix OpenCL bug where scalars where are passed incorrectly to compile options. 1
    • Fix bug in af::Window::surface() with respect to dimensions and ranges. 1
    • Fix possible double free corruption in af_assign_seq(). 1
    • Add missing eval for key in af::scanByKey in CPU backend. 1
    • Fixed creation of sparse values array using AF_STORAGE_COO. 1 1

    Examples

    • Add a Conjugate Gradient solver example to demonstrate sparse and dense matrix operations. 1

    CUDA Backend

    • When using CUDA 8.0, compute 2.x are no longer in default compute list.
      • This follows CUDA 8.0 deprecating computes 2.x.
      • Default computes for CUDA 8.0 will be 30, 50, 60.
    • When using CUDA pre-8.0, the default selection remains 20, 30, 50.
    • CUDA backend now uses -arch=sm_30 for PTX compilation as default.
      • Unless compute 2.0 is enabled.

    Known Issues

    • af::lu() on CPU is known to give incorrect results when built run on OS X 10.11 or 10.12 and compiled with Accelerate Framework. 1
      • Since the OS X Installer libraries uses MKL rather than Accelerate Framework, this issue does not affect those libraries.
    Source code(tar.gz)
    Source code(zip)
  • v3.4.0(Sep 13, 2016)

    v3.4.0

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.0.tar.bz2

    Installer CUDA Version: 7.5 (Required) Installer OpenCL Version: 1.2 (Minimum)

    Major Updates

    • [Sparse Matrix and BLAS](ref sparse_func). 1 2
    • Faster JIT for CUDA and OpenCL. 1 2
    • Support for [random number generator engines](ref af::randomEngine). 1 2
    • Improvements to graphics. 1 2

    Features

    • [Sparse Matrix and BLAS](ref sparse_func) 1 2
      • Support for [CSR](ref AF_STORAGE_CSR) and [COO](ref AF_STORAGE_COO) [storage types](ref af_storage).
      • Sparse-Dense Matrix Multiplication and Matrix-Vector Multiplication as a part of af::matmul() using AF_STORAGE_CSR format for sparse.
      • Conversion to and from [dense](ref AF_STORAGE_DENSE) matrix to [CSR](ref AF_STORAGE_CSR) and [COO](ref AF_STORAGE_COO) [storage types](ref af_storage).
    • Faster JIT 1 2
      • Performance improvements for CUDA and OpenCL JIT functions.
      • Support for evaluating multiple outputs in a single kernel. See af::array::eval() for more.
    • [Random Number Generation](ref af::randomEngine) 1 2
      • af::randomEngine(): A random engine class to handle setting the type and seed for random number generator engines.
      • Supported engine types are:
    • Graphics 1 2
      • Using Forge v0.9.0
      • [Vector Field](ref af::Window::vectorField) plotting functionality. 1
      • Removed GLEW and replaced with glbinding.
        • Removed usage of GLEW after support for MX (multithreaded) was dropped in v2.0. 1
      • Multiple overlays on the same window are now possible.
        • Overlays support for same type of object (2D/3D)
        • Supported by af::Window::plot, af::Window::hist, af::Window::surface, af::Window::vectorField.
      • New API to set axes limits for graphs.
        • Draw calls do not automatically compute the limits. This is now under user control.
        • af::Window::setAxesLimits can be used to set axes limits automatically or manually.
        • af::Window::setAxesTitles can be used to set axes titles.
      • New API for plot and scatter:
        • af::Window::plot() and af::Window::scatter() now can handle 2D and 3D and determine appropriate order.
        • af_draw_plot_nd()
        • af_draw_plot_2d()
        • af_draw_plot_3d()
        • af_draw_scatter_nd()
        • af_draw_scatter_2d()
        • af_draw_scatter_3d()
    • New [interpolation methods](ref af_interp_type) 1
      • Applies to
        • af::resize()
        • af::transform()
        • af::approx1()
        • af::approx2()
    • Support for [complex mathematical functions](ref mathfunc_mat) 1
      • Add complex support for trig_mat, af::sqrt(), af::log().
    • af::medfilt1(): Median filter for 1-d signals 1
    • Generalized scan functions: scan_func_scan and scan_func_scanbykey
      • Now supports inclusive or exclusive scans
      • Supports binary operations defined by af_binary_op. 1
    • [Image Moments](ref moments_mat) functions 1
    • Add af::getSizeOf() function for af_dtype 1
    • Explicitly extantiate af::array::device() for `void * 1

    Bug Fixes

    • Fixes to edge-cases in morph_mat. 1
    • Makes JIT tree size consistent between devices. 1
    • Delegate higher-dimension in convolve_mat to correct dimensions. 1
    • Indexing fixes with C++11. 1 2
    • Handle empty arrays as inputs in various functions. 1
    • Fix bug when single element input to af::median. 1
    • Fix bug in calculation of time from af::timeit(). 1
    • Fix bug in floating point numbers in af::seq. 1
    • Fixes for OpenCL graphics interop on NVIDIA devices. 1
    • Fix bug when compiling large kernels for AMD devices. 1
    • Fix bug in af::bilateral when shared memory is over the limit. 1
    • Fix bug in kernel header compilation tool bin2cpp. 1
    • Fix inital values for morph_mat functions. 1
    • Fix bugs in af::homography() CPU and OpenCL kernels. 1
    • Fix bug in CPU TNJ. 1

    Improvements

    • CUDA 8 and compute 6.x(Pascal) support, current installer ships with CUDA 7.5. 1 2 3
    • User controlled FFT plan caching. 1
    • CUDA performance improvements for image_func_wrap, image_func_unwrap and approx_mat. 1
    • Fallback for CUDA-OpenGL interop when no devices does not support OpenGL. 1
    • Additional forms of batching with the transform_func_transform functions. New behavior defined here. 1
    • Update to OpenCL2 headers. 1
    • Support for integration with external OpenCL contexts. 1
    • Performance improvements to interal copy in CPU Backend. 1
    • Performance improvements to af::select and af::replace CUDA kernels. 1
    • Enable OpenCL-CPU offload by default for devices with Unified Host Memory. 1
      • To disable, use the environment variable AF_OPENCL_CPU_OFFLOAD=0.

    Build

    • Compilation speedups. 1
    • Build fixes with MKL. 1
    • Error message when CMake CUDA Compute Detection fails. 1
    • Several CMake build issues with Xcode generator fixed. 1 2
    • Fix multiple OpenCL definitions at link time. 1
    • Fix lapacke detection in CMake. 1
    • Update build tags of
    • Fix builds with GCC 6.1.1 and GCC 5.3.0. 1

    Installers

    • All installers now ship with ArrayFire libraries build with MKL 2016.
    • All installers now ship with Forge development files and examples included.
    • CUDA Compute 2.0 has been removed from the installers. Please contact us directly if you have a special need.

    Examples

    • Added [example simulating gravity](ref graphics/field.cpp) for demonstration of vector field.
    • Improvements to financial/black_scholes_options.cpp example.
    • Improvements to graphics/gravity_sim.cpp example.
    • Fix graphics examples to use af::Window::setAxesLimits and af::Window::setAxesTitles functions.

    Documentation & Licensing

    • ArrayFire copyright and trademark policy
    • Fixed grammar in license.
    • Add license information for glbinding.
    • Remove license infomation for GLEW.
    • Random123 now applies to all backends.
    • Random number functions are now under random_mat.

    Deprecations

    The following functions have been deprecated and may be modified or removed permanently from future versions of ArrayFire.

    • af::Window::plot3(): Use af::Window::plot instead.
    • af_draw_plot(): Use af_draw_plot_nd or af_draw_plot_2d instead.
    • af_draw_plot3(): Use af_draw_plot_nd or af_draw_plot_3d instead.
    • af::Window::scatter3(): Use af::Window::scatter instead.
    • af_draw_scatter(): Use af_draw_scatter_nd or af_draw_scatter_2d instead.
    • af_draw_scatter3(): Use af_draw_scatter_nd or af_draw_scatter_3d instead.

    Known Issues

    Certain CUDA functions are known to be broken on Tegra K1. The following ArrayFire tests are currently failing:

    • assign_cuda
    • harris_cuda
    • homography_cuda
    • median_cuda
    • orb_cudasort_cuda
    • sort_by_key_cuda
    • sort_index_cuda
    Source code(tar.gz)
    Source code(zip)
  • v3.3.2(Apr 26, 2016)

    v3.3.2

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.3.2.tar.bz2

    Improvements

    • Family of [Sort](ref sort_mat) functions now support higher order dimensions.
    • Improved performance of batched sort on dim 0 for all [Sort](ref sort_mat) functions.
    • [Median](ref stat_func_median) now also supports higher order dimensions.

    Bug Fixes

    Build

    Documentation

    • Fixed documentation for \ref af::replace().
    • Fixed images in [Using on OSX](ref using_on_osx) page.

    Installer

    • Linux x64 installers will now be compiled with GCC 4.9.2.
    • OSX installer gives better error messages on brew failures and now includes link to Fixing OS X Installer Failures for brew installation failures.
    Source code(tar.gz)
    Source code(zip)
  • v3.3.1(Mar 17, 2016)

    v3.3.1

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.3.1.tar.bz2

    Bug Fixes

    • Fixes to \ref af::array::device()
      • CPU Backend: evaluate arrays before returning pointer with asynchronous calls in CPU backend.
      • OpenCL Backend: fix segfaults when requested for device pointers on empty arrays.
    • Fixed \ref af::array::operator%() from using rem to mod.
    • Fixed array destruction when backends are switched in Unified API.
    • Fixed indexing after \ref af::moddims() is called.
    • Fixes FFT calls for CUDA and OpenCL backends when used on multiple devices.
    • Fixed unresolved external for some functions from \ref af::array::array_proxy class.

    Build

    • CMake compiles files in alphabetical order.
    • CMake fixes for BLAS and LAPACK on some Linux distributions.

    Improvements

    Documentation

    • Reorganized, cleaner README file.
    • Replaced non-free lena image in assets with free-to-distribute lena image.
    Source code(tar.gz)
    Source code(zip)
  • v3.3.0(Feb 28, 2016)

    v3.3.0

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.3.0.tar.bz2

    Major Updates

    • CPU backend supports aysnchronous execution.
    • Performance improvements to OpenCL BLAS and FFT functions.
    • Improved performance of memory manager.
    • Improvements to visualization functions.
    • Improved sorted order for OpenCL devices.
    • Integration with external OpenCL projects.

    Features

    • \ref af::getActiveBackend(): Returns the current backend being used.
    • Scatter plot added to graphics.
    • \ref af::transform() now supports perspective transformation matrices.
    • \ref af::infoString(): Returns af::info() as a string.
    • \ref af::printMemInfo(): Print a table showing information about buffer from the memory manager
      • The \ref AF_MEM_INFO macro prints numbers and total sizes of all buffers (requires including af/macros.h)
    • \ref af::allocHost(): Allocates memory on host.
    • \ref af::freeHost(): Frees host side memory allocated by arrayfire.
    • OpenCL functions can now use CPU implementation.
      • Currently limited to Unified Memory devices (CPU and On-board Graphics).
      • Functions: af::matmul() and all [LAPACK](ref linalg_mat) functions.
      • Takes advantage of optimized libraries such as MKL without doing memory copies.
      • Use the environment variable AF_OPENCL_CPU_OFFLOAD=1 to take advantage of this feature.
    • Functions specific to OpenCL backend.
      • \ref afcl::addDevice(): Adds an external device and context to ArrayFire's device manager.
      • \ref afcl::deleteDevice(): Removes an external device and context from ArrayFire's device manager.
      • \ref afcl::setDevice(): Sets an external device and context from ArrayFire's device manager.
      • \ref afcl::getDeviceType(): Gets the device type of the current device.
      • \ref afcl::getPlatform(): Gets the platform of the current device.
    • \ref af::createStridedArray() allows array creation user-defined strides and device pointer.
    • Expose functions that provide information about memory layout of Arrays.
      • \ref af::getStrides(): Gets the strides for each dimension of the array.
      • \ref af::getOffset(): Gets the offsets for each dimension of the array.
      • \ref af::getRawPtr(): Gets raw pointer to the location of the array on device.
      • \ref af::isLinear(): Returns true if all elements in the array are contiguous.
      • \ref af::isOwner(): Returns true if the array owns the raw pointer, false if it is a sub-array.
      • \ref af::getStrides(): Gets the strides of the array.
      • \ref af::getStrides(): Gets the strides of the array.
    • \ref af::getDeviceId(): Gets the device id on which the array resides.
    • \ref af::isImageIOAvailable(): Returns true if ArrayFire was compiled with Freeimage enabled
    • \ref af::isLAPACKAvailable(): Returns true if ArrayFire was compiled with LAPACK functions enabled

    Bug Fixes

    Improvements

    • Optionally offload BLAS and LAPACK functions to CPU implementations to improve performance.
    • Performance improvements to the memory manager.
    • Error messages are now more detailed.
    • Improved sorted order for OpenCL devices.
    • JIT heuristics can now be tweaked using environment variables. See [Environment Variables](ref configuring_environment) tutorial.
    • Add BUILD_<BACKEND> options to examples and tests to toggle backends when compiling independently.

    Examples

    • New visualization [example simulating gravity](ref graphics/gravity_sim.cpp).

    Build

    • Support for Intel icc compiler
    • Support to compile with Intel MKL as a BLAS and LAPACK provider
    • Tests are now available for building as standalone (like examples)
    • Tests can now be built as a single file for each backend
    • Better handling of NONFREE build options
    • Searching for GLEW in CMake default paths
    • Fixes for compiling with MKL on OSX.

    Installers

    • Improvements to OSX Installer
      • CMake config files are now installed with libraries
      • Independent options for installing examples and documentation components

    Deprecations

    • af_lock_device_arr is now deprecated to be removed in v4.0.0. Use \ref af_lock_array() instead.
    • af_unlock_device_arr is now deprecated to be removed in v4.0.0. use \ref af_unlock_array() instead.

    Documentation

    • Fixes to documentation for \ref matchTemplate().
    • Improved documentation for deviceInfo.
    • Fixes to documentation for \ref exp().

    Known Issues

    Source code(tar.gz)
    Source code(zip)
  • v3.3.alpha(Feb 4, 2016)

  • v3.2.2(Dec 31, 2015)

    Release Notes {#releasenotes}

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.2.2.tar.bz2

    v3.2.2

    Bug Fixes

    • Fixed memory leak in CUDA Random number generators
    • Fixed bug in af::select() and af::replace() tests
    • Fixed exception thrown when printing empty arrays with af::print()
    • Fixed bug in CPU random number generation. Changed the generator to mt19937
    • Fixed exception handling (internal)
      • Exceptions now show function, short file name and line number
      • Added AF_RETURN_ERROR macro to handle returning errors.
      • Removed THROW macro, and renamed AF_THROW_MSG to AF_THROW_ERR.
    • Fixed bug in \ref af::identity() that may have affected CUDA Compute 5.2 cards

    Build

    • Added a MIN_BUILD_TIME option to build with minimum optimization compiler flags resulting in faster compile times
    • Fixed issue in CBLAS detection by CMake
    • Fixed tests failing for builds without optional components FreeImage and LAPACK
    • Added a test for unified backend
    • Only info and backend tests are now built for unified backend
    • Sort tests execution alphabetically
    • Fixed compilation flags and errors in tests and examples
    • Moved AF_REVISION and AF_COMPILER_STR into src/backend. This is because as revision is updated with every commit, entire ArrayFire would have to be rebuilt in the old code.
      • v3.3 will add a af_get_revision() function to get the revision string.
    • Clean up examples
      • Remove getchar for Windows (this will be handled by the installer)
      • Other miscellaneous code cleanup
      • Fixed bug in [plot3.cpp](ref graphics/plot3.cpp) example
    • Rename clBLAS/clFFT external project suffix from external -> ext
    • Add OpenBLAS as a lapack/lapacke alternative

    Improvements

    • Added \ref AF_MEM_INFO macro to print memory info from ArrayFire's memory manager (cross issue)
    • Added additional paths for searching for libaf* for Unified backend on unix-style OS.
      • Note: This still requires dependencies such as forge, CUDA, NVVM etc to be in LD_LIBRARY_PATH as described in [Unified Backend](ref unifiedbackend)
    • Create streams for devices only when required in CUDA Backend

    Documentation

    • Hide scrollbars appearing for pre and code styles
    • Fix documentation for af::replace
    • Add code sample for converting the output of af::getAvailableBackends() into bools
    • Minor fixes in documentation
    Source code(tar.gz)
    Source code(zip)
  • v3.2.1(Dec 5, 2015)

    Release Notes {#releasenotes}

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.2.1.tar.bz2

    v3.2.1

    Bug Fixes

    • Fixed bug in homography()
    • Fixed bug in behavior of af::array::device()
    • Fixed bug when indexing with span along trailing dimension
    • Fixed bug when indexing in [GFor](ref gfor)
    • Fixed bug in CPU information fetching
    • Fixed compilation bug in unified backend caused by missing link library
    • Add missing symbol for af_draw_surface()

    Build

    • Tests can now be used as a standalone project
      • Tests can now be built using pre-compiled libraries
      • Similar to how the examples are built
    • The install target now installs the examples source irrespective of the BUILD_EXAMPLES value
      • Examples are not built if BUILD_EXAMPLES is off

    Documentation

    • HTML documentation is now built and installed in docs/html
    • Added documentation for \ref af::seq class
    • Updated Matrix Manipulation tutorial
    • Examples list is now generated by CMake
      • Examples are now listed as dir/example.cpp
    • Removed dummy groups used for indexing documentation (affcted doxygen < 1.8.9)
    Source code(tar.gz)
    Source code(zip)
  • v3.2.0(Nov 13, 2015)

    Release Notes

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.2.0.tar.bz2

    Major Updates

    • Added Unified backend
      • Allows switching backends at runtime
      • Read [Unified Backend](ref unifiedbackend) for more.
    • Support for 16-bit integers (\ref s16 and \ref u16)
      • All functions that support 32-bit interger types (\ref s32, \ref u32), now also support 16-bit interger types

    Function Additions

    • Unified Backend
      • \ref setBackend() - Sets a backend as active
      • \ref getBackendCount() - Gets the number of backends available for use
      • \ref getAvailableBackends() - Returns information about available backends
      • \ref getBackendId() - Gets the backend enum for an array
    • Vision
      • \ref homography() - Homography estimation
      • \ref gloh() - GLOH Descriptor for SIFT
    • Image Processing
      • \ref loadImageNative() - Load an image as native data without modification
      • \ref saveImageNative() - Save an image without modifying data or type
    • Graphics
      • \ref af::Window::plot3() - 3-dimensional line plot
      • \ref af::Window::surface() - 3-dimensional curve plot
    • Indexing
      • \ref af_create_indexers()
      • \ref af_set_array_indexer()
      • \ref af_set_seq_indexer()
      • \ref af_set_seq_param_indexer()
      • \ref af_release_indexers()
    • CUDA Backend Specific
      • \ref setNativeId() - Set the CUDA device with given native id as active
        • ArrayFire uses a modified order for devices. The native id for a device can be retreived using nvidia-smi
    • OpenCL Backend Specific
      • \ref setDeviceId() - Set the OpenCL device using the clDeviceId

    Other Improvements

    • Added \ref c32 and \ref c64 support for \ref isNaN(), \ref isInf() and \ref iszero()
    • Added CPU information for x86 and x86_64 architectures in CPU backend's \ref info()
    • Batch support for \ref approx1() and \ref approx2()
      • Now can be used with gfor as well
    • Added \ref s64 and \ref u64 support to:
      • \ref sort() (along with sort index and sort by key)
      • \ref setUnique(), \ref setUnion(), \ref setIntersect()
      • \ref convolve() and \ref fftConvolve()
      • \ref histogram() and \ref histEqual()
      • \ref lookup()
      • \ref mean()
    • Added \ref AF_MSG macro

    Build Improvements

    • Submodules update is now automatically called if not cloned recursively
    • Fixes for compilation on Visual Studio 2015
    • Option to use fallback to CPU LAPACK for linear algebra functions in case of CUDA 6.5 or older versions.

    Bug Fixes

    Documentation Updates

    • Improved tutorials documentation
      • More detailed Using on [Linux](ref using_on_windows), [OSX](ref using_on_windows), [Windows](ref using_on_windows) pages.
    • Added return type information for functions that return different type arrays

    New Examples

    • Graphics
      • [Plot3](ref plot3.cpp)
      • [Surface](ref surface.cpp)
    • [Shallow Water Equation](ref swe.cpp)
    • [Basic](ref basic.cpp) as a Unified backend example

    Installers

    • All installers now include the Unified backend and corresponding CMake files
    • Visual Studio projects include Unified in the Platform Configurations
    • Added installer for Jetson TX1
    • SIFT and GLOH do not ship with the installers as SIFT is protected by patents that do not allow commercial distribution without licensing.
    Source code(tar.gz)
    Source code(zip)
  • v3.1.3(Oct 18, 2015)

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.1.3.tar.bz2

    Bug Fixes

    • Fixed bugs in various OpenCL kernels without offset additions
    • Remove ARCH_32 and ARCH_64 flags
    • Fix missing symbols when freeimage is not found
    • Use CUDA driver version for Windows
    • Improvements to SIFT
    • Fixed memory leak in median
    • Fixes for Windows compilation when not using MKL #1047
    • Fixed for building without LAPACK

    Other

    • Documentation: Fixed documentation for select and replace
    • Documentation: Fixed documentation for af_isnan
    Source code(tar.gz)
    Source code(zip)
  • v3.1.2(Sep 26, 2015)

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.1.2.tar.bz2

    Bug Fixes

    • Fixed bug in assign that was causing test to fail
    • Fixed bug in convolve. Frequency condition now depends on kernel size only
    • Fixed bug in indexed reductions for complex type in OpenCL backend
    • Fixed bug in kernel name generation in ireduce for OpenCL backend
    • Fixed non-linear to linear indices in ireduce
    • Fixed bug in reductions for small arrays
    • Fixed bug in histogram for indexed arrays
    • Fixed compiler error CPUID for non-compliant devices
    • Fixed failing tests on i386 platforms
    • Add missing AFAPI

    Other

    • Documentation: Added missing examples and other corrections
    • Documentation: Fixed warnings in documentation building
    • Installers: Send error messages to log file in OSX Installer
    Source code(tar.gz)
    Source code(zip)
  • v3.1.1(Sep 13, 2015)

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.1.1.tar.bz2

    Installers

    • CUDA backend now depends on CUDA 7.5 toolkit
    • OpenCL backend now require OpenCL 1.2 or greater

    Bug Fixes

    • Fixed bug in reductions after indexing
    • Fixed bug in indexing when using reverse indices

    Build

    • cmake now includes PKG_CONFIG in the search path for CBLAS and LAPACKE libraries
    • heston_model.cpp example now builds with the default ArrayFire cmake files after installation

    Other

    Source code(tar.gz)
    Source code(zip)
  • v3.1.0(Aug 28, 2015)

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.1.0.tar.bz2

    Release Notes {#releasenotes}

    v3.1.0

    Function Additions

    • Computer Vision Functions
      • nearestNeighbour() - Nearest Neighbour with SAD, SSD and SHD distances
      • harris() - Harris Corner Detector
      • susan() - Susan Corner Detector
      • sift() - Scale Invariant Feature Transform (SIFT)
        • Method and apparatus for identifying scale invariant features" "in an image and use of same for locating an object in an image," David" "G. Lowe, US Patent 6,711,293 (March 23, 2004). Provisional application" "filed March 8, 1999. Asignee: The University of British Columbia. For" "further details, contact David Lowe ([email protected]) or the" "University-Industry Liaison Office of the University of British" "Columbia.")
        • SIFT is available for compiling but does not ship with ArrayFire hosted installers/pre-built libraries
      • dog() - Difference of Gaussians
    • Image Processing Functions
      • ycbcr2rgb() and rgb2ycbcr() - RGB <->YCbCr color space conversion
      • wrap() and unwrap() Wrap and Unwrap
      • sat() - Summed Area Tables
      • loadImageMem() and saveImageMem() - Load and Save images to/from memory
        • af_image_format - Added imageFormat (af_image_format) enum
    • Array & Data Handling
      • copy() - Copy
      • array::lock() and array::unlock() - Lock and Unlock
      • select() and replace() - Select and Replace
      • Get array reference count (af_get_data_ref_count)
    • Signal Processing
      • fftInPlace() - 1D in place FFT
      • fft2InPlace() - 2D in place FFT
      • fft3InPlace() - 3D in place FFT
      • ifftInPlace() - 1D in place Inverse FFT
      • ifft2InPlace() - 2D in place Inverse FFT
      • ifft3InPlace() - 3D in place Inverse FFT
      • fftR2C() - Real to complex FFT
      • fftC2R() - Complex to Real FFT
    • Linear Algebra
      • svd() and svdInPlace() - Singular Value Decomposition
    • Other operations
      • sigmoid() - Sigmoid
      • Sum (with option to replace NaN values)
      • Product (with option to replace NaN values)
    • Graphics
      • Window::setSize() - Window resizing using Forge API
    • Utility
      • Allow users to set print precision (print, af_print_array_gen)
      • saveArray() and readArray() - Stream arrays to binary files
      • toString() - toString function returns the array and data as a string
    • CUDA specific functionality
      • getStream() - Returns default CUDA stream ArrayFire uses for the current device
      • getNativeId() - Returns native id of the CUDA device

    Improvements

    • dot
      • Allow complex inputs with conjugate option
    • AF_INTERP_LOWER interpolation
      • For resize, rotate and transform based functions
    • 64-bit integer support
      • For reductions, random, iota, range, diff1, diff2, accum, join, shift and tile
    • convolve
      • Support for non-overlapping batched convolutions
    • Complex Arrays
      • Fix binary ops on complex inputs of mixed types
      • Complex type support for exp
    • tile
      • Performance improvements by using JIT when possible.
    • Add AF_API_VERSION macro
      • Allows disabling of API to maintain consistency with previous versions
    • Other Performance Improvements
      • Use reference counting to reduce unnecessary copies
    • CPU Backend
      • Device properties for CPU
      • Improved performance when all buffers are indexed linearly
    • CUDA Backend
      • Use streams in CUDA (no longer using default stream)
      • Using async cudaMem ops
      • Add 64-bit integer support for JIT functions
      • Performance improvements for CUDA JIT for non-linear 3D and 4D arrays
    • OpenCL Backend
      • Improve compilation times for OpenCL backend
      • Performance improvements for non-linear JIT kernels on OpenCL
      • Improved shared memory load/store in many OpenCL kernels (PR 933)
      • Using cl.hpp v1.2.7

    Bug Fixes

    • Common
      • Fix compatibility of c32/c64 arrays when operating with scalars
      • Fix median for all values of an array
      • Fix double free issue when indexing (30cbbc7)
      • Fix bug in rank
      • Fix default values for scale throwing exception
      • Fix conjg raising exception on real input
      • Fix bug when using conjugate transpose for vector input
      • Fix issue with const input for array_proxy::get()
    • CPU Backend
      • Fix randn generating same sequence for multiple calls
      • Fix setSeed for randu
      • Fix casting to and from complex
      • Check NULL values when allocating memory
      • Fix offset issue for CPU element-wise operations

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.1.0.tar.bz2

    New Examples

    • Match Template
    • Susan
    • Heston Model (contributed by Michael Nowotny)

    Distribution Changes

    • Fixed automatic detection of ArrayFire when using with CMake in the Windows Installer
    • Compiling ArrayFire with FreeImage as a static library for Linux x86 installers

    Known Issues

    • OpenBlas can cause issues with QR factorization in CPU backend
    • FreeImage older than 3.10 can cause issues with loadImageMem and saveImageMem
    • OpenCL backend issues on OSX
      • AMD GPUs not supported because of driver issues
      • Intel CPUs not supported
      • Linear algebra functions do not work on Intel GPUs.
    • Stability and correctness issues with open source OpenCL implementations such as Beignet, GalliumCompute.
    Source code(tar.gz)
    Source code(zip)
  • v3.0.2(Jun 26, 2015)

    The source code with submodules can be downloaded directly from the following link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.0.2.tar.bz2

    Bug Fixes

    • Added missing symbols from the compatible API
    • Fixed a bug affecting corner rows and elements in grad()
    • Fixed linear interpolation bugs affecting large images in the following:
      • approx1()
      • approx2()
      • resize()
      • rotate()
      • scale()
      • skew()
      • transform()

    Documentation

    • Added missing documentation for constant()
    • Added missing documentation for array::scalar()
    • Added supported input types for functions in arith.h
    Source code(tar.gz)
    Source code(zip)
Blazing-fast Expression Templates Library (ETL) with GPU support, in C++

Expression Templates Library (ETL) 1.3.0 ETL is a header only library for C++ that provides vector and matrix classes with support for Expression Temp

Baptiste Wicht 201 Jun 4, 2022
MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

MIRACL What is MIRACL? Multiprecision Integer and Rational Arithmetic Cryptographic Library – the MIRACL Crypto SDK – is a C software library that is

MIRACL 482 Jun 24, 2022
A C library for statistical and scientific computing

Apophenia is an open statistical library for working with data sets and statistical or simulation models. It provides functions on the same level as t

null 184 Jun 17, 2022
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library

Klein ?? ?? Project Site ?? ?? Description Do you need to do any of the following? Quickly? Really quickly even? Projecting points onto lines, lines t

Jeremy Ong 599 Jun 17, 2022
linalg.h is a single header, public domain, short vector math library for C++

linalg.h linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute la

Sterling Orsten 724 Jun 23, 2022
LibTomMath is a free open source portable number theoretic multiple-precision integer library written entirely in C.

libtommath This is the git repository for LibTomMath, a free open source portable number theoretic multiple-precision integer (MPI) library written en

libtom 520 Jun 25, 2022
a lean linear math library, aimed at graphics programming. Supports vec3, vec4, mat4x4 and quaternions

linmath.h -- A small library for linear math as required for computer graphics linmath.h provides the most used types required for programming compute

datenwolf 684 Jun 12, 2022
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

OpenBLAS Travis CI: AppVeyor: Drone CI: Introduction OpenBLAS is an optimized BLAS (Basic Linear Algebra Subprograms) library based on GotoBLAS2 1.13

Zhang Xianyi 4.6k Jul 1, 2022
The QuantLib C++ library

QuantLib: the free/open-source library for quantitative finance The QuantLib project (http://quantlib.org) is aimed at providing a comprehensive softw

Luigi Ballabio 3.3k Jun 27, 2022
A C++ header-only library of statistical distribution functions.

StatsLib StatsLib is a templated C++ library of statistical distribution functions, featuring unique compile-time computing capabilities and seamless

Keith O'Hara 377 Jun 19, 2022
SymEngine is a fast symbolic manipulation library, written in C++

SymEngine SymEngine is a standalone fast C++ symbolic manipulation library. Optional thin wrappers allow usage of the library from other languages, e.

null 859 Jun 25, 2022
nml is a simple matrix and linear algebra library written in standard C.

nml is a simple matrix and linear algebra library written in standard C.

Andrei Ciobanu 33 Jun 4, 2022
RcppFastFloat: Rcpp Bindings for the fastfloat C++ Header-Only Library

Converting ascii text into (floating-point) numeric values is a very common problem. The fast_float header-only C++ library by Daniel Lemire does this very well, and very fast at up to or over to 1 gigabyte per second as described in more detail in a recent arXiv paper.

Dirk Eddelbuettel 18 May 2, 2022
📽 Highly optimized 2D|3D math library, also known as OpenGL Mathematics (glm) for `C

Highly optimized 2D|3D math library, also known as OpenGL Mathematics (glm) for `C`. cglm provides lot of utils to help math operations to be fast and quick to write. It is community friendly, feel free to bring any issues, bugs you faced.

Recep Aslantas 1.4k Jul 2, 2022
✨sigmatch - Modern C++ 20 Signature Match / Search Library

sigmatch Modern C++ 20 Signature Match / Search Library ✨ Features ?? Header-only, no dependencies, no exceptions. ☕ Compile-time literal signature st

Sprite 35 Jun 22, 2022
C++ library for solving large sparse linear systems with algebraic multigrid method

AMGCL AMGCL is a header-only C++ library for solving large sparse linear systems with algebraic multigrid (AMG) method. AMG is one of the most effecti

Denis Demidov 528 Jun 22, 2022
Header only FFT library

dj_fft: Header-only FFT library Details This repository provides a header-only library to compute fourier transforms in 1D, 2D, and 3D. Its goal is to

Jonathan Dupuy 119 Jun 27, 2022
C++ Mathematical Expression Parsing And Evaluation Library

C++ Mathematical Expression Toolkit Library Documentation Section 00 - Introduction Section 01 - Capabilities Section 02 - Example Expressions

Arash Partow 401 Jun 24, 2022
C++ header-only fixed-point math library

fpm A C++ header-only fixed-point math library. "fpm" stands for "fixed-point math". It is designed to serve as a drop-in replacement for floating-poi

Mike Lankamp 296 Jun 28, 2022