SIMD Vector Classes for C++

Overview

You may be interested in switching to std-simd. Features present in Vc 1.4 and not present in std-simd will eventually turn into Vc 2.0, which then depends on std-simd.

Vc: portable, zero-overhead C++ types for explicitly data-parallel programming

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores.

Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

The Vc library provides the missing link. Its types enable explicitly stating data-parallel operations on multiple values. The parallelism is therefore added via the type system. Competing approaches state the parallelism via new control structures and consequently new semantics inside the body of these control structures.

Vc is a free software library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets. Thus an application written with Vc can be compiled for:

  • AVX and AVX2
  • SSE2 up to SSE4.2 or SSE4a
  • Scalar
  • AVX-512 (Vc 2 development)
  • NEON (in development)
  • NVIDIA GPUs / CUDA (research)

After Intel dropped MIC support with ICC 18, Vc 1.4 also removes support for it.

Examples

Usage on Compiler Explorer

Scalar Product

Let's start from the code for calculating a 3D scalar product using builtin floats:

using Vec3D = std::array<float, 3>;
float scalar_product(Vec3D a, Vec3D b) {
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
}

Using Vc, we can easily vectorize the code using the float_v type:

using Vc::float_v
using Vec3D = std::array<float_v, 3>;
float_v scalar_product(Vec3D a, Vec3D b) {
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
}

The above will scale to 1, 4, 8, 16, etc. scalar products calculated in parallel, depending on the target hardware's capabilities.

For comparison, the same vectorization using Intel SSE intrinsics is more verbose and uses prefix notation (i.e. function calls):

using Vec3D = std::array<__m128, 3>;
__m128 scalar_product(Vec3D a, Vec3D b) {
  return _mm_add_ps(_mm_add_ps(_mm_mul_ps(a[0], b[0]), _mm_mul_ps(a[1], b[1])),
                    _mm_mul_ps(a[2], b[2]));
}

The above will neither scale to AVX, AVX-512, etc. nor is it portable to other SIMD ISAs.

Build Requirements

cmake >= 3.0

C++11 Compiler:

  • GCC >= 4.8.1
  • clang >= 3.4
  • ICC >= 18.0.5
  • Visual Studio 2015 (64-bit target)

Building and Installing Vc

  • After cloning, you need to initialize Vc's git submodules:
git submodule update --init
  • Create a build directory:
$ mkdir build
$ cd build
  • Call cmake with the relevant options:
$ cmake -DCMAKE_INSTALL_PREFIX=/opt/Vc -DBUILD_TESTING=OFF <srcdir>
  • Build and install:
$ make -j16
$ make install

Documentation

The documentation is generated via doxygen. You can build the documentation by running doxygen in the doc subdirectory. Alternatively, you can find nightly builds of the documentation at:

Publications

Work on integrating the functionality of Vc in the C++ standard library.

Communication

A channel on the freenode IRC network is reserved for discussions on Vc: ##vc on freenode (via SSL)

Feel free to use the GitHub issue tracker for questions. Alternatively, there's a mailinglist for users of Vc

License

Vc is released under the terms of the 3-clause BSD license.

Issues
  • Vc has bad performance with Intel C/C++ compiler on Linux

    Vc has bad performance with Intel C/C++ compiler on Linux

    | Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU | | --- | --- | --- | --- | --- | --- | | 1.2.0 | Linux | GCC 5.3.0 | -O3 -march=native | | Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz | | 1.2.0 | Linux | ICC 16.0.2 | -O3 -march=native | | Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz |

    Vc has much worse performance when compiled with ICC than GCC, as shown in the example below. The test case is solving many quadratic equations, given coefficients a, b, c. The source code can be found at http://pastebin.com/hr6nPDmJ (quadratic.cc)

    Testcase

    Here is a session on my computer:

     ~ $ icpc -Wall -std=c++11 -O3 -march=native -DUSE_VC=1 quadratic.cc -lVc -o icc.out
     ~ $ g++  -Wall -std=c++11 -O3 -march=native -DUSE_VC=1 quadratic.cc -lVc -o gcc.out
     ~ $ gcc.out
                    optimized scalar:  57.000ms
                     AVX2 intrinsics:  12.000ms
                                 *Vc:  13.000ms*
     ~ $ icc.out
                    optimized scalar:  13.000ms
                     AVX2 intrinsics:  12.000ms
                                 *Vc:  49.000ms*
     ~ $ 
    

    Notice that Vc code is slower than the auto-vectorized code that ICC generates. AVX intrinsics code provided for reference.

    Note: It used to be the case that even the scalar code would see its performance degraded, and while this is not true for this particular example, I still see places in which merely including Vc degrades performance of code that is not using Vc at all when using the Intel compiler, probably due to options changed in Vc headers.

    type: miscompilation 
    opened by amadio 31
  • tests: Avoid UB with abs of minimum integral value

    tests: Avoid UB with abs of minimum integral value

    The most negative number doesn't have an absolute value that can be represented in the same type.

    opened by hahnjo 20
  • [WIP] More SimdArray operators

    [WIP] More SimdArray operators

    implement missing mathematical operators on the SimdArray, see #75

    type: enhancement 
    opened by chr-engwer 18
  • Implement gathers with AVX2 intrinsics

    Implement gathers with AVX2 intrinsics

    Reference: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=gather&techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2,AVX,AVX2,FMA

    type: enhancement 
    opened by mattkretz 17
  • Incorrect values of Sin for large input values.

    Incorrect values of Sin for large input values.

    Vc version / revision : master branch

    Operating System : Linux Ubuntu 18.04.1 LTS

    Compiler & Version : gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0

    Compiler Flags : As done for test file(trigonometric.cpp)

    Assembler & Version :

    CPU : Intel® Core™ i5-7200U CPU @ 2.50GHz × 4 (sse sse2 ssse3 sse4_1 sse4_2 avx avx2 supported)

    Large input values might be fed to certain Bessel integrals, that's one case where such a need for accurate computation of sin and cos value of large numbers arises.

    Testcase

    Modified the Sine test in trigonometric.cpp file to the following (Added an array of large values, computed and displayed sine value of them.):

    TEST_TYPES(V, testSin, (REAL_VECTORS, SIMD_REAL_ARRAY_LIST)) //{{{1
    {
        typedef typename V::EntryType T;
        UnitTest::setFuzzyness<float>(2);
        UnitTest::setFuzzyness<double>(1e7);
        Array<SincosReference<T> > reference = sincosReference<T>();
        for (size_t i = 0; i + V::Size - 1 < reference.size; i += V::Size) {
            V x, sref, y;
            for (size_t j = 0; j < V::Size; ++j) {
                x[j] = reference.data[i + j].x;
                sref[j] = reference.data[i + j].s;
                y[j] = 1.3e20;
            }
            std::cout << Vc::sin(y) << std::endl;
            FUZZY_COMPARE(Vc::sin(x), sref) << " x = " << x << ", i = " << i;
            FUZZY_COMPARE(Vc::sin(-x), -sref) << " x = " << x << ", i = " << i;
        }
    }
    

    Actual Results

    For trigonometric_sse, .._avx and .._avx2 :

    .  
    .  
    <-9.13625e+32 -9.13625e+32 -9.13625e+32 -9.13625e+32 | -0.279659>
    <-inf -inf -inf -inf | -0.838151>  
    <-9.13625e+32 -9.13625e+32 -9.13625e+32 -9.13625e+32>  
    <-inf -inf -inf -inf>  
    <-9.13625e+32 -9.13625e+32 -0.279659>  
    <-0.838151 -0.838151 -0.838151>  
    .  
    .  
    

    For trigonometric_scalar :

    [-0.279659]
    [-0.838151]
    <-0.279659 -0.279659 -0.279659>
    <-0.838151 -0.838151 -0.838151>
    <-0.279659>
    <-0.838151>
    

    Expected Results

    All values same and in between [-1, 1].

    Also, a similar issue exists for cosine as well.

    opened by ArifAhmed1995 15
  • Q: Why performance hit between C-style float_v arrays and simdize from std::vector<float>::iterator?

    Q: Why performance hit between C-style float_v arrays and simdize from std::vector::iterator?

    0x0103a1 | Fedora 26 | gcc 7.3.1 | -O3 -mavx2 -mfma | ? Assembler | i7-7500U

    The inner kernel of my n-body solver accepts float_v objects and returns results via float_v* pointers. My test implementation stores the particle data in a series of C-style arrays of float_v, and iterates the kernel over those. The performance is insane: 63 GFlop/s on a tiny laptop, and >1TFlop/s on a 16-core i9.

    But to integrate this with my larger code, I need be able to use data from std::vector<float> containers. Enter simdize. The routine that calls the same kernel as above looks sort of like this:

    using VectorF = std::vector<float, Vc::Allocator<float>>;
    
    void nbody(const int num,
               const VectorF& sx, ...
               const float tx[], ...
               float tax[], ...) {
    
        Vc::simdize<VectorF::const_iterator> sxit; ...
    
        // scalar over targets
        for (int i = 0; i < num; i++) {
            // spread this one target over a vector
            const Vc::float_v vtx = tx[i]; ...
            Vc::float_v vtax(0.0f); ...
            // vectorized over sources
            sxit = sx.begin(); ...
            for (int j = 0; j < num/Vc::float_v::Size; j++) {
                kernel(*sxit, ...
                       vtx, ...
                       &vtax, ...);
                ++sxit; ...
            }
            // reduce to scalar
            tax[i] = vtax.sum(); ...
        }
    }
    

    But the performance is about 1/4th that of the version which uses the native C-arrays of float_v (63 GFlop/s for float_v[] vs. 15 GFlop/s for simdize vs 5 GFlop/s for serial x86 instructions). I've confirmed that the std::vector are all aligned to 32-byte boundaries (or better), and that the final summation results are identical.

    My questions are:

    1. Should I expect this performance hit using std::vector and simdize?
    2. Alternatively, can I easily copy the data from my std::vector into a temporary float_v[] array? If so, how do I copy the data without manually iterating over each element (std::copy requires iterators and memcpy is dangerous)? The benefit of SIMD-izing my kernel should easily outweigh the cost of copying the data.
    type: question type: missed optimization 
    opened by markstock 14
  • Vector: add constructor taking an element reference

    Vector: add constructor taking an element reference

    Hi again!

    This MR fixes VcDevel/Vc#256. Tested with the following example under MSVC 19.29.30038.1:

    #include <Vc/Vc>
    #include <iostream>
    
    int main()
    {
    	Vc::float_v a;
    
    	a[2] = 4.0f;
    
    	Vc::float_v works = (float)a[2];
    	Vc::float_v fails = a[2]; // error: cannot convert Vc::float_v::reference to float
    
    	std::cout << works << fails << std::endl;
    
    	return 0;
    }
    
    ❯ .\build\Debug\main.exe
    [4, 4, 4, 4][4, 4, 4, 4]
    
    type: enhancement 
    opened by amyspark 13
  • Vc projects fail to build with clang-7

    Vc projects fail to build with clang-7

    Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU -|-|-|-|-|- 1.3.3|Kubuntu 18.04|clang-7|-|-|-

    Actual Results

    Projects including Vc fail to build at Vc/sse/intrinsics.h:617, with error

    /home/kiroma/Vc/Vc/scalar/../common/../sse/intrinsics.h:617:13: error: argument to '__builtin_ia32_vec_ext_v4sf' must be a constant integer
                _MM_EXTRACT_FLOAT(f, v, i);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~
    /usr/local/lib/clang/7.0.0/include/smmintrin.h:890:11: note: expanded from macro '_MM_EXTRACT_FLOAT'
      { (D) = __builtin_ia32_vec_ext_v4sf((__v4sf)(__m128)(X), (int)(N)); }
              ^                                                ~~~~~~~~
    

    Expected Results

    Projects successfully compile

    opened by kiroma 12
  • Do Vc vectors require initialization?

    Do Vc vectors require initialization?

    While staring at some assembly I noticed that my c-style array declaration actually translates to code. It looks like its Vc vector variables get initialized. Can I prevent that?

    VTune output, code and assembly, is in the following gist. https://gist.github.com/DavidPfander-UniStuttgart/852de93d3a26ca2bbd2c295788d56dd8

    type: bug 
    opened by DavidPfander-UniStuttgart 11
  • Fixing msvc

    Fixing msvc

    This PR proposes some changes to enable compilation using MSVC. It also adds a new algorithm simd_for_each_n.

    The main change is related to min() and max() which could be defined as macros. The change protects any use of either of those symbols from being macro expanded.

    opened by hkaiser 11
  • Visual Studio 22 doesn't like scatter implementation

    Visual Studio 22 doesn't like scatter implementation

    Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU ----------------------|------------------|--------------------|----------------|---------------------|---- 1.4.2 | Windows | Visual Studio 22 | /permissive | |

    Compilation fails at the inline asm on line 82 of scatterimplementation.h. This is even providing the /permissive flag. Error C2760 syntax error: ':' was unexpected here; expected ')' ..\include\Vc\common\scatterimplementation.h 82

    I fixed this by replacing it with the commented-out implementation.

    opened by bmanga 0
  • Compilation failure with Clang-cl: iterators.h nextBit

    Compilation failure with Clang-cl: iterators.h nextBit

    Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU ----------------------|------------------|--------------------|----------------|---------------------|---- 1.4.2 | Windows | Clang 12 | | |

    I hit line 168 of common/iterators.h:

            void nextBit()
            {
    #ifdef Vc_GNU_ASM
                bit = __builtin_ctzl(mask);
    #elif defined(Vc_MSVC)
                _BitScanForward(&bit, mask);
    #else
    #error "Not implemented yet. Please contact [email protected]"
    #endif
            }
    

    Apparently clang on Windows is neither Vc_GNU_ASM nor Vc_MSVC. Something like the following works for me:

            void nextBit()
            {
    #ifdef Vc_MSVC
                _BitScanForward(&bit, mask);
    #else
                bit = __builtin_ctzl(mask);
    #endif
            }
    
    opened by bmanga 0
  • Add VS2022 to CI

    Add VS2022 to CI

    This PR adds Visual Studio 2022 to the CI.

    opened by bernhardmgruber 0
  • Vector is not trivially copyable

    Vector is not trivially copyable

    If you change

    https://github.com/VcDevel/Vc/blob/b84dcd0a65d8dc5de6a2bd4d367882b3748f812c/Vc/common/simdarrayfwd.h#L49-L54

    to

        Vc_INTRINSIC Vector(const Vector &x) = default;
        Vc_INTRINSIC Vector &operator=(const Vector &x) = default;
    

    Vector and in turn SimdArray becomes trivially copyable. Currently we get a lot of warnings with gcc 11.2, if we use Vc types together with memcpy (which we can't change). I don't see any problems resulting in the proposed change.

    opened by krzikalla 0
  • CMakeLists.txt: use CMAKE_CURRENT_LIST_DIR for module path

    CMakeLists.txt: use CMAKE_CURRENT_LIST_DIR for module path

    Fix #301

    type: enhancement 
    opened by htfy96 1
  • Use `CMAKE_CURRENT_LIST_DIR` instead of CMAKE_CURRENT_SOURCE_DIR in CMakeLists.txt

    Use `CMAKE_CURRENT_LIST_DIR` instead of CMAKE_CURRENT_SOURCE_DIR in CMakeLists.txt

    Currently, this project uses:

    project(Vc VERSION "${CMAKE_MATCH_1}.${CMAKE_MATCH_2}.${CMAKE_MATCH_3}" LANGUAGES C CXX)
    list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
    

    to add the module directory into CMAKE_MODULE_PATH. However, it doesn't behave as expected when users try to embed this project into their own using CPM.cmake:

    CMake Error at /home/lz/.cache/CPM/vc/c43a073d69c63625768cfeb76b77ea806113471c/CMakeLists.txt:25 (include):
      include could not find requested file:
    
        VcMacros
    
    
    CMake Error at /home/lz/.cache/CPM/vc/c43a073d69c63625768cfeb76b77ea806113471c/CMakeLists.txt:26 (include):
      include could not find requested file:
    
        AddTargetProperty
    
    
    CMake Error at /home/lz/.cache/CPM/vc/c43a073d69c63625768cfeb76b77ea806113471c/CMakeLists.txt:27 (include):
      include could not find requested file:
    
        OptimizeForArchitecture
    
    
    CMake Error at /home/lz/.cache/CPM/vc/c43a073d69c63625768cfeb76b77ea806113471c/CMakeLists.txt:29 (vc_determine_compiler):
      Unknown CMake command "vc_determine_compiler"
    

    This is because CMAKE_CURRENT_SOURCE_DIR only refers to the parent project's source dir when this project is embedded, which can be fixed with CMAKE_CURRENT_LIST_DIR which points to the correct directory

    opened by htfy96 0
  • Unify all-bits-set mask creation

    Unify all-bits-set mask creation

    Unify the diverging implementation of creating an __m256i with all bits set. We currently have diverging implementations for ICC/MSVC and other compilers. See: Originally posted by @amadio in https://github.com/VcDevel/Vc/pull/286#discussion_r700130444

    type: code cleanup 
    opened by bernhardmgruber 0
  • Regenerate Compiler Explorer header for 1.4.2 and onwards

    Regenerate Compiler Explorer header for 1.4.2 and onwards

    Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU ----------------------|------------------|--------------------|----------------|---------------------|---- 1.4.2 | Compiler Explorer | N/A | N/A | N/A | N/A

    Testcase

    https://godbolt.org/z/f76oj8316

    #include <https://raw.githubusercontent.com/VcDevel/Vc/1.4/godbolt/Vc>
    #include <limits>
    #include <string>
    
    int main() {
        const std::string test = Vc_VERSION_STRING;
        return 0;
    }
    
    

    Actual Results

    .L.str:
            .asciz  "1.4.1-dev"
    

    Expected Results

    .L.str:
            .asciz  "1.4.3-dev"
    
    type: bug 
    opened by amyspark 1
  • Add VS2019 to CI

    Add VS2019 to CI

    This PR adds Visual Studio 2019 to the CI.

    type: enhancement 
    opened by bernhardmgruber 20
  • Add icpx to CI

    Add icpx to CI

    WIP

    opened by bernhardmgruber 0
Releases(1.4.2)
  • 1.4.2(Jun 23, 2021)

    portable SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    DOI

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.4.2 is a bugfix release most notably incorporating fixes for current compiler versions.

    User Changelog

    Resolved Issues

      | description --- | --- optimization | Fixed a warning from cmake. (see #276) optimization | Added a missing include for GCC11. (see #275) optimization | Fixed a compilation error with MSVC. (see #272, #277) optimization | Fixed a corner case in the unit tests. (see #262) optimization | Fixed a warning in the unit tests. (see #261) optimization | Fixed a unit test compilation error with clang. (see #260) optimization | Removed the deprecated Vector<T, VectorAbi::Scalar>::reinterpretCast to fix a warning with GCC10. (see #254) optimization | Avoid potentially pessimizing std::move in return statements. (see #258) optimization | Avoid redefinition of bit_scan{forward,reverse} macros. (see #248 optimization | Improved performance of simized random access containers.   | Documentation and CI cleanup. (see #222, #223, #251, #259)

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.4.2.qch(1.52 MB)
    Vc-1.4.2.tar.gz(624.71 KB)
    Vc-docs-1.4.2.tar.gz(881.45 KB)
  • 1.4.1(Nov 19, 2018)

    portable SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    DOI

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.4.1 is a bugfix release most notably incorporating improved codegen on gathers with AVX2 available.

    User Changelog

    Resolved Issues

                                     | description --- | --- optimization | Gather operations that required implicit multiplication of the index vector (because of gather into an array of structure) will now do a more efficient vector multiplication or, if possible, use a larger stride in the gather instruction to avoid the multiplication (see #214) bug fixed | Internal code cleanup (removal of dead code) to hit fewer compilation corner cases/errors. bug fixed | Buildsystem fixes for i686 and libmvec.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.4.1.qch(1.52 MB)
    Vc-1.4.1.tar.gz(623.82 KB)
    Vc-docs-1.4.1.tar.gz(837.04 KB)
  • 1.4.0(Oct 1, 2018)

    portable SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    DOI

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.4.0 is a minor feature release with subtle changes in the interface and the start of an upgrade path to std::experimental::simd.

    User Changelog

                                     | description --- | --- API break | Dropped all Intel MIC code. This was not AVX512, but the old KNC implementation. ICC 18 dropped support for the -mmic compile flag. Consequently, I removed the maintenance burden. API break | Vc::simdize<T, N> (with arithmetic T and suitable N) can now be an alias for Vc::Vector<T, Sse>, while AVX(2) is available. In Vc 1.3, this would use Vc::SimdArray<T, N> instead. API break | Fixed conversions from Vc::Vector<T> to Vc::SimdArray<U, Vc::Vector<T>::size()> to be implicit (as the documentation always said). API break API addition | Added Vc::simd<T, Abi> alias that resolves to the corresponding Vc::Vector or Vc::SimdArray type. Most importantly Vc::simd<T, Vc::simd_abi::fixed_size<N>> (alias: Vc::fixed_size_simd<T, N>) will give you (almost) Vc::SimdArray<T, N>. Note that this simd type does not implement the exact same interface as defined in the Parallelism TS v2. Most SimdArray operations return fixed_size_simd now, thus potentially breaking existing code (e.g. by breaking template argument deduction). API addition | Added load_interleaved and store_interleaved to be used with Vc::simdize<T> objects. This enables optimized loads from / stores to an AoS (array of structure) layout into/from structs of Vc vectors. This is currently only optimized for structures T where all data members have equal sizeof. API addition | Added a new constructor to simdized types: Vc::simdize<T>([](size_t n) { ... }) expects the lambda to return objects of type T, which will be placed at the corresponding index n in the resulting object. API break API addition | Vc::simd_for_each and Vc::simd_for_each_n will now vectorize any value_type via Vc::simdize. In Vc 1.3, only arithmetic types were vectorized, everything else used a fall-back to std::for_each. This fallback is removed in Vc 1.4. Consequently, the value_type must work with Vc::simdize. behavior change bug fixed | The sin, cos, and sincos functions now have increased precision at the cost of lower performance (with AVX2, you still get a speedup of ~8 over std::sin and std::cos). There's an option to use libmvec for Vc::sin and Vc::cos, which gives even higher performance for SSE4 and AVX2, but less performance for targets without SSE4 or just AVX. CMake feature | Modernize CMake to simplify usage of Vc in external projects via exported target. behavior change bug fixed | Binary operators are now stricter in the implicit conversions they allow. E.g. simd<int>(0) == 0l is now ill-formed. Because it would require a silent (potentially) narrowing conversion of the right hand side.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.4.0.qch(1.53 MB)
    Vc-1.4.0.tar.gz(613.60 KB)
    Vc-docs-1.4.0.tar.gz(839.51 KB)
  • 1.3.3(Nov 27, 2017)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.3 is a minor bug fix release.

    User Changelog

    • Support for AVX2 gather instructions. Thanks to Kay F. Jahnke for the initial patch.
    • Shift optimizations
    • Preliminary support for compiling to non-x86 targets (uses only the Scalar ABI)
    • Resolve failing static assertions, moving the relevant tests to unit tests
    • Fixed is_simd_vector and is_simd_mask traits to consider the ElementType too. Thanks to Kay F. Jahnke.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.3.qch(1.25 MB)
    Vc-1.3.3.tar.gz(763.04 KB)
    Vc-docs-1.3.3.tar.gz(853.70 KB)
  • 1.3.2(May 3, 2017)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.2 is a small bug fix release.

    User Changelog

    • Resolve warnings from GCC 6 about ignored attributes.
    • Support for Kaby Lake detection.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.2.qch(1.25 MB)
    Vc-1.3.2.tar.gz(514.62 KB)
    Vc-docs-1.3.2.tar.gz(855.81 KB)
  • 1.3.1(Mar 9, 2017)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.1 contains bug fixes, enables swap on scalar subscripts, and resolves a licensing issue in the examples.

    User Changelog

    • swap(v[i], v[j]) did not compile. Vc 1.3.1 overloads the swap function and thus enables swapping scalars into/out of vector and mask objects.
    • The spline example has moved to the new Vc-examples-nonfree repository since it has a license that restricts redistribution.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.1.qch(1.25 MB)
    Vc-1.3.1.tar.gz(513.52 KB)
    Vc-docs-1.3.1.tar.gz(855.83 KB)
  • 1.3.0(Oct 27, 2016)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.0 contains API cleanups, bug fixes, important compiler-specific optimizations & workarounds, and finally supports MSVC again

    User Changelog

    • 64-bit MS VisualStudio 2015 support. (See #119 for some of the gory details.)
    • ICC 17 support (#143).
    • GCC 6 support (#125).
    • Workarounds for bad ICC code-gen (#135). Now Vc not only works correctly when compiled with ICC, but also performs as good (or better) as GCC and Clang.
    • Safer and more restrictive subscripting on Vector and Mask. There is a minor source compatibility break involved, since Vector::operator[] returned lvalue references before Vc 1.3 and returns a smart reference (rvalue) now. This change reduces the chance of miscompilation & internal compiler errors and reduces the reliance on non-standard C++ extensions.
    • Support for x32 compilation (like x86_64 but with 32-bit pointers).
    • Added scatter interface to SimdArray (thanks to Kay Jahnke).
    • simd_cast properly works with ADL now (i.e. you don't have to write fully qualified Vc::simd_cast anymore).
    • Added simd_for_each_n (thanks to Hartmut Kaiser).

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.0.qch(1.25 MB)
    Vc-1.3.0.tar.gz(518.53 KB)
    Vc-docs-1.3.0.tar.gz(855.43 KB)
  • 1.2.0(Feb 25, 2016)

    Vc 1.2.0 contains API cleanups and bug fixes.

    User Changelog

    • Improved documentation, especially on SimdArray<T, N>
    • Rewritten Vector<T> and SimdArray<T, N> binary operators for more correctness and to follow the latest proposal to the C++ standards committee
    • Fixed trait that queries mutability of callable to simd_for_each
    • A few build system cleanups
    • Improved ICC support

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.2.0.qch(1.25 MB)
    Vc-1.2.0.tar.gz(509.77 KB)
    Vc-docs-1.2.0.tar.gz(842.65 KB)
  • 1.1.0(Dec 16, 2015)

    The Vc 1.1.0 resolves several important issues that remained or were found in Vc 1.0.0.

    User Changelog

    • Significant restructuring of the documentation, fixing many issues where the documentation still presented Vc 0.7 API.
    • Implement all math functions supported by Vector for SimdArray.
    • Fix iif to work for builtin types.
    • Reintroduce structured gather and scatter functions/constructors to Vector to restore API compatiblity with Vc 0.7. These functions are all marked as deprecated, suggesting to use the new subscript operators instead. This should ease porting applications from Vc 0.7.
    • Deprecate Vector::copySign, Vector::isNegative, and Vector::exponent. They are replaced by their non-member counterparts Vc::copysign, Vc::isnegative, and Vc::exponent.
    • Fix a few remaining license issues so that everything really says BSD now. This includes a replacement of Vc::array with a fork from libc++ instead of the previous code that was forked from libstdc++.
    • Resolve a few minor OS X compatibility issues.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.1.0.qch(1.14 MB)
    Vc-1.1.0.tar.gz(501.61 KB)
    Vc-docs-1.1.0.tar.gz(824.15 KB)
  • 1.0.0(Nov 3, 2015)

    Changelog

    • AVX (u)int_v is now only one SSE width instead of the full AVX width.
    • AVX2 support added with doubled width for (u)int_v and (u)short_v.
    • Xeon Phi support (Knights Corner). This requires an Intel compiler.
    • Dropped the guarantee of (u)int_v::size() == float_v::size(). Therefore, the implicit conversion between int and float vectors present in Vc 0.x is ill- formed with 1.0.
    • New simd_cast<T> cast function. It allows arbitrary conversions from one or more Vectors/SimdArrays to one or more Vectors/SimdArrays.
    • New simdize<T> expression "vectorizes T". This is still somewhat experimental.
    • sfloat_v is gone in favor of a generic SimdArray<T, N> class template allowing you to build vector objects of arbitrary width. Thus, to get the old sfloat_v type back you'd write using sfloat_v = Vc::SimdArray<float, Vc::short_v::size()>;. Note that SimdArray is not meant to be used as a container, i.e. N should be "small".
    • Besides Vc::Vector<T> you can also now use microarchitecture specific types directly, such as Vc::SSE::Vector<T>. This enables you to use SSE vectors and AVX vectors in the same translation unit. You should prefer SimdArray<T, N> in most cases, though.
    • In Vc 0.x the Vector<T> class template was defined multiple times in different namespaces. Now it's a single class template Vc::Vector<T, Abi> with aliases in the implementation namespaces. This enables you to write a function such as template <typename T, typename Abi> void f(Vc::Vector<T, Abi> x), which matches any Vc Vector type, whether that's the Scalar implementation or an actual SIMD implementation.
    • Gather & scatter received a new interface that is a lot more intuitive and flexible. Use Vc::vector and/or Vc::array as alternatives to std::vector and std::array to get an additional subscript overload accepting Vc::Vector objects as argument to the subscript operator for gather & scatter.
    • Requires C++11. Thus, many older compiler versions and, at least for now, MSVC are not supported anymore.
    • Load and store functions default to unaligned memory access. The old default was aligned memory access.

    Known issues

    • The documentation is incomplete and outdated.
    Source code(tar.gz)
    Source code(zip)
    Vc-1.0.0.qch(832.00 KB)
    Vc-1.0.0.tar.gz(489.06 KB)
    Vc-docs-1.0.0.tar.gz(551.29 KB)
  • 0.7.5(Aug 26, 2015)

Artistic creativity, accelerated with SIMD.

Link the YouTube video demonstration: https://www.youtube.com/watch?v=Bjwml32dxhU The compression algorithm does not work well on this colorful video,

Long Nguyen 17 Jul 20, 2021
SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena.

simd_fastinvsqrt SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena. Why Why not. How This video explai

Liam 4 Jun 12, 2021
linalg.h is a single header, public domain, short vector math library for C++

linalg.h linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute la

Sterling Orsten 686 Nov 26, 2021
A wrapper for intel SSE/AVX vector instructions

VMath A wrapper for intel SSE/AVX vector instructions This is just a toy thing to figure out what working with intrinsics is like. I tried to keep it

Dennis 6 Nov 25, 2021
The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 0.9, working as a coprocessor to CORE-V's CVA6 core

Ara Ara is a vector unit working as a coprocessor for the CVA6 core. It supports the RISC-V Vector Extension, version 0.9. Dependencies Check DEPENDEN

null 78 Nov 14, 2021
C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.

Introduction The Simd Library is a free open source image processing and machine learning library, designed for C and C++ programmers. It provides man

Ihar Yermalayeu 1.4k Dec 7, 2021
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library

Klein ?? ?? Project Site ?? ?? Description Do you need to do any of the following? Quickly? Really quickly even? Projecting points onto lines, lines t

Jeremy Ong 545 Nov 25, 2021
Artistic creativity, accelerated with SIMD.

Link the YouTube video demonstration: https://www.youtube.com/watch?v=Bjwml32dxhU The compression algorithm does not work well on this colorful video,

Long Nguyen 17 Jul 20, 2021
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)

Xtensor Stack 1.2k Dec 3, 2021
SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena.

simd_fastinvsqrt SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena. Why Why not. How This video explai

Liam 4 Jun 12, 2021
Portable header-only C++ low level SIMD library

libsimdpp libsimdpp is a portable header-only zero-overhead C++ low level SIMD library. The library presents a single interface over SIMD instruction

Povilas Kanapickas 975 Nov 21, 2021
Proyecto de Enmascadaro de Imagenes con SIMD

TP 2 - Organización del Computador II Proyecto de Enmascarado de Imágenes con SIMD Objetivo Se debe implementan 2 funciones de Enmascaramiento de imág

null 1 Nov 1, 2021
std::find simd version

std::find simd version std::find doesn't use simd intrinsics. ( check https://gms.tf/stdfind-and-memchr-optimizations.html ) So i thought simd can mak

SungJinKang 15 Nov 29, 2021
The DirectX Tool Kit (aka DirectXTK) is a collection of helper classes for writing DirectX 11.x code in C++

DirectX Tool Kit for DirectX 11 http://go.microsoft.com/fwlink/?LinkId=248929 Copyright (c) Microsoft Corporation. All rights reserved. January 9, 202

Microsoft 1.9k Dec 5, 2021
A collecton of generic reference counted data structures, tools to create compatible C style classes, and demo applications

The Offbrand library is a collection of reference counted generic data structures written in C for C. The library includes bash scripts to assist in t

Tyler Heck 81 Jul 22, 2021
Modding (hacking) il2cpp games by classes, methods, fields names.

ByNameModding Modding (hacking) il2cpp games by classes, methods, fields names. Status: Ready to use Why did I do it 1. In order not to update the off

null 47 Nov 30, 2021
Redefine java classes without Instrumentation or custom classloaders

DynamicNativeAgent The DynamicNativeAgent can redefine java classes using native library (without classloaders or Instrumentation) Usage AgentFactory.

whispered 4 Sep 9, 2021
dex-vm implementation, used to protect the classes.dex file

nmmp 基于dex-vm运行dalvik字节码从而对dex进行保护,增加反编译难度。 项目分为两部分nmm-protect是纯java项目,对dex进行转换,把dex里方法及各种数据转为c结构体,处理apk生成c项目,编译生成so,输出处理后的apk。nmmvm是一个安卓项目,包含dex-vm实现

mao 147 Dec 3, 2021
The AudioUnitSDK contains a set of base classes as well as utility sources required for Audio Unit development.

The AudioUnitSDK contains a set of base classes as well as utility sources required for Audio Unit development.

Apple 58 Nov 29, 2021
(R) Efficient methods and operators for the sparse matrix classes in 'Matrix' (esp. CSR format or "RsparseMatrix")

MatrixExtra MatrixExtra is an R package which extends the sparse matrix and sparse vector types in the Matrix package, particularly the CSR or Rsparse

null 9 Nov 20, 2021
Library of useful C++ snippets and reusable classes I've created as I build out Arduino Uno and ESP32 projects.

Arduino Snippets Library of useful C++ snippets and reusable classes I've created as I build out Arduino Uno and ESP32 projects. Button A simple butto

Max Lynch 6 Jul 19, 2021
Helper C++ classes to quickly preintegrate IMU measurements between SLAM keyframes

mola-imu-preintegration Integrator of IMU angular velocity readings. This repository provides: IMUIntegrator and RotationIntegrator: C++ classes to in

The MOLA SLAM framework 10 Oct 12, 2021
A lightweight header-only C++11 library for quick and easy SQL querying with QtSql classes.

EasyQtSql EasyQtSql is a lightweight header-only C++11 library for quick and easy SQL querying with QtSql classes. Features: Header only C++11 library

null 31 Nov 30, 2021
A video game I created for one of my CS classes.

Eclipse This is a video game I created for one of my CS classes. It game will run on Mac or Linux. Requirements This game requires that Mednafen be in

null 2 Nov 20, 2021
This is some utility functions/classes for having a nice way to communicate with a pico board RP2040

PicoScreenTerminal This is some utility functions/classes for having a nice way to communicate with a pico board RP2040 How to build First follow the

GuillaumeG. 4 Nov 15, 2021
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

VexCL VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to redu

Denis Demidov 648 Nov 21, 2021
2D Vector Graphics Engine Powered by a JIT Compiler

Blend2D 2D Vector Graphics Powered by a JIT Compiler. Official Home Page (blend2d.com) Official Repository (blend2d/blend2d) Public Chat Channel Zlib

Blend2D 959 Nov 24, 2021