SIMD Vector Classes for C++

Overview

You may be interested in switching to std-simd. Features present in Vc 1.4 and not present in std-simd will eventually turn into Vc 2.0, which then depends on std-simd.

Vc: portable, zero-overhead C++ types for explicitly data-parallel programming

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores.

Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

The Vc library provides the missing link. Its types enable explicitly stating data-parallel operations on multiple values. The parallelism is therefore added via the type system. Competing approaches state the parallelism via new control structures and consequently new semantics inside the body of these control structures.

Vc is a free software library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets. Thus an application written with Vc can be compiled for:

  • AVX and AVX2
  • SSE2 up to SSE4.2 or SSE4a
  • Scalar
  • AVX-512 (Vc 2 development)
  • NEON (in development)
  • NVIDIA GPUs / CUDA (research)

After Intel dropped MIC support with ICC 18, Vc 1.4 also removes support for it.

Examples

Usage on Compiler Explorer

Scalar Product

Let's start from the code for calculating a 3D scalar product using builtin floats:

using Vec3D = std::array<float, 3>;
float scalar_product(Vec3D a, Vec3D b) {
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
}

Using Vc, we can easily vectorize the code using the float_v type:

using Vc::float_v
using Vec3D = std::array<float_v, 3>;
float_v scalar_product(Vec3D a, Vec3D b) {
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
}

The above will scale to 1, 4, 8, 16, etc. scalar products calculated in parallel, depending on the target hardware's capabilities.

For comparison, the same vectorization using Intel SSE intrinsics is more verbose and uses prefix notation (i.e. function calls):

using Vec3D = std::array<__m128, 3>;
__m128 scalar_product(Vec3D a, Vec3D b) {
  return _mm_add_ps(_mm_add_ps(_mm_mul_ps(a[0], b[0]), _mm_mul_ps(a[1], b[1])),
                    _mm_mul_ps(a[2], b[2]));
}

The above will neither scale to AVX, AVX-512, etc. nor is it portable to other SIMD ISAs.

Build Requirements

cmake >= 3.0

C++11 Compiler:

  • GCC >= 4.8.1
  • clang >= 3.4
  • ICC >= 18.0.5
  • Visual Studio 2015 (64-bit target)

Building and Installing Vc

  • After cloning, you need to initialize Vc's git submodules:
git submodule update --init
  • Create a build directory:
$ mkdir build
$ cd build
  • Call cmake with the relevant options:
$ cmake -DCMAKE_INSTALL_PREFIX=/opt/Vc -DBUILD_TESTING=OFF <srcdir>
  • Build and install:
$ make -j16
$ make install

Documentation

The documentation is generated via doxygen. You can build the documentation by running doxygen in the doc subdirectory. Alternatively, you can find nightly builds of the documentation at:

Publications

Work on integrating the functionality of Vc in the C++ standard library.

Communication

A channel on the freenode IRC network is reserved for discussions on Vc: ##vc on freenode (via SSL)

Feel free to use the GitHub issue tracker for questions. Alternatively, there's a mailinglist for users of Vc

License

Vc is released under the terms of the 3-clause BSD license.

Issues
  • Vc has bad performance with Intel C/C++ compiler on Linux

    Vc has bad performance with Intel C/C++ compiler on Linux

    | Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU | | --- | --- | --- | --- | --- | --- | | 1.2.0 | Linux | GCC 5.3.0 | -O3 -march=native | | Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz | | 1.2.0 | Linux | ICC 16.0.2 | -O3 -march=native | | Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz |

    Vc has much worse performance when compiled with ICC than GCC, as shown in the example below. The test case is solving many quadratic equations, given coefficients a, b, c. The source code can be found at http://pastebin.com/hr6nPDmJ (quadratic.cc)

    Testcase

    Here is a session on my computer:

     ~ $ icpc -Wall -std=c++11 -O3 -march=native -DUSE_VC=1 quadratic.cc -lVc -o icc.out
     ~ $ g++  -Wall -std=c++11 -O3 -march=native -DUSE_VC=1 quadratic.cc -lVc -o gcc.out
     ~ $ gcc.out
                    optimized scalar:  57.000ms
                     AVX2 intrinsics:  12.000ms
                                 *Vc:  13.000ms*
     ~ $ icc.out
                    optimized scalar:  13.000ms
                     AVX2 intrinsics:  12.000ms
                                 *Vc:  49.000ms*
     ~ $ 
    

    Notice that Vc code is slower than the auto-vectorized code that ICC generates. AVX intrinsics code provided for reference.

    Note: It used to be the case that even the scalar code would see its performance degraded, and while this is not true for this particular example, I still see places in which merely including Vc degrades performance of code that is not using Vc at all when using the Intel compiler, probably due to options changed in Vc headers.

    type: miscompilation 
    opened by amadio 31
  • Incorrect values of Sin for large input values.

    Incorrect values of Sin for large input values.

    Vc version / revision : master branch

    Operating System : Linux Ubuntu 18.04.1 LTS

    Compiler & Version : gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0

    Compiler Flags : As done for test file(trigonometric.cpp)

    Assembler & Version :

    CPU : Intel® Core™ i5-7200U CPU @ 2.50GHz × 4 (sse sse2 ssse3 sse4_1 sse4_2 avx avx2 supported)

    Large input values might be fed to certain Bessel integrals, that's one case where such a need for accurate computation of sin and cos value of large numbers arises.

    Testcase

    Modified the Sine test in trigonometric.cpp file to the following (Added an array of large values, computed and displayed sine value of them.):

    TEST_TYPES(V, testSin, (REAL_VECTORS, SIMD_REAL_ARRAY_LIST)) //{{{1
    {
        typedef typename V::EntryType T;
        UnitTest::setFuzzyness<float>(2);
        UnitTest::setFuzzyness<double>(1e7);
        Array<SincosReference<T> > reference = sincosReference<T>();
        for (size_t i = 0; i + V::Size - 1 < reference.size; i += V::Size) {
            V x, sref, y;
            for (size_t j = 0; j < V::Size; ++j) {
                x[j] = reference.data[i + j].x;
                sref[j] = reference.data[i + j].s;
                y[j] = 1.3e20;
            }
            std::cout << Vc::sin(y) << std::endl;
            FUZZY_COMPARE(Vc::sin(x), sref) << " x = " << x << ", i = " << i;
            FUZZY_COMPARE(Vc::sin(-x), -sref) << " x = " << x << ", i = " << i;
        }
    }
    

    Actual Results

    For trigonometric_sse, .._avx and .._avx2 :

    .  
    .  
    <-9.13625e+32 -9.13625e+32 -9.13625e+32 -9.13625e+32 | -0.279659>
    <-inf -inf -inf -inf | -0.838151>  
    <-9.13625e+32 -9.13625e+32 -9.13625e+32 -9.13625e+32>  
    <-inf -inf -inf -inf>  
    <-9.13625e+32 -9.13625e+32 -0.279659>  
    <-0.838151 -0.838151 -0.838151>  
    .  
    .  
    

    For trigonometric_scalar :

    [-0.279659]
    [-0.838151]
    <-0.279659 -0.279659 -0.279659>
    <-0.838151 -0.838151 -0.838151>
    <-0.279659>
    <-0.838151>
    

    Expected Results

    All values same and in between [-1, 1].

    Also, a similar issue exists for cosine as well.

    opened by ArifAhmed1995 15
  • Q: Why performance hit between C-style float_v arrays and simdize from std::vector<float>::iterator?

    Q: Why performance hit between C-style float_v arrays and simdize from std::vector::iterator?

    0x0103a1 | Fedora 26 | gcc 7.3.1 | -O3 -mavx2 -mfma | ? Assembler | i7-7500U

    The inner kernel of my n-body solver accepts float_v objects and returns results via float_v* pointers. My test implementation stores the particle data in a series of C-style arrays of float_v, and iterates the kernel over those. The performance is insane: 63 GFlop/s on a tiny laptop, and >1TFlop/s on a 16-core i9.

    But to integrate this with my larger code, I need be able to use data from std::vector<float> containers. Enter simdize. The routine that calls the same kernel as above looks sort of like this:

    using VectorF = std::vector<float, Vc::Allocator<float>>;
    
    void nbody(const int num,
               const VectorF& sx, ...
               const float tx[], ...
               float tax[], ...) {
    
        Vc::simdize<VectorF::const_iterator> sxit; ...
    
        // scalar over targets
        for (int i = 0; i < num; i++) {
            // spread this one target over a vector
            const Vc::float_v vtx = tx[i]; ...
            Vc::float_v vtax(0.0f); ...
            // vectorized over sources
            sxit = sx.begin(); ...
            for (int j = 0; j < num/Vc::float_v::Size; j++) {
                kernel(*sxit, ...
                       vtx, ...
                       &vtax, ...);
                ++sxit; ...
            }
            // reduce to scalar
            tax[i] = vtax.sum(); ...
        }
    }
    

    But the performance is about 1/4th that of the version which uses the native C-arrays of float_v (63 GFlop/s for float_v[] vs. 15 GFlop/s for simdize vs 5 GFlop/s for serial x86 instructions). I've confirmed that the std::vector are all aligned to 32-byte boundaries (or better), and that the final summation results are identical.

    My questions are:

    1. Should I expect this performance hit using std::vector and simdize?
    2. Alternatively, can I easily copy the data from my std::vector into a temporary float_v[] array? If so, how do I copy the data without manually iterating over each element (std::copy requires iterators and memcpy is dangerous)? The benefit of SIMD-izing my kernel should easily outweigh the cost of copying the data.
    type: question type: missed optimization 
    opened by markstock 14
  • Vector: add constructor taking an element reference

    Vector: add constructor taking an element reference

    Hi again!

    This MR fixes VcDevel/Vc#256. Tested with the following example under MSVC 19.29.30038.1:

    #include <Vc/Vc>
    #include <iostream>
    
    int main()
    {
    	Vc::float_v a;
    
    	a[2] = 4.0f;
    
    	Vc::float_v works = (float)a[2];
    	Vc::float_v fails = a[2]; // error: cannot convert Vc::float_v::reference to float
    
    	std::cout << works << fails << std::endl;
    
    	return 0;
    }
    
    ❯ .\build\Debug\main.exe
    [4, 4, 4, 4][4, 4, 4, 4]
    
    type: enhancement 
    opened by amyspark 13
  • Vc projects fail to build with clang-7

    Vc projects fail to build with clang-7

    Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU -|-|-|-|-|- 1.3.3|Kubuntu 18.04|clang-7|-|-|-

    Actual Results

    Projects including Vc fail to build at Vc/sse/intrinsics.h:617, with error

    /home/kiroma/Vc/Vc/scalar/../common/../sse/intrinsics.h:617:13: error: argument to '__builtin_ia32_vec_ext_v4sf' must be a constant integer
                _MM_EXTRACT_FLOAT(f, v, i);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~
    /usr/local/lib/clang/7.0.0/include/smmintrin.h:890:11: note: expanded from macro '_MM_EXTRACT_FLOAT'
      { (D) = __builtin_ia32_vec_ext_v4sf((__v4sf)(__m128)(X), (int)(N)); }
              ^                                                ~~~~~~~~
    

    Expected Results

    Projects successfully compile

    opened by kiroma 12
  • Do Vc vectors require initialization?

    Do Vc vectors require initialization?

    While staring at some assembly I noticed that my c-style array declaration actually translates to code. It looks like its Vc vector variables get initialized. Can I prevent that?

    VTune output, code and assembly, is in the following gist. https://gist.github.com/DavidPfander-UniStuttgart/852de93d3a26ca2bbd2c295788d56dd8

    type: bug 
    opened by DavidPfander-UniStuttgart 11
  • Fixing msvc

    Fixing msvc

    This PR proposes some changes to enable compilation using MSVC. It also adds a new algorithm simd_for_each_n.

    The main change is related to min() and max() which could be defined as macros. The change protects any use of either of those symbols from being macro expanded.

    opened by hkaiser 11
  • Non-commercial clause in spline example is incompatible with the BSD-3-Clause license

    Non-commercial clause in spline example is incompatible with the BSD-3-Clause license

    The following files have a non-commercial clause which is incompatible with the project license:

    examples/spline/spline.cpp examples/spline/spline.h examples/spline/spline2.h examples/spline/spline3.h

    This come up with the license check in openSUSE submission.

    opened by ismail 10
  • Vc fails with gcc 6

    Vc fails with gcc 6

    Vc fails with gcc 6 like so:

    common/storage.h:189:21: error: flexible array member in union EntryType m[];

    I tested a hello world program like this:

    int main() { union { int a; int b[]; }; return 0; }

    $ g++-5.3.0 -std=c++14 foo.cpp $

    $ g++-6.1.0 -std=c++14 foo.cpp foo.cpp: In function ‘int main()’: foo.cpp:3:26: error: flexible array member in union $

    type: bug 
    opened by axaith 8
  • Several math operations don't work on SimdArray

    Several math operations don't work on SimdArray

    Trying to play with the SimdArray<T,N> I immediatelly stubled upon the fact that Vc::sqrt(.) does not for SimdArray<T,N>. As the documentation is a bit scarce, I'm not sure whether I used it correctly, but I would have assumed, that I can use them the same way as I use the underlying Vector. It seems that there is a mechanism to support arbitrary operations on the SimdArray<T,N>, but again I was not able to figure it out from the current documentation.

    type: bug tag: release blocker 
    opened by chr-engwer 8
  • Span.h static asserts on MSVC 2017 with C++14/17 enabled.

    Span.h static asserts on MSVC 2017 with C++14/17 enabled.

    I have built and linked the library (version 1.4) through cmake, setting the premade flags. Note that the issue disappears when C++11 is explicitly set. The errors are:

    Schweregrad	Code	Beschreibung	Projekt	Datei	Zeile	Unterdrückungszustand
    Fehler	C2338	Can't have a span with an extent < 0	prs	c:\program files\vc\include\vc\common\span.h	143	
    Fehler	C2118	Negativer Index	prs	c:\program files\vc\include\vc\common\span.h	167	
    Fehler	C2572	"Vc_1::Common::span<const std::byte,-1>::span": Neudefinition des Standardarguments: Parameter 1	prs	c:\program files\vc\include\vc\common\span.h	204	
    Fehler	C2382	"Vc_1::Common::span<const std::byte,-1>::span": Neudefinition; unterschiedliche Ausnahmespezifikationen	prs	c:\program files\vc\include\vc\common\span.h	204	
    Fehler	C2572	"Vc_1::Common::span<std::byte,-1>::span": Neudefinition des Standardarguments: Parameter 1	prs	c:\program files\vc\include\vc\common\span.h	204	
    Fehler	C2382	"Vc_1::Common::span<std::byte,-1>::span": Neudefinition; unterschiedliche Ausnahmespezifikationen	prs	c:\program files\vc\include\vc\common\span.h	204	
    
    opened by acdemiralp 7
  • Include vcpkg patch

    Include vcpkg patch

    vcpkg contains the following patch when building/installing Vc:

    diff --git a/cmake/VcConfig.cmake.in b/cmake/VcConfig.cmake.in
    index 36de476..5cb0e5b 100644
    --- a/cmake/VcConfig.cmake.in
    +++ b/cmake/VcConfig.cmake.in
    @@ -4,7 +4,7 @@
     set_and_check(@[email protected]_INSTALL_DIR @[email protected])
     set_and_check(@[email protected]_INCLUDE_DIR @[email protected]/include)
     set_and_check(@[email protected]_LIB_DIR @[email protected]/[email protected][email protected])
    -set_and_check(@[email protected]_CMAKE_MODULES_DIR ${@[email protected]_LIB_DIR}/cmake/Vc)
    +set_and_check(@[email protected]_CMAKE_MODULES_DIR @[email protected]/share/vc)
     set(@[email protected]_VERSION_STRING "@[email protected]")
     
     ### Setup @[email protected] defaults
    @@ -20,7 +20,7 @@ list(APPEND @[email protected]_ALL_FLAGS ${@[email protected]_COMPILE_FLAGS})
     list(APPEND @[email protected]_ALL_FLAGS ${@[email protected]_ARCHITECTURE_FLAGS})
     
     ### Import targets
    -include("@[email protected]/@[email protected]/@[email protected]")
    +include(${@[email protected]_CMAKE_MODULES_DIR}/@[email protected])
     
     ### Define @[email protected]_LIBRARIES for backwards compatibility
     get_target_property(vc_lib_location @[email protected]::Vc INTERFACE_LOCATION)
    

    If the changes are correct, we should include the patch.

    opened by bernhardmgruber 0
  • icpx fails test memory_scalar with -O3

    icpx fails test memory_scalar with -O3

    With changes in PR: https://github.com/VcDevel/Vc/pull/292

    Testcase

    int main() {
        using V = Vc::native_simd<short>;
        Vc::Memory<V, 53> m1, m2;
        m1.setZero();
        m2.setZero();
        m1 += 1;
        return m1 != m2;
    }
    

    Actual Results

    0

    Expected Results

    1

    Godbolt: https://godbolt.org/z/8heTYahbf

    The problem goes away if we use -O2 instead of -O3.

    opened by bernhardmgruber 0
  • Cast tests fail with g++11

    Cast tests fail with g++11

    Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU ----------------------|------------------|--------------------|----------------|---------------------|---- branch 1.4 (45fbb882) | Ubuntu 21.10 | g++ 11.2 | | |AMD Ryzen 9 5950X 16-Core Processor

    Testcase

    configure with BUILD_TESTING=ON BUILD_EXTRA_CAST_TESTS=ON then make and run the tests

    Actual Results

    The following tests FAILED:
    	354 - casts_Vc_FROM_N_17_Vc_TO_N_5_sse (Failed)
    	355 - casts_Vc_FROM_N_17_Vc_TO_N_5_avx (Failed)
    	356 - casts_Vc_FROM_N_17_Vc_TO_N_5_avx2 (Failed)
    	393 - casts_Vc_FROM_N_16_Vc_TO_N_8_scalar (Failed)
    

    Expected Results

    no errors

    opened by bernhardmgruber 0
  • Add SVE2 instructions to SIMD.

    Add SVE2 instructions to SIMD.

    Hello, Vc I would like to try and implement the SVE2 instruction into the code. I appreciate any comments and advice on this. Thank you. https://developer.arm.com/documentation/102340/0001/Introducing-SVE2

    opened by aserputov 3
Releases(1.4.3)
  • 1.4.3(May 20, 2022)

    portable SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    DOI

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.4.3 is a bugfix release most notably incorporating fixes for current compiler versions.

    User Changelog

    Resolved Issues

    • Fix integer comparison warning by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/281
    • Fix vector call linking with trigonometric functions by @EricAtORS in https://github.com/VcDevel/Vc/pull/274
    • Allow consumption in subprojects by @Corristo in https://github.com/VcDevel/Vc/pull/271
    • Support GCC standard libraries which do not define __GLIBC_PREREQ by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/282
    • Vector: add index_type alias by @amyspark in https://github.com/VcDevel/Vc/pull/285
    • Add github actions CI inspired by travis/appveyor scripts by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/278
    • Require at least VS2015 by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/290
    • MSVC: specify Vc_VDECL explicitly on a few functions by @amyspark in https://github.com/VcDevel/Vc/pull/291
    • Treat GitHub Actions CI like travis/appveyor CI by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/297
    • Remove appveyor and travis CI by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/293
    • exp: adjust boundaries for single-precision floating point by @amyspark in https://github.com/VcDevel/Vc/pull/295
    • Retrieve MSVC version from cl.exe by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/298
    • Vector: add constructor taking an element reference by @amyspark in https://github.com/VcDevel/Vc/pull/286
    • Fix MaskBool initialization on SSE by @amyspark in https://github.com/VcDevel/Vc/pull/300
    • Update godbolt headers by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/305
    • Add Vc_INTRINSIC to fixed_size_simd operators by @bmanga in https://github.com/VcDevel/Vc/pull/309
    • Add nighly builds by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/314
    • Support AMD zen3 by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/317
    • Remove custom offsetof implementation by @hahnjo in https://github.com/VcDevel/Vc/pull/313
    • Add clang-12 to CI by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/315
    • Fix zen3 flags for icc and support icelake by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/321
    • Add VS2019 to CI by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/294
    • Update README by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/324
    • CMake: add x86-64 feature levels by @stephanlachnit in https://github.com/VcDevel/Vc/pull/326
    • Don't build tests by default by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/325
    • Add g++-11 to CI by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/318
    • Drop VS2017 by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/328
    • CMakeLists.txt: use CMAKE_CURRENT_LIST_DIR for module path by @htfy96 in https://github.com/VcDevel/Vc/pull/302
    • Minor fixes for Intel C/C++ compilers by @amadio in https://github.com/VcDevel/Vc/pull/330
    • Disable ctest submit by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/336
    • Make godbolt headers have all of Vc by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/333
    • Default to Vc_RECURSIVE_MEMORY 1 by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/334
    • Avoid assertion with MS STL in debug mode by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/337
    • Avoid Intel specific flags with icc by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/322
    • Fix out of bounds index when float_v is scalar by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/340
    • Run debug builds in CI by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/338
    • Revert "disable submitting to cdash" by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/342
    • Update README.md by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/341
    • Add maintenance mode warning to issue template by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/344
    • Fix warning on using TimeStampCounter uninitialized by @bernhardmgruber in https://github.com/VcDevel/Vc/pull/343

    New Contributors

    • @EricAtORS made their first contribution in https://github.com/VcDevel/Vc/pull/274
    • @Corristo made their first contribution in https://github.com/VcDevel/Vc/pull/271
    • @amyspark made their first contribution in https://github.com/VcDevel/Vc/pull/285
    • @bmanga made their first contribution in https://github.com/VcDevel/Vc/pull/309
    • @stephanlachnit made their first contribution in https://github.com/VcDevel/Vc/pull/326
    • @htfy96 made their first contribution in https://github.com/VcDevel/Vc/pull/302

    Full Changelog: https://github.com/VcDevel/Vc/compare/1.4.2...1.4.3

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.4.3.tar.gz(625.38 KB)
    Vc-docs-1.4.3.tar.gz(911.43 KB)
  • 1.4.2(Jun 23, 2021)

    portable SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    DOI

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.4.2 is a bugfix release most notably incorporating fixes for current compiler versions.

    User Changelog

    Resolved Issues

      | description --- | --- optimization | Fixed a warning from cmake. (see #276) optimization | Added a missing include for GCC11. (see #275) optimization | Fixed a compilation error with MSVC. (see #272, #277) optimization | Fixed a corner case in the unit tests. (see #262) optimization | Fixed a warning in the unit tests. (see #261) optimization | Fixed a unit test compilation error with clang. (see #260) optimization | Removed the deprecated Vector<T, VectorAbi::Scalar>::reinterpretCast to fix a warning with GCC10. (see #254) optimization | Avoid potentially pessimizing std::move in return statements. (see #258) optimization | Avoid redefinition of bit_scan{forward,reverse} macros. (see #248 optimization | Improved performance of simized random access containers.   | Documentation and CI cleanup. (see #222, #223, #251, #259)

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.4.2.qch(1.52 MB)
    Vc-1.4.2.tar.gz(624.71 KB)
    Vc-docs-1.4.2.tar.gz(881.45 KB)
  • 1.4.1(Nov 19, 2018)

    portable SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    DOI

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.4.1 is a bugfix release most notably incorporating improved codegen on gathers with AVX2 available.

    User Changelog

    Resolved Issues

                                     | description --- | --- optimization | Gather operations that required implicit multiplication of the index vector (because of gather into an array of structure) will now do a more efficient vector multiplication or, if possible, use a larger stride in the gather instruction to avoid the multiplication (see #214) bug fixed | Internal code cleanup (removal of dead code) to hit fewer compilation corner cases/errors. bug fixed | Buildsystem fixes for i686 and libmvec.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.4.1.qch(1.52 MB)
    Vc-1.4.1.tar.gz(623.82 KB)
    Vc-docs-1.4.1.tar.gz(837.04 KB)
  • 1.4.0(Oct 1, 2018)

    portable SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    DOI

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.4.0 is a minor feature release with subtle changes in the interface and the start of an upgrade path to std::experimental::simd.

    User Changelog

                                     | description --- | --- API break | Dropped all Intel MIC code. This was not AVX512, but the old KNC implementation. ICC 18 dropped support for the -mmic compile flag. Consequently, I removed the maintenance burden. API break | Vc::simdize<T, N> (with arithmetic T and suitable N) can now be an alias for Vc::Vector<T, Sse>, while AVX(2) is available. In Vc 1.3, this would use Vc::SimdArray<T, N> instead. API break | Fixed conversions from Vc::Vector<T> to Vc::SimdArray<U, Vc::Vector<T>::size()> to be implicit (as the documentation always said). API break API addition | Added Vc::simd<T, Abi> alias that resolves to the corresponding Vc::Vector or Vc::SimdArray type. Most importantly Vc::simd<T, Vc::simd_abi::fixed_size<N>> (alias: Vc::fixed_size_simd<T, N>) will give you (almost) Vc::SimdArray<T, N>. Note that this simd type does not implement the exact same interface as defined in the Parallelism TS v2. Most SimdArray operations return fixed_size_simd now, thus potentially breaking existing code (e.g. by breaking template argument deduction). API addition | Added load_interleaved and store_interleaved to be used with Vc::simdize<T> objects. This enables optimized loads from / stores to an AoS (array of structure) layout into/from structs of Vc vectors. This is currently only optimized for structures T where all data members have equal sizeof. API addition | Added a new constructor to simdized types: Vc::simdize<T>([](size_t n) { ... }) expects the lambda to return objects of type T, which will be placed at the corresponding index n in the resulting object. API break API addition | Vc::simd_for_each and Vc::simd_for_each_n will now vectorize any value_type via Vc::simdize. In Vc 1.3, only arithmetic types were vectorized, everything else used a fall-back to std::for_each. This fallback is removed in Vc 1.4. Consequently, the value_type must work with Vc::simdize. behavior change bug fixed | The sin, cos, and sincos functions now have increased precision at the cost of lower performance (with AVX2, you still get a speedup of ~8 over std::sin and std::cos). There's an option to use libmvec for Vc::sin and Vc::cos, which gives even higher performance for SSE4 and AVX2, but less performance for targets without SSE4 or just AVX. CMake feature | Modernize CMake to simplify usage of Vc in external projects via exported target. behavior change bug fixed | Binary operators are now stricter in the implicit conversions they allow. E.g. simd<int>(0) == 0l is now ill-formed. Because it would require a silent (potentially) narrowing conversion of the right hand side.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.4.0.qch(1.53 MB)
    Vc-1.4.0.tar.gz(613.60 KB)
    Vc-docs-1.4.0.tar.gz(839.51 KB)
  • 1.3.3(Nov 27, 2017)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.3 is a minor bug fix release.

    User Changelog

    • Support for AVX2 gather instructions. Thanks to Kay F. Jahnke for the initial patch.
    • Shift optimizations
    • Preliminary support for compiling to non-x86 targets (uses only the Scalar ABI)
    • Resolve failing static assertions, moving the relevant tests to unit tests
    • Fixed is_simd_vector and is_simd_mask traits to consider the ElementType too. Thanks to Kay F. Jahnke.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.3.qch(1.25 MB)
    Vc-1.3.3.tar.gz(763.04 KB)
    Vc-docs-1.3.3.tar.gz(853.70 KB)
  • 1.3.2(May 3, 2017)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.2 is a small bug fix release.

    User Changelog

    • Resolve warnings from GCC 6 about ignored attributes.
    • Support for Kaby Lake detection.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.2.qch(1.25 MB)
    Vc-1.3.2.tar.gz(514.62 KB)
    Vc-docs-1.3.2.tar.gz(855.81 KB)
  • 1.3.1(Mar 9, 2017)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.1 contains bug fixes, enables swap on scalar subscripts, and resolves a licensing issue in the examples.

    User Changelog

    • swap(v[i], v[j]) did not compile. Vc 1.3.1 overloads the swap function and thus enables swapping scalars into/out of vector and mask objects.
    • The spline example has moved to the new Vc-examples-nonfree repository since it has a license that restricts redistribution.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.1.qch(1.25 MB)
    Vc-1.3.1.tar.gz(513.52 KB)
    Vc-docs-1.3.1.tar.gz(855.83 KB)
  • 1.3.0(Oct 27, 2016)

    SIMD zero-overhead language license GCC Support Clang Support ICC Support MSVC Support

    Vc is an open source library to ease explicit vectorization of C++ code. It has an intuitive API and provides portability between different compilers and compiler versions as well as portability between different vector instruction sets.

    Vc 1.3.0 contains API cleanups, bug fixes, important compiler-specific optimizations & workarounds, and finally supports MSVC again

    User Changelog

    • 64-bit MS VisualStudio 2015 support. (See #119 for some of the gory details.)
    • ICC 17 support (#143).
    • GCC 6 support (#125).
    • Workarounds for bad ICC code-gen (#135). Now Vc not only works correctly when compiled with ICC, but also performs as good (or better) as GCC and Clang.
    • Safer and more restrictive subscripting on Vector and Mask. There is a minor source compatibility break involved, since Vector::operator[] returned lvalue references before Vc 1.3 and returns a smart reference (rvalue) now. This change reduces the chance of miscompilation & internal compiler errors and reduces the reliance on non-standard C++ extensions.
    • Support for x32 compilation (like x86_64 but with 32-bit pointers).
    • Added scatter interface to SimdArray (thanks to Kay Jahnke).
    • simd_cast properly works with ADL now (i.e. you don't have to write fully qualified Vc::simd_cast anymore).
    • Added simd_for_each_n (thanks to Hartmut Kaiser).

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.3.0.qch(1.25 MB)
    Vc-1.3.0.tar.gz(518.53 KB)
    Vc-docs-1.3.0.tar.gz(855.43 KB)
  • 1.2.0(Feb 25, 2016)

    Vc 1.2.0 contains API cleanups and bug fixes.

    User Changelog

    • Improved documentation, especially on SimdArray<T, N>
    • Rewritten Vector<T> and SimdArray<T, N> binary operators for more correctness and to follow the latest proposal to the C++ standards committee
    • Fixed trait that queries mutability of callable to simd_for_each
    • A few build system cleanups
    • Improved ICC support

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.2.0.qch(1.25 MB)
    Vc-1.2.0.tar.gz(509.77 KB)
    Vc-docs-1.2.0.tar.gz(842.65 KB)
  • 1.1.0(Dec 16, 2015)

    The Vc 1.1.0 resolves several important issues that remained or were found in Vc 1.0.0.

    User Changelog

    • Significant restructuring of the documentation, fixing many issues where the documentation still presented Vc 0.7 API.
    • Implement all math functions supported by Vector for SimdArray.
    • Fix iif to work for builtin types.
    • Reintroduce structured gather and scatter functions/constructors to Vector to restore API compatiblity with Vc 0.7. These functions are all marked as deprecated, suggesting to use the new subscript operators instead. This should ease porting applications from Vc 0.7.
    • Deprecate Vector::copySign, Vector::isNegative, and Vector::exponent. They are replaced by their non-member counterparts Vc::copysign, Vc::isnegative, and Vc::exponent.
    • Fix a few remaining license issues so that everything really says BSD now. This includes a replacement of Vc::array with a fork from libc++ instead of the previous code that was forked from libstdc++.
    • Resolve a few minor OS X compatibility issues.

    Developer Changelog

    Source code(tar.gz)
    Source code(zip)
    Vc-1.1.0.qch(1.14 MB)
    Vc-1.1.0.tar.gz(501.61 KB)
    Vc-docs-1.1.0.tar.gz(824.15 KB)
  • 1.0.0(Nov 3, 2015)

    Changelog

    • AVX (u)int_v is now only one SSE width instead of the full AVX width.
    • AVX2 support added with doubled width for (u)int_v and (u)short_v.
    • Xeon Phi support (Knights Corner). This requires an Intel compiler.
    • Dropped the guarantee of (u)int_v::size() == float_v::size(). Therefore, the implicit conversion between int and float vectors present in Vc 0.x is ill- formed with 1.0.
    • New simd_cast<T> cast function. It allows arbitrary conversions from one or more Vectors/SimdArrays to one or more Vectors/SimdArrays.
    • New simdize<T> expression "vectorizes T". This is still somewhat experimental.
    • sfloat_v is gone in favor of a generic SimdArray<T, N> class template allowing you to build vector objects of arbitrary width. Thus, to get the old sfloat_v type back you'd write using sfloat_v = Vc::SimdArray<float, Vc::short_v::size()>;. Note that SimdArray is not meant to be used as a container, i.e. N should be "small".
    • Besides Vc::Vector<T> you can also now use microarchitecture specific types directly, such as Vc::SSE::Vector<T>. This enables you to use SSE vectors and AVX vectors in the same translation unit. You should prefer SimdArray<T, N> in most cases, though.
    • In Vc 0.x the Vector<T> class template was defined multiple times in different namespaces. Now it's a single class template Vc::Vector<T, Abi> with aliases in the implementation namespaces. This enables you to write a function such as template <typename T, typename Abi> void f(Vc::Vector<T, Abi> x), which matches any Vc Vector type, whether that's the Scalar implementation or an actual SIMD implementation.
    • Gather & scatter received a new interface that is a lot more intuitive and flexible. Use Vc::vector and/or Vc::array as alternatives to std::vector and std::array to get an additional subscript overload accepting Vc::Vector objects as argument to the subscript operator for gather & scatter.
    • Requires C++11. Thus, many older compiler versions and, at least for now, MSVC are not supported anymore.
    • Load and store functions default to unaligned memory access. The old default was aligned memory access.

    Known issues

    • The documentation is incomplete and outdated.
    Source code(tar.gz)
    Source code(zip)
    Vc-1.0.0.qch(832.00 KB)
    Vc-1.0.0.tar.gz(489.06 KB)
    Vc-docs-1.0.0.tar.gz(551.29 KB)
  • 0.7.5(Aug 26, 2015)

Artistic creativity, accelerated with SIMD.

Link the YouTube video demonstration: https://www.youtube.com/watch?v=Bjwml32dxhU The compression algorithm does not work well on this colorful video,

Long Nguyen 17 Mar 16, 2022
SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena.

simd_fastinvsqrt SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena. Why Why not. How This video explai

Liam 7 Jan 28, 2022
linalg.h is a single header, public domain, short vector math library for C++

linalg.h linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute la

Sterling Orsten 724 Jun 23, 2022
A wrapper for intel SSE/AVX vector instructions

VMath A wrapper for intel SSE/AVX vector instructions This is just a toy thing to figure out what working with intrinsics is like. I tried to keep it

Dennis 7 Apr 24, 2022
The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 0.9, working as a coprocessor to CORE-V's CVA6 core

Ara Ara is a vector unit working as a coprocessor for the CVA6 core. It supports the RISC-V Vector Extension, version 0.9. Dependencies Check DEPENDEN

null 137 Jun 21, 2022
For C++, help generate a default operator for classes whose components (base classes, data members) have the operator. Hideously intrusive.

C-plus-plus-library-default-operators Helps generate a default operator / member function for classes whose components (base classes, data members) ha

Walt Karas 2 Jul 25, 2018
C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.

Introduction The Simd Library is a free open source image processing and machine learning library, designed for C and C++ programmers. It provides man

Ihar Yermalayeu 1.6k Jun 24, 2022
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library

Klein ?? ?? Project Site ?? ?? Description Do you need to do any of the following? Quickly? Really quickly even? Projecting points onto lines, lines t

Jeremy Ong 599 Jun 17, 2022
Artistic creativity, accelerated with SIMD.

Link the YouTube video demonstration: https://www.youtube.com/watch?v=Bjwml32dxhU The compression algorithm does not work well on this colorful video,

Long Nguyen 17 Mar 16, 2022
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, NEON, AVX512)

Xtensor Stack 1.4k Jun 24, 2022
SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena.

simd_fastinvsqrt SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena. Why Why not. How This video explai

Liam 7 Jan 28, 2022
Portable header-only C++ low level SIMD library

libsimdpp libsimdpp is a portable header-only zero-overhead C++ low level SIMD library. The library presents a single interface over SIMD instruction

Povilas Kanapickas 1k Jun 20, 2022
Proyecto de Enmascadaro de Imagenes con SIMD

TP 2 - Organización del Computador II Proyecto de Enmascarado de Imágenes con SIMD Objetivo Se debe implementan 2 funciones de Enmascaramiento de imág

null 0 May 6, 2022
std::find simd version

std::find simd version std::find doesn't use simd intrinsics. ( check https://gms.tf/stdfind-and-memchr-optimizations.html ) So i thought simd can mak

SungJinKang 19 Jan 5, 2022
A small data-oriented and SIMD-optimized 3D rigid body physics library.

nudge Nudge is a small data-oriented and SIMD-optimized 3D rigid body physics library. For more information, see: http://rasmusbarr.github.io/blog/dod

null 231 Jun 10, 2022
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

VexCL VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to redu

Denis Demidov 671 May 25, 2022
2D Vector Graphics Engine Powered by a JIT Compiler

Blend2D 2D Vector Graphics Powered by a JIT Compiler. Official Home Page (blend2d.com) Official Repository (blend2d/blend2d) Public Chat Channel Zlib

Blend2D 1.1k Jun 25, 2022
Antialiased 2D vector drawing library on top of OpenGL for UI and visualizations.

This project is not actively maintained. NanoVG NanoVG is small antialiased vector graphics rendering library for OpenGL. It has lean API modeled afte

Mikko Mononen 4.4k Jun 30, 2022
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

Chih-Jen Lin 4.2k Jun 22, 2022
linalg.h is a single header, public domain, short vector math library for C++

linalg.h linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute la

Sterling Orsten 724 Jun 23, 2022
An implementation of a weak handle interface to a packed vector in C++

Experimental handle container in C++ Overview Following on from c-handle-container, this library builds on the same ideas but supports a dynamic numbe

Tom Hulton-Harrop 12 Mar 11, 2022
Interactive, thoroughly customizable maps in native Android, iOS, macOS, Node.js, and Qt applications, powered by vector tiles and OpenGL

Mapbox GL Native A C++ library that powers customizable vector maps in native applications on multiple platforms by taking stylesheets that conform to

Mapbox 4.1k Jun 29, 2022
Mapnik implemention of Mapbox Vector Tile specification

mapnik-vector-tile A Mapnik implemention of Mapbox Vector Tile specification. Provides C++ headers that support rendering geodata into vector tiles an

Mapbox 528 Jun 3, 2022
experimental project to create PBF vector tiles

Vector tiles producer This is an experimental project to create vector tiles. What does this do? This creates vector tiles based on the mapnik proto f

vross 31 Nov 2, 2021
Build vector tilesets from large collections of GeoJSON features.

tippecanoe Builds vector tilesets from large (or small) collections of GeoJSON, Geobuf, or CSV features, like these. ⚡ Mapbox has a new service for cr

Mapbox 2.1k Jul 1, 2022
Open-source vector similarity search for Postgres

Open-source vector similarity search for Postgres

Andrew Kane 550 Jun 26, 2022
Benchmarking a trivial replacement for std::vector

std::vector replacement benchmark Dependencies You'll need gnuplot and bash to run ./bench.sh. In addition to that, you'll need to have gcc and clang

Dale Weiler 7 May 23, 2022
Support Vector Machines Implementation from scratch in C++

SVM C++ Samples These are sample programs of Support Vector Machines from scratch in C++. 1. Implementation Model Class Problem Decision Boundary Code

null 4 Apr 9, 2022
Public repository for rolling release of main Vector robot code repository.

vector Public repository for rolling release of main Vector robot code repository. This rolling release will be worked to completion until all non-thi

Digital Dream Labs 54 Jun 5, 2022