C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.

Overview

Introduction

The Simd Library is a free open source image processing and machine learning library, designed for C and C++ programmers. It provides many useful high performance algorithms for image processing such as: pixel format conversion, image scaling and filtration, extraction of statistic information from images, motion detection, object detection (HAAR and LBP classifier cascades) and classification, neural network.

The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC (big-endian), NEON for ARM.

The Simd Library has C API and also contains useful C++ classes and functions to facilitate access to C API. The library supports dynamic and static linking, 32-bit and 64-bit Windows, Android and Linux, MSVS, G++ and Clang compilers, MSVS project and CMake build systems.

Library folder's structure

The Simd Library has next folder's structure:

  • simd/src/Simd/ - contains source codes of the library.
  • simd/src/Test/ - contains test framework of the library.
  • simd/src/Use/ - contains the use examples of the library.
  • simd/prj/vs2013/ - contains project files of Microsoft Visual Studio 2013.
  • simd/prj/vs2015/ - contains project files of Microsoft Visual Studio 2015.
  • simd/prj/vs2017w/ - contains project files of Microsoft Visual Studio 2017 (for Windows).
  • simd/prj/vs2017a/ - contains project files of Microsoft Visual Studio 2017 (for Android).
  • simd/prj/vs2019/ - contains project files of Microsoft Visual Studio 2019.
  • simd/prj/cmd/ - contains additional scripts needed for building of the library in Windows.
  • simd/prj/cmake/ - contains files of CMake build systems.
  • simd/prj/sh/ - contains additional scripts needed for building of the library in Linux.
  • simd/prj/txt/ - contains text files needed for building of the library.
  • simd/data/cascade/ - contains OpenCV cascades (HAAR and LBP).
  • simd/data/image/ - contains image samples.
  • simd/data/network/ - contains examples of trained networks.
  • simd/docs/ - contains documentation of the library.

The library building for Windows

To build the library and test application for Windows 32/64 you need to use Microsoft Visual Studio 2019 (or 2013/2015/2017). The project files are in the directory:

simd/prj/vs2015/

By default the library is built as a DLL (Dynamic Linked Library). You also may build it as a static library. To do this you must change appropriate property (Configuration Type) of Simd project and also uncomment #define SIMD_STATIC in file:

simd/src/Simd/SimdConfig.h

Also in order to build the library you can use CMake and MinGW:

cd .\prj\cmake
cmake . -DSIMD_TOOLCHAIN="your_toolchain\bin\g++" -DSIMD_TARGET="x86_64" -DCMAKE_BUILD_TYPE="Release" -G "MinGW Makefiles"
mingw32-make

The library building for Android

To build the library and test application for Android(x86, x64, ARM, ARM64) you need to use Microsoft Visual Studio 2017. The project files are in the directory:

simd/prj/vs2017a/

By default the library is built as a SO (Dynamic Library).

The library building for Linux

To build the library and test application for Linux 32/64 you need to use CMake build systems. Files of CMake build systems are placed in the directory:

simd/prj/cmake/

The library can be built for x86/x64, PowerPC(64, big-endian) and ARM(32/64) platforms with using of G++ or Clang compilers. With using of native compiler (g++) for current platform it is simple:

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="" -DSIMD_TARGET=""
make

To build the library for PowerPC(64, big-endian) and ARM(32/64) platforms you can also use toolchain for cross compilation. There is an example of using for PowerPC (64 bit, big-endian):

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="/your_toolchain/usr/bin/powerpc-linux-gnu-g++" -DSIMD_TARGET="ppc64" -DCMAKE_BUILD_TYPE="Release"
make

For ARM (32 bit):

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="/your_toolchain/usr/bin/arm-linux-gnueabihf-g++" -DSIMD_TARGET="arm" -DCMAKE_BUILD_TYPE="Release"
make

And for ARM (64 bit):

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="/your_toolchain/usr/bin/aarch64-linux-gnu-g++" -DSIMD_TARGET="aarch64" -DCMAKE_BUILD_TYPE="Release"
make

As result the library and the test application will be built in the current directory.

The library using

If you use the library from C code you must include:

#include "Simd/SimdLib.h"

And to use the library from C++ code you must include:

#include "Simd/SimdLib.hpp"

In order to use Simd::Detection you must include:

#include "Simd/SimdDetection.hpp"

In order to use Simd::Neural you must include:

#include "Simd/SimdNeural.hpp"

In order to use Simd::Motion you must include:

#include "Simd/SimdMotion.hpp"

Interaction with OpenCV

If you need use mutual conversion between Simd and OpenCV types you just have to define macro SIMD_OPENCV_ENABLE before including of Simd headers:

#include <opencv2/core/core.hpp>
#define SIMD_OPENCV_ENABLE
#include "Simd/SimdLib.hpp"

And you can convert next types:

  • cv::Point, cv::Size <--> Simd::Point.
  • cv::Rect <--> Simd::Rectangle.
  • cv::Mat <--> Simd::View.

Test Framework

The test suite is needed for testing of correctness of work of the library and also for its performance testing. There is a set of tests for every function from API of the library. There is an example of test application using:

./Test -m=a -tt=1 -f=Sobel -ot=log.txt

Where next parameters were used:

  • -m=a - a auto checking mode which includes performance testing (only for library built in Release mode). In this case different implementations of each functions will be compared between themselves (for example a scalar implementation and implementations with using of different SIMD instructions such as SSE2, AVX2, and other). Also it can be -m=c (creation of test data for cross-platform testing), -m=v (cross-platform testing with using of early prepared test data) and -m=s (running of special tests).
  • -tt=1 - a number of test threads.
  • -fi=Sobel - an include filter. In current case will be tested only functions which contain word 'Sobel' in their names. If you miss this parameter then full testing will be performed. You can use several filters - function name has to satisfy at least one of them.
  • -ot=log.txt - a file name with test report (in TEXT file format). The test's report also will be output to console.

Also you can use parameters:

  • -help or -? in order to print help message.
  • -r=../.. to set project root directory.
  • -pa=1 to print alignment statistics.
  • -c=512 a number of channels in test image for performance testing.
  • -h=1080 a height of test image for performance testing.
  • -w=1920 a width of test image for performance testing.
  • -oh=log.html - a file name with test report (in HTML file format).
  • -s=sample.avi a video source (See Simd::Motion test).
  • -o=output.avi an annotated video output (See Simd::Motion test).
  • -wt=1 a thread number used to parallelize algorithms.
  • -fe=Abs an exclude filter to exclude some tests.
  • -mt=100 a minimal test execution time (in milliseconds).
  • -lc=1 to litter CPU cache between test runs.
Comments
  • Test.exe crashes

    Test.exe crashes

    I'm looking forward trying SIMD library for video conversions, so installed simd.4.2.73.zip, VS2017, Windows SDK 8.1, loaded the vs2017w project file, selected x64 & release and hit Build. This ran successfully, be it rather slow (never used VS so have no reference) and it looks like all files are made. I have not actually used the dll but did try Test.exe and notice it crashes after the 4th line. I also tried running "Test -m=a -tt=1 -f=Sobel -ot=log.txt" which gives me an error that Sobel is not found, and without that filter it also crashes after a few lines. I'm now left wondering if I made an error building the library. Windows 7/64. untitled

    opened by mikeversteeg 26
  • Ошибка в коде к статье на Habr

    Ошибка в коде к статье на Habr

    Здравствуйте! Хочу сообщить об ошибке в коде к Вашей статье https://m.habr.com/ru/post/448436/, а именно в базовой реализации свёртки. depthwise свёртка(с group > 1) вычисляется некорректно: значения есть только у первых каналов, последующие же каналы заполнены нулями.

    opened by GlebSBrykin 22
  • SimdYuv420pToBgr with different YUV & RGB image sizes

    SimdYuv420pToBgr with different YUV & RGB image sizes

    Hi!

    Still loving this library, it is really impressive coding..

    I need to display YUV420P images as thumbnails on screen, which means they need to be converted to RGB and resized. Because this is HD video, speed is essential. Currently I first resize each of the YUV planes, and then convert to RGB. However this means an additional memory write of the (smaller) YUV image. This can be avoiding by dropping the demand that for SimdYuv420pToBgr both images must have same size. Can this be added? As speed is important, a simple pixel drop can be used (although if it can be added efficiently, a basic (bi)lineair interpolation would be great).

    Thanks for considering.

    opened by mikeversteeg 18
  • Premultiplication & vice versa

    Premultiplication & vice versa

    Very happy with the added alpha support, your code is much faster than my "SIMD" attempts :)

    I would very much like a function to convert a BGRA bitmap to premultiplied and vice versa. This would complete the alpha support in Simd.

    PS: To convert a straight alpha color value bitmap to premultiplied format, multiply its R, G, and B values by A. To convert premultiplied to straight, divide R, G, and B by A. Keep in mind alpha can be 0, and values can never exceed 255.

    opened by mikeversteeg 16
  • Clang build issues

    Clang build issues

    There are following issues reported by clang-msvc under Windows (64 bit):

    In file included from C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdMemory.h:29: In file included from C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdMath.h:29: C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdConst.h(94,36): error: excess elements in vector initializer const __m128i K_INV_ZERO = SIMD_MM_SET1_EPI8(0xFF); ^~~~~~~~~~~~~~~~~~~~~~~ C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdInit.h(104,40): note: expanded from macro 'SIMD_MM_SET1_EPI8' {SIMD_AS_CHAR(a), SIMD_AS_CHAR(a), SIMD_AS_CHAR(a), SIMD_AS_CHAR(a),
    ^~~~~~~~~~~~~~~ C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdInit.h(38,25): note: expanded from macro 'SIMD_AS_CHAR' #define SIMD_AS_CHAR(a) char(a) ^~~~~~~

    and also these:

    ..\CSimdImageResolutionReductorComp.cpp(141,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r2 = _mm_mullo_epi32(_mm_and_si128(loaded2, mask), c5); ^ ..\CSimdImageResolutionReductorComp.cpp(142,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r3 = _mm_mullo_epi32(_mm_and_si128(loaded3, mask), c10); ^ ..\CSimdImageResolutionReductorComp.cpp(143,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r4 = _mm_mullo_epi32(_mm_and_si128(loaded4, mask), c10); ^ ..\CSimdImageResolutionReductorComp.cpp(144,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r5 = _mm_mullo_epi32(_mm_and_si128(loaded5, mask), c5); ^

    MSVC, Intel, GCC and MinGW are fine with this, although. What could be the reason and it is possible to fix it for Clang?

    opened by ArsMasiuk 14
  • Object Measurement

    Object Measurement

    The SIMD library is great, but I'm not shure how I do some object measurement (e.g. finding the centre point of an object). Maybe you could list me just the functions to use: This would be very helpful.

    opened by syberarall 13
  • Illegal instruction error

    Illegal instruction error

    Hi, I am building library on my amd64 machine:

    $ cmake -G"MSYS Makefiles" -DCMAKE_BUILD_TYPE=Debug -DLIBRARY="STATIC" -DTARGET="x86_64" -DTOOLCHAIN="C:/msys64/mingw64/bin/g++" ../prj/cmake/

    $ gcc -v

    COLLECT_GCC=C:\msys64\mingw64\bin\gcc.exe
    COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.2.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../gcc-8.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=ada,c,lto,c++,objc,obj-c++,fortran --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev3, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
    Thread model: posix
    gcc version 8.2.0 (Rev3, Built by MSYS2 project)
    

    When I run the command "make" the build is successful. But when I try to run "Test" binary, I get an error message "illegal instruction".

    (gdb) run
    Starting program: C:\msys64\home\user\test\Simd-4.2.70\build\Test.exe
    [New Thread 10452.0x1740]
    [New Thread 10452.0x2a54]
    [New Thread 10452.0x2e24]
    [New Thread 10452.0x27dc]
    SSE: Yes
    SSE2: Yes
    SSE3: Yes
    SSSE3: No
    SSE4.1: No
    SSE4.2: No
    AVX: No
    AVX2: No
    AVX-512F: No
    AVX-512BW: No
    PowerPC-Altivec: No
    PowerPC-VSX: No
    ARM-NEON: No
    MIPS-MSA: No
    
    Thread 1 received signal SIGILL, Illegal instruction.
    0x000000000046d86f in Test::TestPoint ()
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/TestCheckCpp.cpp:60
    60          {
    (gdb) bt full
    #0  0x000000000046d86f in Test::TestPoint ()
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/TestCheckCpp.cpp:60
            p = {x = 140730470038064, y = 1875954880}
            fp = {x = 3.6138301231727476e-316, y = 9.2664781905975763e-315}
    #1  0x000000000047a3be in Test::CheckCpp ()
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/TestCheckCpp.cpp:124
    No locals.
    #2  0x000000000041082d in main (argc=1, argv=0x45c1960)
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/Test.cpp:677
            options = {mode = 73138176, help = false,
              include = {<std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {
                  _M_impl = {<std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<__gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<No data fields>}, <No data fields>}, _M_start = 0x84070083,
                    _M_finish = 0x45c0000,
                    _M_end_of_storage = 0x7ffe5f4113fc <ntdll!RtlpNtSetValueKey+20604>}}, <No data fields>},
              exclude = {<std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {
                  _M_impl = {<std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<__gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<No data fields>}, <No data fields>}, _M_start = 0x800,
                    _M_finish = 0x45c0000,
                    _M_end_of_storage = 0x44724b0}}, <No data fields>}, text = {
                static npos = 18446744073709551615,
                _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
                  _M_p = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>}, _M_string_length = 1, {
                  _M_local_buf = "\000\b\000\000\000\000\000\000▒$G\004\000\000\000", _M_allocated_capacity = 2048}}, html = {
                static npos = 18446744073709551615,
                _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x0},
                _M_string_length = 73138176, {
                  _M_local_buf = "▒▒;_▒\177\000\000\000\000\\\004\000\000\000",
                  _M_allocated_capacity = 140730496159222}},
              testThreads = 140730495922279, workThreads = 0, printAlign = 184}
            groups = {<std::_Vector_base<Test::Group, std::allocator<Test::Group> >> = {
                _M_impl = {<std::allocator<Test::Group>> = {<__gnu_cxx::new_allocator<Test::Group>> = {<No data fields>}, <No data fields>}, _M_start = 0x1,
                  _M_finish = 0x7ffe5f3ce9c3 <ntdll!memset+122627>,
                  _M_end_of_storage = 0x44724a0}}, <No data fields>}
    
    opened by kudzurunner 12
  • Cmake fails with SimdVersion.h

    Cmake fails with SimdVersion.h

    After fighting through making cmake work on MinGw, I gradually come to understand how this all works, with this thread being the golden helper: https://github.com/glfw/glfw/issues/843#issuecomment-250316815

    Whilst I can compile without MSYS only using the stock MinGW install, I'm interested in getting it to work with "Msys Makefiles" or "Unix Makefiles" I always end up with: Skip updating of file ""C:/Users/Bonfire/Desktop/winVAI/Projects/suite/pkg/Simd/prj/cmake/../..\src\Simd\SimdVersion.h"" because there are not any changes.

    Everything else seems to work, make -j8 posts many compiled files, but it always files at this one: /SimdLib.cpp: In function 'const char* SimdVersion()': C:/Users/Bonfire/Desktop/winVAI/Projects/suite/pkg/Simd/src/Simd/SimdLib.cpp:79:12: error: 'SIMD_VERSION' was not declared in this scope return SIMD_VERSION; ^~~~~~~~~~~~

    Is there a way to fix this?

    opened by ghost 12
  • Bug in SimdResizeMethodNearest resizer

    Bug in SimdResizeMethodNearest resizer

    Calling SimdResizerRunwith SimdResizeMethodNearestgives spill on the right of the resized image. E.g. if I have two YUV420P images of 1920*1080 and copy one to the first quadrant of the other (0, 0, 960, 540), there is spill to the right. See image. This does not happen with SimdResizeMethodAreaor SimdResizeMethodBilinear(did not try others) but (very!) unfortunately they are too slow. SimdLib v4.9.111. Windows 64 image

    opened by mikeversteeg 11
  • build error View = Mat

    build error View = Mat

    I have included the Simd/SimdLib.hpp, and I defined #define SIMD_OPENCV_ENABLE, but when I build, I run into error.

    #include <opencv2/core.hpp>
    #include <opencv2/highgui.hpp>
    #include <opencv2/calib3d.hpp>
    #include <opencv2/imgproc.hpp>
    #include <opencv2/videoio.hpp>
    #include <opencv2/video/tracking.hpp>
    #define SIMD_OPENCV_ENABLE
    #include "Simd/SimdLib.hpp"
    
    #include <thread>
    #include <atomic>
    #include "ShareSpace.h"
    #include "Controller.h"
    
            img.resource = camera->getFrame();
            if(!img.resource.empty())
            {
                // cv::resize(img.resource, img.resource, cv::Size(640,512));
                Simd::View viewsrc = img.resource;
                cv::Mat tmp= cv::Mat::zeros(cv::Size(640,512), img.resource.type());
                Simd::View viewdest = tmp;
                Simd::ResizeBilinear(viewsrc, viewdest);
                img.resource = tmp;
            }
    

    error

    error: Share/ThreadManager/ThreadManager.cpp:91:38: error: class template argument deduction failed:
       91 |             Simd::View viewsrc = img.resource;
          |                                      ^~~~~~~~
    Share/ThreadManager/ThreadManager.cpp:91:38: error: no matching function for call to ‘View(cv::Mat&)’
    In file included from /usr/include/Simd/SimdLib.hpp:27,
                     from Share/ThreadManager/ThreadManager.h:19,
                     from Share/ThreadManager/ThreadManager.cpp:10:
    /usr/include/Simd/SimdView.hpp:819:52: note: candidate: ‘template<template<class> class A> View(const Simd::Point<long int>&, Simd::View<A>::Format)-> Simd::View<A>’
      819 |     template <template<class> class A> SIMD_INLINE View<A>::View(const Point<ptrdiff_t> & size, Format f)
          |                                                    ^~~~~~~
    /usr/include/Simd/SimdView.hpp:819:52: note:   template argument deduction/substitution failed:
    Share/ThreadManager/ThreadManager.cpp:91:38: note:   candidate expects 2 arguments, 1 provided
       91 |             Simd::View viewsrc = img.resource;
          |                                      ^~~~~~~~
    In file included from /usr/include/Simd/SimdLib.hpp:27,
                     from Share/ThreadManager/ThreadManager.h:19,
                     from Share/ThreadManager/ThreadManager.cpp:10
    
    opened by wangzhankun 11
  • GaussianBlur usage and memory issue

    GaussianBlur usage and memory issue

    The following code:

    #include <iostream>
    
    #include "Simd/SimdLib.h"
    
    static void print(const unsigned char* img, int rows, int cols)
    {
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < cols; j++) {
                std::cout << static_cast<unsigned>(img[i*cols + j]) << " ";
            }
            std::cout << std::endl;
        }
    }
    
    int main(int , char * [])
    {
        const int rows = 8, cols = 12;
        unsigned char img[rows*cols];
        unsigned char img_blur[rows*cols];
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < cols; j++) {
                img[i*cols + j] = static_cast<unsigned char>(i*cols + j);
            }
        }
    
        const float radius = 5.0f;
        void * funcPtr = SimdGaussianBlurInit(cols, rows, 1, &radius);
        SimdGaussianBlurRun(funcPtr, img, cols, img_blur, cols);
        SimdRelease(funcPtr);
    
        std::cout << "Original image:" << std::endl;
        print(img, rows, cols);
        std::cout << "\nGaussian blur:" << std::endl;
        print(img_blur, rows, cols);
    
        return EXIT_SUCCESS;
    }
    

    produces on my computer the following error:

    *** stack smashing detected ***: ./TestGaussianBlur terminated
    Abandon (core dumped)
    

    Output when running with Valgrind:

    ==22407== Memcheck, a memory error detector
    ==22407== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
    ==22407== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
    ==22407== Command: ./TestGaussianBlur
    ==22407== 
    ==22407== Source and destination overlap in memcpy(0x8418b60, 0x8418b60, 48)
    ==22407==    at 0x730E674: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1034)
    ==22407==    by 0x332B536: void Simd::Avx2::BlurImageAny<1>(Simd::BlurParam const&, Simd::Base::AlgDefault const&, unsigned char const*, unsigned long, unsigned char*, float*, unsigned char*, unsigned long) (SimdAvx2GaussianBlur.cpp:124)
    ==22407==    by 0x68AD52: Simd::Base::GaussianBlurDefault::Run(unsigned char const*, unsigned long, unsigned char*, unsigned long) (SimdBaseGaussianBlur.cpp:245)
    ==22407==    by 0x63D055: SimdGaussianBlurRun (SimdLib.cpp:2409)
    ==22407==    by 0x6386A9: main (TestGaussianBlur.cpp:78)
    ==22407== 
    Original image:
    0 1 2 3 4 5 6 7 8 9 10 11 
    12 13 14 15 16 17 18 19 20 21 22 23 
    24 25 26 27 28 29 30 31 32 33 34 35 
    36 37 38 39 40 41 42 43 44 45 46 47 
    48 49 50 51 52 53 54 55 56 57 58 59 
    60 61 62 63 64 65 66 67 68 69 70 71 
    72 73 74 75 76 77 78 79 80 81 82 83 
    84 85 86 87 88 89 90 91 92 93 94 95 
    
    Gaussian blur:
    ==22407== Use of uninitialised value of size 8
    ==22407==    at 0x7852BA3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x785309F: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x7860B65: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x63856F: print(unsigned char const*, int, int) (TestGaussianBlur.cpp:59)
    ==22407==    by 0x63871F: main (TestGaussianBlur.cpp:84)
    ==22407== 
    ==22407== Conditional jump or move depends on uninitialised value(s)
    ==22407==    at 0x7852BB6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x785309F: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x7860B65: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x63856F: print(unsigned char const*, int, int) (TestGaussianBlur.cpp:59)
    ==22407==    by 0x63871F: main (TestGaussianBlur.cpp:84)
    ==22407== 
    16 17 18 18 19 20 21 22 23 23 24 24 
    16 17 18 18 19 20 21 22 23 23 24 24 
    21 22 23 23 24 25 26 27 27 28 29 29 
    26 27 27 28 29 30 31 31 32 33 34 34 
    31 31 32 32 33 34 35 36 36 37 38 38 
    34 34 35 35 36 37 38 38 39 40 40 41 
    35 35 36 36 37 38 38 39 40 40 41 41 
    34 34 35 35 36 36 37 38 38 39 39 39 
    ==22407== Conditional jump or move depends on uninitialised value(s)
    ==22407==    at 0x638732: main (TestGaussianBlur.cpp:87)
    ==22407== 
    *** stack smashing detected ***: ./TestGaussianBlur terminated
    ==22407== 
    ==22407== Process terminating with default action of signal 6 (SIGABRT)
    ==22407==    at 0x806D438: raise (raise.c:54)
    ==22407==    by 0x806F039: abort (abort.c:89)
    ==22407==    by 0x80AF7F9: __libc_message (libc_fatal.c:175)
    ==22407==    by 0x815121B: __fortify_fail (fortify_fail.c:37)
    ==22407==    by 0x81511BF: __stack_chk_fail (stack_chk_fail.c:28)
    ==22407==    by 0x638738: main (TestGaussianBlur.cpp:87)
    ==22407== 
    ==22407== HEAP SUMMARY:
    ==22407==     in use at exit: 8,320 bytes in 8 blocks
    ==22407==   total heap usage: 18 allocs, 10 frees, 93,336 bytes allocated
    ==22407== 
    ==22407== LEAK SUMMARY:
    ==22407==    definitely lost: 0 bytes in 0 blocks
    ==22407==    indirectly lost: 0 bytes in 0 blocks
    ==22407==      possibly lost: 0 bytes in 0 blocks
    ==22407==    still reachable: 8,320 bytes in 8 blocks
    ==22407==         suppressed: 0 bytes in 0 blocks
    ==22407== Rerun with --leak-check=full to see details of leaked memory
    ==22407== 
    ==22407== For counts of detected and suppressed errors, rerun with: -v
    ==22407== Use --track-origins=yes to see where uninitialised values come from
    ==22407== ERROR SUMMARY: 386 errors from 4 contexts (suppressed: 0 from 0)
    Abandon (core dumped)
    

    What could be the issue in my code?


    In the SimdGaussianBlurInit() function, what is the relationship between radius and standard sigma value for Gaussian kernel?

    For instance with scipy.ndimage.gaussian_filter, it gives:

    img:
     [[ 0  1  2  3  4  5  6  7  8  9 10 11]
     [12 13 14 15 16 17 18 19 20 21 22 23]
     [24 25 26 27 28 29 30 31 32 33 34 35]
     [36 37 38 39 40 41 42 43 44 45 46 47]
     [48 49 50 51 52 53 54 55 56 57 58 59]
     [60 61 62 63 64 65 66 67 68 69 70 71]
     [72 73 74 75 76 77 78 79 80 81 82 83]
     [84 85 86 87 88 89 90 91 92 93 94 95]]
    img_blur (sigma=5.0):
     [[39 39 39 40 40 41 41 42 42 43 43 43]
     [40 40 40 41 41 42 42 43 43 44 44 44]
     [41 41 41 42 42 43 43 44 44 45 45 45]
     [43 43 43 44 44 45 45 46 46 47 47 47]
     [46 46 46 47 47 48 48 49 49 50 50 50]
     [48 48 48 49 49 50 50 51 51 52 52 52]
     [49 49 49 50 50 51 51 52 52 53 53 53]
     [50 50 50 51 51 52 52 53 53 54 54 54]]
    img_blur (sigma=5.0/2.0):
     [[19 19 20 21 22 23 23 24 25 26 27 27]
     [23 23 24 25 26 27 27 28 29 30 31 31]
     [29 29 30 31 32 33 33 34 35 36 37 37]
     [38 38 39 40 41 42 42 43 44 45 46 46]
     [47 47 48 49 50 51 51 52 53 54 55 55]
     [56 56 57 58 59 60 60 61 62 63 64 64]
     [62 62 63 64 65 66 66 67 68 69 70 70]
     [66 66 67 68 69 70 70 71 72 73 74 74]]
    

    Finally, why the radius parameter in SimdGaussianBlurInit() is a const float pointer? Looks like there is no need to have a pointer for this, unless to support different radius values for X and Y axes?

    opened by s-trinh 10
  • Supporting multi-byte grayscale PNG images?

    Supporting multi-byte grayscale PNG images?

    HI there,

    So I'm trying out the current framework. After discovering Simd only supports PNG's lossless file-format, I converted my images accordingly. My input PNG's are 16-bit grayscale pixels.

    So, the Framework's API is able to decode and read in my image file, but it has a funny output, according to gdb:

    $1 = {width = 1124, height = 1364, stride = 4512, format = Simd::ViewSimd::Allocator::Rgba32, data = 0x555569994a00 "", _owner = true}

    The output in bold was unexpected. I was hoping the View would be set to ::Int16 and the stride would be 1124*2 = 2248 So, am I to understand that my 16-bit grayscale image is now RGB, with A channels?

    Is there a plan to support the pure, multi-byte grayscale pixel plane in the near-future?
    If not, I'd be happy to add this in.

    Thanks, Charles.

    opened by crm-mtz 0
  • Suppress the inner bounding boxes which is contained with in bigger bounding Box.

    Suppress the inner bounding boxes which is contained with in bigger bounding Box.

    Hello, Is there any parameter that can suppress the inner bounding box and display only the outer one? Secondly is there any document that can help in understanding the algorithms used for different part in implementation ?

    Thank You.

    opened by azafar1991 0
  • SimdYuv420pSaveAsJpegToMemory

    SimdYuv420pSaveAsJpegToMemory

    I'm very impressed by SimdYuv420pSaveAsJpegToMemorybut there is a problem with the colour space conversion. If I go from YUV420P to JPEG and back, the images look washed. Is there a colour space mismatch or is this caused by JPEG compression?

    Also please note when choosing SimdPixelFormatNoneI get BGR, not YUV420P.

    opened by mikeversteeg 1
  • SimdYuva444pToBgraV2

    SimdYuva444pToBgraV2

    I'm interested in SimdYuva444pToBgraV2. I currently create a reduced size of an YUV420P by first converting to YUV444P, resize it and then use SimdYuv444pToBgraV2to display. However if the original is YUVA420P, I currently don't see an efficient way to end up with reduced size BGRA.

    Thanks.

    opened by mikeversteeg 6
Releases(v5.2.121)
  • v5.2.121(Jan 3, 2023)

    Algorithms

    New features
    • SIMD_DEPRECATED macro.
    • The mark of function SimdSvmSumLinear as deprecated.
    • SSE4.1, AVX2, AVX-512BW optimizations of function SynetNormalizeLayerForward.
    • Enumeration SimdWarpAffineFlags.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class WarpAffineNearest.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class WarpAffineBilinear.
    • Multi-threaded optimizations of class WarpAffineNearest.
    • Multi-threaded optimizations of class WarpAffineBilinear.
    • Function Simd::WarpAffine.
    • Function Simd::Mean.
    • Function Simd::OtsuThreshold.
    • Function Simd::RecursiveBilateralFilter.
    • The mark of function SimdEdgeBackgroundGrowRangeSlow as deprecated.
    • The mark of function SimdEdgeBackgroundGrowRangeFast as deprecated.
    • The mark of function SimdEdgeBackgroundIncrementCount as deprecated.
    • The mark of function SimdEdgeBackgroundAdjustRange as deprecated.
    • The mark of function SimdEdgeBackgroundAdjustRangeMasked as deprecated.
    • The mark of function SimdEdgeBackgroundShiftRange as deprecated.
    • The mark of function SimdEdgeBackgroundShiftRangeMasked as deprecated.
    • The mark of function Simd::EdgeBackgroundGrowRangeSlow as deprecated.
    • The mark of function Simd::EdgeBackgroundGrowRangeFast as deprecated.
    • The mark of function Simd::EdgeBackgroundIncrementCount as deprecated.
    • The mark of function Simd::EdgeBackgroundAdjustRange as deprecated.
    • The mark of function Simd::EdgeBackgroundAdjustRangeMasked as deprecated.
    • The mark of function Simd::EdgeBackgroundShiftRange as deprecated.
    • The mark of function Simd::EdgeBackgroundShiftRangeMasked as deprecated.
    Bug fixing
    • Wrong assert in AVX-512BW optimizations of function BgrToRgb.
    • MSVS compiler bug (Windows, Arm64).
    • Error in function Simd::DrawLine.

    Test framework

    New features
    • Tests for verifying functionality of WarpAffine engine.
    • Special tests for verifying functionality of WarpAffine engine.

    Infrastructure

    New features
    • SIMD_OPENCV Cmake option to test Simd with OpenCV support.

    Documentation

    Improving
    • Using example in description of function RecursiveBilateralFilterInit.
    Source code(tar.gz)
    Source code(zip)
    simd.5.2.121.zip(5.45 MB)
  • v5.2.120(Dec 1, 2022)

    Algorithms

    New features
    • AVX2 optimizations of class RecursiveBilateralFilterFast.
    • Base implementation of function SynetNormalizeLayerForward.
    Bug fixing
    • Error in SSE4.1 optimizations of function SynetSetInput.
    • MSVS compiler warning in AMX optimizations of class SynetConvolution8iNhwcDirect.
    • MSVS compiler warning in AMX optimizations of class SynetMergedConvolution8iCdc.
    • MSVS compiler warning in AMX optimizations of class SynetMergedConvolution8iCd.
    • MSVS compiler warning in AMX optimizations of class SynetMergedConvolution8iDc.
    • Error in AVX and AVX2 optimizations of function SynetInnerProductLayerForward.
    • Using of SIMD_CPP_2011_ENABLE macro outside of library.

    Test framework

    New features
    • Tests for verifying functionality of function SynetNormalizeLayerForward.
    Removing
    • Data test for function Fill.
    • Data test for function FillFrame.
    • Data test for function FillBgra.
    • Data test for function FillBgr.
    • Data test for function FillPixel.
    • Data test for function Float32ToFloat16.
    • Data test for function Float16ToFloat32.
    • Data test for function SquaredDifferenceSum16f.
    • Data test for function CosineDistance16f.
    • Data test for function Float32ToUint8.
    • Data test for function Uint8ToFloat32.
    • Data test for function MeanFilter3x3.
    • Data test for function MedianFilterRhomb3x3.
    • Data test for function MedianFilterRhomb5x5.
    • Data test for function MedianFilterSquare3x3.
    • Data test for function MedianFilterSquare5x5.
    • Data test for function GaussianBlur3x3.
    • Data test for function AbsGradientSaturatedSum.
    • Data test for function LbpEstimate.
    • Data test for function NormalizeHistogram.
    • Data test for function SobelDx.
    • Data test for function SobelDxAbs.
    • Data test for function SobelDy.
    • Data test for function SobelDyAbs.
    • Data test for function ContourMetrics.
    • Data test for function Laplace.
    • Data test for function LaplaceAbs.
    • Data test for function Histogram.
    • Data test for function HistogramMasked.
    • Data test for function HistogramConditional.
    • Data test for function AbsSecondDerivativeHistogram.
    • Data test for function ChangeColors.
    • Data test for function HogDirectionHistograms.
    • Data test for function HogExtractFeatures.
    • Data test for function HogDeinterleave.
    • Data test for function HogFilterSeparable.
    • Data test for function HogLiteExtractFeatures.
    • Data test for function HogLiteFilterFeatures.
    • Data test for function HogLiteResizeFeatures.
    • Data test for function HogLiteCompressFeatures.
    • Data test for function HogLiteFilterSeparable.
    • Data test for function HogLiteFindMax7x7.
    • Data test for function HogLiteCreateMask.
    • Data test for function Integral.
    • Data test for function InterferenceIncrement.
    • Data test for function InterferenceIncrementMasked.
    • Data test for function InterferenceDecrement.
    • Data test for function InterferenceDecrementMasked.
    • Data test for function InterleaveUv.
    • Data test for function InterleaveBgr.
    • Data test for function InterleaveBgra.
    • Data test for function NeuralConvert.
    • Data test for function NeuralProductSum.
    • Data test for function NeuralAddVectorMultipliedByValue.
    • Data test for function NeuralAddVector.
    • Data test for function NeuralAddValue.
    • Data test for function NeuralRoughSigmoid.
    • Data test for function NeuralRoughSigmoid2.
    • Data test for function NeuralDerivativeSigmoid.
    • Data test for function NeuralRoughTanh.
    • Data test for function NeuralDerivativeTanh.
    • Data test for function NeuralDerivativeRelu.
    • Data test for function NeuralPow.
    • Data test for function NeuralUpdateWeights.
    • Data test for function NeuralAdaptiveGradientUpdate.
    • Data test for function NeuralPooling1x1Max3x3.
    • Data test for function NeuralPooling2x2Max2x2.
    • Data test for function NeuralPooling2x2Max3x3.
    • Data test for function NeuralAddConvolution2x2Forward.
    • Data test for function NeuralAddConvolution3x3Forward.
    • Data test for function NeuralAddConvolution4x4Forward.
    • Data test for function NeuralAddConvolution5x5Forward.
    • Data test for function NeuralAddConvolution2x2Backward.
    • Data test for function NeuralAddConvolution3x3Backward.
    • Data test for function NeuralAddConvolution4x4Backward.
    • Data test for function NeuralAddConvolution5x5Backward.
    • Data test for function NeuralAddConvolution2x2Sum.
    • Data test for function NeuralAddConvolution3x3Sum.
    • Data test for function NeuralAddConvolution4x4Sum.
    • Data test for function NeuralAddConvolution5x5Sum.
    • Data test for function NeuralConvolutionForward.
    • Data test for function OperationBinary8u.
    • Data test for function OperationBinary16i.
    • Data test for function VectorProduct.
    • Data test for function ReduceColor2x2.
    • Data test for function ReduceGray2x2.
    • Data test for function ReduceGray3x3.
    • Data test for function ReduceGray4x4.
    • Data test for function ReduceGray5x5.
    • Data test for function Reorder16bit.
    • Data test for function Reorder32bit.
    • Data test for function Reorder64bit.
    • Data test for function ResizeBilinear.
    • Data test for function SegmentationShrinkRegion.
    • Data test for function SegmentationFillSingleHoles.
    • Data test for function SegmentationChangeIndex.
    • Data test for function SegmentationPropagate2x2.
    • Data test for function ShiftBilinear.
    • Data test for function GetStatistic.
    • Data test for function GetMoments.
    • Data test for function GetRowSums.
    • Data test for function GetColSums.
    • Data test for function GetAbsDyRowSums.
    • Data test for function GetAbsDxColSums.
    • Data test for function ValueSum.
    • Data test for function SquareSum.
    • Data test for function SobelDxAbsSum.
    • Data test for function SobelDyAbsSum.
    • Data test for function LaplaceAbsSum.
    • Data test for function ValueSquareSum.
    • Data test for function CorrelationSum.
    • Data test for function StretchGray2x2.
    • Data test for function SvmSumLinear.
    • Data test for function SynetEltwiseLayerForward.
    • Data test for function TextureBoostedSaturatedGradient.
    • Data test for function TextureBoostedUv.
    • Data test for function TextureGetDifferenceSum.
    • Data test for function TexturePerformCompensation.
    • Data test for function Yuv444pToBgr.
    • Data test for function Yuv422pToBgr.
    • Data test for function Yuv420pToBgr.
    • Data test for function Yuv444pToHsl.
    • Data test for function Yuv444pToHsv.
    • Data test for function Yuv444pToHue.
    • Data test for function Yuv420pToHue.
    • Data test for function Yuv444pToBgra.
    • Data test for function Yuv422pToBgra.
    • Data test for function Yuv420pToBgra.
    • Data test infrastructure.

    Infrastructure

    New features
    • SIMD_RUNTIME CMake build option.
    Source code(tar.gz)
    Source code(zip)
    simd.5.2.120.zip(5.40 MB)
  • v5.1.119(Nov 1, 2022)

    Algorithms

    New features
    • AMX optimizations of class SynetConvolution8iNhwcDirect.
    • AMX optimizations of class SynetMergedConvolution8iCdc.
    • AMX optimizations of class SynetMergedConvolution8iCd.
    • AMX optimizations of class SynetMergedConvolution8iDc.
    Improving
    • Optimization of using of memory buffer in class SynetConvolution8iNhwcDirect.
    Bug fixing
    • MSVS compiler bug (Windows, Arm64).
    Removing
    • AVX-512VNNI optimizations of function SetDepthwise in class SynetConvolution8iNhwcDirect (it is equal to AVX-512BW version).

    Test framework

    New features
    Removing
    • Data test for function BayerToBgr.
    • Data test for function BayerToBgra.
    • Data test for function Bgr48pToBgra32.
    • Data test for function Binarization.
    • Data test for function AveragingBinarization.
    • Data test for function ConditionalCount8u.
    • Data test for function ConditionalCount16i.
    • Data test for function ConditionalSum.
    • Data test for function ConditionalSquareSum.
    • Data test for function ConditionalSquareGradientSum.
    • Data test for function ConditionalFill.
    • Data test for function Copy.
    • Data test for function CopyFrame.
    • Data test for function Crc32c.
    • Data test for function DeinterleaveUv.
    • Data test for function DeinterleaveBgr.
    • Data test for function DeinterleaveBgra.
    • Data test for function DetectionHaarDetect32fp.
    • Data test for function DetectionHaarDetect32fi.
    • Data test for function DetectionLbpDetect32fp.
    • Data test for function DetectionLbpDetect32fi.
    • Data test for function DetectionLbpDetect16ip.
    • Data test for function DetectionLbpDetect16ii.
    • Data test for function AbsDifferenceSum.
    • Data test for function AbsDifferenceSumMasked.
    • Data test for function AbsDifferenceSums3x3.
    • Data test for function AbsDifferenceSums3x3Masked.
    • Data test for function SquaredDifferenceSum.
    • Data test for function SquaredDifferenceSumMasked.
    • Data test for function SquaredDifferenceSum32f.
    • Data test for function SquaredDifferenceKahanSum32f.
    • Data test for function CosineDistance32f.
    • Data test for function AlphaBlending.
    • Data test for function AlphaFilling.
    • Data test for function EdgeBackgroundGrowRangeSlow.
    • Data test for function EdgeBackgroundGrowRangeFast.
    • Data test for function EdgeBackgroundIncrementCount.
    • Data test for function EdgeBackgroundAdjustRange.
    • Data test for function EdgeBackgroundAdjustRangeMasked.
    • Data test for function EdgeBackgroundShiftRange.
    • Data test for function EdgeBackgroundShiftRangeMasked.
    Source code(tar.gz)
    Source code(zip)
    simd.5.1.119.zip(5.47 MB)
  • v5.1.118(Oct 4, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1 optimizations of RecursiveBilateralFilter engine.
    • Support of ARGB32 format in View.
    • Support of ARGB32 format in function AlphaPremultiply.
    • Support of ARGB32 format in function AlphaUnpremultiply.
    • AVX-512BW optimizations of function TileMatMul8u8i (AMX tile emulation).
    Improving
    • Base implementation of class ImagePngLoader.
    Bug fixing
    • Build errors on MacOS Arm64.
    • Clang compiler warning (-mfpu=neon -mfpu=neon-fp16).
    • Compiler errors (C++-98 specific).

    Test framework

    New features
    • Tests for verifying functionality of RecursiveBilateralFilter engine.
    Removing
    • Data test for function AbsDifference.
    • Data test for function AddFeatureDifference.
    • Data test for function BgraToBgr.
    • Data test for function BgraToGray.
    • Data test for function BgrToGray.
    • Data test for function BgrToHsl.
    • Data test for function BgrToHsv.
    • Data test for function GrayToBgr.
    • Data test for function Int16ToGray.
    • Data test for function BgrToBayer.
    • Data test for function BgraToBayer.
    • Data test for function BgrToBgra.
    • Data test for function GrayToBgra.
    • Data test for function BgraToYuv420p.
    • Data test for function BgraToYuv422p.
    • Data test for function BgraToYuv444p.
    • Data test for function BgrToYuv420p.
    • Data test for function BgrToYuv422p.
    • Data test for function BgrToYuv444p.
    • Data test for function BackgroundGrowRangeSlow.
    • Data test for function BackgroundGrowRangeFast.
    • Data test for function BackgroundIncrementCount.
    • Data test for function BackgroundAdjustRange.
    • Data test for function BackgroundAdjustRangeMasked.
    • Data test for function BackgroundShiftRange.
    • Data test for function BackgroundShiftRangeMasked.
    • Data test for function BackgroundInitMask.
    Source code(tar.gz)
    Source code(zip)
    simd.5.1.118.zip(5.47 MB)
  • v5.1.117(Sep 1, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2 optimizations of function AlphaBlending2x.
    Bug fixing
    • Buffer overrun in Base implementation of function SynetFusedLayerForward9.
    • Buffer overrun in SSE4.1 optimization of class SynetScale8i.
    • Buffer overrun in AVX-512BW optimizations of class ResizerNearest.
    • Buffer overrun in AVX-512BW optimizations of class ResizerByteBilinear.
    • Buffer overrun in AVX-512BW optimizations of class ResizerByteBicubic.
    • Buffer overrun in AVX-512BW optimizations of class ResizerByteArea1x1.
    • Buffer overrun in AVX-512BW optimizations of class ResizerByteArea2x2.
    • Buffer overrun in AVX-512BW optimizations of function TransformImage.
    • Error in AVX-512BW optimizations of function Yuv420pToUyvy422.
    • Crash in std::unordered_map after calling of some Simd function (Simd does not clear MMX registers after using).
    • Error (possible negative output values) in AVX-512BW optimizations of function CosineDistancesMxNp16f.
    • Error (possible negative output values) in AVX-512BW optimizations of function CosineDistancesMxNa16f.
    • Error in AVX-512BW optimizations of function TransformImage.
    • Valgrind warning in OutputMemoryStream.
    • Memory leak in Base implementation of class ImagePngLoader.
    Replacing
    • Replace SSE2 optimizations to SSE4.1 for function SegmentationChangeIndex.
    • Replace SSE2 optimizations to SSE4.1 for function SegmentationFillSingleHoles.
    • Replace SSE2 optimizations to SSE4.1 for function SegmentationPropagate2x2.
    • Replace SSE2 optimizations to SSE4.1 for function ShiftBilinear.
    • Replace SSE2 optimizations to SSE4.1 for function SobelDx.
    • Replace SSE2 optimizations to SSE4.1 for function SobelDy.
    • Replace SSE2 optimizations to SSE4.1 for function ContourAnchors.
    • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceSum.
    • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceSumMasked.
    • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceSum32f.
    • Replace SSE2 optimizations to SSE4.1 for function SquaredDifferenceKahanSum32f.
    • Replace SSE2 optimizations to SSE4.1 for function GetStatistic.
    • Replace SSE2 optimizations to SSE4.1 for function GetMoments.
    • Replace SSE2 optimizations to SSE4.1 for function GetObjectMoments.
    • Replace SSE2 optimizations to SSE4.1 for function GetRowSums.
    • Replace SSE2 optimizations to SSE4.1 for function GetColSums.
    • Replace SSE2 optimizations to SSE4.1 for function GetAbsDyRowSums.
    • Replace SSE2 optimizations to SSE4.1 for function GetAbsDxColSums.
    • Replace SSE2 optimizations to SSE4.1 for function ValueSum.
    • Replace SSE2 optimizations to SSE4.1 for function SquareSum.
    • Replace SSE2 optimizations to SSE4.1 for function ValueSquareSum.
    • Replace SSE2 optimizations to SSE4.1 for function CorrelationSum.
    • Replace SSE2 optimizations to SSE4.1 for function StretchGray2x2.
    • Replace SSE2 optimizations to SSE4.1 for function TextureBoostedSaturatedGradient.
    • Replace SSE2 optimizations to SSE4.1 for function TextureBoostedUv.
    • Replace SSE2 optimizations to SSE4.1 for function TextureGetDifferenceSum.
    • Replace SSE2 optimizations to SSE4.1 for function TexturePerformCompensation.
    • Replace SSE2 optimizations to SSE4.1 for function Yuv420pToHue.
    • Replace SSE2 optimizations to SSE4.1 for function Yuv444pToHue.
    • Replace SSE2 optimizations to SSE4.1 for function Yuva420pToBgra.
    • Replace SSE2 optimizations to SSE4.1 for function Yuv420pToBgra.
    • Replace SSE2 optimizations to SSE4.1 for function Yuv420pToBgraV2.
    • Replace SSE2 optimizations to SSE4.1 for function Yuv422pToBgra.
    • Replace SSE2 optimizations to SSE4.1 for function Yuv444pToBgra.
    • Replace SSE2 optimizations to SSE4.1 for function Yuv444pToBgraV2.
    • Replace SSE2 optimizations to SSE4.1 for function SynetPoolingAverage.
    • Replace SSE2 optimizations to SSE4.1 for function SynetScaleLayerForward.
    • Replace SSE2 optimizations to SSE4.1 for function SynetConvert32fTo8u.
    • Replace SSE2 optimizations to SSE4.1 for function SynetReorderImage.
    • Replace SSE2 optimizations to SSE4.1 for function SynetReorderFilter.
    • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward0.
    • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward1.
    • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward2.
    • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward3.
    • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward4.
    • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward8.
    • Replace SSE2 optimizations to SSE4.1 for function SynetFusedLayerForward9.
    • Replace SSE2 optimizations to SSE4.1 for function SynetDeconvolution32fInit.
    • Replace SSE2 optimizations to SSE4.1 for class SynetDeconvolution32fGemmNN.
    • Replace SSE2 optimizations to SSE4.1 for class SynetDeconvolution32fNhwcDirect2x2.
    • Replace SSE2 optimizations to SSE4.1 for function SynetConvolution32fInit.
    • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fDepthwiseDotProduct.
    • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fWinograd.
    • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fDirectNchw.
    • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fNhwcDirect.
    • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fDirectNhwc.
    • Replace SSE2 optimizations to SSE4.1 for class SynetConvolution32fGemmNN.
    • Replace SSE2 optimizations to SSE4.1 for class SynetMergedConvolution32fCdc.
    • Replace SSE2 optimizations to SSE4.1 for class SynetMergedConvolution32fCd.
    • Replace SSE2 optimizations to SSE4.1 for class SynetMergedConvolution32fDc.
    • Replace SSE2 optimizations to SSE4.1 for function SynetMergedConvolution32fInit.
    • Replace SSE2 optimizations to SSE4.1 for function SynetAddBias.
    • Replace SSE2 optimizations to SSE4.1 for function SynetEltwiseLayerForward.
    • Replace SSE2 optimizations to SSE4.1 for function SynetInnerProductLayerForward.
    • Replace SSE2 optimizations to SSE4.1 for function SynetLrnLayerCrossChannels.
    • Replace SSE2 optimizations to SSE4.1 for function SynetShuffleLayerForward.
    • Replace SSE2 optimizations to SSE4.1 for function SynetSoftmaxLayerForward.
    • Replace SSE2 optimizations to SSE4.1 for function SynetElu32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetHardSigmoid32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetHswish32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetMish32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetPreluLayerForward.
    • Replace SSE2 optimizations to SSE4.1 for function SynetRelu32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetRestrictRange32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetSigmoid32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetSoftplus32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetSwish32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetTanh32f.
    • Replace SSE2 optimizations to SSE4.1 for function SynetGemm32fNN.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x3Block1x4SetFilter.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x3Block1x4SetInput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x3Block1x4SetOutput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x5Block1x4SetFilter.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x5Block1x4SetInput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel1x5Block1x4SetOutput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block2x2SetFilter.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block2x2SetInput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block2x2SetOutput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block4x4SetFilter.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block4x4SetInput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel2x2Block4x4SetOutput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block2x2SetFilter.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block2x2SetInput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block2x2SetOutput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block3x3SetFilter.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block3x3SetInput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block3x3SetOutput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block4x4SetFilter.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block4x4SetInput.
    • Replace SSE2 optimizations to SSE4.1 for function WinogradKernel3x3Block4x4SetOutput.

    Test framework

    New features
    • Command line argument '-tr=' - number of test execution repeats.
    • Special test for function CosineDistancesMxNp16fSpecialTest.
    • Tests for verifying functionality of function AlphaBlending2x.
    Bug fixing
    • Uninitialized source array in VectorNormNp16fAutoTest.
    • Uninitialized source array in CosineDistancesMxNp16fAutoTest.
    • Memory leak in ImageLoadFromMemorySpecialTest.

    Infrastructure

    Removing
    • Project Sse2 for Microsoft Visual Studio 2022.
    • Project Sse2 for Microsoft Visual Studio 2019.
    Source code(tar.gz)
    Source code(zip)
    simd.5.1.117.zip(5.14 MB)
  • v5.0.116(Aug 15, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuva444pToBgraV2.
    • Function SimdEmpty.
    • Checking of no man's land watermarks in function SimdFree.
    Improving
    • AVX-512BW optimizations of AMX tile emulation.
    • AMX optimizations of class SynetConvolution32fBf16Nhwc.
    • AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
    • AMX optimizations of class SynetMergedConvolution32fBf16Cd.
    Bug fixing
    • GCC linker error when SIMD_AMX_EMULATE macro is switched on.
    • Error in SSE4.1, AVX2, AVX-512BW, AMX optimizations of class SynetConvolution32fBf16Nhwc.
    • Wrong assert in SSE4.1 and AVX-512BW optimizations of class ResizerNearest.
    • Error in AVX optimizations of class SynetMergedConvolution32fCdc.
    • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
    • External buffer reading overflow in class SynetMergedConvolution32fBf16Cdc.
    • External buffer reading overflow in class SynetMergedConvolution32fBf16Cd.
    • External buffer reading overflow in class SynetMergedConvolution32fBf16Dc.
    • FP32 overflow in SSE2, AVX2, AVX-512BW, NEON optimizations of function Tanh.
    • Error in function Base::SynetConvolution32fGemmNN::ImgToCol.
    • Error in SSE4.1, AVX2 optimizations of class ResizerByteArea2x2.
    • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerNearest.
    • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteBilinear.
    • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteBicubic.
    • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteArea1x1.
    • Buffer overrun in SSE4.1, AVX2 optimizations of class ResizerByteArea2x2.
    Replacing
    • Replace SSE2 optimizations to SSE4.1 for function SvmSumLinear.
    • Replace SSE2 optimizations to SSE4.1 for function AbsDifference.
    • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSum.
    • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSumMasked.
    • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSums3x3.
    • Replace SSE2 optimizations to SSE4.1 for function AbsDifferenceSums3x3Masked.
    • Replace SSE2 optimizations to SSE4.1 for function AbsGradientSaturatedSum.
    • Replace SSE2 optimizations to SSE4.1 for function AddFeatureDifference.
    • Replace SSE2 optimizations to SSE4.1 for function AlphaBlending.
    • Replace SSE2 optimizations to SSE4.1 for function AlphaBlendingUniform.
    • Replace SSE2 optimizations to SSE4.1 for function AlphaFilling.
    • Replace SSE2 optimizations to SSE4.1 for function AlphaPremultiply.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundGrowRangeSlow.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundGrowRangeFast.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundIncrementCount.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundAdjustRange.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundAdjustRangeMasked.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundShiftRange.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundShiftRangeMasked.
    • Replace SSE2 optimizations to SSE4.1 for function BackgroundInitMask.
    • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundGrowRangeSlow.
    • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundGrowRangeFast.
    • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundIncrementCount.
    • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundAdjustRange.
    • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundAdjustRangeMasked.
    • Replace SSE2 optimizations to SSE4.1 for function EdgeBackgroundShiftRangeMasked.
    • Replace SSE2 optimizations to SSE4.1 for function BayerToBgra.
    • Replace SSE2 optimizations to SSE4.1 for function BgraToGray.
    • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv420p.
    • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv422p.
    • Replace SSE2 optimizations to SSE4.1 for function BgraToYuv444p.
    • Replace SSE2 optimizations to SSE4.1 for function BgraToYuva420p.
    • Replace SSE2 optimizations to SSE4.1 for function BgrToGray.
    • Replace SSE2 optimizations to SSE4.1 for function RgbaToGray.
    • Replace SSE2 optimizations to SSE4.1 for function Bgr48pToBgra32.
    • Replace SSE2 optimizations to SSE4.1 for function Binarization.
    • Replace SSE2 optimizations to SSE4.1 for function AveragingBinarization.
    • Replace SSE2 optimizations to SSE4.1 for function ConditionalCount8u.
    • Replace SSE2 optimizations to SSE4.1 for function ConditionalCount16i.
    • Replace SSE2 optimizations to SSE4.1 for function ConditionalSum.
    • Replace SSE2 optimizations to SSE4.1 for function ConditionalSquareSum.
    • Replace SSE2 optimizations to SSE4.1 for function ConditionalSquareGradientSum.
    • Replace SSE2 optimizations to SSE4.1 for function ConditionalFill.
    • Replace SSE2 optimizations to SSE4.1 for function DeinterleaveUv.
    • Replace SSE2 optimizations to SSE4.1 for function Fill32f.
    • Replace SSE2 optimizations to SSE4.1 for function FillBgr.
    • Replace SSE2 optimizations to SSE4.1 for function FillBgra.
    • Replace SSE2 optimizations to SSE4.1 for function FillPixel.
    • Replace SSE2 optimizations to SSE4.1 for function CosineDistance32f.
    • Replace SSE2 optimizations to SSE4.1 for function Float32ToUint8.
    • Replace SSE2 optimizations to SSE4.1 for function Uint8ToFloat32.
    • Replace SSE2 optimizations to SSE4.1 for function GaussianBlur3x3.
    • Replace SSE2 optimizations to SSE4.1 for function GrayToBgra.
    • Replace SSE2 optimizations to SSE4.1 for function AbsSecondDerivativeHistogram.
    • Replace SSE2 optimizations to SSE4.1 for function HistogramMasked.
    • Replace SSE2 optimizations to SSE4.1 for function HistogramConditional.
    • Replace SSE2 optimizations to SSE4.1 for function HogDirectionHistograms.
    • Replace SSE2 optimizations to SSE4.1 for function HogDeinterleave.
    • Replace SSE2 optimizations to SSE4.1 for function HogFilterSeparable.
    • Replace SSE2 optimizations to SSE4.1 for function Int16ToGray.
    • Replace SSE2 optimizations to SSE4.1 for function InterferenceIncrement.
    • Replace SSE2 optimizations to SSE4.1 for function InterferenceIncrementMasked.
    • Replace SSE2 optimizations to SSE4.1 for function InterferenceDecrement.
    • Replace SSE2 optimizations to SSE4.1 for function InterferenceDecrementMasked.
    • Replace SSE2 optimizations to SSE4.1 for function InterleaveUv.
    • Replace SSE2 optimizations to SSE4.1 for function Laplace.
    • Replace SSE2 optimizations to SSE4.1 for function LbpEstimate.
    • Replace SSE2 optimizations to SSE4.1 for function MeanFilter3x3.
    • Replace SSE2 optimizations to SSE4.1 for function MedianFilterRhomb3x3.
    • Replace SSE2 optimizations to SSE4.1 for function MedianFilterRhomb5x5.
    • Replace SSE2 optimizations to SSE4.1 for function MedianFilterSquare3x3.
    • Replace SSE2 optimizations to SSE4.1 for function MedianFilterSquare5x5.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Forward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Forward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Forward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Forward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Backward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Backward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Backward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Backward.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAdaptiveGradientUpdate.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddVectorMultipliedByValue.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddVector.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralAddValue.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralConvert.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeRelu.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeSigmoid.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralDerivativeTanh.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling1x1Max3x3.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling2x2Max2x2.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralPooling2x2Max3x3.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralPow.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralProductSum.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralRoughSigmoid.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralRoughSigmoid2.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralRoughTanh.
    • Replace SSE2 optimizations to SSE4.1 for function NeuralUpdateWeights.
    • Replace SSE2 optimizations to SSE4.1 for function OperationBinary8u.
    • Replace SSE2 optimizations to SSE4.1 for function OperationBinary16i.
    • Replace SSE2 optimizations to SSE4.1 for function VectorProduct.
    • Replace SSE2 optimizations to SSE4.1 for function ReduceColor2x2.
    • Replace SSE2 optimizations to SSE4.1 for function ReduceGray2x2.
    • Replace SSE2 optimizations to SSE4.1 for function ReduceGray3x3.
    • Replace SSE2 optimizations to SSE4.1 for function ReduceGray4x4.
    • Replace SSE2 optimizations to SSE4.1 for function ReduceGray5x5.
    • Replace SSE2 optimizations to SSE4.1 for function Reorder16bit.
    • Replace SSE2 optimizations to SSE4.1 for function Reorder32bit.
    • Replace SSE2 optimizations to SSE4.1 for function Reorder64bit.
    • Replace SSE2 optimizations to SSE4.1 for function ResizeBilinear.
    • Replace SSE2 optimizations to SSE4.1 for function ResizerInit.
    • Replace SSE2 optimizations to SSE4.1 for class ResizerByteBilinear.
    • Replace SSE2 optimizations to SSE4.1 for class ResizerFloatBilinear.
    • Replace SSE2 optimizations to SSE4.1 for class ResizerByteArea1x1.

    Test framework

    New features
    • Tests for verifying functionality of function Yuva444pToBgraV2.

    Infrastructure

    New features
    • Cmake parameter SIMD_AMX_EMULATE.

    Documentation

    Improving
    • Description of function SynetSetInput.
    Source code(tar.gz)
    Source code(zip)
    simd.5.0.116.zip(5.16 MB)
  • v5.0.115(Jul 1, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cd.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Dc.
    • AVX-512BF16 extension support.
    • AVX-512BF16 optimizations of function Float32ToBFloat16.
    • AVX-512BF16, AMX optimizations of class SynetConvolution32fBf16Nhwc.
    • AMX extension support.
    • Support of 3D pooling in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetPoolingMax32f.
    Improving
    • AVX-512BW optimizations of function Fill32f.
    Renaming
    • Rename function SynetPoolingForwardAverage to SynetPoolingAverage.
    • Rename function SynetPoolingForwardMax32f to SynetPoolingMax32f.
    • Rename function SynetPoolingForwardMax8u to SynetPoolingMax8u.
    Replacing
    • Replace AVX-512F optimizations to AVX-512BW for function SvmSumLinear.
    • Replace AVX-512F optimizations to AVX-512BW for function Fill32f.
    • Replace AVX-512F optimizations to AVX-512BW for class ResizerNearest.
    • Replace AVX-512F optimizations to AVX-512BW for class ResizerFloatBilinear.
    • Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceSum32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceKahanSum32f.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralConvolutionForward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Forward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Backward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Sum.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Forward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Backward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Sum.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Forward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Backward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Sum.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Forward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Backward.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Sum.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralProductSum.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAdaptiveGradientUpdate.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling1x1Max3x3.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max2x2.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max3x3.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralUpdateWeights.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddValue.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVector.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVectorMultipliedByValue.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid2.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeSigmoid.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughTanh.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeTanh.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeRelu.
    • Replace AVX-512F optimizations to AVX-512BW for function NeuralPow.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNN.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNT.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fWinograd.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fGemmNN.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fNhwcDirect2x2.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetDeconvolution32fInit.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fGemm.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fProd.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProduct32fInit.
    • Replace AVX-512F optimizations to AVX-512BW for function ConvolutionBiasAndActivation.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetReorderImage.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetReorderFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNN.
    • Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNT.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward0.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward1.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward2.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward3.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward4.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward8.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward9.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetInput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetOutput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetInput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetOutput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetInput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetOutput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetInput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetOutput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetInput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetOutput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetInput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetOutput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetFilter.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetInput.
    • Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetOutput.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetElu32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetHardSigmoid32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetHswish32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetMish32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetPreluLayerForward.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetRelu32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetRestrictRange32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetSigmoid32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetSoftplus32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetSwish32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetTanh32f.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetScaleLayerForward.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetPoolingAverage.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetAddBias.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetEltwiseLayerForward.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProductLayerForward.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetLrnLayerCrossChannels.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetShuffleLayerForward.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetSoftmaxLayerForward.
    • Replace AVX-512F optimizations to AVX-512BW for function SynetUnaryOperation32fLayerForward.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fDirectNchw.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fDirectNhwc.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fNhwcDirect.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetMergedConvolution32fCdc.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetMergedConvolution32fCd.
    • Replace AVX-512F optimizations to AVX-512BW for class SynetMergedConvolution32fDc.

    Infrastructure

    New features
    • Project Avx512bf16 for Microsoft Visual Studio 2022.
    • Project Avx512bf16 for Microsoft Visual Studio 2019.
    • Project Amx for Microsoft Visual Studio 2022.
    • Project Amx for Microsoft Visual Studio 2019.
    Removing
    • Project Avx512f for Microsoft Visual Studio 2022.
    • Project Avx512f for Microsoft Visual Studio 2019.
    Source code(tar.gz)
    Source code(zip)
    simd.5.0.115.zip(5.18 MB)
  • v4.10.114(Jun 1, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToUyvy422.
    • AVX-512BW, NEON optimizations of function Uyvy422ToYuv420p.
    • AVX-512BW, NEON optimizations of function Uyvy422ToBgr.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Float32ToBFloat16.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
    • Base implementation of class SynetConvolution32fBf16Gemm.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution32fBf16Nhwc.
    • Base implementation of class SynetMergedConvolution32fBf16.
    Removing
    • Remove external GEMM function parameter from function SynetConvolution32fInit.
    • Remove external GEMM function parameter from function SynetDeconvolution32fInit.

    Test framework

    New features
    • Tests for verifying functionality of function Yuv420pToUyvy422.
    • Tests for verifying functionality of function Float32ToBFloat16.
    • Tests for verifying functionality of function BFloat16ToFloat32.

    Infrastructure

    New features
    • Project files for Microsoft Visual Studio 2022.
    Source code(tar.gz)
    Source code(zip)
    simd.4.10.114.zip(5.09 MB)
  • v4.9.113(May 4, 2022)

    Algorithms

    New features
    • SSE4.1, AVX2, AVX-512BW optimizations of class ResizerByteArea2x2.
    Improving
    • Base implementation of class ResizerByteArea1x1.
    Bug fixing
    • Error in Base implementation of class ResizerByteArea2x2.
    • Error in AVX optimizations of class SynetConvolution32fDirectNchw.
    Removing
    • SimdSynetCompatibilityFloatZero flag.

    Infrastructure

    New features
    • Git commit ID info in function SimdVersion.
    • Git branch name in function SimdVersion.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.113.zip(5.01 MB)
  • v4.9.112(Apr 1, 2022)

    Algorithms

    New features
    • NEON optimizations of function Base64Encode.
    • NEON optimizations of ImageJpegSaver class.
    • NEON optimizations of function Yuv420pSaveAsJpegToMemory.
    • NEON optimizations of function Nv12SaveAsJpegToMemory.
    • Owner method in View structure.
    • Owner method in Frame structure.
    • Capture method in View structure.
    • Capture method in Frame structure.
    • Base implementation of class ResizerByteAreaReduced2x2.
    Bug fixing
    • MSVS compiler error in AVX-512BW optimizations of function Yuv420pToBgraV2.
    • Error in AVX2 optimizations of function BgraToRgb.
    • Error (aligned reading of unaligned memory) in SSE4.1, AVX2, AVX-512BW optimizations of function InterleaveBgra.
    • Error in function View::ToOcv.
    • Error in View copy constructor (from OpenCV Mat).

    Test framework

    Bug fixing
    • Wrong default ROOT_PATH for Linux.
    • Error in test SynetConvert32fTo8uAutoTest.
    • Special test ResizeYuv420pSpecialTest.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.112.zip(5.00 MB)
  • v4.9.111(Mar 3, 2022)

    Algorithms

    New features
    • AVX2, AVX-512BW optimizations of ResizerByteBicubic class.
    • SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Base64Decode.
    • NEON optimizations of function SynetSwish32f.
    • Swish activation function to NEON optimizations of SynetConvolution32f framework.
    • Swish activation function to NEON optimizations of SynetDeconvolution32f framework.
    • Swish activation function to NEON optimizations of SynetMergedConvolution32f framework.
    • Swish activation function to NEON optimizations of SynetConvolution8i framework.
    • Swish activation function to NEON optimizations of SynetMergedConvolution8i framework.
    • NEON optimizations of function Yuv444pToBgraV2.
    • SSE2, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToBgraV2.
    Improving
    • SSE4.1 optimizations of ResizerByteBicubic class.
    Bug fixing
    • Compiler error in NEON optimizations of function AlphaUnpremultiply.
    • MSVS Compiler warnings in SSE4.1, AVX2, AVX-512BW optimizations of function TransformImage.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.111.zip(4.99 MB)
  • v4.9.110(Mar 3, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1 optimizations of ResizerByteBicubic class.
    • Base implementation of function BgraToYuv444pV2.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Nv12SaveAsJpegToMemory.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuv420pSaveAsJpegToMemory.
    • Base implementation of function BgraToYuv420pV2.
    Bug fixing
    • Error in SSE4.1, AVX2, AVX-512BW optimizations of function BgraToRgba.
    • Error in SSE4.1, AVX2 optimizations of function BgraToBgr.
    • Error in SSE4.1, AVX2 optimizations of function BgraToRgb.
    • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function AlphaUnpremultiply.

    Test framework

    New features
    • Tests for verifying functionality of function BgraToYuv444pV2.
    • Tests for verifying functionality of function Nv12SaveAsJpegToMemory.
    • Tests for verifying functionality of function Yuv420pSaveAsJpegToMemory.
    • Tests for verifying functionality of function BgraToYuv420pV2.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.110.zip(4.97 MB)
  • v4.9.109(Jan 3, 2022)

    Algorithms

    New features
    • Parameter Uyvy422ToBgr to function.
    • SSE4.1, AVX2 optimizations of function Uyvy422ToBgr.
    • Base implementation, SSE4.1, AVX2 optimizations of function Uyvy422ToYuv420p.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Base64Encode.
    • Base implementation of function Base64Decode.
    Improving
    • AVX2 optimizations of class ResizerNearest for Bgr24, Uv16.
    Renaming
    • Function UyvyToBgr to Uyvy422ToBgr.

    Test framework

    New features
    • Tests for verifying functionality of function Uyvy422ToYuv420p.
    • Tests for verifying functionality of function Base64Encode.
    • Tests for verifying functionality of function Base64Decode.

    Documentation

    Changes
    • Update developers list.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.109.zip(4.95 MB)
  • v4.9.108(Dec 1, 2021)

    Algorithms

    New features
    • SSE4.1, AVX2, AVX-512F, AVX-512BW optimizations of class ResizerNearest.
    • Add SimdResizeMethodNearestPytorch to SimdResizeMethodType enumeration.
    • Add parameter BackgroundStatUpdateTime to Motion Detector.
    • MotionDetector performance optimization (case of falling star).
    • 16-bit UYVY image format in View.
    • Base implementation of function UyvyToBgr.
    • Base implementation, SSE2, AVX2, AVX-512F optimizations of function SynetSwish32f.
    • SimdConvolutionActivationSwish item of SimdConvolutionActivationType enumeration.
    • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetConvolution32f framework.
    • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetDeconvolution32f framework.
    • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetMergedConvolution32f framework.
    • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8i framework.
    • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
    • SimdYuvType enumeration.
    • Base implementation, SSE2, AVX2, AVX-512BW optimizations of function Yuv444pToBgraV2.
    • Function Simd::Resize supports images with 16-bit channel size.
    • Base implementation function Yuv420pToBgraV2.
    Improving
    • Refactoring of SimdResizeMethodType enumeration.
    Bug fixing
    • Stack corruption in function Simd::Avx2::JpegWriteBlockSubs.

    Test framework

    New features
    • Tests for verifying functionality of function UyvyToBgr.
    • Tests for verifying functionality of function SynetSwish32f.
    • Tests for verifying functionality of function Yuv444pToBgraV2.
    • Tests for verifying functionality of function Yuv420pToBgraV2.

    Infrastructure

    Bug fixing
    • Wrong compiler options correction in Cmake.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.108.zip(4.92 MB)
  • v4.9.107(Nov 1, 2021)

    Algorithms

    New features
    • Internal class Holder to replace std::unique_ptr for old compilers without support of C++11 standard.
    • SimdBayerLayoutType enumeration.
    • Base implementation of class ResizerNearest.
    Bug fixing
    • Compiler error when defined macro SIMD_SSE2_DISABLE.
    • Compiler error when defined macro SIMD_NEON_DISABLE.

    Infrastructure

    New features
    • SIMD_ROOT Cmake parameter.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.107.zip(4.90 MB)
  • v4.9.106(Oct 1, 2021)

    Algorithms

    New features
    • Base implementation, SSE2, AVX, AVX-512F, NEON optimizations of function SynetHardSigmoid32f.
    • SimdConvolutionActivationHardSigmoid item of SimdConvolutionActivationType enumeration.
    • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetConvolution32f framework.
    • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetDeconvolution32f framework.
    • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetMergedConvolution32f framework.
    • NEON optimizations of SynetMergedConvolution32fDc class.
    • NEON optimizations of SynetMergedConvolution32fCd class.
    • NEON optimizations of SynetInnerProduct32fGemm class.
    • NEON optimizations of SynetInnerProduct32fProd class.
    • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI, NEON optimizations of SynetConvolution8i framework.
    • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
    Bug fixing
    • Compiler error in file SimdInit.h (CLang, Windows).
    Removing
    • Remove including SimdConfig.h in SimdLib.h.

    Test framework

    New features
    • Tests for verifying functionality of function SynetHardSigmoid32f.
    • '-pi' test parameter (to print internal performance statistics of Simd Library to console).
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.106.zip(4.90 MB)
  • v4.9.105(Sep 13, 2021)

    Algorithms

    New features
    • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24 for Rotate180, TransposeRotate90).
    • Method Frame::Clone with region parameter.
    • Method View::Clone with region parameter.
    • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • AVX-512BW optimizations of function TransformImage (case of Gray8, Uv16, Bgra32 for Rotate180, TransposeRotate90).
    • AVX-512BW optimizations of function TransformImage (case of Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • AVX-512BW optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • AVX-512BW optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function AlphaBlendingUniform.
    • AVX-512BW optimizations of function TransformImage (case of Bgr24 for Rotate180, TransposeRotate90, Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • Resize function (with size parameter).
    • Move constructor of View structure.
    • Move operator of View structure.
    • Clear method of Frame structure.
    • Swap method of Frame structure.
    • Move constructor of Frame structure.
    • Move operator of Frame structure.

    Tests

    New features
    • Tests for verifying functionality of function AlphaBlendingUniform.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.105.zip(4.89 MB)
  • v4.9.104(Aug 3, 2021)

    Algorithms

    New features
    • Rgba32 format in Frame structure.
    • Rgba32 format in Convert function (for frames).
    • SSE4.1 optimizations of function Float32ToFloat16.
    • SSE4.1 optimizations of function Float16ToFloat32.
    • AVX2 optimizations of function TransformImage (case of Bgra32 for Rotate180, TransposeRotate90).
    Improving
    • SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetConvolution32fNhwcDirect (case of fixed kernels).
    • Reducing of compilation time and binaries size of class SynetConvolution32f.
    • Reducing of compilation time and binaries size of class SynetDeconvolution32f.
    • Reducing of compilation time and binaries size of class SynetMergedConvolution32f.
    • Reducing of compilation time and binaries size of class SynetConvolution8i.
    • Reducing of compilation time and binaries size of class SynetMergedConvolution8i.
    • SSE41 optimizations of function TransformImage (case of Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate180).
    • SSE41 optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • SSE41 optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    Bug fixing
    • Compiler error in file SimdAvx512bwResizer.cpp (GCC 5.4.0).
    • Compiler error in file SimdAvx512bwBgraToBgr.cpp (MSVS-2017).
    • Compiler error in file SimdInit.h (CLang, Windows).
    • Error in AVX2 and AVX-512BW optimizations of functions CosineDistancesMxNa16f and CosineDistancesMxNp16f (functions may return small negative values).
    • Error in function Base::DetectionLoadA (it generates exception instead of returns NULL).
    • Error in SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetDeconvolution32fNhwcDirect2x2.
    Replacing
    • Replace SSE3 optimizations to SSE4.1 for function Gemm32fNT.
    • Replace SSE3 optimizations to SSE4.1 for function SynetConvolution32fInit.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralConvolutionForward.
    • Replace SSE4.2 optimizations to SSE4.1 for function Crc32c.
    • Replace SSSE3 optimizations to SSE4.1 for function AlphaBlending.
    • Replace SSSE3 optimizations to SSE4.1 for function AlphaFilling.
    • Replace SSSE3 optimizations to SSE4.1 for function AlphaPremultiply.
    • Replace SSSE3 optimizations to SSE4.1 for function BayerToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToBayer.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgba.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv420p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv422p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuva420p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToBayer.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function RgbToBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToGray.
    • Replace SSSE3 optimizations to SSE4.1 for function RgbToGray.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function TransformImage.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv420p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv422p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv444p.
    • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function GaussianBlur3x3.
    • Replace SSSE3 optimizations to SSE4.1 for function GrayToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function Laplace.
    • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbs.
    • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbsSum.
    • Replace SSSE3 optimizations to SSE4.1 for function MeanFilter3x3.
    • Replace SSSE3 optimizations to SSE4.1 for function ReduceColor2x2.
    • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray2x2.
    • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray4x4.
    • Replace SSSE3 optimizations to SSE4.1 for function Reorder16bit.
    • Replace SSSE3 optimizations to SSE4.1 for function Reorder32bit.
    • Replace SSSE3 optimizations to SSE4.1 for function Reorder64bit.
    • Replace SSSE3 optimizations to SSE4.1 for function ResizeBilinear.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDx.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbs.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbsSum.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDy.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbs.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbsSum.
    • Replace SSSE3 optimizations to SSE4.1 for function ContourMetrics.
    • Replace SSSE3 optimizations to SSE4.1 for function ContourMetricsMasked.
    • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSum.
    • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSumMasked.
    • Replace SSSE3 optimizations to SSE4.1 for function TextureBoostedSaturatedGradient.
    • Replace SSSE3 optimizations to SSE4.1 for class ResizerByteBilinear.

    Tests

    New features
    • Colorized annotation in console logging.
    Improving
    • Performance report generation to text file.
    • Thread ID annotation in console logging.

    Infrastructure

    New features
    • SIMD_INT8_DEBUG cmake option.
    Removing
    • Separate support of SSE3 extension (it has been moved into SSE4.1).
    • Separate support of SSE4.2 extension (it has been moved into SSE4.1).
    • Separate support of SSSE3 extension (it has been moved into SSE4.1).
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.104.zip(4.71 MB)
  • v4.8.103(Jul 1, 2021)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of class ResizerShortBilinear.
    • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNa16f.
    • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNp16f.
    • Parameter of ROI mask in Motion::Model.
    • SSE2, AVX-512BW and NEON optimizations of function AbsDifference.
    • NEON optimizations of function AlphaUnpremultiply.
    • NEON optimizations of function AlphaPremultiply.
    • NEON optimizations of function ValueSquareSums.
    Improving
    • Performance of SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
    Bug fixing
    • Linker warning in file SimdImageLoad.h (MSVS).
    Replacing
    • Replace SSE optimizations to SSE2 for function SvmSumLinear.
    • Replace SSE optimizations to SSE2 for function Fill32f.
    • Replace SSE optimizations to SSE2 for function CosineDistance32f.
    • Replace SSE optimizations to SSE2 for function DifferenceSum32f.
    • Replace SSE optimizations to SSE2 for function SquaredDifferenceKahanSum32f.
    • Replace SSE optimizations to SSE2 for function HogDeinterleave.
    • Replace SSE optimizations to SSE2 for function HogFilterSeparable.
    • Replace SSE optimizations to SSE2 for class ResizerFloatBilinear.
    • Replace SSE optimizations to SSE2 for function NeuralAddVectorMultipliedByValue.
    • Replace SSE optimizations to SSE2 for function NeuralAddVector.
    • Replace SSE optimizations to SSE2 for function NeuralAddVector.
    • Replace SSE optimizations to SSE2 for function NeuralAdaptiveGradientUpdate.
    • Replace SSE optimizations to SSE2 for function NeuralDerivativeRelu.
    • Replace SSE optimizations to SSE2 for function NeuralDerivativeSigmoid.
    • Replace SSE optimizations to SSE2 for function NeuralDerivativeTanh.
    • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid.
    • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid2.
    • Replace SSE optimizations to SSE2 for function NeuralRoughTanh.
    • Replace SSE optimizations to SSE2 for function NeuralUpdateWeights.
    • Replace SSE optimizations to SSE2 for function NeuralPooling1x1Max3x3.
    • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max2x2.
    • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max3x3.
    • Replace SSE optimizations to SSE2 for function SynetPoolingForwardAverage.
    • Replace SSE optimizations to SSE2 for function SynetPoolingForwardMax32f.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Sum.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Sum.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Sum.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Sum.
    • Replace SSE optimizations to SSE2 for function Gemm32fNN.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward0.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward1.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward2.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward3.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward4.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward8.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward9.
    • Replace SSE optimizations to SSE2 for function SynetReorderImage.
    • Replace SSE optimizations to SSE2 for function SynetReorderFilter.
    • Replace SSE optimizations to SSE2 for function SynetAddBias.
    • Replace SSE optimizations to SSE2 for function SynetEltwiseLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetInnerProductLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetShuffleLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetHswish32f.
    • Replace SSE optimizations to SSE2 for function SynetPreluLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetRelu32f.
    • Replace SSE optimizations to SSE2 for function SynetRestrictRange32f.
    • Replace SSE optimizations to SSE2 for function SynetScaleLayerForward.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetOutput.

    Tests

    New features
    • Tests to verify functionality function of VectorNormNa16f.
    • Tests to verify functionality function of VectorNormNp16f.

    Infrastructure

    Removing
    • Support of SSE extension.
    Source code(tar.gz)
    Source code(zip)
    simd.4.8.103.zip(4.69 MB)
  • v4.7.102(Jun 2, 2021)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function ValueSquareSums.
    Improving
    • Performance of AVX2, AVX-512F and NEON optimizations of SynetConvolution32fGemmNN class.
    • Performance of Neural::FullyConnectedLayer::Forward method.
    Bug fixing
    • Error in class SynetMergedConvolution32fDc (large weights case).
    • Compiler error in file SimdAvx2SynetConversion.cpp (MSVS-2015, Win32).
    • Error in SSSE3 optimization of ImageTransform function.
    • Compiler error in file SimdImageSaveJpeg.h (Clang, Mac mini).
    • Compiler warnings (Clang).
    • Error in function ImagePngLoader::ReadTransparency (test tbbn0g04.png).
    • Error in Base implementation, SSE4.1 optimization of class ImagePngLoader (test basn0g16.png).
    • Error in SSE4.1 optimization of class ImagePngLoader (test s02i3p01.png).

    Tests

    New features
    • Tests to verify functionality function of ValueSquareSums.
    Improving
    • Header of performance report table.
    Bug fixing
    • Compiler error in file TestFile.h (Clang, Mac mini).
    Source code(tar.gz)
    Source code(zip)
    simd.4.7.102.zip(5.56 MB)
  • v4.7.101(May 3, 2021)

    Algorithms

    New features
    • Parameter a in function DeinterleaveBgra can be NULL.
    • Simd::DeinterleaveBgra C++ wrapper.
    • Simd::DeinterleaveRgb C++ wrapper.
    • Simd::DeinterleaveRgba C++ wrappers.
    • Method View::Load (from memory).
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImageJpegSaver class.
    • Base implementation of ImageJpegLoader class.
    • Base implementation of ImagePngLoader class.
    • NEON optimizations of ImagePngSaver class.
    • SIMD_SYNET_DISABLE macro.
    • Base implementation, AVX2, AVX-512BW, NEON optimizations of function СosineDistancesMxNp16f.
    Bug fixing
    • Error in NEON optimizations of function СosineDistancesMxNa16f.

    Tests

    New features
    • Parameter '-ri' to set real image name in runtime.
    • Tests to verify functionality function of СosineDistancesMxNp16f.
    • Special tests for verifying functionality of function ImageLoadFromMemory.
    Bug fixing
    • Error in saving of output log.

    Infrastructure

    New features
    • Real images to test encoding/decoding algorithms.
    • SIMD_SYNET cmake option.
    • SIMD_HIDE cmake option.
    Removing
    • Project files of Microsoft Visual Studio 2017 (for Android).

    Documentation

    New features
    • Description of Cmake parameters.
    Source code(tar.gz)
    Source code(zip)
    simd.4.7.101.zip(5.30 MB)
  • v4.6.100(Apr 1, 2021)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImagePngSaver class.
    • SynetInnerProduct32f framework.
    • Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
    • Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fProd class.
    • Rgba32 format in View structure.
    • Pixel::Rgba32 structure.
    • Simd::RgbToBgr C++ wrapper.
    • Simd::GrayToRgb C++ wrapper.
    • Simd::GrayToRgba C++ wrapper.
    • Simd::BgrToRgba C++ wrapper.
    • Simd::RgbaToRgb C++ wrapper.
    • Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function RgbaToGray.
    • Base implementation, SSSE3, AVX2, AVX-512BW, NEON optimizations of function BgraToRgba.
    • Simd::RgbToRgba C++ wrapper.
    • Simd::RgbaToBgra C++ wrapper.
    • Rgba32 format in Convert function.
    • Rgba32 format in function ImageSave.
    Improving
    • Reduce memory allocations in Simd::ContourDetector.
    Bug fixing
    • Assert in function Avx::SynetMergedConvolution32fCdc::SynetMergedConvolution32fCdc.
    • Assert in function Avx::SynetMergedConvolution32fCd::SynetMergedConvolution32fCd.
    • Assert in function Avx::SynetMergedConvolution32fDc::SynetMergedConvolution32fDc.
    • Freezes in function SynetConvolution32fNhwcDirect::OldReorderWeight (ARMv7 architecture).
    • Freezes in file SimdGemm.h (ARMv7 architecture).

    Tests

    New features
    • Tests for verifying functionality of SynetInnerProduct32f framework.
    • Performance report use milliseconds or microseconds (choosing in runtime).
    • Special test to verify functionality function of Simd::Convert.
    • Tests to verify functionality function of RgbaToGray.
    • Tests to verify functionality function of BgraToRgba.
    Bug fixing
    • Crash in test BgrToRgbAutoTest.
    • Error in test of SynetMergedConvolution8i.

    Infrastructure

    Removing
    • Remove project files of Microsoft Visual Studio 2013.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.100.zip(3.97 MB)
  • v4.6.99(Mar 1, 2021)

    Algorithms

    New features
    • SimdImageFileType enumeration.
    • ImageSaveToFile function.
    • ImageSaveToMemory function.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtSaver class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinSaver class.
    • Change order of parameters in function BgrToRgb.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinSaver class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtSaver class.
    • Additional parameters in function View::Save.
    • Method View::Release.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtLoader class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinLoader class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtLoader class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinLoader class.
    • Additional parameter in function View::Load.
    • Base implementation of Crc32 function.
    Bug fixing
    • Crash in Simd::Detection on Python (using of std::unique_ptr).

    Tests

    New features
    • Possibility to write output video in UseFaceDetection.cpp example.
    • Test parameter '-o=' to write annotated output video.
    • Tests for verifying functionality of function ImageSaveToFile.
    • Tests for verifying functionality of function ImageSaveToMemory.
    • Tests for verifying functionality of function ImageLoadFromMemory.
    • Tests for verifying functionality of function Crc32.

    Documentation

    New features
    • Example of use into description of Font.
    Bug fixing
    • Errors in Simd Library description.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.99.zip(3.94 MB)
  • v4.6.98(Feb 1, 2021)

    Algorithms

    New features
    • Add parameter epsilon to GaussianBlur engine.
    • Add function SynetConvolution32fInfo.
    • Add function SynetConvolution8iInfo.
    • Add function SynetDeconvolution32fInfo.
    • Add function SynetMergedConvolution32fInfo.
    • Add function SynetMergedConvolution8iInfo.
    Improving
    • Performance of SynetConvolution8iNhwcDirect class (case of horizontal padding of small image).
    Renaming
    • GaussianBlur engine parameter from radius to sigma.
    Bug fixing
    • Error in GaussianBlur engine (case of small images).
    • Performance degradation of AVX-512VNNI optimization of SynetConvolution8i framework.
    • Performance degradation of AVX-512VNNI optimization of SynetMergedConvolution8i framework.
    • Error in GaussianBlur engine (wrong processing of last rows).
    • Error in trajectory averaging algorithm in Motion::Detector.

    Tests

    New features
    • Possibility to write output video in UseMotionDetector.cpp example.
    Bug fixing
    • Error in files: TestVideo.cpp, UseMotionDetector.cpp, UseFaceDetector.cpp (MSVS-2019, OpenCV enabled).

    Documentation

    Improving
    • Description of GaussianBlur engine.
    • Description of Motion::Detector.

    Infrastructure

    New feature
    • Ocv.prop.default for Visual Studio 2019.
    Renaming
    • Cmake parameter from LIBRARY to SIMD_SHARED.
    • Cmake parameter from CHECK_VERSION to SIMD_GET_VERSION.
    • Cmake parameter from TOOLCHAIN to SIMD_TOOLCHAIN.
    • Cmake parameter from TARGET to SIMD_TARGET.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.98.zip(3.88 MB)
  • v4.6.97(Jan 4, 2021)

    Algorithms

    New features
    • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetMish32f.
    • Support of Mish activation function in SynetConvolution32f framework.
    • Support of Mish activation function in SynetMergedConvolution32f framework.
    • Support of Mish activation function in SynetConvolution8i framework.
    • Support of Mish activation function in SynetMergedConvolution8i framework.
    • Support of Mish activation function in SynetDeconvolution32f framework.
    • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of GaussianBlur engine.
    Improving
    • AVX-512F optimization of SynetConvolution32fNhwcDirect class.
    • AVX-512F optimization of SynetConvolution32fGemmNN class.
    • AVX-512F optimization of SynetConvolution32fWinograd class.
    • AVX-512F optimization of function Gemm32fNN.
    Bug fixing
    • Error in Base implementation of SynetMergedConvolution32f (type=CDC, add=1).
    • Error in function SimdAlignment.
    • Visual Studio 2017 compiler error in files SimdAvx512bwSynet.cpp, SimdAvx512bwSynetScale.cpp, SimdAvx512bwAlphaBlending.cpp.

    Test framework

    New features
    • Tests for verifying functionality of function SynetMish32f.
    • Tests for verifying functionality of GaussianBlur engine.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.97.zip(3.87 MB)
  • v4.6.96(Dec 1, 2020)

    Algorithms

    New features
    • Base implementation of function AveragingBinarizationV2.
    • SSE4.1, AVX2, AVX-512BW optimizations of function AlphaUnpremultiply.
    Improving
    • SSE2, AVX2, AVX-512BW and NEON optimizations of function MedianFilterSquare5x5.
    • SSE2, AVX2, AVX-512F optimizations of function SynetSoftmaxLayerForward.
    • Reducing of number of calling function CpuSocketNumber at initialization of Simd.
    • Reducing of number of calling function CpuCoreNumber at initialization of Simd.
    • Reducing of number of calling function CheckBit at initialization of Simd.
    Bug fixing
    • Compilation error in file SimdNeonSynetConvolution8i.cpp.
    • Infinite loop in SynetConvolution32fNhwcDirect::OldReorderWeight (on Celeron CPU).
    • Crash in SimdRuntime.h (on Celeron CPU).
    • Crash in SimdGemm.h (on Celeron CPU).
    • Function SimdSynetSpecifyTensorFormat returns incorrect value.

    Test framework

    New features
    • Tests for verifying functionality of function AveragingBinarizationV2.
    • Parameter '-lc' to litter CPU cache between tests run.

    Infrastructure

    New features
    • MSVS projects can be used from external solution.
    Removing
    • Supporting of MSA(MIPS).
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.96.zip(3.85 MB)
  • v4.6.95(Nov 4, 2020)

    Algorithms

    New features
    • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCdc class.
    • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCd class.
    • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iDc class.
    • SSE4.1, AVX2, AVX-512BW optimizations of function SynetConvert8uTo32f.
    • Base implementation, SSE2, SSSE3 AVX2, AVX-512BW optimizations of function AlphaPremultiply.
    • Base implementation of function AlphaUnpremultiply.
    Bug fixing
    • GCC v10 compilation error in file SimdGemm.h.
    • Error in IECompatible method of SynetMergedConvolution8i.

    Test framework

    New features
    • Tests for verifying functionality of function AlphaPremultiply.
    • Tests for verifying functionality of function AlphaUnpremultiply.

    Documentation

    Bug fixing
    • There are no references to C++ wrappers in description of API functions.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.95.zip(3.83 MB)
  • v4.6.94(Oct 1, 2020)

    Algorithms

    New features
    • Base implementation of SynetMergedConvolution8i class.
    • Base implementation of function SynetConvert8uTo32f.
    • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCdc class.
    • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCd class.
    • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iDc class.
    Bug fixing
    • Performance degradation in class Convolution32fNhwcDirect (weights size >> L3 cache).
    • Performance degradation in class Convolution32fGemmNN (weights size >> L3 cache).

    Test framework

    New features
    • Tests for verifying functionality of SynetMergedConvolution8i class.
    • Tests for verifying functionality of function SynetConvert8uTo32f.

    Documentation

    Improving
    • Improve structuring of Synet documentation.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.94.zip(3.72 MB)
  • v4.6.93(Sep 1, 2020)

    Algorithms

    New features
    • Full support of SimdConvolutionActivationType in SynetConvolution8i class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8iNhwcDepthwise class.
    • Extend class MergedConvolution32f (2 merged convolutions).
    • Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fCd class.
    • Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fDc class.
    Improving
    • Reducing of compilation time and assembled size of Simd Library.
    Renaming
    • Class MergedConvolution32f to MergedConvolution32fCdc.
    • Performance degradation in class Convolution32fNhwcDirect (dilation != 1).

    Test framework

    New features
    • Tests for verifying functionality of class MergedConvolution32f (2 merged convolutions).
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.93.zip(3.68 MB)
  • v4.6.92(Aug 3, 2020)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetAdd8i.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetInnerProduct8i.
    Improving
    • Reducing of compilation time and assembled size of Simd Library.
    Bug fixing
    • Error in SSE4.1, AVX2, AVX-512BW optimizations of SynetScale8i class (wrong alignment check).
    • Error in performance annotation of SynetConvolution8i class.
    • Compiler error in file SimdBaseSynetConvolution8i.cpp (for old compilers).
    • Compiler errors in files SimdAvx2Synet.cpp, SimdAvx2SynetScale.cpp (WIN32, MSVS).

    Test framework

    New features
    • Tests for verifying functionality of function SynetAdd8i.
    • Tests for verifying functionality of function SynetInnerProduct8i.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.92.zip(3.66 MB)
Video++, a C++14 high performance video and image processing library.

Video++ Video++ is a video and image processing library taking advantage of the C++14 standard to ease the writing of fast video and image processing

Matthieu Garrigues 692 Dec 28, 2022
ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.

ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.

null 366 Dec 30, 2022
The CImg Library is a small and open-source C++ toolkit for image processing

http://cimg.eu The CImg Library is a small and open-source C++ toolkit for image processing, designed with these properties in mind: CImg defines clas

David Tschumperlé 1.2k Jan 3, 2023
A fast image processing library with low memory needs.

libvips : an image processing library Introduction libvips is a demand-driven, horizontally threaded image processing library. Compared to similar lib

libvips 26 Nov 10, 2022
This is a C++17 deployment of deep-learning based image inpainting algorithm on Windows10, using Libtorch, Opencv and Qt.

This is a desktop software for image inpainting. It is a C++ deployment of image inpainting algorithm on Windows10, based on C++17 and implemented using vs2019.

null 4 May 13, 2022
GLSL Image Processing System

GIPS: The GLSL Image Processing System An image processing application that applies filters written in the OpenGL Shading Language (GLSL). This means

Martin Fiedler 57 Nov 29, 2022
PoC black/white image sequence to dumpy gif image sequence converter

PoC black/white image sequence to dumpy gif image sequence converter

null 69 Dec 9, 2022
The “Quite OK Image” format for fast, lossless image compression

The “Quite OK Image” format for fast, lossless image compression

Dominic Szablewski 6k Dec 30, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Dec 31, 2022
The minimal opencv for Android, iOS and ARM Linux

opencv-mobile ✔️ This project provides the minimal build of opencv library for the Android, iOS and ARM Linux platforms. ✔️ We provide prebuild binary

null 992 Dec 27, 2022
Reading, writing, and processing images in a wide variety of file formats, using a format-agnostic API, aimed at VFX applications.

README for OpenImageIO Introduction The primary target audience for OIIO is VFX studios and developers of tools such as renderers, compositors, viewer

OpenImageIO 1.6k Jan 2, 2023
This library provides a cross-platform image loading library in C11 for projects based on our foundation library

Image Library - Public Domain This library provides a cross-platform image loading library in C11 for projects based on our foundation library.

Mattias Jansson 1 Jan 29, 2022
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
Intel® Open Image Denoise library

Intel Open Image Denoise is an open source library of high-performance, high-quality denoising filters for images rendered with ray tracing

Intel® Open Image Denoise 1.3k Dec 28, 2022
Arduino PNG image decoder library

An 'embedded-friendly' (aka Arduino) PNG image decoding library

Larry Bank 102 Jan 6, 2023
libvot - A C++11 multi-thread library for image retrieval

libvot is a fast implementation of vocabulary tree, which is an algorithm widely used in image retrieval and computer vision. It usually comprises three components to build a image retrieval system using vocabulary tree: build a k-means tree using sift descriptors from images, register images into the database, query images against the database. I

Tianwei Shen 174 Dec 22, 2022
An 'embedded-friendly' (aka Arduino) JPEG image encoding library

Starting in the late 80's I wrote my own imaging codecs for the existing standards (CCITT G3/G4 was the first). I soon added GIF, JPEG and not long after that, the PNG specification was ratified. All of this code was "clean room" - written just from the specification. I used my imaging library in many projects and products over the years and recently decided that some of my codecs could get a new lease on life as open source, embedded-friendly libraries for microcontrollers.

Larry Bank 38 Dec 30, 2022
Small header-only C library to decompress any BC compressed image

Small header-only C library to decompress any BC compressed image

null 92 Jan 1, 2023
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Aaron Feng 8.7k Jan 7, 2023