C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.

Overview

Introduction

The Simd Library is a free open source image processing and machine learning library, designed for C and C++ programmers. It provides many useful high performance algorithms for image processing such as: pixel format conversion, image scaling and filtration, extraction of statistic information from images, motion detection, object detection (HAAR and LBP classifier cascades) and classification, neural network.

The algorithms are optimized with using of different SIMD CPU extensions. In particular the library supports following CPU extensions: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX-512 for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC (big-endian), NEON for ARM.

The Simd Library has C API and also contains useful C++ classes and functions to facilitate access to C API. The library supports dynamic and static linking, 32-bit and 64-bit Windows, Android and Linux, MSVS, G++ and Clang compilers, MSVS project and CMake build systems.

Library folder's structure

The Simd Library has next folder's structure:

  • simd/src/Simd/ - contains source codes of the library.
  • simd/src/Test/ - contains test framework of the library.
  • simd/src/Use/ - contains the use examples of the library.
  • simd/prj/vs2013/ - contains project files of Microsoft Visual Studio 2013.
  • simd/prj/vs2015/ - contains project files of Microsoft Visual Studio 2015.
  • simd/prj/vs2017w/ - contains project files of Microsoft Visual Studio 2017 (for Windows).
  • simd/prj/vs2017a/ - contains project files of Microsoft Visual Studio 2017 (for Android).
  • simd/prj/vs2019/ - contains project files of Microsoft Visual Studio 2019.
  • simd/prj/cmd/ - contains additional scripts needed for building of the library in Windows.
  • simd/prj/cmake/ - contains files of CMake build systems.
  • simd/prj/sh/ - contains additional scripts needed for building of the library in Linux.
  • simd/prj/txt/ - contains text files needed for building of the library.
  • simd/data/cascade/ - contains OpenCV cascades (HAAR and LBP).
  • simd/data/image/ - contains image samples.
  • simd/data/network/ - contains examples of trained networks.
  • simd/docs/ - contains documentation of the library.

The library building for Windows

To build the library and test application for Windows 32/64 you need to use Microsoft Visual Studio 2019 (or 2013/2015/2017). The project files are in the directory:

simd/prj/vs2015/

By default the library is built as a DLL (Dynamic Linked Library). You also may build it as a static library. To do this you must change appropriate property (Configuration Type) of Simd project and also uncomment #define SIMD_STATIC in file:

simd/src/Simd/SimdConfig.h

Also in order to build the library you can use CMake and MinGW:

cd .\prj\cmake
cmake . -DSIMD_TOOLCHAIN="your_toolchain\bin\g++" -DSIMD_TARGET="x86_64" -DCMAKE_BUILD_TYPE="Release" -G "MinGW Makefiles"
mingw32-make

The library building for Android

To build the library and test application for Android(x86, x64, ARM, ARM64) you need to use Microsoft Visual Studio 2017. The project files are in the directory:

simd/prj/vs2017a/

By default the library is built as a SO (Dynamic Library).

The library building for Linux

To build the library and test application for Linux 32/64 you need to use CMake build systems. Files of CMake build systems are placed in the directory:

simd/prj/cmake/

The library can be built for x86/x64, PowerPC(64, big-endian) and ARM(32/64) platforms with using of G++ or Clang compilers. With using of native compiler (g++) for current platform it is simple:

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="" -DSIMD_TARGET=""
make

To build the library for PowerPC(64, big-endian) and ARM(32/64) platforms you can also use toolchain for cross compilation. There is an example of using for PowerPC (64 bit, big-endian):

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="/your_toolchain/usr/bin/powerpc-linux-gnu-g++" -DSIMD_TARGET="ppc64" -DCMAKE_BUILD_TYPE="Release"
make

For ARM (32 bit):

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="/your_toolchain/usr/bin/arm-linux-gnueabihf-g++" -DSIMD_TARGET="arm" -DCMAKE_BUILD_TYPE="Release"
make

And for ARM (64 bit):

cd ./prj/cmake
cmake . -DSIMD_TOOLCHAIN="/your_toolchain/usr/bin/aarch64-linux-gnu-g++" -DSIMD_TARGET="aarch64" -DCMAKE_BUILD_TYPE="Release"
make

As result the library and the test application will be built in the current directory.

The library using

If you use the library from C code you must include:

#include "Simd/SimdLib.h"

And to use the library from C++ code you must include:

#include "Simd/SimdLib.hpp"

In order to use Simd::Detection you must include:

#include "Simd/SimdDetection.hpp"

In order to use Simd::Neural you must include:

#include "Simd/SimdNeural.hpp"

In order to use Simd::Motion you must include:

#include "Simd/SimdMotion.hpp"

Interaction with OpenCV

If you need use mutual conversion between Simd and OpenCV types you just have to define macro SIMD_OPENCV_ENABLE before including of Simd headers:

#include <opencv2/core/core.hpp>
#define SIMD_OPENCV_ENABLE
#include "Simd/SimdLib.hpp"

And you can convert next types:

  • cv::Point, cv::Size <--> Simd::Point.
  • cv::Rect <--> Simd::Rectangle.
  • cv::Mat <--> Simd::View.

Test Framework

The test suite is needed for testing of correctness of work of the library and also for its performance testing. There is a set of tests for every function from API of the library. There is an example of test application using:

./Test -m=a -tt=1 -f=Sobel -ot=log.txt

Where next parameters were used:

  • -m=a - a auto checking mode which includes performance testing (only for library built in Release mode). In this case different implementations of each functions will be compared between themselves (for example a scalar implementation and implementations with using of different SIMD instructions such as SSE2, AVX2, and other). Also it can be -m=c (creation of test data for cross-platform testing), -m=v (cross-platform testing with using of early prepared test data) and -m=s (running of special tests).
  • -tt=1 - a number of test threads.
  • -fi=Sobel - an include filter. In current case will be tested only functions which contain word 'Sobel' in their names. If you miss this parameter then full testing will be performed. You can use several filters - function name has to satisfy at least one of them.
  • -ot=log.txt - a file name with test report (in TEXT file format). The test's report also will be output to console.

Also you can use parameters:

  • -help or -? in order to print help message.
  • -r=../.. to set project root directory.
  • -pa=1 to print alignment statistics.
  • -c=512 a number of channels in test image for performance testing.
  • -h=1080 a height of test image for performance testing.
  • -w=1920 a width of test image for performance testing.
  • -oh=log.html - a file name with test report (in HTML file format).
  • -s=sample.avi a video source (See Simd::Motion test).
  • -o=output.avi an annotated video output (See Simd::Motion test).
  • -wt=1 a thread number used to parallelize algorithms.
  • -fe=Abs an exclude filter to exclude some tests.
  • -mt=100 a minimal test execution time (in milliseconds).
  • -lc=1 to litter CPU cache between test runs.
Issues
  • Test.exe crashes

    Test.exe crashes

    I'm looking forward trying SIMD library for video conversions, so installed simd.4.2.73.zip, VS2017, Windows SDK 8.1, loaded the vs2017w project file, selected x64 & release and hit Build. This ran successfully, be it rather slow (never used VS so have no reference) and it looks like all files are made. I have not actually used the dll but did try Test.exe and notice it crashes after the 4th line. I also tried running "Test -m=a -tt=1 -f=Sobel -ot=log.txt" which gives me an error that Sobel is not found, and without that filter it also crashes after a few lines. I'm now left wondering if I made an error building the library. Windows 7/64. untitled

    opened by mikeversteeg 26
  • Ошибка в коде к статье на Habr

    Ошибка в коде к статье на Habr

    Здравствуйте! Хочу сообщить об ошибке в коде к Вашей статье https://m.habr.com/ru/post/448436/, а именно в базовой реализации свёртки. depthwise свёртка(с group > 1) вычисляется некорректно: значения есть только у первых каналов, последующие же каналы заполнены нулями.

    opened by GlebSBrykin 22
  • SimdYuv420pToBgr with different YUV & RGB image sizes

    SimdYuv420pToBgr with different YUV & RGB image sizes

    Hi!

    Still loving this library, it is really impressive coding..

    I need to display YUV420P images as thumbnails on screen, which means they need to be converted to RGB and resized. Because this is HD video, speed is essential. Currently I first resize each of the YUV planes, and then convert to RGB. However this means an additional memory write of the (smaller) YUV image. This can be avoiding by dropping the demand that for SimdYuv420pToBgr both images must have same size. Can this be added? As speed is important, a simple pixel drop can be used (although if it can be added efficiently, a basic (bi)lineair interpolation would be great).

    Thanks for considering.

    opened by mikeversteeg 18
  • Premultiplication & vice versa

    Premultiplication & vice versa

    Very happy with the added alpha support, your code is much faster than my "SIMD" attempts :)

    I would very much like a function to convert a BGRA bitmap to premultiplied and vice versa. This would complete the alpha support in Simd.

    PS: To convert a straight alpha color value bitmap to premultiplied format, multiply its R, G, and B values by A. To convert premultiplied to straight, divide R, G, and B by A. Keep in mind alpha can be 0, and values can never exceed 255.

    opened by mikeversteeg 16
  • Clang build issues

    Clang build issues

    There are following issues reported by clang-msvc under Windows (64 bit):

    In file included from C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdMemory.h:29: In file included from C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdMath.h:29: C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdConst.h(94,36): error: excess elements in vector initializer const __m128i K_INV_ZERO = SIMD_MM_SET1_EPI8(0xFF); ^~~~~~~~~~~~~~~~~~~~~~~ C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdInit.h(104,40): note: expanded from macro 'SIMD_MM_SET1_EPI8' {SIMD_AS_CHAR(a), SIMD_AS_CHAR(a), SIMD_AS_CHAR(a), SIMD_AS_CHAR(a),
    ^~~~~~~~~~~~~~~ C:\Work\Test\End.Expert.V2.LLVM\Code\ExtLibs_Include\simd-4.8.103\Simd/SimdInit.h(38,25): note: expanded from macro 'SIMD_AS_CHAR' #define SIMD_AS_CHAR(a) char(a) ^~~~~~~

    and also these:

    ..\CSimdImageResolutionReductorComp.cpp(141,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r2 = _mm_mullo_epi32(_mm_and_si128(loaded2, mask), c5); ^ ..\CSimdImageResolutionReductorComp.cpp(142,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r3 = _mm_mullo_epi32(_mm_and_si128(loaded3, mask), c10); ^ ..\CSimdImageResolutionReductorComp.cpp(143,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r4 = _mm_mullo_epi32(_mm_and_si128(loaded4, mask), c10); ^ ..\CSimdImageResolutionReductorComp.cpp(144,15): error: always_inline function '_mm_mullo_epi32' requires target feature 'sse4.1', but would be inlined into function 'Calc4PixelValue' that is compiled without support for 'sse4.1' __m128i r5 = _mm_mullo_epi32(_mm_and_si128(loaded5, mask), c5); ^

    MSVC, Intel, GCC and MinGW are fine with this, although. What could be the reason and it is possible to fix it for Clang?

    opened by ArsMasiuk 14
  • Object Measurement

    Object Measurement

    The SIMD library is great, but I'm not shure how I do some object measurement (e.g. finding the centre point of an object). Maybe you could list me just the functions to use: This would be very helpful.

    opened by syberarall 13
  • Illegal instruction error

    Illegal instruction error

    Hi, I am building library on my amd64 machine:

    $ cmake -G"MSYS Makefiles" -DCMAKE_BUILD_TYPE=Debug -DLIBRARY="STATIC" -DTARGET="x86_64" -DTOOLCHAIN="C:/msys64/mingw64/bin/g++" ../prj/cmake/

    $ gcc -v

    COLLECT_GCC=C:\msys64\mingw64\bin\gcc.exe
    COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.2.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../gcc-8.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=ada,c,lto,c++,objc,obj-c++,fortran --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev3, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
    Thread model: posix
    gcc version 8.2.0 (Rev3, Built by MSYS2 project)
    

    When I run the command "make" the build is successful. But when I try to run "Test" binary, I get an error message "illegal instruction".

    (gdb) run
    Starting program: C:\msys64\home\user\test\Simd-4.2.70\build\Test.exe
    [New Thread 10452.0x1740]
    [New Thread 10452.0x2a54]
    [New Thread 10452.0x2e24]
    [New Thread 10452.0x27dc]
    SSE: Yes
    SSE2: Yes
    SSE3: Yes
    SSSE3: No
    SSE4.1: No
    SSE4.2: No
    AVX: No
    AVX2: No
    AVX-512F: No
    AVX-512BW: No
    PowerPC-Altivec: No
    PowerPC-VSX: No
    ARM-NEON: No
    MIPS-MSA: No
    
    Thread 1 received signal SIGILL, Illegal instruction.
    0x000000000046d86f in Test::TestPoint ()
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/TestCheckCpp.cpp:60
    60          {
    (gdb) bt full
    #0  0x000000000046d86f in Test::TestPoint ()
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/TestCheckCpp.cpp:60
            p = {x = 140730470038064, y = 1875954880}
            fp = {x = 3.6138301231727476e-316, y = 9.2664781905975763e-315}
    #1  0x000000000047a3be in Test::CheckCpp ()
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/TestCheckCpp.cpp:124
    No locals.
    #2  0x000000000041082d in main (argc=1, argv=0x45c1960)
        at C:/msys64/home/user/test/Simd-4.2.70/src/Test/Test.cpp:677
            options = {mode = 73138176, help = false,
              include = {<std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {
                  _M_impl = {<std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<__gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<No data fields>}, <No data fields>}, _M_start = 0x84070083,
                    _M_finish = 0x45c0000,
                    _M_end_of_storage = 0x7ffe5f4113fc <ntdll!RtlpNtSetValueKey+20604>}}, <No data fields>},
              exclude = {<std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {
                  _M_impl = {<std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<__gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<No data fields>}, <No data fields>}, _M_start = 0x800,
                    _M_finish = 0x45c0000,
                    _M_end_of_storage = 0x44724b0}}, <No data fields>}, text = {
                static npos = 18446744073709551615,
                _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
                  _M_p = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>}, _M_string_length = 1, {
                  _M_local_buf = "\000\b\000\000\000\000\000\000▒$G\004\000\000\000", _M_allocated_capacity = 2048}}, html = {
                static npos = 18446744073709551615,
                _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x0},
                _M_string_length = 73138176, {
                  _M_local_buf = "▒▒;_▒\177\000\000\000\000\\\004\000\000\000",
                  _M_allocated_capacity = 140730496159222}},
              testThreads = 140730495922279, workThreads = 0, printAlign = 184}
            groups = {<std::_Vector_base<Test::Group, std::allocator<Test::Group> >> = {
                _M_impl = {<std::allocator<Test::Group>> = {<__gnu_cxx::new_allocator<Test::Group>> = {<No data fields>}, <No data fields>}, _M_start = 0x1,
                  _M_finish = 0x7ffe5f3ce9c3 <ntdll!memset+122627>,
                  _M_end_of_storage = 0x44724a0}}, <No data fields>}
    
    opened by kudzurunner 12
  • Cmake fails with SimdVersion.h

    Cmake fails with SimdVersion.h

    After fighting through making cmake work on MinGw, I gradually come to understand how this all works, with this thread being the golden helper: https://github.com/glfw/glfw/issues/843#issuecomment-250316815

    Whilst I can compile without MSYS only using the stock MinGW install, I'm interested in getting it to work with "Msys Makefiles" or "Unix Makefiles" I always end up with: Skip updating of file ""C:/Users/Bonfire/Desktop/winVAI/Projects/suite/pkg/Simd/prj/cmake/../..\src\Simd\SimdVersion.h"" because there are not any changes.

    Everything else seems to work, make -j8 posts many compiled files, but it always files at this one: /SimdLib.cpp: In function 'const char* SimdVersion()': C:/Users/Bonfire/Desktop/winVAI/Projects/suite/pkg/Simd/src/Simd/SimdLib.cpp:79:12: error: 'SIMD_VERSION' was not declared in this scope return SIMD_VERSION; ^~~~~~~~~~~~

    Is there a way to fix this?

    opened by ghost 12
  • Bug in SimdResizeMethodNearest resizer

    Bug in SimdResizeMethodNearest resizer

    Calling SimdResizerRunwith SimdResizeMethodNearestgives spill on the right of the resized image. E.g. if I have two YUV420P images of 1920*1080 and copy one to the first quadrant of the other (0, 0, 960, 540), there is spill to the right. See image. This does not happen with SimdResizeMethodAreaor SimdResizeMethodBilinear(did not try others) but (very!) unfortunately they are too slow. SimdLib v4.9.111. Windows 64 image

    opened by mikeversteeg 11
  • build error View = Mat

    build error View = Mat

    I have included the Simd/SimdLib.hpp, and I defined #define SIMD_OPENCV_ENABLE, but when I build, I run into error.

    #include <opencv2/core.hpp>
    #include <opencv2/highgui.hpp>
    #include <opencv2/calib3d.hpp>
    #include <opencv2/imgproc.hpp>
    #include <opencv2/videoio.hpp>
    #include <opencv2/video/tracking.hpp>
    #define SIMD_OPENCV_ENABLE
    #include "Simd/SimdLib.hpp"
    
    #include <thread>
    #include <atomic>
    #include "ShareSpace.h"
    #include "Controller.h"
    
            img.resource = camera->getFrame();
            if(!img.resource.empty())
            {
                // cv::resize(img.resource, img.resource, cv::Size(640,512));
                Simd::View viewsrc = img.resource;
                cv::Mat tmp= cv::Mat::zeros(cv::Size(640,512), img.resource.type());
                Simd::View viewdest = tmp;
                Simd::ResizeBilinear(viewsrc, viewdest);
                img.resource = tmp;
            }
    

    error

    error: Share/ThreadManager/ThreadManager.cpp:91:38: error: class template argument deduction failed:
       91 |             Simd::View viewsrc = img.resource;
          |                                      ^~~~~~~~
    Share/ThreadManager/ThreadManager.cpp:91:38: error: no matching function for call to ‘View(cv::Mat&)’
    In file included from /usr/include/Simd/SimdLib.hpp:27,
                     from Share/ThreadManager/ThreadManager.h:19,
                     from Share/ThreadManager/ThreadManager.cpp:10:
    /usr/include/Simd/SimdView.hpp:819:52: note: candidate: ‘template<template<class> class A> View(const Simd::Point<long int>&, Simd::View<A>::Format)-> Simd::View<A>’
      819 |     template <template<class> class A> SIMD_INLINE View<A>::View(const Point<ptrdiff_t> & size, Format f)
          |                                                    ^~~~~~~
    /usr/include/Simd/SimdView.hpp:819:52: note:   template argument deduction/substitution failed:
    Share/ThreadManager/ThreadManager.cpp:91:38: note:   candidate expects 2 arguments, 1 provided
       91 |             Simd::View viewsrc = img.resource;
          |                                      ^~~~~~~~
    In file included from /usr/include/Simd/SimdLib.hpp:27,
                     from Share/ThreadManager/ThreadManager.h:19,
                     from Share/ThreadManager/ThreadManager.cpp:10
    
    opened by wangzhankun 11
  • GaussianBlur usage and memory issue

    GaussianBlur usage and memory issue

    The following code:

    #include <iostream>
    
    #include "Simd/SimdLib.h"
    
    static void print(const unsigned char* img, int rows, int cols)
    {
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < cols; j++) {
                std::cout << static_cast<unsigned>(img[i*cols + j]) << " ";
            }
            std::cout << std::endl;
        }
    }
    
    int main(int , char * [])
    {
        const int rows = 8, cols = 12;
        unsigned char img[rows*cols];
        unsigned char img_blur[rows*cols];
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < cols; j++) {
                img[i*cols + j] = static_cast<unsigned char>(i*cols + j);
            }
        }
    
        const float radius = 5.0f;
        void * funcPtr = SimdGaussianBlurInit(cols, rows, 1, &radius);
        SimdGaussianBlurRun(funcPtr, img, cols, img_blur, cols);
        SimdRelease(funcPtr);
    
        std::cout << "Original image:" << std::endl;
        print(img, rows, cols);
        std::cout << "\nGaussian blur:" << std::endl;
        print(img_blur, rows, cols);
    
        return EXIT_SUCCESS;
    }
    

    produces on my computer the following error:

    *** stack smashing detected ***: ./TestGaussianBlur terminated
    Abandon (core dumped)
    

    Output when running with Valgrind:

    ==22407== Memcheck, a memory error detector
    ==22407== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
    ==22407== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
    ==22407== Command: ./TestGaussianBlur
    ==22407== 
    ==22407== Source and destination overlap in memcpy(0x8418b60, 0x8418b60, 48)
    ==22407==    at 0x730E674: [email protected]@GLIBC_2.14 (vg_replace_strmem.c:1034)
    ==22407==    by 0x332B536: void Simd::Avx2::BlurImageAny<1>(Simd::BlurParam const&, Simd::Base::AlgDefault const&, unsigned char const*, unsigned long, unsigned char*, float*, unsigned char*, unsigned long) (SimdAvx2GaussianBlur.cpp:124)
    ==22407==    by 0x68AD52: Simd::Base::GaussianBlurDefault::Run(unsigned char const*, unsigned long, unsigned char*, unsigned long) (SimdBaseGaussianBlur.cpp:245)
    ==22407==    by 0x63D055: SimdGaussianBlurRun (SimdLib.cpp:2409)
    ==22407==    by 0x6386A9: main (TestGaussianBlur.cpp:78)
    ==22407== 
    Original image:
    0 1 2 3 4 5 6 7 8 9 10 11 
    12 13 14 15 16 17 18 19 20 21 22 23 
    24 25 26 27 28 29 30 31 32 33 34 35 
    36 37 38 39 40 41 42 43 44 45 46 47 
    48 49 50 51 52 53 54 55 56 57 58 59 
    60 61 62 63 64 65 66 67 68 69 70 71 
    72 73 74 75 76 77 78 79 80 81 82 83 
    84 85 86 87 88 89 90 91 92 93 94 95 
    
    Gaussian blur:
    ==22407== Use of uninitialised value of size 8
    ==22407==    at 0x7852BA3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x785309F: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x7860B65: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x63856F: print(unsigned char const*, int, int) (TestGaussianBlur.cpp:59)
    ==22407==    by 0x63871F: main (TestGaussianBlur.cpp:84)
    ==22407== 
    ==22407== Conditional jump or move depends on uninitialised value(s)
    ==22407==    at 0x7852BB6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x785309F: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x7860B65: std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
    ==22407==    by 0x63856F: print(unsigned char const*, int, int) (TestGaussianBlur.cpp:59)
    ==22407==    by 0x63871F: main (TestGaussianBlur.cpp:84)
    ==22407== 
    16 17 18 18 19 20 21 22 23 23 24 24 
    16 17 18 18 19 20 21 22 23 23 24 24 
    21 22 23 23 24 25 26 27 27 28 29 29 
    26 27 27 28 29 30 31 31 32 33 34 34 
    31 31 32 32 33 34 35 36 36 37 38 38 
    34 34 35 35 36 37 38 38 39 40 40 41 
    35 35 36 36 37 38 38 39 40 40 41 41 
    34 34 35 35 36 36 37 38 38 39 39 39 
    ==22407== Conditional jump or move depends on uninitialised value(s)
    ==22407==    at 0x638732: main (TestGaussianBlur.cpp:87)
    ==22407== 
    *** stack smashing detected ***: ./TestGaussianBlur terminated
    ==22407== 
    ==22407== Process terminating with default action of signal 6 (SIGABRT)
    ==22407==    at 0x806D438: raise (raise.c:54)
    ==22407==    by 0x806F039: abort (abort.c:89)
    ==22407==    by 0x80AF7F9: __libc_message (libc_fatal.c:175)
    ==22407==    by 0x815121B: __fortify_fail (fortify_fail.c:37)
    ==22407==    by 0x81511BF: __stack_chk_fail (stack_chk_fail.c:28)
    ==22407==    by 0x638738: main (TestGaussianBlur.cpp:87)
    ==22407== 
    ==22407== HEAP SUMMARY:
    ==22407==     in use at exit: 8,320 bytes in 8 blocks
    ==22407==   total heap usage: 18 allocs, 10 frees, 93,336 bytes allocated
    ==22407== 
    ==22407== LEAK SUMMARY:
    ==22407==    definitely lost: 0 bytes in 0 blocks
    ==22407==    indirectly lost: 0 bytes in 0 blocks
    ==22407==      possibly lost: 0 bytes in 0 blocks
    ==22407==    still reachable: 8,320 bytes in 8 blocks
    ==22407==         suppressed: 0 bytes in 0 blocks
    ==22407== Rerun with --leak-check=full to see details of leaked memory
    ==22407== 
    ==22407== For counts of detected and suppressed errors, rerun with: -v
    ==22407== Use --track-origins=yes to see where uninitialised values come from
    ==22407== ERROR SUMMARY: 386 errors from 4 contexts (suppressed: 0 from 0)
    Abandon (core dumped)
    

    What could be the issue in my code?


    In the SimdGaussianBlurInit() function, what is the relationship between radius and standard sigma value for Gaussian kernel?

    For instance with scipy.ndimage.gaussian_filter, it gives:

    img:
     [[ 0  1  2  3  4  5  6  7  8  9 10 11]
     [12 13 14 15 16 17 18 19 20 21 22 23]
     [24 25 26 27 28 29 30 31 32 33 34 35]
     [36 37 38 39 40 41 42 43 44 45 46 47]
     [48 49 50 51 52 53 54 55 56 57 58 59]
     [60 61 62 63 64 65 66 67 68 69 70 71]
     [72 73 74 75 76 77 78 79 80 81 82 83]
     [84 85 86 87 88 89 90 91 92 93 94 95]]
    img_blur (sigma=5.0):
     [[39 39 39 40 40 41 41 42 42 43 43 43]
     [40 40 40 41 41 42 42 43 43 44 44 44]
     [41 41 41 42 42 43 43 44 44 45 45 45]
     [43 43 43 44 44 45 45 46 46 47 47 47]
     [46 46 46 47 47 48 48 49 49 50 50 50]
     [48 48 48 49 49 50 50 51 51 52 52 52]
     [49 49 49 50 50 51 51 52 52 53 53 53]
     [50 50 50 51 51 52 52 53 53 54 54 54]]
    img_blur (sigma=5.0/2.0):
     [[19 19 20 21 22 23 23 24 25 26 27 27]
     [23 23 24 25 26 27 27 28 29 30 31 31]
     [29 29 30 31 32 33 33 34 35 36 37 37]
     [38 38 39 40 41 42 42 43 44 45 46 46]
     [47 47 48 49 50 51 51 52 53 54 55 55]
     [56 56 57 58 59 60 60 61 62 63 64 64]
     [62 62 63 64 65 66 66 67 68 69 70 70]
     [66 66 67 68 69 70 70 71 72 73 74 74]]
    

    Finally, why the radius parameter in SimdGaussianBlurInit() is a const float pointer? Looks like there is no need to have a pointer for this, unless to support different radius values for X and Y axes?

    opened by s-trinh 10
  • Legacy removing

    Legacy removing

    Simd is being developing about 12 years. And some of its components are not actual now. I think that I can remove some of functionality that are not updated for many years. This list is not full and may be updated:

    1. Support of PPC architecture. 
    2. Data tests (they were developed for PPC porting).
    3. AVX-512F optimizations (merge them to AVX-512BW optimizations). 
    4. SSE2 optimizations (merge them to SSE4.1 optimizations). 
    5. AVX optimizations (merge them to AVX2 optimizations). 
    6. EdgeBackground functions.
    7. SvmSumLinear function.
    8. Interference functions.
    
    opened by ermig1979 0
  • vc2019

    vc2019

    1>SimdLib.obj : error LNK2019: 无法解析的外部符号 "void __cdecl Simd::Avx512bw::NeuralAddConvolution2x2Forward(float const *,unsigned __int64,unsigned __int64,unsigned __int64,float const *,float *,unsigned __int64)" ([email protected]@[email protected]@[email protected]),函数 SimdNeuralAddConvolution2x2Forward 中引用了该符号

    opened by pww1971 3
  • Alpha blend small BGRA image onto large YUV420P image

    Alpha blend small BGRA image onto large YUV420P image

    Been struggling with this simple idea all weekend. I want a fast way to overlay a small BGRA image on a large YUV420P image. Finally figured out the best way to do this, apart from having a dedicated function, is to first use SimdBgraToYuva444pV2to convert the BGRA and then call SimdAlphaBlending3 times. However, SimdBgraToYuva444pV2is missing.. A less elegant way would be to use SimdBgraToYuva420pand use two alpha masks, but I am already foreseeing mask errors.. Is it possible to add SimdBgraToYuva444pV2(or alternatively extend SimdAlphaBlending)?

    opened by mikeversteeg 3
  • Some functions in this library are slower than opencv4 5.5

    Some functions in this library are slower than opencv4 5.5

    Hi, First of all, thank you for your contribution. But when I used this library, I tested Simd:: Resize(),Simd:: BgrToGray and so on. Some functions are 4-6 times slower than OpenCV4.5.5. (OS: Ubuntu18.04 CPU: i7-10750H 12cores)

    Did I miss anything when I used it? Could you give a use case of image processing to make it faster than opencv's function. If most of them are slower than opencv, what are the advantages of this library. Sincerely look forward to your answer.

    opened by SheepKeeper1990 5
  • Is this correct for IPlImage

    Is this correct for IPlImage

    long long getCurrentTimeMicro()
    {
    	return std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now().time_since_epoch()).count();
    }
    
    void InnerMixtureImage(IplImage* pSrc, IplImage* pDst, int xpos, int ypos)
    {
    	if (!pSrc || !pDst)
    		return;
    
    	if (pSrc->nChannels != 4 || pDst->nChannels != 4)
    		return;
    
    	int w = xpos + pSrc->width;
    	int h = ypos + pSrc->height;
    
    	if (w > pDst->width || h > pDst->height)
    	{
    		printf("<WARNING> %s: src width = %d, height = %d, dst width = %d, height = %d, pos x = %d, y = %d\r\n",
    			__FUNCTION__, pSrc->width, pSrc->height, pDst->width, pDst->height, xpos, ypos);
    		return;
    	}
    
    	int i, j;
    	for (j = 0; j < pSrc->height; ++j)
    	{
    
    		unsigned char* pucDst = (unsigned char*)pDst->imageData + (j + ypos) * pDst->widthStep + xpos * pDst->nChannels;
    		unsigned char* pucSrc = (unsigned char*)pSrc->imageData + j * pSrc->widthStep;
    
    		for (i = 0; i < pSrc->width; ++i)
    		{
    			unsigned char alpha = pucSrc[3];
    
    			if (alpha == 0)
    			{
    			}
    			else if (alpha == 255)
    			{
    				pucDst[0] = pucSrc[0];
    				pucDst[1] = pucSrc[1];
    				pucDst[2] = pucSrc[2];
    				pucDst[3] = pucSrc[3];
    			}
    			else 
    			{
    				pucDst[0] = (pucDst[0] * (255 - alpha) + pucSrc[0] * alpha) >> 8;
    				pucDst[1] = (pucDst[1] * (255 - alpha) + pucSrc[1] * alpha) >> 8;
    				pucDst[2] = (pucDst[2] * (255 - alpha) + pucSrc[2] * alpha) >> 8;
    				pucDst[3] = pucDst[3] > alpha ? pucDst[3] : alpha;
    			}
    			pucDst += 4;
    			pucSrc += 4;
    		}
    	}
    
    }
    
    
    int main()
    {
    	typedef Simd::View<Simd::Allocator> View;
    
        IplImage* pUpdate = cvLoadImage("update.png", CV_LOAD_IMAGE_UNCHANGED);
    	
    	int width = pUpdate->width;
    	int height = pUpdate->height;
    	int stride = pUpdate->widthStep;
    
    	IplImage* pImg = cvCreateImage(cvGetSize(pUpdate), 8, 4);
    	cvSet(pImg, cvScalar(51, 51, 51, 255));
    
        IplImage* pChannel1 = cvCreateImage(cvGetSize(pUpdate), 8, 1);
        IplImage* pChannel2 = cvCreateImage(cvGetSize(pUpdate), 8, 1);
        IplImage* pChannel3 = cvCreateImage(cvGetSize(pUpdate), 8, 1);
        IplImage* pChannel4 = cvCreateImage(cvGetSize(pUpdate), 8, 1);
    
        cvSplit(pUpdate, pChannel1, pChannel2, pChannel3, pChannel4);
    
    	View src(width, height, stride, View::Bgra32, pImg->imageData);
    
    	View dst(width, height, stride, View::Bgra32, pUpdate->imageData);
    
    	View alpha(pChannel4->width, pChannel4->height, pChannel4->widthStep, View::Gray8, pChannel4->imageData);
    
    	int64_t start = getCurrentTimeMicro();
    
    	Simd::AlphaBlending(dst, alpha, src);
    
    	int64_t end = getCurrentTimeMicro();
    
    	cout << "Simd::AlphaBlending elapsed time = " << (end - start) << " us " << endl;
    
    	start = getCurrentTimeMicro();
    
    	//InnerMixtureImage(pUpdate, pImg, 0, 0);
    
    	end = getCurrentTimeMicro();
    
    	cout << "InnerMixtureImage elapsed time = " << (end - start) << " us " << endl;
    
        cvShowImage("blend", pImg);
        cvWaitKey();
    }
    
    opened by HowToExpect 0
Releases(v4.10.114)
  • v4.10.114(Jun 1, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToUyvy422.
    • AVX-512BW, NEON optimizations of function Uyvy422ToYuv420p.
    • AVX-512BW, NEON optimizations of function Uyvy422ToBgr.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Float32ToBFloat16.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
    • Base implementation of class SynetConvolution32fBf16Gemm.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution32fBf16Nhwc.
    • Base implementation of class SynetMergedConvolution32fBf16.
    Removing
    • Remove external GEMM function parameter from function SynetConvolution32fInit.
    • Remove external GEMM function parameter from function SynetDeconvolution32fInit.

    Test framework

    New features
    • Tests for verifying functionality of function Yuv420pToUyvy422.
    • Tests for verifying functionality of function Float32ToBFloat16.
    • Tests for verifying functionality of function BFloat16ToFloat32.

    Infrastructure

    New features
    • Project files for Microsoft Visual Studio 2022.
    Source code(tar.gz)
    Source code(zip)
    simd.4.10.114.zip(5.09 MB)
  • v4.9.113(May 4, 2022)

    Algorithms

    New features
    • SSE4.1, AVX2, AVX-512BW optimizations of class ResizerByteArea2x2.
    Improving
    • Base implementation of class ResizerByteArea1x1.
    Bug fixing
    • Error in Base implementation of class ResizerByteArea2x2.
    • Error in AVX optimizations of class SynetConvolution32fDirectNchw.
    Removing
    • SimdSynetCompatibilityFloatZero flag.

    Infrastructure

    New features
    • Git commit ID info in function SimdVersion.
    • Git branch name in function SimdVersion.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.113.zip(5.01 MB)
  • v4.9.112(Apr 1, 2022)

    Algorithms

    New features
    • NEON optimizations of function Base64Encode.
    • NEON optimizations of ImageJpegSaver class.
    • NEON optimizations of function Yuv420pSaveAsJpegToMemory.
    • NEON optimizations of function Nv12SaveAsJpegToMemory.
    • Owner method in View structure.
    • Owner method in Frame structure.
    • Capture method in View structure.
    • Capture method in Frame structure.
    • Base implementation of class ResizerByteAreaReduced2x2.
    Bug fixing
    • MSVS compiler error in AVX-512BW optimizations of function Yuv420pToBgraV2.
    • Error in AVX2 optimizations of function BgraToRgb.
    • Error (aligned reading of unaligned memory) in SSE4.1, AVX2, AVX-512BW optimizations of function InterleaveBgra.
    • Error in function View::ToOcv.
    • Error in View copy constructor (from OpenCV Mat).

    Test framework

    Bug fixing
    • Wrong default ROOT_PATH for Linux.
    • Error in test SynetConvert32fTo8uAutoTest.
    • Special test ResizeYuv420pSpecialTest.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.112.zip(5.00 MB)
  • v4.9.111(Mar 3, 2022)

    Algorithms

    New features
    • AVX2, AVX-512BW optimizations of ResizerByteBicubic class.
    • SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Base64Decode.
    • NEON optimizations of function SynetSwish32f.
    • Swish activation function to NEON optimizations of SynetConvolution32f framework.
    • Swish activation function to NEON optimizations of SynetDeconvolution32f framework.
    • Swish activation function to NEON optimizations of SynetMergedConvolution32f framework.
    • Swish activation function to NEON optimizations of SynetConvolution8i framework.
    • Swish activation function to NEON optimizations of SynetMergedConvolution8i framework.
    • NEON optimizations of function Yuv444pToBgraV2.
    • SSE2, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToBgraV2.
    Improving
    • SSE4.1 optimizations of ResizerByteBicubic class.
    Bug fixing
    • Compiler error in NEON optimizations of function AlphaUnpremultiply.
    • MSVS Compiler warnings in SSE4.1, AVX2, AVX-512BW optimizations of function TransformImage.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.111.zip(4.99 MB)
  • v4.9.110(Mar 3, 2022)

    Algorithms

    New features
    • Base implementation, SSE4.1 optimizations of ResizerByteBicubic class.
    • Base implementation of function BgraToYuv444pV2.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Nv12SaveAsJpegToMemory.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuv420pSaveAsJpegToMemory.
    • Base implementation of function BgraToYuv420pV2.
    Bug fixing
    • Error in SSE4.1, AVX2, AVX-512BW optimizations of function BgraToRgba.
    • Error in SSE4.1, AVX2 optimizations of function BgraToBgr.
    • Error in SSE4.1, AVX2 optimizations of function BgraToRgb.
    • Error in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function AlphaUnpremultiply.

    Test framework

    New features
    • Tests for verifying functionality of function BgraToYuv444pV2.
    • Tests for verifying functionality of function Nv12SaveAsJpegToMemory.
    • Tests for verifying functionality of function Yuv420pSaveAsJpegToMemory.
    • Tests for verifying functionality of function BgraToYuv420pV2.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.110.zip(4.97 MB)
  • v4.9.109(Jan 3, 2022)

    Algorithms

    New features
    • Parameter Uyvy422ToBgr to function.
    • SSE4.1, AVX2 optimizations of function Uyvy422ToBgr.
    • Base implementation, SSE4.1, AVX2 optimizations of function Uyvy422ToYuv420p.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Base64Encode.
    • Base implementation of function Base64Decode.
    Improving
    • AVX2 optimizations of class ResizerNearest for Bgr24, Uv16.
    Renaming
    • Function UyvyToBgr to Uyvy422ToBgr.

    Test framework

    New features
    • Tests for verifying functionality of function Uyvy422ToYuv420p.
    • Tests for verifying functionality of function Base64Encode.
    • Tests for verifying functionality of function Base64Decode.

    Documentation

    Changes
    • Update developers list.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.109.zip(4.95 MB)
  • v4.9.108(Dec 1, 2021)

    Algorithms

    New features
    • SSE4.1, AVX2, AVX-512F, AVX-512BW optimizations of class ResizerNearest.
    • Add SimdResizeMethodNearestPytorch to SimdResizeMethodType enumeration.
    • Add parameter BackgroundStatUpdateTime to Motion Detector.
    • MotionDetector performance optimization (case of falling star).
    • 16-bit UYVY image format in View.
    • Base implementation of function UyvyToBgr.
    • Base implementation, SSE2, AVX2, AVX-512F optimizations of function SynetSwish32f.
    • SimdConvolutionActivationSwish item of SimdConvolutionActivationType enumeration.
    • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetConvolution32f framework.
    • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetDeconvolution32f framework.
    • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetMergedConvolution32f framework.
    • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8i framework.
    • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
    • SimdYuvType enumeration.
    • Base implementation, SSE2, AVX2, AVX-512BW optimizations of function Yuv444pToBgraV2.
    • Function Simd::Resize supports images with 16-bit channel size.
    • Base implementation function Yuv420pToBgraV2.
    Improving
    • Refactoring of SimdResizeMethodType enumeration.
    Bug fixing
    • Stack corruption in function Simd::Avx2::JpegWriteBlockSubs.

    Test framework

    New features
    • Tests for verifying functionality of function UyvyToBgr.
    • Tests for verifying functionality of function SynetSwish32f.
    • Tests for verifying functionality of function Yuv444pToBgraV2.
    • Tests for verifying functionality of function Yuv420pToBgraV2.

    Infrastructure

    Bug fixing
    • Wrong compiler options correction in Cmake.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.108.zip(4.92 MB)
  • v4.9.107(Nov 1, 2021)

    Algorithms

    New features
    • Internal class Holder to replace std::unique_ptr for old compilers without support of C++11 standard.
    • SimdBayerLayoutType enumeration.
    • Base implementation of class ResizerNearest.
    Bug fixing
    • Compiler error when defined macro SIMD_SSE2_DISABLE.
    • Compiler error when defined macro SIMD_NEON_DISABLE.

    Infrastructure

    New features
    • SIMD_ROOT Cmake parameter.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.107.zip(4.90 MB)
  • v4.9.106(Oct 1, 2021)

    Algorithms

    New features
    • Base implementation, SSE2, AVX, AVX-512F, NEON optimizations of function SynetHardSigmoid32f.
    • SimdConvolutionActivationHardSigmoid item of SimdConvolutionActivationType enumeration.
    • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetConvolution32f framework.
    • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetDeconvolution32f framework.
    • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetMergedConvolution32f framework.
    • NEON optimizations of SynetMergedConvolution32fDc class.
    • NEON optimizations of SynetMergedConvolution32fCd class.
    • NEON optimizations of SynetInnerProduct32fGemm class.
    • NEON optimizations of SynetInnerProduct32fProd class.
    • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI, NEON optimizations of SynetConvolution8i framework.
    • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
    Bug fixing
    • Compiler error in file SimdInit.h (CLang, Windows).
    Removing
    • Remove including SimdConfig.h in SimdLib.h.

    Test framework

    New features
    • Tests for verifying functionality of function SynetHardSigmoid32f.
    • '-pi' test parameter (to print internal performance statistics of Simd Library to console).
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.106.zip(4.90 MB)
  • v4.9.105(Sep 13, 2021)

    Algorithms

    New features
    • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24 for Rotate180, TransposeRotate90).
    • Method Frame::Clone with region parameter.
    • Method View::Clone with region parameter.
    • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • AVX-512BW optimizations of function TransformImage (case of Gray8, Uv16, Bgra32 for Rotate180, TransposeRotate90).
    • AVX-512BW optimizations of function TransformImage (case of Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • AVX-512BW optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • AVX-512BW optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function AlphaBlendingUniform.
    • AVX-512BW optimizations of function TransformImage (case of Bgr24 for Rotate180, TransposeRotate90, Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • Resize function (with size parameter).
    • Move constructor of View structure.
    • Move operator of View structure.
    • Clear method of Frame structure.
    • Swap method of Frame structure.
    • Move constructor of Frame structure.
    • Move operator of Frame structure.

    Tests

    New features
    • Tests for verifying functionality of function AlphaBlendingUniform.
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.105.zip(4.89 MB)
  • v4.9.104(Aug 3, 2021)

    Algorithms

    New features
    • Rgba32 format in Frame structure.
    • Rgba32 format in Convert function (for frames).
    • SSE4.1 optimizations of function Float32ToFloat16.
    • SSE4.1 optimizations of function Float16ToFloat32.
    • AVX2 optimizations of function TransformImage (case of Bgra32 for Rotate180, TransposeRotate90).
    Improving
    • SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetConvolution32fNhwcDirect (case of fixed kernels).
    • Reducing of compilation time and binaries size of class SynetConvolution32f.
    • Reducing of compilation time and binaries size of class SynetDeconvolution32f.
    • Reducing of compilation time and binaries size of class SynetMergedConvolution32f.
    • Reducing of compilation time and binaries size of class SynetConvolution8i.
    • Reducing of compilation time and binaries size of class SynetMergedConvolution8i.
    • SSE41 optimizations of function TransformImage (case of Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate180).
    • SSE41 optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    • SSE41 optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
    Bug fixing
    • Compiler error in file SimdAvx512bwResizer.cpp (GCC 5.4.0).
    • Compiler error in file SimdAvx512bwBgraToBgr.cpp (MSVS-2017).
    • Compiler error in file SimdInit.h (CLang, Windows).
    • Error in AVX2 and AVX-512BW optimizations of functions CosineDistancesMxNa16f and CosineDistancesMxNp16f (functions may return small negative values).
    • Error in function Base::DetectionLoadA (it generates exception instead of returns NULL).
    • Error in SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetDeconvolution32fNhwcDirect2x2.
    Replacing
    • Replace SSE3 optimizations to SSE4.1 for function Gemm32fNT.
    • Replace SSE3 optimizations to SSE4.1 for function SynetConvolution32fInit.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
    • Replace SSE3 optimizations to SSE4.1 for function NeuralConvolutionForward.
    • Replace SSE4.2 optimizations to SSE4.1 for function Crc32c.
    • Replace SSSE3 optimizations to SSE4.1 for function AlphaBlending.
    • Replace SSSE3 optimizations to SSE4.1 for function AlphaFilling.
    • Replace SSSE3 optimizations to SSE4.1 for function AlphaPremultiply.
    • Replace SSSE3 optimizations to SSE4.1 for function BayerToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToBayer.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgba.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv420p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv422p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuva420p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToBayer.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function RgbToBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToGray.
    • Replace SSSE3 optimizations to SSE4.1 for function RgbToGray.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function TransformImage.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv420p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv422p.
    • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv444p.
    • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function GaussianBlur3x3.
    • Replace SSSE3 optimizations to SSE4.1 for function GrayToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgra.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToBgr.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToRgb.
    • Replace SSSE3 optimizations to SSE4.1 for function Laplace.
    • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbs.
    • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbsSum.
    • Replace SSSE3 optimizations to SSE4.1 for function MeanFilter3x3.
    • Replace SSSE3 optimizations to SSE4.1 for function ReduceColor2x2.
    • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray2x2.
    • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray4x4.
    • Replace SSSE3 optimizations to SSE4.1 for function Reorder16bit.
    • Replace SSSE3 optimizations to SSE4.1 for function Reorder32bit.
    • Replace SSSE3 optimizations to SSE4.1 for function Reorder64bit.
    • Replace SSSE3 optimizations to SSE4.1 for function ResizeBilinear.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDx.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbs.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbsSum.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDy.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbs.
    • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbsSum.
    • Replace SSSE3 optimizations to SSE4.1 for function ContourMetrics.
    • Replace SSSE3 optimizations to SSE4.1 for function ContourMetricsMasked.
    • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSum.
    • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSumMasked.
    • Replace SSSE3 optimizations to SSE4.1 for function TextureBoostedSaturatedGradient.
    • Replace SSSE3 optimizations to SSE4.1 for class ResizerByteBilinear.

    Tests

    New features
    • Colorized annotation in console logging.
    Improving
    • Performance report generation to text file.
    • Thread ID annotation in console logging.

    Infrastructure

    New features
    • SIMD_INT8_DEBUG cmake option.
    Removing
    • Separate support of SSE3 extension (it has been moved into SSE4.1).
    • Separate support of SSE4.2 extension (it has been moved into SSE4.1).
    • Separate support of SSSE3 extension (it has been moved into SSE4.1).
    Source code(tar.gz)
    Source code(zip)
    simd.4.9.104.zip(4.71 MB)
  • v4.8.103(Jul 1, 2021)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of class ResizerShortBilinear.
    • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNa16f.
    • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNp16f.
    • Parameter of ROI mask in Motion::Model.
    • SSE2, AVX-512BW and NEON optimizations of function AbsDifference.
    • NEON optimizations of function AlphaUnpremultiply.
    • NEON optimizations of function AlphaPremultiply.
    • NEON optimizations of function ValueSquareSums.
    Improving
    • Performance of SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
    Bug fixing
    • Linker warning in file SimdImageLoad.h (MSVS).
    Replacing
    • Replace SSE optimizations to SSE2 for function SvmSumLinear.
    • Replace SSE optimizations to SSE2 for function Fill32f.
    • Replace SSE optimizations to SSE2 for function CosineDistance32f.
    • Replace SSE optimizations to SSE2 for function DifferenceSum32f.
    • Replace SSE optimizations to SSE2 for function SquaredDifferenceKahanSum32f.
    • Replace SSE optimizations to SSE2 for function HogDeinterleave.
    • Replace SSE optimizations to SSE2 for function HogFilterSeparable.
    • Replace SSE optimizations to SSE2 for class ResizerFloatBilinear.
    • Replace SSE optimizations to SSE2 for function NeuralAddVectorMultipliedByValue.
    • Replace SSE optimizations to SSE2 for function NeuralAddVector.
    • Replace SSE optimizations to SSE2 for function NeuralAddVector.
    • Replace SSE optimizations to SSE2 for function NeuralAdaptiveGradientUpdate.
    • Replace SSE optimizations to SSE2 for function NeuralDerivativeRelu.
    • Replace SSE optimizations to SSE2 for function NeuralDerivativeSigmoid.
    • Replace SSE optimizations to SSE2 for function NeuralDerivativeTanh.
    • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid.
    • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid2.
    • Replace SSE optimizations to SSE2 for function NeuralRoughTanh.
    • Replace SSE optimizations to SSE2 for function NeuralUpdateWeights.
    • Replace SSE optimizations to SSE2 for function NeuralPooling1x1Max3x3.
    • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max2x2.
    • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max3x3.
    • Replace SSE optimizations to SSE2 for function SynetPoolingForwardAverage.
    • Replace SSE optimizations to SSE2 for function SynetPoolingForwardMax32f.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Forward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Backward.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Sum.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Sum.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Sum.
    • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Sum.
    • Replace SSE optimizations to SSE2 for function Gemm32fNN.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward0.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward1.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward2.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward3.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward4.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward8.
    • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward9.
    • Replace SSE optimizations to SSE2 for function SynetReorderImage.
    • Replace SSE optimizations to SSE2 for function SynetReorderFilter.
    • Replace SSE optimizations to SSE2 for function SynetAddBias.
    • Replace SSE optimizations to SSE2 for function SynetEltwiseLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetInnerProductLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetShuffleLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetHswish32f.
    • Replace SSE optimizations to SSE2 for function SynetPreluLayerForward.
    • Replace SSE optimizations to SSE2 for function SynetRelu32f.
    • Replace SSE optimizations to SSE2 for function SynetRestrictRange32f.
    • Replace SSE optimizations to SSE2 for function SynetScaleLayerForward.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetOutput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetFilter.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetInput.
    • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetOutput.

    Tests

    New features
    • Tests to verify functionality function of VectorNormNa16f.
    • Tests to verify functionality function of VectorNormNp16f.

    Infrastructure

    Removing
    • Support of SSE extension.
    Source code(tar.gz)
    Source code(zip)
    simd.4.8.103.zip(4.69 MB)
  • v4.7.102(Jun 2, 2021)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function ValueSquareSums.
    Improving
    • Performance of AVX2, AVX-512F and NEON optimizations of SynetConvolution32fGemmNN class.
    • Performance of Neural::FullyConnectedLayer::Forward method.
    Bug fixing
    • Error in class SynetMergedConvolution32fDc (large weights case).
    • Compiler error in file SimdAvx2SynetConversion.cpp (MSVS-2015, Win32).
    • Error in SSSE3 optimization of ImageTransform function.
    • Compiler error in file SimdImageSaveJpeg.h (Clang, Mac mini).
    • Compiler warnings (Clang).
    • Error in function ImagePngLoader::ReadTransparency (test tbbn0g04.png).
    • Error in Base implementation, SSE4.1 optimization of class ImagePngLoader (test basn0g16.png).
    • Error in SSE4.1 optimization of class ImagePngLoader (test s02i3p01.png).

    Tests

    New features
    • Tests to verify functionality function of ValueSquareSums.
    Improving
    • Header of performance report table.
    Bug fixing
    • Compiler error in file TestFile.h (Clang, Mac mini).
    Source code(tar.gz)
    Source code(zip)
    simd.4.7.102.zip(5.56 MB)
  • v4.7.101(May 3, 2021)

    Algorithms

    New features
    • Parameter a in function DeinterleaveBgra can be NULL.
    • Simd::DeinterleaveBgra C++ wrapper.
    • Simd::DeinterleaveRgb C++ wrapper.
    • Simd::DeinterleaveRgba C++ wrappers.
    • Method View::Load (from memory).
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImageJpegSaver class.
    • Base implementation of ImageJpegLoader class.
    • Base implementation of ImagePngLoader class.
    • NEON optimizations of ImagePngSaver class.
    • SIMD_SYNET_DISABLE macro.
    • Base implementation, AVX2, AVX-512BW, NEON optimizations of function СosineDistancesMxNp16f.
    Bug fixing
    • Error in NEON optimizations of function СosineDistancesMxNa16f.

    Tests

    New features
    • Parameter '-ri' to set real image name in runtime.
    • Tests to verify functionality function of СosineDistancesMxNp16f.
    • Special tests for verifying functionality of function ImageLoadFromMemory.
    Bug fixing
    • Error in saving of output log.

    Infrastructure

    New features
    • Real images to test encoding/decoding algorithms.
    • SIMD_SYNET cmake option.
    • SIMD_HIDE cmake option.
    Removing
    • Project files of Microsoft Visual Studio 2017 (for Android).

    Documentation

    New features
    • Description of Cmake parameters.
    Source code(tar.gz)
    Source code(zip)
    simd.4.7.101.zip(5.30 MB)
  • v4.6.100(Apr 1, 2021)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImagePngSaver class.
    • SynetInnerProduct32f framework.
    • Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
    • Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fProd class.
    • Rgba32 format in View structure.
    • Pixel::Rgba32 structure.
    • Simd::RgbToBgr C++ wrapper.
    • Simd::GrayToRgb C++ wrapper.
    • Simd::GrayToRgba C++ wrapper.
    • Simd::BgrToRgba C++ wrapper.
    • Simd::RgbaToRgb C++ wrapper.
    • Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function RgbaToGray.
    • Base implementation, SSSE3, AVX2, AVX-512BW, NEON optimizations of function BgraToRgba.
    • Simd::RgbToRgba C++ wrapper.
    • Simd::RgbaToBgra C++ wrapper.
    • Rgba32 format in Convert function.
    • Rgba32 format in function ImageSave.
    Improving
    • Reduce memory allocations in Simd::ContourDetector.
    Bug fixing
    • Assert in function Avx::SynetMergedConvolution32fCdc::SynetMergedConvolution32fCdc.
    • Assert in function Avx::SynetMergedConvolution32fCd::SynetMergedConvolution32fCd.
    • Assert in function Avx::SynetMergedConvolution32fDc::SynetMergedConvolution32fDc.
    • Freezes in function SynetConvolution32fNhwcDirect::OldReorderWeight (ARMv7 architecture).
    • Freezes in file SimdGemm.h (ARMv7 architecture).

    Tests

    New features
    • Tests for verifying functionality of SynetInnerProduct32f framework.
    • Performance report use milliseconds or microseconds (choosing in runtime).
    • Special test to verify functionality function of Simd::Convert.
    • Tests to verify functionality function of RgbaToGray.
    • Tests to verify functionality function of BgraToRgba.
    Bug fixing
    • Crash in test BgrToRgbAutoTest.
    • Error in test of SynetMergedConvolution8i.

    Infrastructure

    Removing
    • Remove project files of Microsoft Visual Studio 2013.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.100.zip(3.97 MB)
  • v4.6.99(Mar 1, 2021)

    Algorithms

    New features
    • SimdImageFileType enumeration.
    • ImageSaveToFile function.
    • ImageSaveToMemory function.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtSaver class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinSaver class.
    • Change order of parameters in function BgrToRgb.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinSaver class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtSaver class.
    • Additional parameters in function View::Save.
    • Method View::Release.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtLoader class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinLoader class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtLoader class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinLoader class.
    • Additional parameter in function View::Load.
    • Base implementation of Crc32 function.
    Bug fixing
    • Crash in Simd::Detection on Python (using of std::unique_ptr).

    Tests

    New features
    • Possibility to write output video in UseFaceDetection.cpp example.
    • Test parameter '-o=' to write annotated output video.
    • Tests for verifying functionality of function ImageSaveToFile.
    • Tests for verifying functionality of function ImageSaveToMemory.
    • Tests for verifying functionality of function ImageLoadFromMemory.
    • Tests for verifying functionality of function Crc32.

    Documentation

    New features
    • Example of use into description of Font.
    Bug fixing
    • Errors in Simd Library description.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.99.zip(3.94 MB)
  • v4.6.98(Feb 1, 2021)

    Algorithms

    New features
    • Add parameter epsilon to GaussianBlur engine.
    • Add function SynetConvolution32fInfo.
    • Add function SynetConvolution8iInfo.
    • Add function SynetDeconvolution32fInfo.
    • Add function SynetMergedConvolution32fInfo.
    • Add function SynetMergedConvolution8iInfo.
    Improving
    • Performance of SynetConvolution8iNhwcDirect class (case of horizontal padding of small image).
    Renaming
    • GaussianBlur engine parameter from radius to sigma.
    Bug fixing
    • Error in GaussianBlur engine (case of small images).
    • Performance degradation of AVX-512VNNI optimization of SynetConvolution8i framework.
    • Performance degradation of AVX-512VNNI optimization of SynetMergedConvolution8i framework.
    • Error in GaussianBlur engine (wrong processing of last rows).
    • Error in trajectory averaging algorithm in Motion::Detector.

    Tests

    New features
    • Possibility to write output video in UseMotionDetector.cpp example.
    Bug fixing
    • Error in files: TestVideo.cpp, UseMotionDetector.cpp, UseFaceDetector.cpp (MSVS-2019, OpenCV enabled).

    Documentation

    Improving
    • Description of GaussianBlur engine.
    • Description of Motion::Detector.

    Infrastructure

    New feature
    • Ocv.prop.default for Visual Studio 2019.
    Renaming
    • Cmake parameter from LIBRARY to SIMD_SHARED.
    • Cmake parameter from CHECK_VERSION to SIMD_GET_VERSION.
    • Cmake parameter from TOOLCHAIN to SIMD_TOOLCHAIN.
    • Cmake parameter from TARGET to SIMD_TARGET.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.98.zip(3.88 MB)
  • v4.6.97(Jan 4, 2021)

    Algorithms

    New features
    • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetMish32f.
    • Support of Mish activation function in SynetConvolution32f framework.
    • Support of Mish activation function in SynetMergedConvolution32f framework.
    • Support of Mish activation function in SynetConvolution8i framework.
    • Support of Mish activation function in SynetMergedConvolution8i framework.
    • Support of Mish activation function in SynetDeconvolution32f framework.
    • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of GaussianBlur engine.
    Improving
    • AVX-512F optimization of SynetConvolution32fNhwcDirect class.
    • AVX-512F optimization of SynetConvolution32fGemmNN class.
    • AVX-512F optimization of SynetConvolution32fWinograd class.
    • AVX-512F optimization of function Gemm32fNN.
    Bug fixing
    • Error in Base implementation of SynetMergedConvolution32f (type=CDC, add=1).
    • Error in function SimdAlignment.
    • Visual Studio 2017 compiler error in files SimdAvx512bwSynet.cpp, SimdAvx512bwSynetScale.cpp, SimdAvx512bwAlphaBlending.cpp.

    Test framework

    New features
    • Tests for verifying functionality of function SynetMish32f.
    • Tests for verifying functionality of GaussianBlur engine.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.97.zip(3.87 MB)
  • v4.6.96(Dec 1, 2020)

    Algorithms

    New features
    • Base implementation of function AveragingBinarizationV2.
    • SSE4.1, AVX2, AVX-512BW optimizations of function AlphaUnpremultiply.
    Improving
    • SSE2, AVX2, AVX-512BW and NEON optimizations of function MedianFilterSquare5x5.
    • SSE2, AVX2, AVX-512F optimizations of function SynetSoftmaxLayerForward.
    • Reducing of number of calling function CpuSocketNumber at initialization of Simd.
    • Reducing of number of calling function CpuCoreNumber at initialization of Simd.
    • Reducing of number of calling function CheckBit at initialization of Simd.
    Bug fixing
    • Compilation error in file SimdNeonSynetConvolution8i.cpp.
    • Infinite loop in SynetConvolution32fNhwcDirect::OldReorderWeight (on Celeron CPU).
    • Crash in SimdRuntime.h (on Celeron CPU).
    • Crash in SimdGemm.h (on Celeron CPU).
    • Function SimdSynetSpecifyTensorFormat returns incorrect value.

    Test framework

    New features
    • Tests for verifying functionality of function AveragingBinarizationV2.
    • Parameter '-lc' to litter CPU cache between tests run.

    Infrastructure

    New features
    • MSVS projects can be used from external solution.
    Removing
    • Supporting of MSA(MIPS).
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.96.zip(3.85 MB)
  • v4.6.95(Nov 4, 2020)

    Algorithms

    New features
    • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCdc class.
    • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCd class.
    • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iDc class.
    • SSE4.1, AVX2, AVX-512BW optimizations of function SynetConvert8uTo32f.
    • Base implementation, SSE2, SSSE3 AVX2, AVX-512BW optimizations of function AlphaPremultiply.
    • Base implementation of function AlphaUnpremultiply.
    Bug fixing
    • GCC v10 compilation error in file SimdGemm.h.
    • Error in IECompatible method of SynetMergedConvolution8i.

    Test framework

    New features
    • Tests for verifying functionality of function AlphaPremultiply.
    • Tests for verifying functionality of function AlphaUnpremultiply.

    Documentation

    Bug fixing
    • There are no references to C++ wrappers in description of API functions.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.95.zip(3.83 MB)
  • v4.6.94(Oct 1, 2020)

    Algorithms

    New features
    • Base implementation of SynetMergedConvolution8i class.
    • Base implementation of function SynetConvert8uTo32f.
    • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCdc class.
    • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCd class.
    • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iDc class.
    Bug fixing
    • Performance degradation in class Convolution32fNhwcDirect (weights size >> L3 cache).
    • Performance degradation in class Convolution32fGemmNN (weights size >> L3 cache).

    Test framework

    New features
    • Tests for verifying functionality of SynetMergedConvolution8i class.
    • Tests for verifying functionality of function SynetConvert8uTo32f.

    Documentation

    Improving
    • Improve structuring of Synet documentation.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.94.zip(3.72 MB)
  • v4.6.93(Sep 1, 2020)

    Algorithms

    New features
    • Full support of SimdConvolutionActivationType in SynetConvolution8i class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8iNhwcDepthwise class.
    • Extend class MergedConvolution32f (2 merged convolutions).
    • Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fCd class.
    • Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fDc class.
    Improving
    • Reducing of compilation time and assembled size of Simd Library.
    Renaming
    • Class MergedConvolution32f to MergedConvolution32fCdc.
    • Performance degradation in class Convolution32fNhwcDirect (dilation != 1).

    Test framework

    New features
    • Tests for verifying functionality of class MergedConvolution32f (2 merged convolutions).
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.93.zip(3.68 MB)
  • v4.6.92(Aug 3, 2020)

    Algorithms

    New features
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetAdd8i.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetInnerProduct8i.
    Improving
    • Reducing of compilation time and assembled size of Simd Library.
    Bug fixing
    • Error in SSE4.1, AVX2, AVX-512BW optimizations of SynetScale8i class (wrong alignment check).
    • Error in performance annotation of SynetConvolution8i class.
    • Compiler error in file SimdBaseSynetConvolution8i.cpp (for old compilers).
    • Compiler errors in files SimdAvx2Synet.cpp, SimdAvx2SynetScale.cpp (WIN32, MSVS).

    Test framework

    New features
    • Tests for verifying functionality of function SynetAdd8i.
    • Tests for verifying functionality of function SynetInnerProduct8i.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.92.zip(3.66 MB)
  • v4.6.91(Jul 1, 2020)

    Algorithms

    New features
    • Extend SimdSynetCompatibilityType enumeration.
    • Add support of SimdSynetCompatibility8iNarrowed to Base implementation, SSE2, AVX2, AVX-512BW and NEON optimizations of function SynetConvert32fTo8u.
    • Add support of SimdSynetCompatibility8iNarrowed to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
    • Add support of SimdConvolutionActivationPrelu to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
    • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of SynetScale8i class.
    Improving
    • Reducing of size of applications or shared libraries which use Simd as static library.
    Bug fixing
    • Error in class SynetConvolution8i (batch > 1).

    Test framework

    New features
    • Tests for verifying functionality of SynetScale8i framework.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.91.zip(3.63 MB)
  • v4.6.90(Jun 3, 2020)

    Algorithms

    New features
    • Rgb24 format in Frame structure.
    • Rgb24 format in Convert function.
    • Base implementation, SSS3, AVX2, AVX-512BW and NEON optimizations of function RgbToGray.
    • Base implementation, SSS3, AVX2, AVX-512BW and NEON optimizations of function RgbToBgra.
    • Base implementation, SSS3, AVX2, AVX-512BW and NEON optimizations of function BgraToRgb.
    • AVX2 optimization of function BgraToBgr.
    • Function LitterCpuCache.
    • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv444pToRgb.
    • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv422pToRgb.
    • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv420pToRgb.
    Improving
    • NEON optimization of function BgrToGray.
    Bug fixing
    • Error in class SynetConvolution8i (group != 1).
    • Wrong assert condition in SSE2, AVX, AVX2, AVX-512F and NEON optimization of class Convolution32fNhwcDirect.
    • Compiler error when SIMD_AVX2_DISABLE macro is uncommented.
    • Int32 overflow in function SynetConvolution8i::SetParams.

    Test framework

    New features
    • Tests for verifying functionality of function RgbToGray.
    • Tests for verifying functionality of function RgbToBgra.
    • Tests for verifying functionality of function BgraToRgb.
    • Tests for verifying functionality of function Yuv444pToRgb.
    • Tests for verifying functionality of function Yuv422pToRgb.
    • Tests for verifying functionality of function Yuv420pToRgb.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.90.zip(3.61 MB)
  • v4.6.89(May 4, 2020)

    Algorithms

    Bug fixing
    • Microsoft Visual Studio 2013 compiler errors in files: SimdSynetConvolution8i.h, SimdSse2SynetConvolution32f.cpp, SimdAvx2Reduce.cpp.
    • Buffer overrun in SSE4.1, AVX2, NEON optimizations of SynetConvolution8iNhwcDirect class.
    • Visual Studio 2017 internal compiler error in function Avx512f::ConvolutionBiasAndActivation (Win32/Release).
    • Compiler error in NEON optimization of class SynetConvolution8iNhwcDirect (ARM, 32-bit).
    • Error in AVX2 optimization of function SynetScaleLayerForward.
    • Error in base implementation of SquaredDifferenceKahanSum32f (Visual Studio 2017).
    • Error in AVX-512BW optimization of class SynetConvolution8iNhwcDirect (Visual Studio 2017/2019, Release).
    • Error in class SynetConvolution32fNhwcDirect (large parameters srcC and dstC).

    Test framework

    Bug fixing
    • Microsoft Visual Studio 2013 compiler errors in files: TestTensor.h, TestSynetActivation.cpp.
    • Test report is not generated if output directory is not exists.
    • Error in test SynetConvert32fTo8uAutoTest.

    Infrastructure

    New features
    • Script to test Simd compiled with different version of Microsoft Visual Studio.
    • New structure of Microsoft Visual Studio 2019 project files.
    Removing
    • Remove project files of Microsoft Visual Studio 2012.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.89.zip(3.59 MB)
  • v4.6.88(Apr 1, 2020)

    Algorithms

    New features
    • AVX-512VNNI extension support.
    • AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
    • Base implementation and SSE4.1, AVX2 AVX-512BW and NEON optimizations of function SynetPoolingForwardMax8u.
    Renaming
    • SynetPoolingForwardMax to SynetPoolingForwardMax32f.
    Improving
    • SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
    • SSE2, AVX, AVX2, AVX-512F and NEON optimizations of SynetConvolution32fNhwcDirect class.
    Bug fixing
    • Microsoft Visual Studio 2015 compiler error in function SynetConvert32fTo8u.
    • Degradation of performance of AVX2 code.
    • Microsoft Visual Studio compiler error in function Extract64i (32-bit mode).

    Test framework

    New features
    • Tests for verifying functionality of function SynetPoolingForwardMax8u.
    Source code(tar.gz)
    Source code(zip)
    simd.4.6.88.zip(3.56 MB)
  • v4.5.87(Mar 2, 2020)

    Algorithms

    New features
    • Add parameter of bitwise compatibility of function SynetScaleLayerForward and Inference Engine.
    • Add parameter 'type' to function SynetShuffleLayerForward.
    • Base implementation, SSE2, AVX2, AVX-512BW amd NEON optimizations of function SynetConvert32fTo8u.
    • SimdSynetCompatibilityType enumeration.
    • Base implementation of SynetConvolution8iGemmNN class.
    • Base implementation and SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
    Renaming
    • SimdSynetConvertImage to SimdSynetReorderImage.
    • SimdSynetConvertFilter to SimdSynetReorderFilter.

    Test framework

    New features
    • A new commandline test parameter -c - a number of channels in test image for performance testing.
    • A new commandline test parameter -mt - a minimal test execution time (in milliseconds).
    • Tests for verifying functionality of SynetConvolution8i framework.
    • Tests for verifying functionality of function SynetConvert32fTo8u.

    Documentation

    Bug fixing
    • Error in description of method Detection::LoadStringXml.
    Source code(tar.gz)
    Source code(zip)
    simd.4.5.87.zip(3.47 MB)
  • v4.5.86(Feb 3, 2020)

    Algorithms

    New features
    • SimdResizeMethodInferenceEngineInterp method in Resizer framework.
    Improving
    • Performance of Convolution32f framework (NHWC format, kernel=3x3, stride=1x1, large H and W).
    • Performance of AVX-512F and NEON optimizations of function GemmPackA.
    • Performance of Convolution32f framework (NHWC format, GemmNN method).
    • Performance of SSE2, AVX, AVX2, AVX-512F and NEON optimizations of Convolution32f framework (NHWC format, NhwcDirect method, kernel=1x1).
    • Performance of AVX-512F optimization of MergedConvolution32f framework (input convolution).
    • Performance of AVX2 and AVX-512F optimizations of MergedConvolution32f framework (output convolution).
    • Performance of Convolution32f framework (stride > 1).
    • Performance of AVX-512F optimization of Gemm32fNN function (add 6x64 and 6x48 micro kernel).
    Bug fixing
    • Error in AVX-512F optimization of function WinogradKernel3x3Block2x2SetOutput (NCHW format).
    • Error in SSE, AVX, AVX-512F and NEON optimizations of function SynetPoolingForwardAverage (NHWC format).
    • Error in AVX-512F optimization of function SynetInnerProductLayerForward.
    • Error in AVX, AVX2 and AVX-512F optimizations of function Gemm32fNT.
    • Error in function WinogradKernel3x3Block4x4SetInput (padX != padY != padW != padH).
    • Error in debug FLOPS annotation of Deconvolution32f framework.
    • MergedConvolution32f framework doesn't work with stride == 3.
    Source code(tar.gz)
    Source code(zip)
    simd.4.5.86.zip(3.44 MB)
  • v4.5.85(Jan 3, 2020)

    Algorithms

    New features
    • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetUnaryOperation32fLayerForward.
    • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetSoftplus32f.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetFilter.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetInput.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetOutput.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetFilter.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetInput.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetOutput.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetFilter.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetInput.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetOutput.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetFilter.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetInput.
    • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetOutput.
    Improving
    • Performance of Convolution32f framework (NHWC format, kernel=1x1x1).
    • Performance of Convolution32f framework (NHWC format, kernel=2x2).
    • Performance of Convolution32f framework (NHWC format, kernel=1x3).
    • Performance of Convolution32f framework (NHWC format, kernel=1x5).
    Renaming
    • NeuralSigmoid to SynetSigmoid32f.
    • NeuralTanh to SynetTanh32f.
    • NeuralRelu to SynetRelu32f.
    • Winograd2x3SetFilter to WinogradKernel3x3Block2x2SetFilter.
    • Winograd2x3SetInput to WinogradKernel3x3Block2x2SetInput.
    • Winograd2x3SetOutput to WinogradKernel3x3Block2x2SetOutput.
    • Winograd3x3SetFilter to WinogradKernel3x3Block3x3SetFilter.
    • Winograd3x3SetInput to WinogradKernel3x3Block3x3SetInput.
    • Winograd3x3SetOutput to WinogradKernel3x3Block3x3SetOutput.
    • Winograd4x4SetFilter to WinogradKernel3x3Block4x4SetFilter.
    • Winograd4x4SetInput to WinogradKernel3x3Block4x4SetInput.
    • Winograd4x4SetOutput to WinogradKernel3x3Block4x4SetOutput.
    Bug fixing
    • Error in Convolution32f framework (kernel greater than input size, NHWC format).
    • Potential crash in ContourDetector.

    Test framework

    New features
    • Tests for verifying functionality of function SynetUnaryOperation32fLayerForward.
    • Tests for verifying functionality of function SynetSoftplus32f.
    • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetFilter.
    • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetInput.
    • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetOutput.
    • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetFilter.
    • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetInput.
    • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetOutput.
    • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetFilter.
    • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetInput.
    • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetOutput.
    • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetFilter.
    • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetInput.
    • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetOutput.
    Source code(tar.gz)
    Source code(zip)
    simd.4.5.85.zip(3.43 MB)
Video++, a C++14 high performance video and image processing library.

Video++ Video++ is a video and image processing library taking advantage of the C++14 standard to ease the writing of fast video and image processing

Matthieu Garrigues 681 Jun 14, 2022
ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.

ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.

null 318 Jun 10, 2022
The CImg Library is a small and open-source C++ toolkit for image processing

http://cimg.eu The CImg Library is a small and open-source C++ toolkit for image processing, designed with these properties in mind: CImg defines clas

David Tschumperlé 1.1k Jun 27, 2022
A fast image processing library with low memory needs.

libvips : an image processing library Introduction libvips is a demand-driven, horizontally threaded image processing library. Compared to similar lib

libvips 16 Jun 19, 2022
GLSL Image Processing System

GIPS: The GLSL Image Processing System An image processing application that applies filters written in the OpenGL Shading Language (GLSL). This means

Martin Fiedler 52 Jun 15, 2022
This is a C++17 deployment of deep-learning based image inpainting algorithm on Windows10, using Libtorch, Opencv and Qt.

This is a desktop software for image inpainting. It is a C++ deployment of image inpainting algorithm on Windows10, based on C++17 and implemented using vs2019.

null 4 May 13, 2022
PoC black/white image sequence to dumpy gif image sequence converter

PoC black/white image sequence to dumpy gif image sequence converter

null 63 Jun 27, 2022
The “Quite OK Image” format for fast, lossless image compression

The “Quite OK Image” format for fast, lossless image compression

Dominic Szablewski 5.6k Jun 24, 2022
The minimal opencv for Android, iOS and ARM Linux

opencv-mobile ✔️ This project provides the minimal build of opencv library for the Android, iOS and ARM Linux platforms. ✔️ We provide prebuild binary

null 818 Jun 30, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.2k Jun 27, 2022
Reading, writing, and processing images in a wide variety of file formats, using a format-agnostic API, aimed at VFX applications.

README for OpenImageIO Introduction The primary target audience for OIIO is VFX studios and developers of tools such as renderers, compositors, viewer

OpenImageIO 1.5k Jun 29, 2022
This library provides a cross-platform image loading library in C11 for projects based on our foundation library

Image Library - Public Domain This library provides a cross-platform image loading library in C11 for projects based on our foundation library.

Mattias Jansson 1 Jan 29, 2022
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 368 Jun 25, 2022
Intel® Open Image Denoise library

Intel Open Image Denoise is an open source library of high-performance, high-quality denoising filters for images rendered with ray tracing

Intel® Open Image Denoise 1.3k Jul 1, 2022
Arduino PNG image decoder library

An 'embedded-friendly' (aka Arduino) PNG image decoding library

Larry Bank 91 Jun 24, 2022
libvot - A C++11 multi-thread library for image retrieval

libvot is a fast implementation of vocabulary tree, which is an algorithm widely used in image retrieval and computer vision. It usually comprises three components to build a image retrieval system using vocabulary tree: build a k-means tree using sift descriptors from images, register images into the database, query images against the database. I

Tianwei Shen 166 Apr 8, 2022
An 'embedded-friendly' (aka Arduino) JPEG image encoding library

Starting in the late 80's I wrote my own imaging codecs for the existing standards (CCITT G3/G4 was the first). I soon added GIF, JPEG and not long after that, the PNG specification was ratified. All of this code was "clean room" - written just from the specification. I used my imaging library in many projects and products over the years and recently decided that some of my codecs could get a new lease on life as open source, embedded-friendly libraries for microcontrollers.

Larry Bank 34 Jun 21, 2022
Small header-only C library to decompress any BC compressed image

Small header-only C library to decompress any BC compressed image

null 61 Jun 29, 2022
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Aaron Feng 7k Jun 29, 2022