A toolkit for making real world machine learning and data analysis applications in C++

Overview

dlib C++ library Travis Status

Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. See http://dlib.net for the main project documentation and API reference.

Compiling dlib C++ example programs

Go into the examples folder and type:

mkdir build; cd build; cmake .. ; cmake --build .

That will build all the examples. If you have a CPU that supports AVX instructions then turn them on like this:

mkdir build; cd build; cmake .. -DUSE_AVX_INSTRUCTIONS=1; cmake --build .

Doing so will make some things run faster.

Finally, Visual Studio users should usually do everything in 64bit mode. By default Visual Studio is 32bit, both in its outputs and its own execution, so you have to explicitly tell it to use 64bits. Since it's not the 1990s anymore you probably want to use 64bits. Do that with a cmake invocation like this:

cmake .. -G "Visual Studio 14 2015 Win64" -T host=x64 

Compiling your own C++ programs that use dlib

The examples folder has a CMake tutorial that tells you what to do. There are also additional instructions on the dlib web site.

Alternatively, if you are using the vcpkg dependency manager you can download and install dlib with CMake integration in a single command:

vcpkg install dlib

Compiling dlib Python API

Before you can run the Python example programs you must compile dlib. Type:

python setup.py install

Running the unit test suite

Type the following to compile and run the dlib unit test suite:

cd dlib/test
mkdir build
cd build
cmake ..
cmake --build . --config Release
./dtest --runall

Note that on windows your compiler might put the test executable in a subfolder called Release. If that's the case then you have to go to that folder before running the test.

This library is licensed under the Boost Software License, which can be found in dlib/LICENSE.txt. The long and short of the license is that you can use dlib however you like, even in closed source commercial software.

dlib sponsors

This research is based in part upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA) under contract number 2014-14071600010. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, or the U.S. Government.

Issues
  • YOLO loss

    YOLO loss

    Hi, I've been spending the last few days trying to make a loss_yolo layer for dlib, in particular the loss presented in the YOLOv3 paper.

    I think I came up with a pretty straightforward implementation but, as of now, it still does not work.

    I wondered if you could have a look. I am quite confident the loss implementation is correct, however I think I might be making some assumptions about the dlib API when the loss layer takes several inputs from the network.

    I tried to make the loss similar to the loss_mmod in the way you set the options of the layer, etc.

    So, my question is, does this way of coding the loss in dlib make sense for multiple outputs? Or is dlib doing something I don't expect?

    There's also a simple example program that takes a path containing a training.xml file (like the one from the face or vehicle detection examples).

    Thanks in advance :)

    opened by arrufat 94
  • QUESTION : is yolov3 possible in DLIB

    QUESTION : is yolov3 possible in DLIB

    I am trying to define yolov3 using dlib's dnn module. I'm stuck with the darknet53 backbone, as I want it to output the outputs of the last three layers. So far i have this:

    using namespace dlib;
                        
    template <int outc, int kern, int stride, typename SUBNET> 
    using conv_block = leaky_relu<affine<con<outc,kern,kern,stride,stride,SUBNET>>>;
    
    template <int inc, typename SUBNET>
    using resblock = add_prev1<conv_block<inc,3,1,conv_block<inc/2,1,1,tag1<SUBNET>>>>;
    
    template<int nblocks, int outc, typename SUBNET>
    using conv_resblock = repeat<nblocks, resblock<outc,
                          conv_block<outc, 3, 2, SUBNET>>>;
    
    template<typename SUBNET>
    using darknet53 = tag3<conv_resblock<4, 1024,
                      tag2<conv_resblock<8, 512,
                      tag1<conv_resblock<8, 256,
                      conv_resblock<2, 128,
                      conv_resblock<1, 64,
                      conv_block<32, 3, SUBNET
                      >>>>>>>>>;
    

    Is it possible for darknet53 to output tag1, tag2 and tag3?

    opened by pfeatherstone 83
  • Optimize dlib for POWER8 VSX

    Optimize dlib for POWER8 VSX

    Enable and optimize support for POWER8 VSX SIMD instructions on PPC64LE Linux to dlib/simd.

    $$$$ Financial bounties available. Any reasonable suggested value will be seriously considered.

    I welcome contact / replies from developers in the dlib community who are interested to work on this project.

    paid-bounty 
    opened by edelsohn 79
  • Example of DCGAN

    Example of DCGAN

    Hi, I would like to contribute a DCGAN example to dlib.

    I have implemented a version of Pytorch DCGAN for C++.

    However, I would need some guidance with some things I don't know how to do. I am wondering on how I should proceed. Should I attach my current code here (around 150 lines), or make a pull request, even if the code is not able to learn anything? Maybe @edubois can help out, since he stated that he managed to make it work on https://github.com/davisking/dlib/issues/1261

    Thanks for your hard work on dlib.

    enhancement 
    opened by arrufat 67
  • Arbitrary sized FFTs using modified kissFFT as default backend and MKL otherwise

    Arbitrary sized FFTs using modified kissFFT as default backend and MKL otherwise

    This PR adds a C++ port of kissFFT as the default backend of dlib's fft routines. All the kiss code is inserted into the dlib namespace to avoid conflicts with user code that may or may not be using original (C) kissFFT. All MKL fft code is put into it's own separate header file. Now dlib/matrix_fft.h simply calls the right wrappers depending on a pre-processor macro (as before). I've removed the FFTW wrapper code as per @davisking's request as the fftw planner isn't thread safe. I've also removed the original fft backend code, again, as per @davisking's request.

    Note that this PR isn't quite ready yet. Need more unit tests. Not quite sure why MKL wrappers only worked for matrix<complex<double>> before. There's no reason why it couldn't work for matrix<complex<float>> if using DFTI_SINGLE instead of DFTI_DOUBLE. (maybe user code had to do a cast?) So need more units tests for MKL wrappers.

    Not quite happy with the number of copies at the moment. When passing const matrix_exp<EXP>& data as input, there are 2 copies: one to evaluate the matrix expression to matrix<typename EXP::type> and one to do either the in-place fft (copy is done in the implementation details of the fft functions) or out-of-place fft (copy is done explicitly in dlib/matrix_fft.h). For example, if you want to do the fft of std::vector<std::complex>, you have to use dlib::mat which returns a matrix expression then dlib::fft will evaluate that to dlib::matrix<std::complex<float>>. So there is a copy of std::vector to dlib::matrix that is unnecessary. Could add some function overloads to support std::vector. Don't know what's the best thing to do without dirtying the API. Could benefit from @davisking's advice.

    I think if we've gone this far, we should support real FFTs so the requirement that input should be complex vanishes. Output will be complex though. Both KISS and MKL support this so there shouldn't be a lot of leg work. More unit tests.

    opened by pfeatherstone 56
  • Add dnn self supervised learning example

    Add dnn self supervised learning example

    Hi, recently I've been interested in self-supervised learning (SSL) methods in deep learning. I noticed that dlib has support for unsupervised loss layers by just leaving out the training_label_type typedef and the truth iterator argument to compute_loss_value_and_gradient().

    However, most methods I've read about and tried are quite complicated (hard-negative mining, predictors, stop gradient, exponential moving average of the weights between both architectures) due to the need of breaking the symmetry between both branches to avoid collapse. But recently, I came across a simple method named Barlow Twins: Self-Supervised Learning via Redundancy Reduction. The idea is as follows:

    1. forward two augmented versions of an image
    2. compute the empirical cross-correlation matrix between both feature representations
    3. make that matrix as close to the identity as possible.

    That prevents collapse, since it's making each individual dimension of the feature vectors to focus on different tasks, and thus avoiding redundancies between dimensions (it needs a high dimension for it to work, since it relies on sparse representations).

    So far, I've implemented:

    input_rgb_image_pair

    It takes a std::vector<std::pair<matrix<rgb_pixel>, matrix<rgb_pixel>>> and puts the first elements of the pairs in the first half of the batch and the second elements of the batch in the second half of the batch. This allows computing batch normalization on each half efficiently (done on the loss layer).

    loss_barlow_twins

    I tried to follow the official paper as close as possible, and made use of the awesome Matrix Calculus site for the gradients (having element wise operations, (off-)diagonal terms, and summations was a bit tedious to do manually 😅)

    But… I am experiencing some difficulties:

    • if I use a dnn_trainer I get a segfault (but if I train manually, like in the code, it works: at least the loss goes down)
    • if I use a batch size larger than 64, I also get a segfault (but I have plenty of RAM/VRAM)

    So maybe I misunderstood something about how to implement the input layer or the unsupervised loss layer… If you could have a look at some point… You can just run the example by giving a path to a folder containing the CIFAR-10 dataset.

    It would be great to have an example of SSL on dlib.

    Thanks in advance (all the code is inside the example program, I will make a proper PR if we manage to get this work, and you are interested, since the method is fairly new)

    opened by arrufat 54
  • Semantic Segmentation Functionality

    Semantic Segmentation Functionality

    I'm interested in using dlib for semantic segmentation. I think the only necessary features would be:

    • Loss function - it doesn't look like loss_multiclass_log would work for this
    • "Unpooling" upsampling layer - there are a few ways to do this

    Is this something you're interested in supporting? I'm planning to add this functionality regardless, just wanted to check in.

    Regarding implementation, would the upsampling be something you would want added to the pooling class or its own upsample class?

    enhancement help wanted 
    opened by davidmascharka 53
  • Adding an install target to dlib's CMakeLists

    Adding an install target to dlib's CMakeLists

    This PR covers the first part of items discussed in #34, namely, the installation.

    A follow-up PR (hopefully tomorrow) should cover the 2nd part, namely, the generation and installation of dlibConfig.cmake

    opened by severin-lemaignan 51
  • Add support for fused convolutions

    Add support for fused convolutions

    I've been playing a bit with the idea of having fused convolutions (convolution + batch_norm) in dlib. I think the first step would be to move all the operations that are done by the affine_ layer into the convolution, that is, update the bias of the convolution and re-scale the filters.

    This PR adds some helper methods that allow doing this. The next step could be adding a new layer that can be constructed from an affine_ layer and it's a no-op, like the tag layers, or add a version of the affine layer that does nothing (just outputs its input, without copying or anything). How would you approach this?

    Finally, here's an example that uses a visitor to update the convolutions that are below an affine layer. It can be build from by putting the file into the examples folder and loading the pretrained resnet 50 from the dnn_introduction3_ex.cpp. If we manage to make something interesting out of it, maybe it would be interesting to have this visitor, too.

    #include "resnet.h"
    
    #include <dlib/dnn.h>
    #include <dlib/image_io.h>
    
    using namespace std;
    using namespace dlib;
    
    class visitor_fuse_convolutions
    {
        public:
        template <typename T> void fuse_convolutions(T&) const
        {
            // disable other layer types
        }
    
        // handle the standard case (convolutional layer followed by affine;
        template <long nf, long nr, long nc, int sy, int sx, int py, int px, typename U, typename E>
        void fuse_convolutions(add_layer<affine_, add_layer<con_<nf, nr, nc, sy, sx, py, px>, U>, E>& l)
        {
            // get the parameters from the affine layer as alias_tensor_instance
            auto gamma = l.layer_details().get_gamma();
            auto beta = l.layer_details().get_beta();
    
            // get the convolution below the affine layer and its paramaters
            auto& conv = l.subnet().layer_details();
            const long num_filters_out = conv.num_filters();
            const long num_rows = conv.nr();
            const long num_cols = conv.nc();
            tensor& params = conv.get_layer_params();
            // guess the number of input filters
            long num_filters_in;
            if (conv.bias_is_disabled())
                num_filters_in = params.size() / num_filters_out / num_rows / num_cols;
            else
                num_filters_in = (params.size() - num_filters_out) / num_filters_out / num_rows / num_cols;
    
            // set the new number of parameters for this convolution
            const size_t num_params = num_filters_in * num_filters_out * num_rows * num_cols + num_filters_out;
            alias_tensor filters(num_filters_out, num_filters_in, num_rows, num_cols);
            alias_tensor biases(1, num_filters_out);
            if (conv.bias_is_disabled())
            {
                conv.enable_bias();
                resizable_tensor new_params = params;
                new_params.set_size(num_params);
                biases(new_params, filters.size()) = 0;
                params = new_params;
            }
    
            // update the biases
            auto b = biases(params, filters.size());
            b+= mat(beta);
    
            // rescale the filters
            DLIB_CASSERT(filters.num_samples() == gamma.k());
            auto t = filters(params, 0);
            float* f = t.host();
            const float* g = gamma.host();
            for (long n = 0; n < filters.num_samples(); ++n)
            {
                for (long k = 0; k < filters.k(); ++k)
                {
                    for (long r = 0; r < filters.nr(); ++r)
                    {
                        for (long c = 0; c < filters.nc(); ++c)
                        {
                            f[tensor_index(t, n, k, r, c)] *= g[n];
                        }
                    }
                }
            }
    
            // reset the affine layer
            gamma = 1;
            beta = 0;
        }
    
        template <typename input_layer_type>
        void operator()(size_t , input_layer_type& l) const
        {
            // ignore other layers
        }
    
        template <typename T, typename U, typename E>
        void operator()(size_t , add_layer<T, U, E>& l)
        {
            fuse_convolutions(l);
        }
    };
    
    int main(const int argc, const char** argv)
    try
    {
        resnet::infer_50 net1, net2;
        std::vector<std::string> labels;
        deserialize("resnet50_1000_imagenet_classifier.dnn") >> net1 >> labels;
        net2 = net1;
        matrix<rgb_pixel> image;
        load_image(image, "elephant.jpg");
    
        const auto& label1 = labels[net1(image)];
        const auto& out1 = net1.subnet().get_output();
        resizable_tensor probs(out1);
        tt::softmax(probs, out1);
        cout << "pred1: " << label1 << " (" << max(mat(probs)) << ")" << endl;
    
    
        // fuse the convolutions in the network
        dlib::visit_layers_backwards(net2, visitor_fuse_convolutions());
        const auto& label2 = labels[net2(image)];
        const auto& out2 = net2.subnet().get_output();
        tt::softmax(probs, out2);
        cout << "pred2: " << label2 << " (" << max(mat(probs)) << ")" << endl;
    
        cout << "max abs difference: " << max(abs(mat(out1) - mat(out2))) << endl;
        DLIB_CASSERT(max(abs(mat(out1) - mat(out2))) < 1e-2);
    }
    catch (const exception& e)
    {
        cout << e.what() << endl;
        return EXIT_FAILURE;
    }
    

    output with this image (elephant.jpg): elephant

    pred1: African_elephant (0.962677)
    pred2: African_elephant (0.962623)
    max abs difference: 0.00436211
    

    UPDATE: make visitor more generic and show a results with a real image

    enhancement 
    opened by arrufat 50
  • Trying to compile dlib 19.20 with cuda 11 and cudnn 8, or cuda 10.1 and cudnn 7.6.4

    Trying to compile dlib 19.20 with cuda 11 and cudnn 8, or cuda 10.1 and cudnn 7.6.4

    Environment: Pop-Os (Ubuntu 20.04) Gcc 9 and 8 (tried both) Cmake 3.16.3 dlib 19.20.99 python 3.8.2 NVIDIA Quadro P600

    Expected Behavior

    Compiling with cuda..

    Current Behavior

    ..... -- Found cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so -- Building a CUDA test project to see if your compiler is compatible with CUDA... -- Checking if you have the right version of cuDNN installed. -- *** Found cuDNN, but it looks like the wrong version so dlib will not use it. *** -- *** Dlib requires cuDNN V5.0 OR GREATER. Since cuDNN is not found DLIB WILL NOT USE CUDA. .....

    • Where did you get dlib: git clone https://github.com/davisking/dlib.git

    I first tried downloading the latest (and suggested) version of cuda from Nvidia site, which was Cuda 11. then downloaded cuDNN for Cuda 11, which was 8.0.0 (runtime and dev .deb packets) installed them following Nvidia method.

    when compiling "cudnn_samples_v8" everything works, so I think installations went ok. but no way to get dlib compiled with Cuda.

    I've tried to uninstall cuda 11 and cudnn 8 and install cuda 10.1 and cudnn 7.6 (suggested for cuda 10.1) but the result is the same. Every time I erased the build directory, and compiled in 2 ways: as per dlib instructions, using: $ sudo python3 setup.py install or: $ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1

    Any suggestion? Thanks.

    inactive 
    opened by spiderdab 48
  • Arbitrary FFT sizes

    Arbitrary FFT sizes

    Feature suggestion: allow the fft routine to do arbitrary sized FFTs. Let FFTW, lapack, BLAS or whatever do the magic of choosing the correct kernels for the input

    enhancement 
    opened by pfeatherstone 47
  • Attempting to modernise the cmake build script

    Attempting to modernise the cmake build script

    This PR requires CMake 3.17 or higher. Maybe this is too new. If anything, this is a good exercise in trying to understand cmake in more detail. The key points are:

    • use the target_xxx() functions
    • don't need variables for needed libraries, includes or sources. You can just use cmake functions to set things on targets
    • use the built-in helpers for findings libraries like PNG, JPEG, GIF, SQLITE, even BLAS, LAPACK and CUDA It's not ready yet of course.
    opened by pfeatherstone 3
  •  want to contribute to creating a pipeline that converts Dlib deeplearning models to ONNX standard format.

    want to contribute to creating a pipeline that converts Dlib deeplearning models to ONNX standard format.

    Dlib machine learning platform is powerful and great. And thank you for allowing us to use these powerful platforms. As you know, these days, deep learning frameworks are standardized and changing with ONNX format. We want to contribute to creating a pipeline that converts Dlib deeplearning models to ONNX standard format. Please give us a lot of advice.

    opened by intellizd 5
  • extract_highdim_face_lbp_descriptors  LFW result

    extract_highdim_face_lbp_descriptors LFW result

    Hi, I want to thank you for your efforts. I tested extract_highdim_face_lbp_descriptors on LFW face dataset. However result is very bad. In the original paper (Blessing of Dimensionality: High-dimensional Feature and Its Efficient Compression for Face Verification) Table 1, verification accuracy is more than 93% , but it is 68 % from this function. where is the error?

    sample code:

    	cv_image<unsigned char> img(img1);  // where img1 is cv::Mat gray image 
    	array2d<unsigned char> img2;
    	assign_image(img2, img);
    	full_object_detection shape(faceRect, Points68);
    	std::vector<double>feats;
    	extract_highdim_face_lbp_descriptors(img2, shape, feats);
    

    also I displayed faceRect and points68 and all things go fine image

    Thanks in advance

    opened by HeshamAMH 0
  • How to train objects to detect the shape of an electronic chip?

    How to train objects to detect the shape of an electronic chip?

    Hello: I would like to use dlib training to detect this shape object 100 * 100 size target image, search the large image is 2448 * 2048 size, is there a similar example?

    opened by w136111526 6
C++ image processing and machine learning library with using of SIMD: SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX-512, VMX(Altivec) and VSX(Power7), NEON for ARM.

Introduction The Simd Library is a free open source image processing and machine learning library, designed for C and C++ programmers. It provides man

Ihar Yermalayeu 1.6k Aug 4, 2022
A computationally efficient and convenient toolkit of iterated Kalman filter.

IKFoM IKFoM (Iterated Kalman Filters on Manifolds) is a computationally efficient and convenient toolkit for deploying iterated Kalman filters on vari

HKU-Mars-Lab 182 Aug 5, 2022
The CImg Library is a small and open-source C++ toolkit for image processing

http://cimg.eu The CImg Library is a small and open-source C++ toolkit for image processing, designed with these properties in mind: CImg defines clas

David Tschumperlé 1.1k Aug 8, 2022
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 368 Jun 25, 2022
Reading, writing, and processing images in a wide variety of file formats, using a format-agnostic API, aimed at VFX applications.

README for OpenImageIO Introduction The primary target audience for OIIO is VFX studios and developers of tools such as renderers, compositors, viewer

OpenImageIO 1.5k Jul 27, 2022
a Vulkan real-time rendering engine (Windows and Linux).

Luz Engine A Vulkan engine that I'm developing to study and implement modern rendering techniques used by AAA games. Videos on Youtube Features How to

Hadryan Salles 47 Aug 5, 2022
This is a C++17 deployment of deep-learning based image inpainting algorithm on Windows10, using Libtorch, Opencv and Qt.

This is a desktop software for image inpainting. It is a C++ deployment of image inpainting algorithm on Windows10, based on C++17 and implemented using vs2019.

null 4 May 13, 2022
Embed image data directly to HTML files.

compact_html Welcome! Embed image data directly to HTML files. Thanks: cpp-base64: Base64 encoding and decoding with c++. cpprestsdk: The C++ REST SDK

Frost Sigh 41 Jul 19, 2022
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.

Aaron Feng 7.3k Aug 1, 2022
CGIF, A fast and lightweight GIF encoder that can create GIF animations and images

CGIF, a GIF encoder written in C A fast and lightweight GIF encoder that can create GIF animations and images. Summary of the main features: user-defi

Daniel Löbl 66 Jul 27, 2022
libspng is a C library for reading and writing PNG format files with a focus on security and ease of use.

libspng (simple png) is a C library for reading and writing Portable Network Graphics (PNG) format files with a focus on security and ease of use.

Randy 520 Aug 6, 2022
A crude untested example showing how to retrieve and display images from multiple cameras with OpenCV and wxWidgets.

About wxOpenCVCameras is a crude untested example of how to retrieve and display images from multiple cameras, using OpenCV to grab images from a came

PB 6 Jul 13, 2022
Generate Height map with Generator (OpenGL and imgui) and Construct Splat Map with generated height map using Algorithm

Generate Height map with Generator (OpenGL and imgui) and Construct Splat Map with generated height map using Algorithm(DPS, BFS, Gradient Descent ... etc) . At Renderer, with height map and blend map which are generated in front of this stage, render high quality terrain with OpenGL

Snowapril 35 Mar 22, 2022
PNG encoder and decoder in C and C++.

PNG encoder and decoder in C and C++, without dependencies

Lode Vandevenne 1.7k Aug 4, 2022
An image and texture viewer for tga, png, apng, exr, dds, gif, hdr, jpg, tif, ico, webp, and bmp files

An image and texture viewer for tga, png, apng, exr, dds, gif, hdr, jpg, tif, ico, webp, and bmp files. Uses Dear ImGui, OpenGL, and Tacent. Useful for game devs as it displays information like the presence of an alpha channel and querying specific pixels for their colour.

Tristan Grimmer 140 Aug 4, 2022
A bunch of functions and helpers classes for D3D12, and DXGI libraries

TypedD3D A bunch of functions and helpers classes for D3D12, and DXGI libraries Namespaces Helpers A bunch of helper functions that interface more wit

Renzy Alarcon 6 Jun 17, 2022
Tiny ISO-compliant C++ EXIF and XMP parsing library for JPEG.

TinyEXIF: Tiny ISO-compliant C++ EXIF and XMP parsing library for JPEG Introduction TinyEXIF is a tiny, lightweight C++ library for parsing the metada

cDc 78 Jul 31, 2022
Video++, a C++14 high performance video and image processing library.

Video++ Video++ is a video and image processing library taking advantage of the C++14 standard to ease the writing of fast video and image processing

Matthieu Garrigues 683 Jul 28, 2022