Header-only library for using Keras models in C++.



Build Status (License MIT 1.0)


Use Keras models in C++ with ease

Table of contents


Would you like to build/train a model using Keras/Python? And would you like to run the prediction (forward pass) on your model in C++ without linking your application against TensorFlow? Then frugally-deep is exactly for you.


  • is a small header-only library written in modern and pure C++.
  • is very easy to integrate and use.
  • depends only on FunctionalPlus, Eigen and json - also header-only libraries.
  • supports inference (model.predict) not only for sequential models but also for computational graphs with a more complex topology, created with the functional API.
  • re-implements a (small) subset of TensorFlow, i.e., the operations needed to support prediction.
  • results in a much smaller binary size than linking against TensorFlow.
  • works out-of-the-box also when compiled into a 32-bit executable. (Of course, 64 bit is fine too.)
  • utterly ignores even the most powerful GPU in your system and uses only one CPU core per prediction. ;-)
  • but is quite fast on one CPU core compared to TensorFlow, and you can run multiple predictions in parallel, thus utilizing as many CPUs as you like to improve the overall prediction throughput of your application/pipeline.

Supported layer types

Layer types typically used in image recognition/generation are supported, making many popular model architectures possible (see Performance section).

  • Add, Concatenate, Subtract, Multiply, Average, Maximum
  • AveragePooling1D/2D, GlobalAveragePooling1D/2D
  • Bidirectional, TimeDistributed, GRU, LSTM, CuDNNGRU, CuDNNLSTM
  • Conv1D/2D, SeparableConv2D, DepthwiseConv2D
  • Cropping1D/2D, ZeroPadding1D/2D
  • BatchNormalization, Dense, Flatten
  • Dropout, AlphaDropout, GaussianDropout, GaussianNoise
  • SpatialDropout1D, SpatialDropout2D, SpatialDropout3D
  • MaxPooling1D/2D, GlobalMaxPooling1D/2D
  • ELU, LeakyReLU, ReLU, SeLU, PReLU
  • Sigmoid, Softmax, Softplus, Tanh
  • UpSampling1D/2D
  • Reshape, Permute
  • Embedding

Also supported

  • multiple inputs and outputs
  • nested models
  • residual connections
  • shared layers
  • variable input shapes
  • arbitrary complex model architectures / computational graphs
  • custom layers (by passing custom factory functions to load_model)

Currently not supported are the following:

ActivityRegularization, AveragePooling3D, Conv2DTranspose, Conv3D, ConvLSTM2D, Cropping3D, Dot, GRUCell, LocallyConnected1D, LocallyConnected2D, LSTMCell, Masking, MaxPooling3D, RepeatVector, RNN, SimpleRNN, SimpleRNNCell, StackedRNNCells, ThresholdedReLU, Upsampling3D, temporal models


  1. Use Keras/Python to build (model.compile(...)), train (model.fit(...)) and test (model.evaluate(...)) your model as usual. Then save it to a single HDF5 file using model.save('....h5', include_optimizer=False). The image_data_format in your model must be channels_last, which is the default when using the TensorFlow backend. Models created with a different image_data_format and other backends are not supported.

  2. Now convert it to the frugally-deep file format with keras_export/convert_model.py

  3. Finally load it in C++ (fdeep::load_model(...)) and use model.predict(...) to invoke a forward pass with your data.

The following minimal example shows the full workflow:

# create_model.py
import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

inputs = Input(shape=(4,))
x = Dense(5, activation='relu')(inputs)
predictions = Dense(3, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)
model.compile(loss='categorical_crossentropy', optimizer='nadam')

    np.asarray([[1, 2, 3, 4], [2, 3, 4, 5]]),
    np.asarray([[1, 0, 0], [0, 0, 1]]), epochs=10)

model.save('keras_model.h5', include_optimizer=False)
python3 keras_export/convert_model.py keras_model.h5 fdeep_model.json
// main.cpp
#include <fdeep/fdeep.hpp>
int main()
    const auto model = fdeep::load_model("fdeep_model.json");
    const auto result = model.predict(
        std::vector<float>{1, 2, 3, 4})});
    std::cout << fdeep::show_tensors(result) << std::endl;

When using convert_model.py a test case (input and corresponding output values) is generated automatically and saved along with your model. fdeep::load_model runs this test to make sure the results of a forward pass in frugally-deep are the same as in Keras.

For more integration examples please have a look at the FAQ.


Below you can find the average durations of multiple consecutive forward passes for some popular models ran on a single core of an Intel Core i5-6600 CPU @ 3.30GHz. frugally-deep and TensorFlow were compiled (GCC ver. 7.1) with g++ -O3 -march=native. The processes were started with CUDA_VISIBLE_DEVICES='' taskset --cpu-list 1 ... to disable the GPU and to only allow usage of one CPU. (see used Dockerfile)

Model Keras + TF frugally-deep
DenseNet121 0.11 s 0.29 s
DenseNet169 0.13 s 0.36 s
DenseNet201 0.16 s 0.49 s
InceptionV3 0.17 s 0.35 s
MobileNet 0.06 s 0.20 s
MobileNetV2 0.06 s 0.22 s
NASNetLarge 1.38 s 4.83 s
NASNetMobile 0.14 s 0.40 s
ResNet101 0.24 s 0.50 s
ResNet101V2 0.21 s 0.47 s
ResNet152 0.32 s 0.72 s
ResNet152V2 0.30 s 0.69 s
ResNet50 0.14 s 0.28 s
ResNet50V2 0.12 s 0.25 s
VGG16 0.41 s 0.63 s
VGG19 0.52 s 0.76 s
Xception 0.35 s 1.26 s

Requirements and Installation

  • A C++14-compatible compiler: Compilers from these versions on are fine: GCC 4.9, Clang 3.7 (libc++ 3.7) and Visual C++ 2015
  • Python 3.7 or higher
  • TensorFlow 2.4.0

Guides for different ways to install frugally-deep can be found in INSTALL.md.


See FAQ.md


The API of this library still might change in the future. If you have any suggestions, find errors or want to give general feedback/criticism, I'd love to hear from you. Of course, contributions are also very welcome.


Distributed under the MIT License. (See accompanying file LICENSE or at https://opensource.org/licenses/MIT)

  • Problem with results of siamese CNN using EfficientNet

    Problem with results of siamese CNN using EfficientNet

    Hi there,

    First of all let me thank you for this fantastic library!

    Recently I got stuck on converting a siamese network that utilizes functional model and EfficientNetB0 architecture. I'm strictly following this repo for my development: https://github.com/sajadamouei/Person-Re-ID-with-light-weight-network. Since EfficientNetB0 uses FixedDropout and reduce layers that shrink the dimensionality (requires multiplying tensors by 1x1xDEPTH Conv) I had to implement them myself in the library. When I convert EfficientNetB0 on its own and load it in my C++ app, the output is EXACTLY as expected on both python and C++ side - no problems there. However, When I try to create siamese network out of them like presented here: https://github.com/sajadamouei/Person-Re-ID-with-light-weight-network/blob/master/model.py I get totally different results. In anticipation to your question - yes, I made super sure that the inputs to the network are EXACTLY the same on both sides - python and C++. I've tried everything to fix this and concluded that there must be something wrong with either the way frugally-deep deals with functional models OR the converter itself. What I also noticed is that tensors look completely different when they reach both Flatten layers in the architecture. Any ideas why this may be happening? Please look at the below screenshots to better understand the problem.

    github_plane Screenshot 2022-03-06 at 11 57 21

    opened by pavel123 37
  • Hash value for json/net loaded?

    Hash value for json/net loaded?

    I think it would be handy for us to have a hash over a loaded model (so I could store, together with the results, some indication of how they were generated - particularly handy for encodings, which tend to be incompatible). I could simply calculate a hash over the file/string used to initialise the net, but since many files could potentially result in the same net it would be nicer if the net itself could provide such a hash. Is such a function implemented or, if not, do you see an easy way to get such a hash?



    opened by utcke 36
  • How to convert model with

    How to convert model with "relu6" layer?

    My Keras model uses "rule6" layer , how to change convert_model.py to make the json file? and any examples for adding custom layer in fdeep::load_model?

    Thank you very much!

    opened by binlbl 32
  • Using Eigen Unsupported modules to improve convolutions

    Using Eigen Unsupported modules to improve convolutions

    I noticed that Eigen 3.3 has unsupported modules, including modules for Tensors and gemm operations.


    I noticed you implement your own gemm operation in fdeep/convolution.hpp in function convolve_im2col. This could be improved by using gemm functions from the eigen unsupported modules.

    I ran a test by inferring the UNet model from pix2pix in frugally deep. It took 18s compared to a model converted from onnx and inferred in OpenCV which took 3s. I think this shows that convolutions in frugally could be improved.


    opened by pfeatherstone 32
  • Slow-ish run time on MSVC

    Slow-ish run time on MSVC


    First of all thank you for this great library! :-) I've got a fairly small model (18 layers) for real-time applications, basically mainly consisting of 5 blocks of Conv2D/ReLu/MaxPool2D, and input size 64x64x3. I'm unfortunately seeing some speed problems with fdeep. A forward pass takes around 11ms in Keras, and it's taking 60ms in fdeep. (I've measured by calling predict 100x in a for-loop and then averaging - a bit crude but should do the trick for this purpose). I've compiled with the latest VS2017 15.5.5, Release mode, and default compiler flags (/O2). If I enable AVX2 and instrinsics, it goes down to 50ms, but still way too slow. (I've tried without im2col but it's even slower, around >10x).

    I've run the VS profiler, but I'm not 100% sure I'm interpreting the results correctly. I think around 30%+5% of the total time is spent in Eigen's gebp and gemm functions, where we probably can't do much. Except maybe: I think I've seen you're using RowMajor storage for the Eigen matrices. Eigen is supposedly more optimised for its default, ColMajor storage. Would it be hard to change that in fdeep? Another 30% seems to be spent in convolve_im2col. But I'm not 100% sure where. I first thought it was the memcpy in eigen_mat_to_values but eigen_mat_to_values itself contains very few profiler samples only. There's also a lot of internal::transform and std::transform showing up in the profiler as well (internal::transform<ContainerOut>(reuse_t{}, f, std::forward<ContainerIn>(xs));) but I couldn't really figure out what the actual code is that this executes. I also saw that I think you pre-instantiate some convolution functions for common kernels. Most of my convolution kernels are 3x3, and it looks like you only instantiate n x m kernels for n and m equals 1 and 2. Could it help adding 3x3 there? So yea I'm really not sure about all of it. If indeed the majority of time is spent in Eigen's functions, then the RowMajor thing could indeed be a major problem.

    I'm happy to send you the model and an example input via email if you wanted to have a look.

    Here's some screenshots of the profiler: image image image

    Thank you very much!

    opened by patrikhuber 32
  • Input to model

    Input to model

    If I have RGB Image and i want it to pass it to the model , what should i do ?

    what i've made is flatten the input image into vector of float , i appened the r , g , b values after each others to get just 1 vector called "input_vector"

    and then this is the next step.

             typedef fplus::shared_ref<std::vector<float>> shared_float_vec;
             shared_float_vec x(fplus::make_shared_ref<vector<float>>(std::move(input_vector)));
             const auto result = decision_model.predict({fdeep::tensor3(fdeep::shape3(3,60,60),x)});

    then the output is incorrect , what should i do then ? or what i've done wrong ?

    opened by rmmal 32
  •  lambda layer using tf.image

    lambda layer using tf.image

    I am using Lambda layer which includes this function to extract patches in image

    patch_one = tf.image.extract_glimpse(inputs[0], [26, 26], inputs[1][:, j, :], centered=False, normalized=False, noise='zero')

    Is it possible to implement this custom layer in your library and load model?

    opened by katmatus 30
  • Stop at the 'Loading json ...'

    Stop at the 'Loading json ...'

    Hi Tobias, thanks for this great library! I trained a ResNet50 network using Keras. I was able to convert the .h5 model to a .json. However, when I run the program as follows:

    #include <fdeep/fdeep.hpp>
    #include <opencv2/opencv.hpp>
    int main()
    	const cv::Mat image = cv::imread("Image_1_2.jpg");
    	cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
    	const auto model = fdeep::load_model("train7.json");
    	// Use the correct scaling, i.e., low and high.
    	const auto input = fdeep::tensor5_from_bytes(image.ptr(),
    		0.0f, 1.0f);
    	const auto result = model.predict_class({ input });
    	std::cout << result << std::endl;

    It likes the example in the FAQ--How to use images loaded with OpenCV as input for a model? But it doesn't work with my Keras model. It just spent about 236s to load json, and then stop here. My CPU is Core i5-3230M, which is not a good CPU. My model is used to classify 7 kinds of algae cells, which used transfer learning based on ResNet50.
    The python program for trainning model as follows:

    import numpy as np
    import matplotlib.pyplot as plt
    import keras
    from keras.preprocessing import image
    from keras.preprocessing.image import ImageDataGenerator
    from keras.applications import ResNet50
    from keras.applications.resnet50 import preprocess_input
    from keras import Model, layers
    from keras.models import load_model
    input_path = "data/LvsRod/"
    train_datagen = ImageDataGenerator(
        rescale=1. / 255,
    train_generator = train_datagen.flow_from_directory(
        input_path + 'train',
        target_size=(224, 224))
    validation_datagen = ImageDataGenerator(
        rescale=1. / 255,
    validation_generator = validation_datagen.flow_from_directory(
        input_path + 'validation',
        target_size=(224, 224))
    conv_base = ResNet50(include_top=False, weights='imagenet', input_shape=(224, 224, 3))
    for layer in conv_base.layers:
        layer.trainable = False
    x = conv_base.output
    x = layers.Flatten()(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    predictions = layers.Dense(7, activation='softmax')(x)
    model = Model(conv_base.input, predictions)
    optimizer = keras.optimizers.SGD(lr=1e-4, momentum=0.9)
    history = model.fit_generator(generator=train_generator,
                                  steps_per_epoch=10,  # added in Kaggle
                                  validation_steps=10  # added in Kaggle
    # save

    The h5model can download from this

    Extraction code:1od1

    Because the file is too big, so I cannot upload it here. I really want to know how to solve the problem.

    opened by callmefish 28
  • Bad performance.

    Bad performance.

    Hi Tobias,

    I am getting a bad performance when using frugally-deep and I wanted to ask you about some advice. Of course I've read FAQ about the performance so I got that covered.

    Here is what I've tested so far:

    | Environment| Description | Time | |----------|:-------------|------:| | Python | Default settings (GPU ON) | 35ms| | Python | os.environ['CUDA_VISIBLE_DEVICES']='-1' | 45ms| | Python | NO GPU and tf.config.threading.set_intra_op_parallelism_threads(1) | 75ms| | Visual Studio 2017 | Default (Release -O2, whole program optimization) | 310ms| | Visual Studio 2017 | Compiled with AVX2 | 280ms|

    It is quite interesting that single switch (AVX2) gave me 10% boost! but it is still far, very far from what you have advocated.

    I did run a benchmark and here is what I've got:


    Any ideas? Could I send you my model and example code? (privately as this is for the job, I will be happy to support you if I get paid for the project :) ).

    opened by TrueWodzu 27
  • Cannot load InceptionV3 model

    Cannot load InceptionV3 model

    So, I successfully loaded some models and predicted them.

    Yet, when I tried to load InceptionV3 model, I get an error. There was not any errors when I converted the model from 'h5' to 'json' but the code below does not work.


    The error I got


    opened by Terminou 25
  • Frugally LSTM Encoder-Decoder results different from Keras/Tensorflow LSTM Encoder-Decoder (missing support for initial_state)

    Frugally LSTM Encoder-Decoder results different from Keras/Tensorflow LSTM Encoder-Decoder (missing support for initial_state)

    Hi @Dobiasd

    I have been working on the Encoder-Decoder model for Vehicle Path Forecasting since you added support for returned_states and show_tensor5 on LSTM-based models. The workflow of the project was described on this past issue. After some experiments, the LSTM-Based encoder and decoder models are not giving me any problem related to returned_states = True or show_tensor5, confirming frugally-deep fixes worked. However, I have been trying to replicate the results I obtained using the Keras/Tensorflow models without success.

    The frugally-deep fdeep_encoder_model_NT is returning the exact same encoder_hidden_state and encoder_cell_state states compared to its Tf + Keras counterparts using the encoder_model.hdf5. However, the fdeep_decoder_model_NT is not giving me the same decoder_hidden_state and decoder_cell_state output states (compared to the results using Tf + Keras encoder_model.hdf5) :(

    Specifically, I develop the decoder inference model using TF + Keras (please refer yourself to past comments in this issue to see the corresponding code), and then converted it from .hdf5 to .json, ready to be ported into the C++ application (same as with the encoder model). Validating the encoder states: image However, both frugally-deep decoder_hidden_state and decoder_cell_state differ from corresponding Keras-based decoder_hidden_state and decoder_cell_state: image Resulting, as expected, in a wrong bounding box prediction: image which does not match with the corresponding Keras Results: image I do not really know about what is happening with fdeep_decoder_model_NT, so I have various options in mind:

    • I have trained another model using LSTM instead CuDNNLSTM layers in order to check if the problem is with CuDNNLSTM layer implementation. However, the problem is still present when using other LSTM-based cells like CuDNNLSTM and LSTM. The fdeep_encoder_model works well but fdeep_decoder_model is still making wrong predictions (both at states returned and next bbox prediction).
    • Now I am working in the main.cpp file. Maybe the problem is inside my internal manipulation of fdeep::tensor5 and fdeep::tensor5s when feeding the data into the ported models. However, both models are working well, except that the decoder's model is making (inaccurate) predictions of future bounding boxes, but it did not crash in any step of the script execution.
    • I am puzzled about the following fact: At main.cpp the decoders predictions is made with the following command: auto decoder_outputs = decoder_model.predict({target_seq, encoder_states.at(0), encoder_states.at(1)});, where encoder_states.at(0) and encoder_states.at(1) represent h_enc and c_enc respectively. However, I tried by interchanging the encoder states at the input of the decoder prediction line like this: auto decoder_outputs = decoder_model.predict({target_seq, encoder_states.at(1), encoder_states.at(0)}); and obtaining the exact same predicted_next_box (even though I interchanged the input order of decoder_states at the prediction function).
    • Finally, apart from the wrong values of h_dec and c_dec returned by fdeep_decoder_model, I noticed both h_dec hidden states (from frugally AND Keras) are in the range [-1, 1], but that does not occur to c_dec hidden states. In Keras, c_dec have values from [-11, 11] but, in frugally, c_dec takes values from [-1, 1]. In addition, based on your suggestion about internal scaling causing this kind of issues, by inspecting the fdeep_encoder_model.json, there are some initializers parameters that are using Variance_Scaling parameter inside that maybe are the cause of errors at inference-time. I think maybe this at the root of the problem but I have no idea of how to get the correct h_enc and c_enc, both between the same ranges used in Keras and with the correct values as well.

    Here is the main.cpp file I am running to test the results. Any comment or suggestion about the code would be welcomed!

    #include <fdeep/fdeep.hpp>
    #include <vector>
    #include <fstream>
    #include <iostream>
    int main()
    	// Loading the previously trained models
    	const auto encoder_model = fdeep::load_model("fdeep_encoder_model_NT.json");
    	std::cout << "Encoder Model Loaded!" << std::endl;
    	const auto decoder_model = fdeep::load_model("fdeep_decoder_model_NT.json");
    	std::cout << "Decoder Model Loaded!" << std::endl;
    	// Batch_size = 1, num_timesteps = 10 and num_features = 4
    	fdeep::shape5 in_traj_shape(1,1,1,10,4);
    	// Loading a sample sequence trajectory into tensor5 data structure
    	const std::vector<float> src_traj  = {1728, 715, 191, 221,
    					1717, 710, 202, 215,
    					1706, 704, 206, 198,
    					1695, 700, 217, 196,
    					1687, 696, 228, 183,
    					1680, 689, 240, 181,
    					1668, 668, 240, 198,
    					1661, 668, 243, 194,
    					1650, 664, 251, 189,
    					1635, 660, 266, 181};
    	// Input trajectory from vector to tensor5 data structure
    	const fdeep::shared_float_vec shared_traj(fplus::make_shared_ref<fdeep::float_vec>(src_traj));
    	const fdeep::tensor5 encoder_inputs(in_traj_shape, shared_traj);
    	std::cout << "Trajectory #0!" << fdeep::show_tensor5(encoder_inputs) << std::endl;
    	// Using loaded encoder model to predict encoder output states
    	// Then encoder_states can be feed as input tensors into decoder_model
    	const auto encoder_states = encoder_model.predict({encoder_inputs});
    	// Printing for debbuging purposes
    	std::cout << "h_enc: "<< fdeep::show_tensor5(encoder_states.at(0)) << std::endl;
    	std::cout << "c_enc: "<< fdeep::show_tensor5(encoder_states.at(1)) << std::endl;
    	// Creating a SOS input sequence token to signal decoder model to start making predictions
    	fdeep::shape5 bbox_shape(1,1,1,1,4);
    	// Loading a sample sequence trajectory into tensor5 data structure
    	const std::vector<float> SOS_token  = {9999.0, 9999.0, 9999.0, 9999.0};
    	const fdeep::shared_float_vec shared_SOS_token(fplus::make_shared_ref<fdeep::float_vec>(SOS_token));
    	fdeep::tensor5 target_seq(bbox_shape, shared_SOS_token);
    	// In Python we have: Prediction, h, c = decoder_model.predict([target_seq] + state)
    	auto decoder_outputs = decoder_model.predict({target_seq, encoder_states.at(1), encoder_states.at(0)});
    	// Printing for debugging purposes
    	std::cout << "h_dec: "<< fdeep::show_tensor5(decoder_outputs.at(1)) << std::endl;
    	std::cout << "c_dec: "<< fdeep::show_tensor5(decoder_outputs.at(2)) << std::endl;
    	std::cout << "Predicted next bounding box!" << fdeep::show_tensor5(decoder_outputs.at(0)) << std::endl;

    The fdeep_encoder_model_NT.json model imported into the C++ application is avaliable to download and inspect from this past comment. The fdeep_decoder_model_NT.json can be downloaded from the following link: Decoder model: https://drive.google.com/open?id=1hwrjcnNfWaqQI0o8TmJKtfsAwj6zd9aq I would really appreciate any help with this issue. I am puzzled because the encoder model is working perfectly but the decoder model does not, specifically, the results between the Keras vs Frugally decoder models differ, giving me wrong output predictions that cannot be used at all.

    opened by MarlonCajamarca 25
  • Modify Unit Tests CmakeLists and INSTALL.md

    Modify Unit Tests CmakeLists and INSTALL.md

    Modify Unit Tests CmakeLists.txt to let Cmake detect Python to execute command instead of using "python3 xxxx", because not all user can use "python3" to run python scripts. The command to convert h5 to json may be failed because of command "python3". I add find_package to detect Python and try to check pip.exe. pip3.exe etc. to check Tensorflow using "pip show tensorflow" to make sure user has install tensorflow. The requirment of Python and tensorflow is written in INSTALL.md.

    opened by sirius-william 4
  • Thanks !

    Thanks !

    Thank the project author very much! My graduate design project is a one-dimensional convolutional neural network. After training with Python's TensorFlow 2.10, I have been looking for ways to deploy the model in my Qt project. I have tried to compile TensorFlow C++(compilation always fails), TensorFlow C API (TensorFlow 2.10 is not supported), TensorRT (AMD graphics driver is not supported), OpenVino (the network architecture I choose is not supported). By chance, I found this library in Google. It is easy to use and does not require much dependence. It only requires header files. It perfectly solves my project needs. Thank you! PS. When using, the python script part is executed in CMakeList.txt in the test, using python3 xxxx. However, not all users can run Python scripts through the command 'python3'. It is recommended to find Python in CMakeLists.txt, or let users specify Python paths. In addition, Mingw will report Fatal error: can't write 286 bytes to section. text when compiling unittest. It is recommended to add: target_ compile_ options(PROJECT_NAME PRIVATE $<$<CXX_ COMPILER_ ID:MSVC>:/bigobj> $<$<CXX_ COMPILER_ ID:GNU>:-Wa,-mbig-obj>) This problem also arises when the library is used in other projects. #

    opened by sirius-william 2
  • Consider having different convolution implementations available and choosing the fastest one at runtime

    Consider having different convolution implementations available and choosing the fastest one at runtime

    Different convolution implementations might perform differently depending on the convolution settings (input size/depth, kernel size/count) and depending on the hardware (mostly CPU/memory) used.

    Right now, for example, we have a special implementation used for 2D convolutions in case strides = (1, 1) (which utilized not only by the Conv2D layer, but also by DepthwiseConv2D, and SeparableConv2D).

    I wonder if it would make sense to provide a function to the user, that when called on a model, tries out different implementations and remembers which one performed best for future calls of model.predict. (Maybe in some settings, event a naive non-im2col convolution is the fastest one.)


    • potentially faster forward passes


    • increased code complexity
    • potentially wrong settings in case the background load on the user's machine varies too much during the evaluation
    opened by Dobiasd 0
  • Feature Suggestion: Support Transformer Models

    Feature Suggestion: Support Transformer Models

    First off, I would like to say that this is a really great piece of work! I have been using it with LSTMs for time-series data and have found frugally-deep to be invaluable. I am starting to investigate Transformers in order to see how they stack up to LSTMs and it would be wonderful if support for Transformer models could be added. I am in the early stages of working with Transformers, but the specific layers that I currently do not see supported are: MultiHeadAttention and LayerNormalization.

    help wanted 
    opened by jonathan-lazzaro-nnl 11
  • Feature suggestion: Support ONNX models?

    Feature suggestion: Support ONNX models?

    How about supporting ONNX in frugally? You could have a protobuf importer for ONNX models or add a tool which converts ONNX to the JSON format you use? Just a thought. A header only ONNX inference engine would be very very useful.

    opened by pfeatherstone 24
Tobias Hermann
likes machine learning and functional programming.
Tobias Hermann
percepnet implemented using Keras, still need to be optimized and tuned.

PercepNet (Still need to be tuned) Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhanceme

cookcodes 33 Nov 17, 2022
ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models.

Just to test for my research, and I add coordinate transformation to evaluate the ORB_SLAM3. Only applied in research, and respect the authors' all work.

B.X.W 5 Jul 11, 2022
Low dependency(C++11 STL only), good portability, header-only, deep neural networks for embedded

LKYDeepNN LKYDeepNN 可訓練的深度類神經網路 (Deep Neural Network) 函式庫。 輕量,核心部份只依賴 C++11 標準函式庫,低相依性、好移植,方便在嵌入式系統上使用。 Class diagram 附有訓練視覺化 demo 程式 訓練視覺化程式以 OpenCV

Lin Kao-Yuan 44 Nov 7, 2022
HackySAC is a C++ header only library for model estimation using RANSAC.

HackySAC HackySAC is a C++ header only library for model estimation using RANSAC. Available under the MIT license. Examples Minimal working example fo

Jonathan Broere 1 Oct 10, 2021
Spying on Microcontrollers using Current Sensing and embedded TinyML models

Welcome to CurrentSense-TinyML CurrentSense-TinyML is all about detecting microcontroller behaviour with current sensing and TinyML. Basically we are

Santander Security Research 71 Sep 17, 2022
Serve pytorch / torch models using Drogon

C++ Torch Server Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x

null 16 Nov 27, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Dec 30, 2022
Header-only C++/python library for fast approximate nearest neighbors

Hnswlib - fast approximate nearest neighbor search Header-only C++ HNSW implementation with python bindings. NEWS: Hnswlib is now 0.5.2. Bugfixes - th

null 2.3k Jan 1, 2023
A header-only C++ library for deep neural networks

MiniDNN MiniDNN is a C++ library that implements a number of popular deep neural network (DNN) models. It has a mini codebase but is fully functional

Yixuan Qiu 336 Dec 22, 2022
nanoflann: a C++11 header-only library for Nearest Neighbor (NN) search with KD-trees

nanoflann 1. About nanoflann is a C++11 header-only library for building KD-Trees of datasets with different topologies: R2, R3 (point clouds), SO(2)

Jose Luis Blanco-Claraco 1.7k Dec 25, 2022
Cranium - 🤖 A portable, header-only, artificial neural network library written in C99

Cranium is a portable, header-only, feedforward artificial neural network library written in vanilla C99. It supports fully-connected networks of arbi

Devin Soni 543 Dec 25, 2022
nanoPGO: A header-only library for Pose-Graph-Optimization in SE(2).

nanoPGO nanoPGO: A header-only library for Pose-Graph-Optimization in SE(2). 1. Description This repo is an implementation of 2D Pose Graph Optimizati

道锋 3 Jul 7, 2022
TensorRT implementation of RepVGG models from RepVGG: Making VGG-style ConvNets Great Again

RepVGG RepVGG models from "RepVGG: Making VGG-style ConvNets Great Again" https://arxiv.org/pdf/2101.03697.pdf For the Pytorch implementation, you can

weiwei zhou 69 Sep 10, 2022
Deploying Deep Learning Models in C++: BERT Language Model

This repository show the code to deploy a deep learning model serialized and running in C++ backend.

null 43 Nov 14, 2022
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio

CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.

OpenNMT 395 Jan 2, 2023
WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,

WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models, to reduce the effort of productionizing E2E models, and to explore better E2E models for production.

null 2.7k Jan 5, 2023
A flexible, high-performance serving system for machine learning models

XGBoost Serving This is a fork of TensorFlow Serving, extended with the support for XGBoost, alphaFM and alphaFM_softmax frameworks. For more informat

iQIYI 128 Nov 18, 2022
Lite.AI 🚀🚀🌟 is a user-friendly C++ lib for awesome🔥🔥🔥 AI models based on onnxruntime, ncnn or mnn. YOLOX, YoloV5, YoloV4, DeepLabV3, ArcFace, CosFace, Colorization, SSD

Lite.AI ?????? is a user-friendly C++ lib for awesome?????? AI models based on onnxruntime, ncnn or mnn. YOLOX??, YoloV5??, YoloV4??, DeepLabV3??, ArcFace??, CosFace??, Colorization??, SSD??, etc.

Def++ 2.4k Jan 4, 2023