header only, dependency-free deep learning framework in C++14

Overview



Maintainers Wanted

The project may be abandoned since the maintainer(s) are just looking to move on. In the case anyone is interested in continuing the project, let us know so that we can discuss next steps.

Please visit: https://groups.google.com/forum/#!forum/tiny-dnn-dev


Join the chat at https://gitter.im/tiny-dnn/users Docs License Coverage Status

tiny-dnn is a C++14 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and IoT devices.

Linux/Mac OS Windows
Build Status Build status

Table of contents

Check out the documentation for more info.

What's New

Features

  • Reasonably fast, without GPU:
    • With TBB threading and SSE/AVX vectorization.
    • 98.8% accuracy on MNIST in 13 minutes training (@Core i7-3520M).
  • Portable & header-only:
    • Runs anywhere as long as you have a compiler which supports C++14.
    • Just include tiny_dnn.h and write your model in C++. There is nothing to install.
  • Easy to integrate with real applications:
    • No output to stdout/stderr.
    • A constant throughput (simple parallelization model, no garbage collection).
    • Works without throwing an exception.
    • Can import caffe's model.
  • Simply implemented:
    • A good library for learning neural networks.

Comparison with other libraries

Please see wiki page.

Supported networks

layer-types

  • core
    • fully connected
    • dropout
    • linear operation
    • zero padding
    • power
  • convolution
    • convolutional
    • average pooling
    • max pooling
    • deconvolutional
    • average unpooling
    • max unpooling
  • normalization
    • contrast normalization (only forward pass)
    • batch normalization
  • split/merge
    • concat
    • slice
    • elementwise-add

activation functions

  • tanh
  • asinh
  • sigmoid
  • softmax
  • softplus
  • softsign
  • rectified linear(relu)
  • leaky relu
  • identity
  • scaled tanh
  • exponential linear units(elu)
  • scaled exponential linear units (selu)

loss functions

  • cross-entropy
  • mean squared error
  • mean absolute error
  • mean absolute error with epsilon range

optimization algorithms

  • stochastic gradient descent (with/without L2 normalization)
  • momentum and Nesterov momentum
  • adagrad
  • rmsprop
  • adam
  • adamax

Dependencies

Nothing. All you need is a C++14 compiler (gcc 4.9+, clang 3.6+ or VS 2015+).

Build

tiny-dnn is header-only, so there's nothing to build. If you want to execute sample program or unit tests, you need to install cmake and type the following commands:

cmake . -DBUILD_EXAMPLES=ON
make

Then change to examples directory and run executable files.

If you would like to use IDE like Visual Studio or Xcode, you can also use cmake to generate corresponding files:

cmake . -G "Xcode"            # for Xcode users
cmake . -G "NMake Makefiles"  # for Windows Visual Studio users

Then open .sln file in visual studio and build(on windows/msvc), or type make command(on linux/mac/windows-mingw).

Some cmake options are available:

options description default additional requirements to use
USE_TBB Use Intel TBB for parallelization OFF1 Intel TBB
USE_OMP Use OpenMP for parallelization OFF1 OpenMP Compiler
USE_SSE Use Intel SSE instruction set ON Intel CPU which supports SSE
USE_AVX Use Intel AVX instruction set ON Intel CPU which supports AVX
USE_AVX2 Build tiny-dnn with AVX2 library support OFF Intel CPU which supports AVX2
USE_NNPACK Use NNPACK for convolution operation OFF Acceleration package for neural networks on multi-core CPUs
USE_OPENCL Enable/Disable OpenCL support (experimental) OFF The open standard for parallel programming of heterogeneous systems
USE_LIBDNN Use Greentea LibDNN for convolution operation with GPU via OpenCL (experimental) OFF An universal convolution implementation supporting CUDA and OpenCL
USE_SERIALIZER Enable model serialization ON2 -
USE_DOUBLE Use double precision computations instead of single precision OFF -
USE_ASAN Use Address Sanitizer OFF clang or gcc compiler
USE_IMAGE_API Enable Image API support ON -
USE_GEMMLOWP Enable gemmlowp support OFF -
BUILD_TESTS Build unit tests OFF3 -
BUILD_EXAMPLES Build example projects OFF -
BUILD_DOCS Build documentation OFF Doxygen
PROFILE Build unit tests OFF gprof

1 tiny-dnn use C++14 standard library for parallelization by default.

2 If you don't use serialization, you can switch off to speedup compilation time.

3 tiny-dnn uses Google Test as default framework to run unit tests. No pre-installation required, it's automatically downloaded during CMake configuration.

For example, type the following commands if you want to use Intel TBB and build tests:

cmake -DUSE_TBB=ON -DBUILD_TESTS=ON .

Customize configurations

You can edit include/config.h to customize default behavior.

Examples

Construct convolutional neural networks

#include "tiny_dnn/tiny_dnn.h"
using namespace tiny_dnn;
using namespace tiny_dnn::activation;
using namespace tiny_dnn::layers;

void construct_cnn() {
    using namespace tiny_dnn;

    network<sequential> net;

    // add layers
    net << conv(32, 32, 5, 1, 6) << tanh()  // in:32x32x1, 5x5conv, 6fmaps
        << ave_pool(28, 28, 6, 2) << tanh() // in:28x28x6, 2x2pooling
        << fc(14 * 14 * 6, 120) << tanh()   // in:14x14x6, out:120
        << fc(120, 10);                     // in:120,     out:10

    assert(net.in_data_size() == 32 * 32);
    assert(net.out_data_size() == 10);

    // load MNIST dataset
    std::vector<label_t> train_labels;
    std::vector<vec_t> train_images;

    parse_mnist_labels("train-labels.idx1-ubyte", &train_labels);
    parse_mnist_images("train-images.idx3-ubyte", &train_images, -1.0, 1.0, 2, 2);

    // declare optimization algorithm
    adagrad optimizer;

    // train (50-epoch, 30-minibatch)
    net.train<mse, adagrad>(optimizer, train_images, train_labels, 30, 50);

    // save
    net.save("net");

    // load
    // network<sequential> net2;
    // net2.load("net");
}

Construct multi-layer perceptron (mlp)

#include "tiny_dnn/tiny_dnn.h"
using namespace tiny_dnn;
using namespace tiny_dnn::activation;
using namespace tiny_dnn::layers;

void construct_mlp() {
    network<sequential> net;

    net << fc(32 * 32, 300) << sigmoid() << fc(300, 10);

    assert(net.in_data_size() == 32 * 32);
    assert(net.out_data_size() == 10);
}

Another way to construct mlp

#include "tiny_dnn/tiny_dnn.h"
using namespace tiny_dnn;
using namespace tiny_dnn::activation;

void construct_mlp() {
    auto mynet = make_mlp<tanh>({ 32 * 32, 300, 10 });

    assert(mynet.in_data_size() == 32 * 32);
    assert(mynet.out_data_size() == 10);
}

For more samples, read examples/main.cpp or MNIST example page.

Contributing

Since deep learning community is rapidly growing, we'd love to get contributions from you to accelerate tiny-dnn development! For a quick guide to contributing, take a look at the Contribution Documents.

References

[1] Y. Bengio, Practical Recommendations for Gradient-Based Training of Deep Architectures. arXiv:1206.5533v2, 2012

[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278-2324.

Other useful reference lists:

License

The BSD 3-Clause License

Gitter rooms

We have gitter rooms for discussing new features & QA. Feel free to join us!

developers https://gitter.im/tiny-dnn/developers
users https://gitter.im/tiny-dnn/users
Comments
  • use ThreadPool

    use ThreadPool

    Hi, this PR changes parallel_for function to use ThreadPool library written by Jakob Progsch, Václav Zeman. https://github.com/progschj/ThreadPool

    This will improve execution speed on Linux.

    performance PR: Stalled PR: Needs Response Work in progress 
    opened by beru 56
  • quantization, bug fix in deconv and graph enet

    quantization, bug fix in deconv and graph enet

    Task list:

    • [x] Basic functions and pass the tests on core quantization utilities.
    • [x] Quantized convolution layer and pass the tests for q_conv.
    • [x] Quantized deconvolution layer and pass the tests for q_deconv.
    • [x] Quantized fully connected layer and pass the tests for q_fully_connected.
    • [x] Quantized bias inside other kernels.
    • [x] Ensure acceptable accuracy on typical examples.
    • [x] Add low precision gemm from Google TF as a kernel for matmul.
    • [x] Resolve unnecessary quantize-dequantize procedure.
    • [x] Add backward propagation for quantization.

    Method:

    The method is totally the same as described in Pete's blog for quantization and my basic codes are modified from TensorFlow quantization module.

    opened by wangyida 53
  • Add tests for dropout and different batches sizes, fix connected issues

    Add tests for dropout and different batches sizes, fix connected issues

    Reproducing the problem described in https://github.com/tiny-dnn/tiny-dnn/issues/540 No fix yet. As my test shown, the problem exists only with batch size >10.

    bug PR: Good to Merge 
    opened by Randl 46
  • Abstract Convolutional Layer

    Abstract Convolutional Layer

    First commit for device abstraction:

    • The convolution class has been abstracted
    • Added a PlantUML in order to show the "big picture" of the current implementation.
    • Removed test warnings
    opened by edgarriba 39
  • Clang tooling

    Clang tooling

    I've rerun clang-format on current code base. All test are passed at my PC.

    Command to run formatting is find -iname *.h -o -iname *.cpp | xargs clang-format-4.0 -style=file -i

    I suppose to add it as a git hook or something like that.

    PR: Good to Merge 
    opened by Randl 37
  • LibDNN integration

    LibDNN integration

    @naibaf7 @nyanp @bhack @mtamburrano I open this ticket for LibDNN integration discussion.

    @naibaf7 What's the current status of LibDNN standalone? Recently, the initial backend architecture was merged. Please, give it a shot and feel free to comment about design or any consideration you think that will be convenient for LibDNN or others optimizations. Thx !

    enhancement 
    opened by edgarriba 37
  • Move to an organization & renaming tiny-cnn

    Move to an organization & renaming tiny-cnn

    We've decided to move tiny-cnn to an organization account to accelarate its development (discussed in #226).

    Since it is clear that we are expanding the scope of tiny-cnn from convolutional networks to general networks, the project name tiny-Cnn seems a bit inaccurate now. I want to change the project name to more appropriate one (if we agreed), at the timing of the transferring repository.

    In #226 we have these 3 proposals:

    • tiny-dnn (convolutioanl net -> deep net)
    • hornet (loose acronyms of Header Only Network)
    • tiny-cnn (of course we can keep its name)

    Whichever we take, naming of project doesn't affect the library API except for its namespace, and hyperlinks and folks, pull requests will be correctly redirected to the new repository.

    Please feel free to give me your feedback if you have suggestions for the naming! We want to decide the name and move to a new account until around 7/25.

    community 
    opened by nyanp 36
  • Tensor (reprise)

    Tensor (reprise)

    Well, as usual, I made a mess with GIT, and apparently I can't easily push to @edgarriba's original PR.

    Today I reworked a bit @edgarriba's #400, I changed a bit the interface, added the lazy-allocation and lazy-movement of memory around, renamed accessor to host_ptr and host_at (to clarify they work on host memory), and implemented the generic functions for binary and unary host operations, element-wise and scalar, so it's now easier to implement functions such as add, sub, mul, div, exp, sqrt, ...

    I commented out linspace, if someone has a strong opinion about that, we can still get it back in.

    enhancement PR: Good to Merge 
    opened by pansk 29
  • Add AVX implementation for Global Average Pooling layer.

    Add AVX implementation for Global Average Pooling layer.

    Since global average pooling layer calculates average of all activations per channel, we pick up contigious 8 floats and keep on performing vertical sum channelwise. At the end a net sum is accumulated by horizontal sum. This is repeated for all channels of a layer.

    Current code falls back to internal backend if nnpack or other unsupported backend is chosen.

    performance PR: Good to Merge high priority 
    opened by kdexd 28
  • Model Tensor structures

    Model Tensor structures

    I open this ticket to discuss about modeling tensors in a classes as discussed in https://github.com/tiny-dnn/tiny-dnn/issues/235#issuecomment-239196739

    @pansk Proposed to have different structures depending on the data nature:

    In an email thread with @nyanp we were discussing about having a structure that represents i/o tensors and another which represents parameters (weights, biases, convolution coefficients, ...). The first structure was supposed to "automatically" move between the CPU and the GPU when needed (e.g., when interleaving CPU and GPU layers, something which is probably still inefficient, but should be allowed for prototyping new, complex layers), while the second was conceived for full-time resident on the GPU (for GPU backends) and was supposed to be downloaded/uploaded only for serialization/deserialization purposes (probably manually).

    enhancement performance 
    opened by edgarriba 26
  • add Tensor class

    add Tensor class

    Add the Tensor structure:

    • data hold by std::vector<> with 64bytes alignment
    • three different data accessors: t.ptr<float_t>(), t.at<float_t>() and t[idx]
    • basic reshape() and resize() routines
    • basic toDevice() and fromDevice() routines
    • implement element-wise add, sub, mul, div
    opened by edgarriba 25
  • Bad Function Call with deconv layer.

    Bad Function Call with deconv layer.

    Any time I try to use a deconv layer in a network im hit with the

    terminate called after throwing an instance of 'std::bad_function_call' what(): bad_function_call Aborted (coredumped)``

    I try the exact same setup with fully_connected_layers and it works just fine. Perhaps a bug?

    opened by TrevorBlythe 0
  • mse in loss_fuction.h

    mse in loss_fuction.h

    I did a simple regression (MLP) with my 2155 data sets. The training seemed successfully completed, however, when I got "get_loss" with mse, "d" seems not to be divided by the total data sets which is 2155.

    input_data:2155 lines x 6 input elements output_data:2155 lines x 1 output element

    double loss = net.get_loss<tiny_dnn::mse>(input_data, target_data) std::cout<<"mse="<< loss <<std::endl;

    ---- loss_function.h ---- class mse { public: static float_t f(const vec_t &y, const vec_t &t) { assert(y.size() == t.size()); float_t d{0.0};

    for (size_t i = 0; i < y.size(); ++i) d += (y[i] - t[i]) * (y[i] - t[i]);
    **[Ichi]: this calculation is right. I confirmed ("predicted value" - "target value")^2 with Excel spared sheet.**
    
    return d / static_cast<float_t>(y.size());
    **[Ichi]: divided by one??? When I outputted "y.size()" with std::cout, it was "1".  I'm not a skillful C++ programmer. I might make a mistake.**
    

    }....

    opened by tak1000 0
  • How to use this library?

    How to use this library?

    I am pretty new to using C++ for Deep Neural Network? Could someone help with how to install this library? I have downloaded the zip, extracted it but I can't seem to include the tiny_dnn.h file.

    I am using the Dev C++ editor. Could someone tell me how to add this particular library to the additional library of Dev C++?

    opened by Sammed98 2
  • Bug in average pooling.

    Bug in average pooling.

    Not used very often, so I can see this oversight.

    I had a 20x1 data sample, and I used a 3x1 average pooling layer with 1x1 stride. The result should be an 18x1 output, but instead received an error from pooling_size_mismatch. In this case, average_pooling_layer wanted the width and height of the data to be a multiple of the pooling window, which it is not in my case.

    However, due to the stride, the size was fine.

    opened by qedware 0
Releases(v1.0.0a3)
  • v1.0.0a3(Nov 29, 2016)

    Now we are announcing v1.0.0a3. Thanks for all great contributors! This release includes the following changes from v1.0.0a2:

    Bug fix

    • Convolutional layer with padding::same mode doesn't work #332 fixed by @nyanp
    • Segmentation fault at MinGW build #203 #281 fixed by @nyanp
    • NNPACK backend doesn't work #398 fixed by @azsane

    Improvements

    • Remove compiler warnings & improve CMakeLists #387 by @beru
    • Improve memory consumption #410 by @beru
    • Improve unit tests #408 by @Randl
    • Subtle speed optimization #419 by @beru
    • Refactor serialization type & size type #407, #422 by @Randl and @edgarriba
    • Improve compilation time by splitting serialiation/deserialization #421 by @beru

    Docs&Comments

    • Add comments to layer class #424 by @edgarriba
    • Fix typo in comments #404 by @MikalaiDrabovich

    Toward v1.0.0

    The first version of tensor is merged into tiny-dnn (#411 #417 #418 by @pansk @Randl and @edgarriba). It isn't integrated with tiny-dnn layers yet, but it's the starting point of the GPU tiny-dnn.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0a2(Nov 13, 2016)

    Bug Fixes

    • Fix SEGV errors on AVX Optimized code (#353) by @nyanp
    • Fix compiler error on msvc2013 (#320) by @nyanp
    • Fix AVX backend slowdown on convolutional layer (#322) by @nyanp
    • Fix throwing error when we load weights manually (#330) by @nyanp
    • Fix returning infinity in tan_h (#347) by @nyanp
    • Fix portability issues on serialization (#377) by @nyanp

    Features

    • Provides compile option to disable serialization support to speedup compilation time (#316) by @nyanp
    • Adds set_trainable method to freeze layers (#346) by @nyanp
    • Adds power layer to caffe converter by @goranrauker
    • double precision support (#332) by @nyanp
    • Provides pad_type and non-square input to pooling layers (#374) by @nyanp
    • Adds public predict method for vector of tensors (#396) by @reunanen
    • Adds Auto engine selection (#339) by @edgarriba
    • Adds basic image utilities, and remove OpenCV dependencies (#337) by @nyanp

    Others

    • Sync with latest NNPACK by @azsane
    • Improves compiler warnings around type-cast by @pansk @reunanen @edgarriba
    • Improves CMakelist by @syoyo @edgarriba @beru
    • Replaces picotest with gtest by @Randl
    • Adds "layer catalogue" into official documentation by @nyanp
    • Adds tests for GPU environment by @Randl
    • Adds cpplint.py by @edgarriba @Randl
    • Adds a document for building iOS app by @wangyida
    • Adds coverall checking by @edgarriba
    • Adds CI builds for Win32 by @nyanp
    • Updates & improves readme by @edgarriba @zhangqianhui
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0a(Sep 14, 2016)

    :tada: This release contains a major refactoring & many bugfixes. Thanks a lot for all great contributors! :tada:

    This release is alpha version. We need more helps and feedbacks toward v1.0.0. Please submit your bug-report at Github issue. Many thanks :)

    • Major updates

      These features are still experimental, so PRs and bug reports are very welcome!

      • Model serialization by @nyanp
    • Minor bug fix

      • Memory errors in cifar-10 example #295 #300 by @edgarriba
      • Fix max-pooling layer #271 by @nyanp
      • Fix concat-layer #301 by @Jiaolong
      • Suppress compiler warnings #297 by @syoyo
    • Other

      • A nice project logo by @KonfrareAlbert
      • Launch official documents at http://tiny-dnn.readthedocs.io/

    some APIs are changed from v0.1.1 .

    • changed its namespace from tiny_cnn to tiny_dnn
    • changed API header from tiny_cnn.h to tiny_dnn.h
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Jul 26, 2016)

    This release contains following improvements:

    • New Layers
      • Batch Normalization
      • Deconvolution/Unpooling (@Wangyida)
      • Power
      • Slice
    • New Loss Functions
      • Absolute mean/Absolute mean with eps (@H4kor)
    • Minor Bug Fix
      • Compile error on MSVC2013 #218 #231
      • Correct the definition of MSE #232
      • Fix linker error due to duplicate symbols
      • Fix handling non-square input data in caffemodel #227
      • Fix data race in network::test #185

    Thank you very much for all comitters for this release!

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Jun 6, 2016)

    This release contains a major refactoring around fundamental architecture of tiny-cnn and fixes many problems. We had the help of 20 comitters for this release. Thanks!

    • Now we can handle non-sequential model as network<graph> #108 #153
    • Catch up the latest format of caffe's proto #162
    • Improve the default behaviour of re-init weight #136
    • Add more tests and documents #73
    • Remove dependency of OpenCV in MNIST example

    Some API have changed from the previous release. see change list

    Source code(tar.gz)
    Source code(zip)
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Dec 30, 2022
CubbyDNN - Deep learning framework using C++17 in a single header file

CubbyDNN CubbyDNN is C++17 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and I

Chris Ohk 30 Aug 16, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 33k Dec 30, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 113 Dec 23, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 80 Dec 27, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Jan 3, 2023
Plaidml - PlaidML is a framework for making deep learning work everywhere.

A platform for making deep learning work everywhere. Documentation | Installation Instructions | Building PlaidML | Contributing | Troubleshooting | R

PlaidML 4.5k Jan 7, 2023
Caffe2 is a lightweight, modular, and scalable deep learning framework.

Source code now lives in the PyTorch repository. Caffe2 Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the origin

Meta Archive 8.4k Jan 6, 2023
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Dec 30, 2022
A header-only C++ library for deep neural networks

MiniDNN MiniDNN is a C++ library that implements a number of popular deep neural network (DNN) models. It has a mini codebase but is fully functional

Yixuan Qiu 336 Dec 22, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Dec 31, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 23, 2022
LibDEEP BSD-3-ClauseLibDEEP - Deep learning library. BSD-3-Clause

LibDEEP LibDEEP is a deep learning library developed in C language for the development of artificial intelligence-based techniques. Please visit our W

Joao Paulo Papa 22 Dec 8, 2022
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE

Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state

JoliBrain 2.4k Dec 30, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 509 Dec 17, 2022
Nimble: Physics Engine for Deep Learning

Nimble: Physics Engine for Deep Learning

Keenon Werling 312 Dec 27, 2022