A library for high performance deep learning inference on NVIDIA GPUs.

Overview

Forward - A library for high performance deep learning inference on NVIDIA GPUs

License Build Status



[中文版]

Forward

Forward is a library for high performance deep learning inference on NVIDIA GPUs. It provides a well-designed scheme that directly parse Tensorflow/PyTorch/Keras models to high-performance engine based on TensorRT. Compared to TensorRT, it is easy-to-use and easy-to-expand. So far, Forward supports not only mainstream deep learning models in CV, NLP and Recommend fields, but also some advanced models such as BERT, GAN, FaceSwap, StyleTransfer.

Features

  • Utilize TensorRT API and customized operators for high-performance deep learning inference.
  • Support not only mainstream deep learning models in CV, NLP and Recommend fields, but also advanced models such as BERT, GAN, FaceSwap, StyleTransfer.
  • Support FLOAT/HALF/INT8 infer modes.
  • Easy to use: Load directly Tensorflow(.pb)/PyTorch(.pth)/Keras(.h5) models and then do inference with TensorRT.
  • Easy to expand: Register customized layers refer to add_support_op.md.
  • Provide C++ and Python interfaces.

Quick Start

Prerequisites

  • NVIDIA CUDA >= 10.0, CuDNN >= 7 (Recommended version: CUDA 10.2 )
  • TensorRT >= 7.0.0.11, (Recommended version: TensorRT-7.2.1.6)
  • CMake >= 3.10.1
  • GCC >= 5.4.0, ld >= 2.26.1
  • (Pytorch) pytorch == 1.3.1
  • (Tensorflow) TensorFlow == 1.15.0 (download Tensorflow 1.15.0 and unzip it to source/third_party/tensorflow/lib)
  • (Keras) HDF 5

Build with CMake

Generate Makefiles or VS project (Windows) and build. Forward can be built for different framework, such as Fwd-Torch, Fwd-Python-Torch, Fwd-Tf, Fwd-Python-Tf, Fwd-Keras, Fwd-Python-Keras, which controlled by CMake options. For example, Fwd-Python-Tf is built as below.

mkdir build
cd build

cmake ..  \
-DTensorRT_ROOT=/path/to/TensorRT \ 
-DENABLE_LOGGING=ON \  
-DENABLE_PROFILING=ON \  
-DENABLE_DYNAMIC_BATCH=ON \ 
-DBUILD_PTYHON_LIB=ON \
-DENABLE_TORCH=OFF \  
-DENABLE_TENSORFLOW=ON \ 
-DENABLE_KERAS=OFF \ 

make -j

CMake build arguments

  • TensorRT_ROOT [Required]: Path to the TensorRT installation directory containing libraries
  • More CMake options refer to CMake Options

Unit Test

When the project is built, unit_test can be used to verify the project is successfully built.

cd build/bin
./unit_test --gtest_filter=TestTfNodes.*

Use Forward-Cpp

Refer to Demo for using Forward-Cpp in Linux

Use Forward-Python

Refer to Demo for using Forward-Python

More Usages

Notice: The name of INPUT in models can be viewed by model viewers, such as Netron.

FAQ

FAQ

Models & Operators

Models

Operators

Contribution

CONTRIBUTING

Contributors

Aster JIAN
Zexi YUAN
Ao LI
Paul LU
JettHu
Ryosuke1eep

Any form of contribution is welcome. The above contributors have been officially released by Tencent.

We very much welcome developers to contribute to Tencent's open source, and we will also give them incentives to acknowledge and thank them. Here we provide an official description of Tencent's open source contribution. Specific contribution rules for each project are formulated by the project team. Developers can choose the appropriate project and participate according to the corresponding rules. The Tencent Project Management Committee will report regularly to qualified contributors and awards will be issued by the official contact.

License

Apache License v2.0

You might also like...
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla

InferenceHelper This is a helper class for deep learning frameworks especially for inference This class provides an interface to use various deep lear

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

NVIDIA GPUs htop like monitoring tool
NVIDIA GPUs htop like monitoring tool

NVTOP What is NVTOP? Nvtop stands for NVidia TOP, a (h)top like task monitor for NVIDIA GPUs. It can handle multiple GPUs and print information about

nvidia nvmpi encoder for streamFX and obs-studio (e.g. for nvidia jetson. Requires nvmpi enabled ffmpeg / libavcodec)

nvmpi-streamFX-obs nvidia nvmpi encoder for streamFX and obs-studio (e.g. for nvidia jetson. Requires nvmpi enabled ffmpeg / libavcodec) Purpose This

A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

ncnn is a high-performance neural network inference framework optimized for the mobile platform
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

Releases(v2.0.2)
  • v2.0.2(Nov 30, 2021)

    ChangeLog

    Added

    • Support tf.math.top_k, tf.math.greater, tf.math.less, tf.math.rsqrt, tf.math.maximum, tf.math.minimum, tf.fill ops
    • Update test models and save scripts
    • Update docs

    Fixed

    • Fix std::round for float and half data type
    • Fix EmbeddingBag layer to avoid extending to undesired dimensions for tf.gather

    Changed

    • Explicitly call .contiguous() on tensor in convolution operation
    • Allow Select layer to handle const inputs
    • Remove macro control in FullyConnected layer
    • Remove outdated plugin files
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Nov 8, 2021)

    ChangeLog

    Fixed

    • fix torch_resize_creator
    • fix upsampling_trilinear_3d

    Added

    • support torch::squeeze.
    • add utests of upsampling_trilinear_3d

    Changed

    • cancel broadcastdim in trt_element_wise_creator.h, and add CHECK_EQ instead.
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Nov 4, 2021)

    ChangeLog

    Added

    • Support TensorRT 8 (compatible with TensorRT 7) [TRT]
    • Support c10::prim::NumToTensor op in TrtElementWiseDesc [TORCH]
    • Support c10::aten::upsample_trilinear3d and c10::aten::upsample_nearest3d ops in TrtResizeDesc [TORCH]
    • Support Select op type in TrtSelectDesc [TF]
    • Add torch.matmul op unit test [TORCH]
    • Update docs

    Changed

    • Integrate skipLayerNormPlugin into ForwardLayerNormPluginDynamic [TRT]
    • Switch Bert-related plugins to official TensorRT NvInfer Plugins [TRT]
    • Replace addScale with addScaleNd layer in TrtScaleDesc [TRT]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.3(Oct 11, 2021)

  • v1.3.2(Sep 2, 2021)

    ChangeLog

    Added

    • Update ONNX-related demo dependencies
    • Update Forward documents

    Fixed

    • Fix core dump issues when OnnxEngine is built in Python
    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(Aug 26, 2021)

  • v1.3.0(Aug 24, 2021)

  • v1.2.5(Aug 17, 2021)

  • v1.2.4(Jul 30, 2021)

  • v1.2.3(Jul 19, 2021)

  • v1.2.2(Jul 2, 2021)

  • v1.2.1(Jul 2, 2021)

    ChangeLog

    Added

    • Update fwd::DataType to support more DataType and its related usages

    Changed

    • TorchModulePlugin is controlled by ENABLE_TORCH_PLUGIN, ENABLE_TORCH_PLUGIN=OFF as default.

    Fixed

    • Fix torch_matmul not support ConstantInput
    • Fix torch_cast not support constantInput
    • Fix split not support minus dim
    • Refact tf_matmul_creator.h and torch_matrix_multiply_creator.h
    • Fix skip_layer_norm'no_skip not work well on DYNAMIC_BATCH
    • Fix: For TorchModulePlugin, only clone required attributes instead of all attributes.
    • Fix: Shuffle Layer is append to the Slice Layer of Torch-Select
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Jun 15, 2021)

    ChangeLog

    Added

    • Support TorchModulePlugin: When some torch nodes are not supported, they will be encapsulated as a SubModule to reconstruct a TorchModulePlugin Layer for a TensorRT engine.

    Fixed

    • Fixed namespace conflicts of BERT plugins: BERT plugins, inherited from TensorRT's BERT plugins, have the same global static variables with those of TensorRT, which leading to conflicts and even free-pointer core problem.

    Changed

    • Update Keras forward interface to fwd::Tensor
    • Update unit_tests.
    • Update minimum Torch Version required to 1.7.0.
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(May 26, 2021)

  • v1.1.0(May 21, 2021)

    ChangeLog

    Added

    • Support Tensorflow [Reshape]
    • Fwd-Tf links to _pywarp_tensorflow_internal.so in Python-Tensorflow.

    Changed

    • Logging usage: Using forward_log.conf to control the behavior of Logging. (See logging usage in ReadMe)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(May 17, 2021)

    ChangeLog

    Added

    • Nodes supported for Torch: aten::unsqeeze_
    • Nodes supported for Keras: Dropout, ReduceSum
    • CUDA_VERSION and CUDA_ARCHs Check in CMake

    Changed

    • Fwd-Tf depends on PyWrap_Tensorflow.so instead of Tensorflow_C_Lib
    • BUILD_PYTHON_LIB=ON is only valid on Linux Platform
    • include files related to libtrt_fwd_engine.so
    • elementwise and reduce implementations
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Apr 8, 2021)

    Changelog

    Added

    • Demos for tutorial.
    • Support Torch >= 1.7.0

    Changed

    • Requirements for TensorRT, PyTorch, and Keras.
    • Use cudaMemcpyAsync in PrepareInputBuffer.
    • Change GLIBCXX_USE_CXX11_ABI usage in CMakeLists.txt.
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Mar 15, 2021)

Owner
Tencent
Tencent
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

NVIDIA Corporation 6.4k Jan 4, 2023
The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs based on CUDA.

dgSPARSE Library Introdution The dgSPARSE Library (Deep Graph Sparse Library) is a high performance library for sparse kernel acceleration on GPUs bas

dgSPARSE 59 Dec 5, 2022
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

null 939 Dec 29, 2022
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

nod.ai 187 Jan 1, 2023
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Dec 30, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 179 Dec 20, 2022
Dataset Synthesizer - NVIDIA Deep learning Dataset Synthesizer (NDDS)

NVIDIA Deep learning Dataset Synthesizer (NDDS) Overview NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-qualit

NVIDIA Corporation 515 Dec 27, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 113 Dec 23, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 80 Dec 27, 2022