ResNet Implementation, Training, and Inference Using LibTorch C++ API

Overview

LibTorch C++ ResNet CIFAR Example

Introduction

ResNet implementation, training, and inference using LibTorch C++ API. Because there is no native implementation even for the simplest data augmentation and learning rate scheduler, the ResNet18 model accuracy on CIFAR10 dataset is only around 74% whereas the same ResNet18 model could achieve ~87% accuracy on the same dataset with some simple data augmentation and learning rate scheduler (87% accuracy is still low because the first 7x7 convolutional layer used in the original ResNet was optimized for ImageNet dataset rather than CIFAR10 dataset). The ResNet18 inference latency using LibTorch C++ API is ~3.0 ms per image, which is slightly faster than the inference latency ~3.5 ms per image using PyTorch Python API. However, it is still way too slow. In practice, we use highly optimized inference engines, such as TensorRT. The saved model from LibTorch C++ API cannot be used for PyTorch Python API and vice versa. LibTorch C++ API is not as rich as PyTorch Python API and its implementation really takes way too much time. The performance benefits that LibTorch C++ API brought is almost negligible over PyTorch Python API.

Taken together, it is not recommended to use LibTorch C++ API unless there are some special use cases.

Usages

Run Docker Container

$ docker pull nvcr.io/nvidia/pytorch:21.03-py3
$ docker run -it --rm --gpus all --ipc=host -v $(pwd):/mnt nvcr.io/nvidia/pytorch:21.03-py3

Download Dataset

$ cd /mnt/
$ mkdir -p dataset
$ cd dataset/
$ bash download-cifar10-binary.sh

Build Application

$ cmake -B build -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'`
$ cmake --build build --config Release

Run Application

$ cd build/src/
$ ./resnet-cifar-demo
You might also like...
Training and fine-tuning YOLOv4 Tiny on custom object detection dataset for Taiwanese traffic
Training and fine-tuning YOLOv4 Tiny on custom object detection dataset for Taiwanese traffic

Object Detection on Taiwanese Traffic using YOLOv4 Tiny Exploration of YOLOv4 Tiny on custom Taiwanese traffic dataset Trained and tested AlexeyAB's D

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more.

Dorylus: Affordable, Scalable, and Accurate GNN Training

Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads This is Dorylus, a Scalable, Resource-eff

Reactive Light Training Module used in fitness for developing agility and reaction speed.

Hello to you , Thanks for taking interest in this project. Use case of this project is to help people that want to improve their agility and reactio

This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.
OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.

OpenEmbedding English version | 中文版 About OpenEmbedding is an open-source framework for TensorFlow distributed training acceleration. Nowadays, many m

A system to flag anomalous source code expressions by learning typical expressions from training data
A system to flag anomalous source code expressions by learning typical expressions from training data

A friendly request: Thanks for visiting control-flag GitHub repository! If you find control-flag useful, we would appreciate a note from you (to niran

Efficient training of deep recommenders on cloud.
Efficient training of deep recommenders on cloud.

HybridBackend Introduction HybridBackend is a training framework for deep recommenders which bridges the gap between evolving cloud infrastructure and

Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large scales

Fairring (FAIR + Herring): a faster all-reduce TL;DR: Using a variation on Amazon’s "Herring" technique, which leverages reduction servers, we can per

Owner
Lei Mao
Artificial Intelligence, Machine Learning, Computer Science. C++, CUDA, Python, CMake.
Lei Mao
LibtorchSegmentation - A c++ trainable semantic segmentation library based on libtorch (pytorch c++). Backbone: VGG, ResNet, ResNext. Architecture: FPN, U-Net, PAN, LinkNet, PSPNet, DeepLab-V3, DeepLab-V3+ by now.

English | 中文 C++ library with Neural Networks for Image Segmentation based on LibTorch. ⭐ Please give a star if this project helps you. ⭐ The main fea

null 309 Dec 29, 2022
Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase.

CFace Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase. Dependancies Tensorflow 2

null 7 Oct 18, 2022
This is a code repository for pytorch c++ (or libtorch) tutorial.

LibtorchTutorials English version 环境 win10 visual sutdio 2017 或者Qt4.11.0 Libtorch 1.7 Opencv4.5 配置 libtorch+Visual Studio和libtorch+QT分别记录libtorch在VS和Q

null 464 Jan 9, 2023
Support Yolov4/Yolov3/Centernet/Classify/Unet. use darknet/libtorch/pytorch to onnx to tensorrt

ONNX-TensorRT Yolov4/Yolov3/CenterNet/Classify/Unet Implementation Yolov4/Yolov3 centernet INTRODUCTION you have the trained model file from the darkn

null 172 Dec 29, 2022
C++ trainable detection library based on libtorch (or pytorch c++). Yolov4 tiny provided now.

C++ Library with Neural Networks for Object Detection Based on LibTorch. ?? Libtorch Tutorials ?? Visit Libtorch Tutorials Project if you want to know

null 62 Dec 29, 2022
The MOT implement by Solov2+DeepSORT with C++ (Libtorch, TensorRT).

Tracking-Solov2-Deepsort This project implement the Multi-Object-Tracking(MOT) base on SOLOv2 and DeepSORT with C++。 The instance segmentation model S

ChenJianqu 38 Dec 22, 2022
The optical flow algorithm RAFT implemented with C++(Libtorch+TensorRT)

RAFT_CPP Attention/注意 There are some bug here,output the wrong result 代码存在bug,估计出来的光流值不准确,解决中 Quick Start 0.Export RAFT onnx model 首先加载训练完成的模型权重: pars

ChenJianqu 21 Dec 29, 2022
Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict outcomes.

Linear-Regression Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict o

vincent laizer 1 Nov 3, 2021
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs)

Merlin: HugeCTR HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-T

null 764 Jan 2, 2023