A project demonstrating how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolkit 3.0, and use it together with the purpose-built gesture recognition model. Once trained, we deploy this model on NVIDIA® Jetson™ using Deepstream SDK.

Overview

Using NVIDIA Pre-trained Models and Transfer Learning Toolkit 3.0 to Create Gesture-based Interactions with a Robot

In this project, we demonstrate how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolkit 3.0, and use it together with the purpose-built gesture recognition model. Once trained, we deploy this model on NVIDIA® Jetson™.

Such gesture-recognition application can be deployed on a robot to understand human gestures and interact with humans.

This demo is available as an on-demand webinar: https://info.nvidia.com/tlt3.0-gesture-based-interactions-with-a-robot-reg-page.html

Part 1. Training object detection network

1. Environment setup

Prerequisites

  • Ubuntu 18.04 LTS
  • python >=3.6.9 < 3.8.x
  • docker-ce >= 19.03.5
  • docker-API 1.40
  • nvidia-container-toolkit >= 1.3.0-1
  • nvidia-container-runtime >= 3.4.0-1
  • nvidia-docker2 >= 2.5.0-1
  • nvidia-driver >= 455.xx

It is also required to have a NVIDIA GPU Cloud account and API key (it's free to use). Once registered, open the setup page to get further instructions.

For hardware requirements check the documentation.

Virtual environment setup

Set up your python environment using python virtualenv and virtualenvwrapper.

pip3 install virtualenv
pip3 install virtualenvwrapper

Add the following lines to your shell startup file (.bashrc, .profile, etc.) to set the location where the virtual environments should live, the location of your development project directories, and the location of the script installed with this package:

export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/Devel
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export VIRTUALENVWRAPPER_VIRTUALENV=/home/USER_NAME/.local/bin/virtualenv
source ~/.local/bin/virtualenvwrapper.sh

Create a virtual environment.

mkvirtualenv tlt_gesture_demo

Activate your virtual environment.

workon tlt_gesture_demo

If you forget the virtualenv name, you can just type

workon

Transfer Learning Toolkit 3.0 setup

For more information on virtualenvwrapper refer to the official page.

In TLT3.0, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tlt-launcher will handle that. You may install the launcher using pip with the following commands.

pip3 install nvidia-pyindex
pip3 install nvidia-tlt

You also need to install Jupyter Notebook to work with this demo.

pip install notebook

2. Preparing EgoHands dataset

To train our hand detection model, we used publicly available dataset EgoHands, provided by IU Computer Vision Lab, Indiana University. EgoHands contains 48 different videos of egocentric interactions with pixel-level ground-truth annotations for 4,800 frames and more than 15,000 hands.

To use it with Transfer Learning Toolit, dataset has to be converted into KITTI format. For our example we adapted the open source script by JK Jung.

Note, that you need to do a small change to the original script by JK Jung, to make it compatibale with Transfer Learning Toolkit. Namely, in the function box_to_line(box), remove the score component by replacing the return statement with the following:

return ' '.join(['hand',
                     '0',
                     '0',
                     '0',
                     '{} {} {} {}'.format(*box),
                     '0 0 0',
                     '0 0 0',
                     '0'])

To convert the dataset, download the prepare_egohands.py by JK Jung, apply the mentioned above modification, set correct paths, and follow the instructions in egohands_dataset/kitti_conversion.ipynb in this project. In addition to calling the original conversion script, this notebook converts your dataset into training and testing sets, as it is required by Transfer Learning Toolkit.

3. Training

As part of TLT 3.0 we provide a set of Jupyter Notebooks demonstrating training workflows for various models. The notebook for this demo can be found in the training_tlt directory of this repository.

Note: don't forget to activate your virtual environment in the same terminal prior to executing the notebook (refer to the setup information).

To execute it, after activating the virtial environment, simpy navigate to the directory, start the notebook and follow the instructions in the training_tlt/handdetect_training.ipynb notebook in your browser.

cd training_tlt
jupyter notebook

Part 2. Deployment on NVIDIA® Jetson™ with DeepStream SDK

Now that you have trained a detection network, you are ready for deployment on the target edge device. In this demo, we are showing you how to deploy a model using DeepStream SDK, which is a multi-platform scalable framework for video analytics applications.

For this demo, we are using a Jetson Xavier AGX 16Gb. We also tested it on a Jetson Xavier NX. Other Jetson devices should also be suitable, but have not been tested.

Prerequisites

Note: these are prerequisites specific for Jetson deployment. If you want to repurpose our solution to run on a discrete GPU, check the DeepStream getting started page.

  • CUDA 10.2
  • cuDNN 8.0.0
  • TensorRT 7.1.0
  • JetPack >= 4.4

If you don't have DeepStream SDK installed with your JetPack version, follow the Jetson setup instructions from the DeepStream Quick Start Guide.

The demo we showed you in the webinar is running on the JetPack 4.4 with DeepStream 5.0. We also added support for JetPack 4.5 with DeepStream 5.1. You can select the corresponding version by checking out the correct branch accordingly:

git checkout JetPack4.5-DeepStream5.1

for the JetPack 4.5 with DeepStream 5.1 combination, and

git checkout JetPack4.4-DeepStream5.0

for JetPack 4.4 with DeepStream 5.0.

Acquiring gesture recognition model

You can download the GestureNet model using either wget method:

wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tlt_gesturenet/versions/deployable_v1.0/zip -O tlt_gesturenet_deployable_v1.0.zip

or via CLI command in NGC:

ngc registry model download-version "nvidia/tlt_gesturenet:deployable_v1.0"

Note, as we are not retraining this model and using it directly for deployment, we seleced the deployable_v1.0 version.

Converting the models to TensorRT

In order to take advantage of the hardware and software accelerations on the target edge device, we want to convert our .etlt models into TensorRT engines. TensorRT is an accelerator which takes advantage of the hardware as well as allows you to modify the arithmetic precision. Changes in arithmetic precision can further increase model's throughput.

There are two ways to convert your model into TensorRT engine. You can do it either directly via DeepStream, or by using the tlt-converter utility. We will show you both ways.

The trained detector model will be converted automatically by Deepstream during the first run. For the following runs you can specify the path to the produced engine in the corresponding DeepStream config. We are providing our DeepStream configs with this project.

Since the GestureNet model is quite new, the version of DeepStream we are using for this demo (5.0) does not support its conversion. However, you can convert it using the updated tlt-converter. To download it, click the link corresponding to your JetPack version:

See this page if you want to use tlt-converter with different hardware and software.

When you have the tlt-converter installed on Jetson, convert the GestureNet model using the following comand:

./tlt-converter -k nvidia_tlt \ 
    -t fp16 \
    -p input_1,1x3x160x160,1x3x160x160,2x3x160x160 \
    -e /EXPORT/PATH/model.plan \
    /PATH/TO/YOUR/GESTURENET/model.etlt

Since we didn't change the model and are using it as it is, the model key remains the same (nvidia_tlt ) as it is specified on the NGC.

Note, that we are converting the model into FP16 format, as we don't have any INT8 calibration file for the deployable model.

Make sure to provide correct values for your model path as well as for the export path.

Preparing the models for deployment

There are two ways of deploying a model with DeepStream SDK: the first relies on TensorRT runtime and requires a model to be converted into a TensorRT engine, the second relies on Triton Inference Server. Triton Inference Server is a server which can be used as a stand-alone solution, but it can also be used integrated into DeepStream app. Such setup allows high flexibility, since it can accept models in various formats which do not necessarily have to be converted into TensorRT format. In this demo, we are showing both deployment ways by deploying our hand detector using TensorRT runtime and gesture recognition model using Triton Inference Server.

To deploy a model to DeepStream using TensorRT runtime, you only need to make sure that the model is convertable into TensorRT, which means all layers and operations in the model are supported by TensorRT. You can check the TensorRT support matrix for further information on supported layers and operations.

All models trained with Transfer Learning Toolkit are supported by TensorRT.

Configuring GestureNet for Triton Inference Server

In order to deploy your model via Triton Inference Server, you need to prepare a model repository in a specified format. It should have the following structure:

└── trtis_model_repo
    └── hcgesture_tlt
        ├── 1
        │   └── model.plan
        └── config.pbtxt

where model.plan is the .plan file generated with the trt-converter, and config.pbtxt has the follwing content:

name: "hcgesture_tlt"
platform: "tensorrt_plan"
max_batch_size: 1
input [
  {
    name: "input_1"
    data_type: TYPE_FP32
    dims: [ 3, 160, 160 ]
  }
]
output [
  {
    name: "activation_18"
    data_type: TYPE_FP32
    dims: [ 6 ]
  }
]
dynamic_batching { }

To learn more about configuring Triton Inference Server repository, refer to the official documentation.

Customizing the deepstream-app

The sample deepstream-app can be configured in a flexible way: as a primary detector or a classifier, or you can even use it to cascade several models, like, for example, a detector and a classifier. In such a case, the detector will pass cropped objects of interest to the classifier. This process happens in a so called DeepStream pipeline, each component of which takes advantage of the hardware componets in Jetson devices. The pipeline for our application looks as follows:

DeepStream pipeline

The GestureNet model we are using was trained on images with a big margin around the region of interest (ROI), at the same time the detector model we trained produces narrow boxes around objects of interest (hand, in our case). At first, this lead to the fact that the objects passed to the classifier were different from the representation learned by the classifier. There were two ways to solve this problem:

  • either through retraining it with a new dataset reflecting our setup,
  • or by extending the cropped ROIs by some suitable margin.

As we wanted to use the GestureNet model as it is, we chose the second path, which required modification of the original app. We implemented the following function to modify the meta data returned by the detetcor to crop bigger bounding boxes:

#define CLIP(a,min,max) (MAX(MIN(a, max), min))

int MARGIN = 200;

static void
modify_meta (GstBuffer * buf, AppCtx * appCtx)
{
  int frame_width;
  int frame_height;
  get_frame_width_and_height(&frame_width, &frame_height, buf);

  NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);
  NvDsFrameMetaList *frame_meta_list = batch_meta->frame_meta_list;
  NvDsObjectMeta *object_meta;
  NvDsFrameMeta *frame_meta;
  NvDsObjectMetaList *obj_meta_list;
  while (frame_meta_list != NULL) {
     frame_meta = (NvDsFrameMeta *) frame_meta_list->data;
     obj_meta_list = frame_meta->obj_meta_list;
     while (obj_meta_list != NULL) {
       object_meta = (NvDsObjectMeta *) obj_meta_list->data;
       object_meta->rect_params.left = CLIP(object_meta->rect_params.left - MARGIN, 0, frame_width - 1);
       object_meta->rect_params.top = CLIP(object_meta->rect_params.top - MARGIN, 0, frame_height - 1);
       object_meta->rect_params.width = CLIP(object_meta->rect_params.left + object_meta->rect_params.width + MARGIN, 0, frame_width - 1);
       object_meta->rect_params.height = CLIP(object_meta->rect_params.top + object_meta->rect_params.height + MARGIN, 0, frame_height - 1);
       obj_meta_list = obj_meta_list->next;
     }
     frame_meta_list = frame_meta_list->next;
  }
}

In order to display the original bounding boxes, we implemented the following function, which restores the meta bounding boxes to their original size:

static void
restore_meta (GstBuffer * buf, AppCtx * appCtx)
{

  int frame_width;
  int frame_height;
  get_frame_width_and_height(&frame_width, &frame_height, buf);

  NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);
  NvDsFrameMetaList *frame_meta_list = batch_meta->frame_meta_list;
  NvDsObjectMeta *object_meta;
  NvDsFrameMeta *frame_meta;
  NvDsObjectMetaList *obj_meta_list;
  while (frame_meta_list != NULL) {
     frame_meta = (NvDsFrameMeta *) frame_meta_list->data;
     obj_meta_list = frame_meta->obj_meta_list;
     while (obj_meta_list != NULL) {
       object_meta = (NvDsObjectMeta *) obj_meta_list->data;

       // reduce the bounding boxes for output (copy the reserve value from detector_bbox_info)
       object_meta->rect_params.left = object_meta->detector_bbox_info.org_bbox_coords.left;
       object_meta->rect_params.top = object_meta->detector_bbox_info.org_bbox_coords.top;
       object_meta->rect_params.width = object_meta->detector_bbox_info.org_bbox_coords.width;
       object_meta->rect_params.height = object_meta->detector_bbox_info.org_bbox_coords.height;

       obj_meta_list = obj_meta_list->next;
     }
     frame_meta_list = frame_meta_list->next;
  }
}

We also implemneted this helper function to get frame width and height from the buffer.

static void
get_frame_width_and_height (int * frame_width, int * frame_height, GstBuffer * buf) {
    GstMapInfo map_info;
    memset(&map_info, 0, sizeof(map_info));
    if (!gst_buffer_map (buf, &map_info, GST_MAP_READ)){
      g_print("Error: Failed to map GST buffer");
    } else {
      NvBufSurface *surface = NULL;
      surface = (NvBufSurface *) map_info.data;
      *frame_width = surface->surfaceList[0].width;
      *frame_height = surface->surfaceList[0].height;
      gst_buffer_unmap(buf, &map_info);
    }
}

Building the application

To build the custom app, copy deployment_deepstream/deepstream-app-bbox to /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps.

Install the requred dependencies:

sudo apt-get install libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev \
   libgstrtspserver-1.0-dev libx11-dev libjson-glib-dev

Build an executable:

cd /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-app-bbox
make

Configuring DeepStream pipeline

Before executing the app, one has to provide configuration files. You can learn more about configuration parameters in the official documentation.

You can find the configuration files of this demo under deployment_deepstream/egohands-deepstream-app-trtis/. At the same directory you will also find the label files required by our models.

Finally, you need to make your models discoverable by our app. According to our configs, the directory structure under deployment_deepstream/egohands-deepstream-app-trtis/ for our model storage looks as follows:

├── tlt_models
│   ├── tlt_egohands_qat
│   │   ├── calibration_qat.bin
│   │   └── resnet34_detector_qat.etlt
└── trtis_model_repo
    └── hcgesture_tlt
        ├── 1
        │   └── model.plan
        └── config.pbtxt

You may notice that the file resnet34_detector_qat.etlt_b16_gpu0_int8.engine specified in the config config_infer_primary_peoplenet_qat.txt is missing in the current setup. It will be generated upon the first execution and will be used directly in the following runs.

Executing the app

In general, the execution command looks as follows:

./deepstream-app-bbox -c <config-file>

In our particular case with the configs provided:

./deepstream-app-bbox -c source1_primary_detector_qat.txt

The app should be running now!

References

Comments
  • [DeepStream6.1] Error: gesture_recognition_tlt_deepstream

    [DeepStream6.1] Error: gesture_recognition_tlt_deepstream

    Hi,

    I’v already run below github project with JetPack4.5.1 and Deepstream5.1 on Jetson Xavier NX. And I try to run below github project with JetPack5.0.1 and Deepstream6.1 on Jetson Xavier NX.

    GitHub

    GitHub - NVIDIA-AI-IOT/gesture_recognition_tlt_deepstream: A project... A project demonstrating how to train your own gesture recognition deep learning pipeline. We start with a pre-trained detection model, repurpose it for hand detection using Transfer Learning Toolki...

    But we found errors like below;

    hand

    I also check below migration guides. But it seems not to be clue about this Error. https://docs.nvidia.com/metropolis/deepstream/6.0/dev-guide/text/DS_Application_migration.html https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Application_migration.html

    Question: Are there anyone success to run this github project with JetPack5.0.1 and Deepstream6.1 on JetsonNX?

    I also try to build by commented out like below; // case NV_DS_SINK_RENDER_OVERLAY:

    Build Error was not found. But we found another Error by running “deepstream-app-bbox” like below;

    err

    We would appreciate if you tell us how to solve this issue.

    P.S. This post is same as below; https://forums.developer.nvidia.com/t/deepstream6-1-build-error-gesture-recognition-tlt-deepstream/222648

    Best regards,/

    opened by mat-macnica 4
  • Can you share the two files generated after the training phase ?

    Can you share the two files generated after the training phase ?

    Hello,

    I haven't been able to reproduce the training phase because I lack Nvidia gpus. I have a jetson nano and I would like to use the final model for inference. However, I miss the two files that are supposed to be generated after the training phase :

    calibration_qat.bin resnet34_detector_qat.etlt

    Without those two, I can't run inference, could you please provide them ?

    Thank you in advance.

    opened by Mactemium 1
  • JetPack4.5-DeepStream5.1 source1_primary_detector_qat.txt does not work

    JetPack4.5-DeepStream5.1 source1_primary_detector_qat.txt does not work

    This branch for JetPack4.5-DeepStream5.1 but the following setting is the DeepStream 5.0. I think it is not work.

    ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
    

    https://github.com/NVIDIA-AI-IOT/gesture_recognition_tlt_deepstream/blob/31f14b84c0adf466c4cbdf7ef17b3e063ccb812b/deployment_deepstream/egohands-deepstream-app-trtis/source1_primary_detector_qat.txt#L174

    Please fix the followings

    ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_klt.so
    
    opened by SnowMasaya 1
  • Error converting the model with tlt-converter

    Error converting the model with tlt-converter

    Hello,

    I am trying to run the gestureNet model by inference on a Jetson nano 4g with the latest Jetpack 4.5.1 installed and Deepstream 5.1. On the Part 2, after aqcuiring the model via wget and installing the tlt-converter, I can't manage to convert the model by using ./tlt-converter -k nvidia_tlt \ -t fp16 \ -p input_1,1x3x160x160,1x3x160x160,2x3x160x160 \ -e /EXPORT/PATH/model.plan \ /PATH/TO/YOUR/GESTURENET/model.etlt

    I either get a "no input dimension given" error or a segmentation fault when I try differents approach (I tried adding the -d parameters for input dimension) I also tried different location for the model.plan and double checked if I had model.etlt.

    If you have any idea on how to solve my issue..

    Thank you

    opened by Mactemium 1
  • deepstream_app_version.h: No such file or directory

    deepstream_app_version.h: No such file or directory

    cd deepstream-app-bbox

    make

    Then:

    deepstream_app.h:35:10: fatal error: deepstream_app_version.h: No such file or directory #include "deepstream_app_version.h"

    opened by michael-lao 0
Owner
NVIDIA AI IOT
NVIDIA AI IOT
Gstreamer plugin that allows use of NVIDIA Maxine SDK in a generic pipeline.

GST-NVMAXINE Gstreamer plugin that allows use of NVIDIA MaxineTM sdk in a generic pipeline. This plugin is intended for use with NVIDIA hardware. Visi

Alex Pitrolo 18 Dec 19, 2022
deploy yolox algorithm use deepstream

YOLOX(Megvii-BaseDetection) Deploy DeepStream ?? ?? This project base on https://github.com/Megvii-BaseDetection/YOLOX and https://zhuanlan.zhihu.com/

null 81 Jan 4, 2023
A C++ implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

A C++ implementation of Yolov5 to detect head or helmet in the wild in Jetson Xavier nx and Jetson nano This repository uses yolov5 to detect humnan h

null 11 Dec 3, 2022
nvidia nvmpi encoder for streamFX and obs-studio (e.g. for nvidia jetson. Requires nvmpi enabled ffmpeg / libavcodec)

nvmpi-streamFX-obs nvidia nvmpi encoder for streamFX and obs-studio (e.g. for nvidia jetson. Requires nvmpi enabled ffmpeg / libavcodec) Purpose This

null 16 Jun 25, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.

Pico-Motion-Recognition This Repository has the code used on the 2 parts tutorial TinyML - Motion Recognition Using Raspberry Pi Pico The first part i

Marcelo Rovai 19 Nov 3, 2022
A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection

A project demonstration on how to use the GigE camera to do the DeepStream Yolo3 object detection, how to set up the GigE camera, and deployment for the DeepStream apps.

NVIDIA AI IOT 9 Sep 23, 2022
Deep Learning in C Programming Language. Provides an easy way to create and train ANNs.

cDNN is a Deep Learning Library written in C Programming Language. cDNN provides functions that can be used to create Artificial Neural Networks (ANN)

Vishal R 12 Dec 24, 2022
tutorial on how to train deep learning models with c++ and dlib.

Dlib Deep Learning tutorial on how to train deep learning models with c++ and dlib. usage git clone https://github.com/davisking/dlib.git mkdir build

Abdolkarim Saeedi 1 Dec 21, 2021
A C++ implementation of Yolov5 to detect mask running in Jetson Xavier nx and Jetson nano.

yolov5-mask-detect A C++ implementation of Yolov5 to detect mask running in Jetson Xavier nx and Jetson nano.In Jetson Xavier Nx, it can achieve 33 FP

null 4 Dec 19, 2022
VNOpenAI 31 Dec 26, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 23, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

The Microsoft Cognitive Toolkit is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.

Microsoft 17.3k Jan 6, 2023
UVIC ECE 499 Real-Time Gesture Recognition Project

Welcome to GitHub Pages You can use the editor on GitHub to maintain and preview the content for your website in Markdown files. Whenever you commit t

Lyndon Bauto 3 Sep 21, 2019
With deep learning to neuroscience world with shield for jetson nano - JNEEG (In progress)

With deep learning to neuroscience world with shield for jetson nano - JNEEG (In progress)

Ildaron 23 Dec 11, 2022
Docker files and scripts to setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a docker container.

jetson_vins_fusion_docker This repository provides Docker files and scripts to easily setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a d

Mohamed Abdelkader Zahana 22 Dec 18, 2022
Deploy SCRFD, an efficient high accuracy face detection approach, in your web browser with ncnn and webassembly

ncnn-webassembly-scrfd open https://nihui.github.io/ncnn-webassembly-scrfd and enjoy build and deploy Install emscripten

null 42 Nov 16, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Dec 30, 2022
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

NVIDIA Corporation 6.4k Jan 4, 2023