Code generation for automatic differentiation with GPU support.

Overview

autogen

autogen-docs

Code generation for automatic differentiation with GPU support.

This library leverages CppAD and CppADCodeGen to trace C++ and Python code, and turns it into efficient CUDA or C code. At the same time, the Jacobian and Hessian code can be automatically generated through reverse-mode or forward-mode automatic differentiation. The generated code is compiled to a dynamic library which typically runs orders of magnitude faster than the original user code that was traced, while multiple calls to the forward or backward versions of the function can be parallelized through CUDA or OpenMP.

Requirements

The library requires CMake and a C++ compiler with stable support for C++17, for example

First, check out the git submodules via

git submodule update --init --recursive

Python

Note: Only Python 3.4 and newer is supported.

Install pybind11:

pip install pybind11

For development, install autogen via the following command:

pip install -e .

To specify explicitly a compatible C++17 compiler, you can do so via preprocessor definitions:

CC=gcc-9 CXX=g++-9 pip install -e .

Features

The following features are available on the different operating systems:

AutoDiff Mode UNIX Windows
CppAD tracing
CPU code generation (GCC, Clang) (MSVC, Clang)
CUDA code generation (NVCC) (NVCC)

Windows CPU Support

CPU-bound code compilation on Windows is available through Microsoft Visual C++ (MSVC) and the Clang compiler at the moment. Depending on the selected compiler/linker, make sure cl.exe and link.exe, or clang.exe are available on the system path. It might be necessary to first load the build variables in the console session by running

"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
Issues
  • Refactor

    Refactor

    • Add codegen targets for CUDA, OpenMP and the original CppADCodeGen functionality (called "LegacyC")
    • CodeGen targets OpenMP and CUDA can create CMake projects with the given input to the function written to a main.cpp file which can be used to debug the generated code
    opened by eric-heiden 3
  • Save autogen files to dedicated folder

    Save autogen files to dedicated folder

    The generated source files and compiled libraries etc. should be stored in some new folder (e.g. .autogen) to keep the current working directory cleaner for the user.

    opened by eric-heiden 1
  • Compile with MSVC

    Compile with MSVC

    Set up compiler class for MSVC that runs the command

    cl /LD model_srcs\*.c /link /out:model.dll
    

    Enable option for floating-point exceptions via /fp:except.

    opened by eric-heiden 1
  • Support for atomic functions

    Support for atomic functions

    Trace functions into separate code files to reduce amount of generated code. Works for external C++ modules via pybind11 wrappers, as well as Python functions directly.

    opened by eric-heiden 1
  • Disabled warnings on Linux/clang-12

    Disabled warnings on Linux/clang-12

    To build with -Wall -Wextra on linux with clang-12, I needed to disable the following warnings:

    -Wno-unused-parameter
    -Wno-unused-variable
    -Wno-unused-function
    -Wno-unused-local-typedefs
    -Wno-sign-compare
    -Wno-inconsistent-missing-
    -Wno-overloaded-virtual
    

    I'm happy to sort them out myself and submit PRs but wanted to keep track somewhere.

    opened by dmillard 0
  • Make scalar type in generated code variable

    Make scalar type in generated code variable

    Currently the generated code uses double as scalar type. Make it variable, not via a typedef that requires recompilation of the entire autogen (Python) package.

    opened by eric-heiden 0
  • Speed up CUDA compilation via clang

    Speed up CUDA compilation via clang

    See https://llvm.org/docs/CompileCudaWithLLVM.html

    First load symbols from C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat

    The following command runs on Windows (MSVC 2019) to compile generated CUDA code through Clang (tested on Clang 13 with CUDA 11.2):

    clang++ test_cuda_srcs\test_cuda.cu -shared -o cuda_test.dll --cuda-gpu-arch=sm_35  -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64"  -lcudart_static -lcudart -pthread --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2"
    
    opened by eric-heiden 1
Owner
Eric Heiden
PhD Student in Robotics and Robot Learning
Eric Heiden
4eisa40 GPU computing : exploiting the GPU to execute advanced simulations

GPU-computing 4eisa40 GPU computing : exploiting the GPU to execute advanced simulations Activities Parallel programming Algorithms Image processing O

Ecam 4MIN repositories 2 Jan 10, 2022
LightSeq: A High Performance Library for Sequence Processing and Generation

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, Sentiment Analysis, and other related tasks with sequence data.

Bytedance Inc. 2.2k Aug 3, 2022
NLP-based perching trajectory generation presented in our paper "Perception-Aware Perching on Powerlines with Multirotors".

Perception-Aware Perching on Powerlines with Multirotors This repo contains the code for the NLP-based perching trajectory generation presented in our

null 22 Jul 18, 2022
A GPU (CUDA) based Artificial Neural Network library

Updates - 05/10/2017: Added a new example The program "image_generator" is located in the "/src/examples" subdirectory and was submitted by Ben Bogart

Daniel Frenzel 91 Jun 13, 2022
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

null 57.8k Aug 8, 2022
GPU Cloth TOP in TouchDesigner using CUDA-enabled NVIDIA Flex

This project demonstrates how to use NVIDIA FleX for GPU cloth simulation in a TouchDesigner Custom Operator. It also shows how to render dynamic meshes from the texture data using custom PBR GLSL material shaders inside TouchDesigner.

Vinícius Ginja 37 Jul 27, 2022
RDO BC1-7 GPU texture encoders

bc7enc - Fast BC1-7 GPU texture encoders with Rate Distortion Optimization (RDO) This repo contains fast texture encoders for BC1-7. All formats suppo

Rich Geldreich 93 Jul 25, 2022
GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV

PyTorchTOP This project demonstrates how to use OpenCV with CUDA modules and PyTorch/LibTorch in a TouchDesigner Custom Operator. Building this projec

David 65 Jun 15, 2022
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Coqui.ai 1.4k Aug 8, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.7k Aug 2, 2022
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio

CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.

OpenNMT 315 Aug 8, 2022
Driver layer GPU libraries and tests for PSP2

PVR_PSP2 Driver layer GPU libraries and tests for PSP2 Currently this project include: Common and PSP2-specific GPU driver headers. Extension library

null 61 May 31, 2022
A easy-to-use image processing library accelerated with CUDA on GPU.

gpucv Have you used OpenCV on your CPU, and wanted to run it on GPU. Did you try installing OpenCV and get frustrated with its installation. Fret not

shrikumaran pb 4 Aug 14, 2021
GPU ray tracing framework using NVIDIA OptiX 7

GPU ray tracing framework using NVIDIA OptiX 7

Shunji Kiuchi 22 Jun 11, 2022
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Tencent 1.2k Aug 5, 2022
GPU miner for TON

"Soft" Pull Request rules Thou shall not merge your own PRs, at least one person should review the PR and merge it (4-eyes rule) Thou shall make sure

null 136 Aug 1, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 42 Jul 18, 2022
An efficient C++17 GPU numerical computing library with Python-like syntax

MatX - Matrix Primitives Library MatX is a modern C++ library for numerical computing on NVIDIA GPUs. Near-native performance can be achieved while us

NVIDIA Corporation 529 Aug 5, 2022
Docker files and scripts to setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a docker container.

jetson_vins_fusion_docker This repository provides Docker files and scripts to easily setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a d

Mohamed Abdelkader Zahana 18 May 30, 2022