124 Repositories
C++ (or C) gpu Libraries
Tiny C++ Software Renderer/Rasterizer
SoftGLRender Tiny C++ Software Renderer/Rasterizer, it implements the main GPU rendering pipeline, 3D models (GLTF) are loaded by assimp, and using GL
PaRSEC: the Parallel Runtime Scheduler and Execution Controller for micro-tasks on distributed heterogeneous systems.
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
A CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD).
Velvet Velvet is a CUDA-accelerated cloth simulation engine based on Extended Position Based Dynamics (XPBD). Why another cloth simulator? There are a
SMAA is a very efficient GPU-based MLAA implementation (DX9, DX10, DX11 and OpenGL)
SMAA is a very efficient GPU-based MLAA implementation (DX9, DX10, DX11 and OpenGL), capable of handling subpixel features seamlessly, and featuring an improved and advanced pattern detection & handling mechanism.
Optimized GPU noise functions and utilities
Optimized GPU noise functions and utilities
Physically-based GPU and CPU ray-tracer emerging on a surface
etx-tracer Physically-based GPU and CPU ray-tracer emerging on a surface. Features Vertex Connection and Merging algorithm (CPU and GPU); Full-spectra
Ultralight is an ultra-fast, ultra-light, standards-compliant HTML renderer for applications and games.
Ultralight is an ultra-fast, ultra-light, standards-compliant HTML renderer for applications and games. It supports most modern HTML5, CSS, and JavaScript features while still remaining light in binary size and memory usage.
An OpenGL 4.3 / C++ 11 rendering engine oriented towards animation
aer-engine About An OpenGL 4.3 / C++ 11 rendering engine oriented towards animation. Features: Custom animation model format, SKMA, with a Blender exp
Get CPU & GPU temperatures and fan and battery statistics from your Mac.
macOS Hardware Stats Get CPU & GPU temperatures and fan and battery statistics from your Mac. This simple script will output a JSON array containing h
HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs)
Merlin: HugeCTR HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-T
A fast, practical GPU rasterizer for fonts and vector graphics
Pathfinder 3 Pathfinder 3 is a fast, practical, GPU-based rasterizer for fonts and vector graphics using OpenGL 3.0+, OpenGL ES 3.0+, WebGL 2, and Met
Basis Universal GPU Texture Codec
basis_universal Basis Universal Supercompressed GPU Texture Codec Basis Universal is a "supercompressed" GPU texture data interchange system that supp
GPU Texture Generator
Imogen GPU/CPU Texture Generator GPU Texture generator using dear imgui for UI. Not production ready and a bit messy but really fun to code. This is a
GPU Texture Baking Tool
fornos GPU Texture Baking Tool A fast and simple tool to bake your high-poly mesh details to textures. Bakers Height Position Normals Ambient Occlusio
A texture compression algorithm for sprite sheets that allows decompression on the GPU during rendering.
CRABBY A texture compression format for spritesheets Crabby TL;DR Crabby is a compressed texture format for spritesheets and flipbook animations. What
An open source iOS framework for GPU-based image and video processing
GPUImage Brad Larson http://www.sunsetlakesoftware.com @bradlarson [email protected] Overview The GPUImage framework is a BSD-licensed iO
Converts common image formats (PNG, JPG, etc.) to GPU-native compressed (BCn, ETC, ASTC) in KTX containers.
Converts common image formats (PNG, JPG, etc.) to GPU-native compressed (BCn, ETC, ASTC) in KTX containers.
physically based path tracer on gpu
GPUPathtracer physically based path tracer on gpu 特点 积分器(ambient occlusion, path tracing, light tracing, volumetric path tracing, bidirectional path t
Radeon Rays is ray intersection acceleration library for hardware and software multiplatforms using CPU and GPU
RadeonRays 4.1 Summary RadeonRays is a ray intersection acceleration library. AMD developed RadeonRays to help developers make the most of GPU and to
Toy path tracer for my own learning purposes (CPU/GPU, C++/C#, Win/Mac/Wasm, DX11/Metal, also Unity)
Toy Path Tracer Toy path tracer for my own learning purposes, using various approaches/techs. Somewhat based on Peter Shirley's Ray Tracing in One Wee
The Forge Cross-Platform Rendering Framework PC Windows, Linux, Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2
The Forge is a cross-platform rendering framework supporting PC Windows 10 / 7 with DirectX 12 / Vulkan 1.1 with DirectX Ray Tracing API DirectX 11 Fa
3D GPUs Strange Attractors and Hypercomplex Fractals explorer - up to 256 Million particles in RealTime
glChAoS.P ⋅ wglChAoS.P - Ver 1.5.3 glChAoS.P / wglChAoS.P ⋅ opengl / webgl ⋅ Chaotic Attractors of Slight (dot) Particles RealTime 3D Strange Attracto
GPU cloth with OpenGL Compute Shaders
GPU cloth with OpenGL Compute Shaders This project in progress is a PBD cloth simulation accelerated and parallelized using OpenGL compute shaders. Fo
An open-source, low-code machine learning library in Python
An open-source, low-code machine learning library in Python 🚀 Version 2.3.6 out now! Check out the release notes here. Official • Docs • Install • Tu
NVIDIA GPUs htop like monitoring tool
NVTOP What is NVTOP? Nvtop stands for NVidia TOP, a (h)top like task monitor for NVIDIA GPUs. It can handle multiple GPUs and print information about
waifu2x converter ncnn version, runs fast on intel / amd / nvidia GPU with vulkan
waifu2x ncnn Vulkan ncnn implementation of waifu2x converter. Runs fast on Intel / AMD / Nvidia with Vulkan API. waifu2x-ncnn-vulkan uses ncnn project
Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Thrust is a C++ parallel programming library which resembles the C++ Standard Library.
Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac
Fast, gpu-based CSV parser
nvParse Parsing CSV files with GPU Parsing delimiter-separated files is a common task in data processing. The regular way of extracting the columns fr
Cross-platform GPU-oriented C++ application/game framework
Introduction neoGFX is a C++ app/game engine and development platform targeted at app and game developers that wish to leverage modern GPUs for perfor
ParallelComputingPlayground - Shows different programming techniques for parallel computing on CPU and GPU
ParallelComputingPlayground Shows different programming techniques for parallel computing on CPU and GPU. Purpose The idea here is to compute a Mandel
Thrust - The C++ parallel algorithms library.
Thrust: Code at the speed of light Thrust is a C++ parallel programming library which resembles the C++ Standard Library. Thrust's high-level interfac
Adorad - Fast, Expressive, & High-Performance Programming Language for those who dare
The Adorad Language Adorad | Documentation | Contributing | Compiler design Key Features of Adorad Simplicity: the language can be learned in less tha
RXMesh - A GPU Mesh Data Structure - SIGGRAPH 2021
RXMesh About RXMesh is a surface triangle mesh data structure and programming model for processing static meshes on the GPU. RXMesh aims at provides a
OptimizedMetaBall - 🔮GPU-based real-time raytracing rendering of transparent metaball
Optimized Raytracing MetaBall: Acceleration and Transparent 🔮 GPU-based real-time raytracing rendering of transparent metaball. (Project for CS337 Co
4eisa40 GPU computing : exploiting the GPU to execute advanced simulations
GPU-computing 4eisa40 GPU computing : exploiting the GPU to execute advanced simulations Activities Parallel programming Algorithms Image processing O
Bruteforce BitCoin Private keys WIF, Minikeys, Passphrases...
Fialka M-125 This is a modified version LostCoins Huge thanks kanhavishva and to all developers whose codes were used in Fialka M-125. Quick start Сon
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee
Parallel programming for everyone.
Tutorial | Examples | Forum Documentation | 简体中文文档 | Contributor Guidelines Overview Taichi (太极) is a parallel programming language for high-performan
A library for applying rootless Adreno GPU driver modifications/replacements
Adreno Tools A library for applying rootless Adreno GPU driver modifications/replacements. Currently supports loading custom GPU drivers such as turni
A composable container for Adaptive ROS 2 Node computations. Select between FPGA, CPU or GPU at run-time.
adaptive_component A composable stateless container for Adaptive ROS 2 Node computations. Select between FPGA, CPU or GPU at run-time. Nodes using har
Efficient training of deep recommenders on cloud.
HybridBackend Introduction HybridBackend is a training framework for deep recommenders which bridges the gap between evolving cloud infrastructure and
Blazing-fast Expression Templates Library (ETL) with GPU support, in C++
Expression Templates Library (ETL) 1.3.0 ETL is a header only library for C++ that provides vector and matrix classes with support for Expression Temp
Lossy fixed-rate GPU-friendly image compression\decompression.
NotOkImageFormat Lossy fixed-rate GPU-friendly image compression\decompression. Supported profiles 16:1:1 2.8125 bpp yuv 4:1:1 3.75 bpp
A heterogeneous OpenCL implementation of AutoDock Vina
Vina-GPU A heterogeneous OpenCL implementation of AutoDock Vina Compiling and Running Note: at least one GPU card is required and make sure the versio
Raytracer implemented with CPU and GPU using CUDA
Raytracer This is a training project aimed at learning ray tracing algorithm and practicing convert sequential CPU code into a parallelized GPU code u
Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
Bindings Golang bindings are provided for NVIDIA Data Center GPU Manager (DCGM). DCGM is a set of tools for managing and monitoring NVIDIA GPUs in clu
SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it simple and easy for physicists to implement lattice QCD formulas while still providing the best possible performance.
SIMULATeQCD a SImple MUlti-GPU LATtice code for QCD calculations SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it simple and easy for ph
NCNN implementation of Real-ESRGAN. Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.
Real-ESRGAN ncnn Vulkan This project is the ncnn implementation of Real-ESRGAN. Real-ESRGAN ncnn Vulkan heavily borrows from realsr-ncnn-vulkan. Many
A low-level, cross-platform GPU library
vgpu is cross-platform low-level GPU library. Features Support for Windows, Linux, macOS. Modern rendering using Vulkan and Direct3D12. Dependencies U
Move CS beacon to GPU memory when sleeping
Blog post Tested on Windows 21H1, Visual Studio 2019 (v142) and an NVIDIA GTX860M. GPUSleep GPUSleep moves the beacon image to GPU memory before the b
Risc-V RV32IMAFC + 80s ERA SoC (bitmap + GPU, sprites, tilemaps)
A simple (no interrupts or exceptions/traps), Risc-V RV32IMAFC CPU, with a pseudo SMT (dual thread) capability. The display is similar to the 8-bit era machines, along with audio, SDCARD read support, UART and PS/2 keyboard input.
Vulkan and other GPU API bugs I found.
GPU-my-list-of-bugs what is it - list of bugs I found writing shaders, mostly shader bugs. Maybe this is my code bug or/and shader bugs, but this code
Legion Low Level Rendering Interface provides a graphics API agnostic rendering interface with minimal CPU overhead and low level access to verbose GPU operations.
Legion-LLRI Legion-LLRI, or “Legion Low Level Rendering Interface” is a rendering API that aims to provide a graphics API agnostic approach to graphic
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.
Monitoring Radeon GPU temperature on macOS
RadeonSensor - Kext and Gadget to show Radeon GPU temperature on macOS The kext is based on FakeSMCs RadeonMonitor to provide GPU temperature to a ded
Finite Field Operations on GPGPU
ff-gpu Finite Field Operations on GPGPU Background In recent times, I've been interested in Finite Field operations, so I decided to implement few fie
Docker files and scripts to setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a docker container.
jetson_vins_fusion_docker This repository provides Docker files and scripts to easily setup and run VINS-FUSION-gpu on NVIDIA jetson boards inside a d
Code generation for automatic differentiation with GPU support.
Code generation for automatic differentiation with GPU support.
A Hydra-enabled GPU path tracer that supports MaterialX.
A Hydra-enabled GPU path tracer that supports MaterialX.
Brute force Mnemonic BIP39 Bip32 Bip44
BIP39 Experimental project BIP39/Bip32/Bip44. This is a modified version LostCoins The project needs the help of a programmer! We need to change the D
An efficient C++17 GPU numerical computing library with Python-like syntax
MatX - Matrix Primitives Library MatX is a modern C++ library for numerical computing on NVIDIA GPUs. Near-native performance can be achieved while us
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU.
Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode
GPU miner for TON
"Soft" Pull Request rules Thou shall not merge your own PRs, at least one person should review the PR and merge it (4-eyes rule) Thou shall make sure
Brute Force Bitcoin Private keys, Public keys
Rotor-Cuda This is a modified version of KeyHunt v1.7 by kanhavishva. A lot of gratitude to all the developers whose codes has been used here. Feature
ArrayFire: a general purpose GPU library.
ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs,
Multi-backend implementation of SYCL for CPUs and GPUs
hipSYCL - a SYCL implementation for CPUs and GPUs hipSYCL is a modern SYCL implementation targeting CPUs and GPUs, with a focus on leveraging existing
A C++ GPU Computing Library for OpenCL
Boost.Compute Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API an
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
QMoM methods for fluid dynamics in C++, C, and OpenACC.
GPU-QBMMlib QMoM methods for fluid dynamics in C++, C, and OpenACC. Agenda Add more test cases from Marchisio Debug higher-dimensional algorithms Fami
GPU ray tracing framework using NVIDIA OptiX 7
GPU ray tracing framework using NVIDIA OptiX 7
Cooperative primitives for CUDA C++.
CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model
What I'm doing here is insane GPU driver prototype for @GreenteaOS
NjRAA Work-in-progress Driver Foundation [nee-jee-ray] What I'm doing here is a GPU driver for Linux as a prototype for future graphics stack of the @
Experiments using the RPI Zero GPU for FFT (1D and 2D)
RPI0_GPU_FFT Experiments using the RPI Zero GPU for FFT/IFFT 1D/2D For an input 4194304 (1D), the GPU was around 7X faster than np.fft.fft and np.fft.
This is a openGL cube demo program. It was made as a tech demo using PVR_PSP2 Driver layer GPU libraries.
OpenGL Cube Demo using PVR_PSP2 Driver layer GPU libraries This is a openGL cube demo program. It was made as a tech demo using PVR_PSP2 Driver layer
A General-purpose Parallel and Heterogeneous Task Programming System
Taskflow Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++ Why Taskflow? Taskflow is faster, more expressive, an
Performance Evaluation of a Parallel Image Enhancement Technique for Dark Images on Multithreaded CPU and GPU Architectures
Performance Evaluation of a Parallel Image Enhancement Technique for Dark Images on Multithreaded CPU and GPU Architectures Image processing is a rese
2D GPU renderer for dynamic UIs
vger vger is a vector graphics renderer which renders a limited set of primitives, but does so almost entirely on the GPU. Works on iOS and macOS. API
A easy-to-use image processing library accelerated with CUDA on GPU.
gpucv Have you used OpenCV on your CPU, and wanted to run it on GPU. Did you try installing OpenCV and get frustrated with its installation. Fret not
Driver layer GPU libraries and tests for PSP2
PVR_PSP2 Driver layer GPU libraries and tests for PSP2 Currently this project include: Common and PSP2-specific GPU driver headers. Extension library
Software ray tracer written from scratch in C that can run on CPU or GPU with emphasis on ease of use and trivial setup
A minimalist and platform-agnostic interactive/real-time raytracer. Strong emphasis on simplicity, ease of use and almost no setup to get started with
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.
GPU 3D signed distance field generator, written with DirectX 11 compute shader
GPU SDF Generator GPU 3D signed distance field generator, written with DirectX 11 compute shader Building git clone --recursive https://github.com/Air
GPU Task Spooler - A SLURM alternative/job scheduler for a single simulation machine
GPU Task Spooler - A SLURM alternative/job scheduler for a single simulation machine
vkQuake is a Quake 1 port using Vulkan instead of OpenGL for rendering
Vulkan Quake port based on QuakeSpasm
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.
XMRig is a high performance, open source, cross platform RandomX, KawPow, CryptoNight and AstroBWT unified CPU/GPU miner
XMRig is a high performance, open source, cross platform RandomX, KawPow, CryptoNight and AstroBWT unified CPU/GPU miner and RandomX benchmark. Official binaries are available for Windows, Linux, macOS and FreeBSD.
An extremely hacky VNC server for WebOS - Works by reading directly from the GPU's framebuffer.
webos-vncserver An extremely hacky VNC server for WebOS - Works by reading directly from the GPU's framebuffer. Requires root privileges.
a language for fast, portable data-parallel computation
Halide Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines. Halid
monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture
monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture monolish is a linear equation solver library that monolithically fuses va
A library for high performance deep learning inference on NVIDIA GPUs.
Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV
Forward - A library for high performance deep learning inference on NVIDIA GPUs
a library for high performance deep learning inference on NVIDIA GPUs.
libcu++: The C++ Standard Library for Your Entire System
libcu++, the NVIDIA C++ Standard Library, is the C++ Standard Library for your entire system. It provides a heterogeneous implementation of the C++ Standard Library that can be used in and between CPU and GPU code.
Open3D: A Modern Library for 3D Data Processing
Open3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python. The backend is highly optimized and is set up for parallelization. We welcome contributions from the open-source community.
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, SRMD, RealSR, Anime4K, RIFE, CAIN, DAIN and ACNet.
ThunderSVM: A Fast SVM Library on GPUs and CPUs
What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss
ThunderGBM: Fast GBDTs and Random Forests on GPUs
Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o
Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
Open Source Deep Learning Server & API DeepDetect (https://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers
Coqui STT ( 🐸 STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. 🐸 STT is battle tested in both producti