GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units

Overview

GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units

Welcome to the repository of ESCAL @ UCR's GPTPU project! We aim at demonstrating the power of matrix processing units (MXUs) that are now ubiquitous in all types of computing platforms. This project chooses Google's Edge TPU -- a "relatively" open archtecture that allows everyone to purchase and integrate into their systems. In our preliminary results, we achieve 2.46x speedup over one single high-end CPU core. You may reference our arXiv paper https://arxiv.org/pdf/2107.05473.pdf or the paper coming up in SC21 for more information.

DOI

Hardware installation

You will need an M.2 version of the edge TPU (recommeded) https://coral.ai/docs/m2/get-started/ or a USB edge TPU accelerator to installed in your system.

Once you have the Edge TPUs, please follow Google's document to install their drivers and toolchains before installing our GPTPU framework. https://coral.ai/docs/m2/get-started/#2-install-the-pcie-driver-and-edge-tpu-runtime

You may also reference Section 3.1 of our arXiv paper to build a multi-Edge-TPU machine (a lot cheaper) or purchase ASUS's 8x Edge TPU PCIe card https://iot.asus.com/products/AI-accelerator/AI-Accelerator-PCIe-Card/

Install GPTPU library (Our contribution)

Compile all benchmarks

$ make 

Run all benchmarks

$ make run
// each benchmark shows its RMSE and error rate as mentioned in paper. Some may involve experimental features.

gptpu library is pre-compiled as libgptpu.so and linked by Makefile.

Compile the gptpu library

// rune the Makefile_gptpu, while it requires sudo permission
// sc21 is simply an demo account without sudo permission

Prerequisites

tensorflow 1.13.1 // Python-based model creation creates the template for the first time if not exist bazel 2.0.0 cnpy (https://github.com/rogersce/cnpy) cmake python3 numpy apex driver gesket driver cblas (for comparison only)

$ sudo apt-get install libopenblas-dev

Set PATH

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Note about GEMM

GEMM is fundamental and it's our very first benchmark. It also includes exact mode as experimental feature. Our exact mode is still in progress while it adopts blocking algorithm in a block size of 256 to avoid uint8_t overflow. In this demo, we show the floating point approximation result with small RMSE and error rate as mentioned in paper.

Multi-tpu scheme

GPTPU library allows enabling multiple TPUs for parallel computing. This following device initialization API

open_devices(int opening_order, int wanted_dev_cnt)

has two arguments opening_order and wanted_dev_cnt.

  1. opening_order: 0: open device(s) sequentially starting from first device (index 0). 1: open device(s) sequentially starting from a random number device. (You can extend this argument with more opening policies)
  2. wanted_dev_cnt: number of devices you want to open. (constrained by maximum number of devices available)

openctpu usage

Please refer to the example source code : ./src/openctpu.cc in detail.

Releases(v0.99-alpha)
Owner
Extreme Storage and Computer Architecture Lab
The github repo for ESCAL @ UCR
Extreme Storage and Computer Architecture Lab
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.

PSTensor : Custimized a Tensor Data Structure Compatible with PyTorch and TensorFlow. You may need this software in the following cases. Manage memory

Jiarui Fang 8 Feb 12, 2022
Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON.

Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON. Built on top of zax json parser, C++ structures having tensor members can also be JSON-serialized and deserialized, allowing one to save and load the state of a highly hierarchical object.

Tamas Levente Kis 2 May 28, 2022
TensorFlow Lite, Coral Edge TPU samples (Python/C++, Raspberry Pi/Windows/Linux).

TensorFlow Lite, Coral Edge TPU samples (Python/C++, Raspberry Pi/Windows/Linux).

Nobuo Tsukamoto 84 Jun 29, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 176 Jul 21, 2022
LIDAR(Livox Horizon) point cloud preprocessing, including point cloud filtering and point cloud feature extraction (edge points and plane points)

LIDAR(Livox Horizon) point cloud preprocessing, including point cloud filtering and point cloud feature extraction (edge points and plane points)

hongyu wang 11 Jul 22, 2022
EdgeKiller is a simple application that fully replaces Microsoft Edge with the Browser of choice.

EdgeKiller EdgeKiller is a simple application that fully replaces Microsoft Edge with the Browser of choice, while also intercepting all the microsoft

Jan Ochwat 2 Nov 30, 2021
MobileNet Image Classification with ESP32-CAM and Edge Impulse (TinyML)

MobileNet Image Classification on ESP32-CAM and Edge Impulse (TinyML) This example is for running a MobileNet neural network model on a 10-dollar Ai-T

Alan Wang 13 Jul 27, 2022
An efficient C++17 GPU numerical computing library with Python-like syntax

MatX - Matrix Primitives Library MatX is a modern C++ library for numerical computing on NVIDIA GPUs. Near-native performance can be achieved while us

NVIDIA Corporation 529 Aug 5, 2022
Boki: Stateful Serverless Computing with Shared Logs [SOSP '21]

Boki Boki is a research FaaS runtime for stateful serverless computing with shared logs. Boki exports the shared log API to serverless functions, allo

Operating Systems and Architecture 34 Jul 20, 2022
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结

Jiang Lu 1 Nov 27, 2021
SMID, Parallel computing of CNN

Parallel Computing in Deep Reference Network 1. Introduction Deep neural networks are made up of a number of layers of linked nodes, each of which imp

null 1 Dec 22, 2021
4eisa40 GPU computing : exploiting the GPU to execute advanced simulations

GPU-computing 4eisa40 GPU computing : exploiting the GPU to execute advanced simulations Activities Parallel programming Algorithms Image processing O

Ecam 4MIN repositories 2 Jan 10, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Aug 6, 2022
Mirror of compiler project code. Not for SVC purpose.

Compiler-proj Project progress is updated here. Progress 2021/11/28: Started! Set up Makefile and finished basic scanner. 2021/10/24: Repo created. Ac

Yuheng 0 Dec 23, 2021
The purpose of this project is to apply mediapipe to more AI chips.

1.About This Project Our Official Website: www.houmo.ai Who We Are: We are Houmo - A Great AI Company. We wish to change the world with unlimited comp

null 32 Jul 29, 2022
NCNN implementation of Real-ESRGAN. Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Real-ESRGAN ncnn Vulkan This project is the ncnn implementation of Real-ESRGAN. Real-ESRGAN ncnn Vulkan heavily borrows from realsr-ncnn-vulkan. Many

Xintao 248 Jul 27, 2022
General broad-phase collision detection framework using BVH and BVTT front tracking.

This is the collision detection package by littlemine (Xinlei Wang). Configuration Instructions This project is developed using Visual Studio 2015 and

Xinlei Wang 46 Jul 4, 2022
Super Mario Remake using C++, SFML, and Image Processing which was a project for Structure Programming Course, 1st Year

Super Mario Remake We use : C++ in OOP concepts SFML for game animations and sound effects. Image processing (Tensorflow and openCV) to add additional

Omar Elshopky 5 Jun 19, 2022