Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

Overview

3D+NeuroSim V1.0

The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly available on a non-commercial basis. Copyright of the model is maintained by the developers, and the model is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International Public License

🌟 This is the released 3D+NeuroSim V1.0 (June 1, 2021) for the tool, and this version has improved following inference engine estimation:

1. Enabled electrical-thermal co-simulation of 3D integrated (monolithic and heterogeneous) CIM accelerators
2. Validate with real silicon data.
3. Add synchronous and asynchronous mode.
4. Update technology file for FinFET.
5. Add level shifter for eNVM.

👉 👉 👉 For Monolithic-3D, in "Param.cpp", to switch mode:

M3D = true;           // false: conventional 2D     // true: enable simulation for monolithic 3D integration

👉 👉 👉 For Heterogeneous-3D, in "Param.cpp", to switch mode:

H3D = true;           // false: conventional 2D     // true: enable simulation for hegerogeneous 3D integration

numMemTier = xxx;                 // user-defined number of memory tiers (on the top of logic tier)

deviceroadmapTop = xxx;           // device design options for top tiers (multi-tier memory arrays)
technodeTop = xxx;
featuresizeTop = xxx;

deviceroadmapBottom = xxx;        // device design options for bottom tier (other logic circuits)
technodeBottom = xxx;            
featuresizeBottom = xxx;

tsvPitch = xxx;                   // TSV pitch size
tsvRes = xxx;                     // TSV unit resistance
tsvCap = xxx;                     // TSV unit capacitance

🌟 This version has also added three default examples for quick start:

1. VGG8 on cifar10 
   8-bit "WAGE" mode pretrained model is uploaded to './log/VGG8.pth'
3. DenseNet40 on cifar10 
   8-bit "WAGE" mode pretrained model is uploaded to './log/DenseNet40.pth'
5. ResNet18 on imagenet 
   "FP" mode pretrained model is loaded from 'https://download.pytorch.org/models/resnet18-5c106cde.pth'

👉 👉 👉 To quickly start inference estimation of default models (skip training)

python inference.py --dataset cifar10 --model VGG8 --mode WAGE
python inference.py --dataset cifar10 --model DenseNet40 --mode WAGE
python inference.py --dataset imagenet --model ResNet18 --mode FP

For estimation of on-chip training accelerators, please visit released V2.1 DNN+NeuroSim V2.1

In Pytorch/Tensorflow wrapper, users are able to define network structures, precision of synaptic weight and neural activation. With the integrated NeuroSim which takes real traces from wrapper, the framework can support hierarchical organization from device level to circuit level, to chip level and to algorithm level, enabling instruction-accurate evaluation on both accuracy and hardware performance of inference.

Developers: Xiaochen Peng 👭 Shanshi Huang 👭 Anni Lu.

This research is supported by NSF CAREER award, NSF/SRC E2CDA program, and ASCENT, one of the SRC/DARPA JUMP centers.

If you use the tool or adapt the tool in your work or publication, you are required to cite the following reference:

X. Peng, W. Chakraborty, A. Kaul, W. Shim, M. S Bakir, S. Datta and S. Yu, ※Benchmarking Monolithic 3D Integration for Compute-in-Memory Accelerators: Overcoming ADC Bottlenecks and Maintaining Scalability to 7nm or Beyond, § IEEE International Electron Devices Meeting (IEDM), 2020.

If you have logistic questions or comments on the model, please contact 👨 Prof. Shimeng Yu, and if you have technical questions or comments, please contact 👩 Xiaochen Peng or 👩 Shanshi Huang or 👩 Anni Lu.

File lists

  1. Manual: Documents/User Manual of 3D_NeuroSim_V1.0.pdf
  2. Framework for monolithic 3D integration: Monolithic3D/inference.py (to run Pytorch wrapper); Monolithic3D/NeuroSim (integrated NeuroSim core)
  3. Framework for heterogeneous 3D integration: Heterogeneous3D/inference.py (to run Pytorch wrapper); Heterogeneous3D/NeuroSim (integrated NeuroSim core)

Installation steps (Linux)

  1. Get the tool from GitHub
git clone https://github.com/neurosim/3D_NeuroSim_V1.0.git
  1. Go to the folder for either monolithic or heterogeneous 3D integration
cd Monolithic3D/
cd Heterogeneous3D/
  1. Train the network to get the model for inference (can be skipped by using pretrained default models)

  2. Compile the NeuroSim codes

make
  1. Run Pytorch wrapper (integrated with NeuroSim)

For the usage of this tool, please refer to the manual.

References related to this tool

  1. X. Peng, W. Chakraborty, A. Kaul, W. Shim, M. S Bakir, S. Datta and S. Yu, ※Benchmarking Monolithic 3D Integration for Compute-in-Memory Accelerators: Overcoming ADC Bottlenecks and Maintaining Scalability to 7nm or Beyond, § IEEE International Electron Devices Meeting (IEDM), 2020.
  2. X. Peng, S. Huang, Y. Luo, X. Sun and S. Yu, ※DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies, § IEEE International Electron Devices Meeting (IEDM), 2019.
  3. X. Peng, R. Liu, S. Yu, ※Optimizing weight mapping and data flow for convolutional neural networks on RRAM based processing-in-memory architecture, § IEEE International Symposium on Circuits and Systems (ISCAS), 2019.
  4. P.-Y. Chen, S. Yu, ※Technological benchmark of analog synaptic devices for neuro-inspired architectures, § IEEE Design & Test, 2019.
  5. P.-Y. Chen, X. Peng, S. Yu, ※NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning, § IEEE Trans. CAD, 2018.
  6. X. Sun, S. Yin, X. Peng, R. Liu, J.-S. Seo, S. Yu, ※XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks,§ ACM/IEEE Design, Automation & Test in Europe Conference (DATE), 2018.
  7. P.-Y. Chen, X. Peng, S. Yu, ※NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures, § IEEE International Electron Devices Meeting (IEDM), 2017.
  8. P.-Y. Chen, S. Yu, ※Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing,§ IEEE International Symposium on Circuits and Systems (ISCAS), 2016.
  9. P.-Y. Chen, D. Kadetotad, Z. Xu, A. Mohanty, B. Lin, J. Ye, S. Vrudhula, J.-S. Seo, Y. Cao, S. Yu, ※Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip,§ IEEE Design, Automation & Test in Europe (DATE), 2015.
  10. S. Wu, et al., ※Training and inference with integers in deep neural networks,§ arXiv: 1802.04680, 2018.
  11. github.com/boluoweifenda/WAGE
  12. github.com/stevenygd/WAGE.pytorch
  13. github.com/aaron-xichen/pytorch-playground
Owner
NeuroSim
Researchers from Prof. Shimeng Yu's group at Georgia Tech
NeuroSim
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

NeuroSim 23 Aug 5, 2022
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

nod.ai 37 Aug 3, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices.

Xiaomi 4.7k Aug 6, 2022
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

NVIDIA Corporation 5.7k Aug 8, 2022
CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU executio

CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models supporting both CPU and GPU execution. The goal is to provide comprehensive inference features and be the most efficient and cost-effective solution to deploy standard neural machine translation systems such as Transformer models.

OpenNMT 315 Aug 8, 2022
A 3D DNN-based Metric Semantic Dense Mapping pipeline and a Visual Inertial SLAM system

MSDM-SLAM This repository represnets a 3D DNN-based Metric Semantic Dense Mapping pipeline and a Visual Inertial SLAM system that can be run on a grou

ITMO Biomechatronics and Energy Efficient Robotics Laboratory 11 Jul 23, 2022
monolish: MONOlithic Liner equation Solvers for Highly-parallel architecture

monolish: MONOlithic LIner equation Solvers for Highly-parallel architecture monolish is a linear equation solver library that monolithically fuses va

RICOS Co. Ltd. 172 Aug 2, 2022
An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++/Python Support

手写AI 1.1k Aug 8, 2022
PaRSEC: the Parallel Runtime Scheduler and Execution Controller for micro-tasks on distributed heterogeneous systems.

PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.

null 9 Jun 19, 2022
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Libonnx A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support. Getting Started The library's

xboot.org 411 Aug 8, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Rachel Selina Rajarathnam 15 Jul 20, 2022
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

ENCRYPTO 11 Jul 6, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 110 Jul 16, 2022
Inference framework for MoE layers based on TensorRT with Python binding

InfMoE Inference framework for MoE-based models, based on a TensorRT custom plugin named MoELayerPlugin (including Python binding) that can run infere

Shengqi Chen 31 Jul 19, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 75 Apr 14, 2022
ffcnn is a cnn neural network inference framework, written in 600 lines C language.

+----------------------------+ ffcnn 卷积神经网络前向推理库 +----------------------------+ ffcnn 是一个 c 语言编写的卷积神经网络前向推理库 只用了 500 多行代码就实现了完整的 yolov3、yolo-fastes

ck 50 Jul 5, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 15.2k Aug 12, 2022
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

DAMOV is a benchmark suite and a methodical framework targeting the study of data movement bottlenecks in modern applications. It is intended to study new architectures, such as near-data processing. Described by Oliveira et al.

SAFARI Research Group at ETH Zurich and Carnegie Mellon University 26 May 2, 2022
Open-L2O - A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

Open-L2O This repository establishes the first comprehensive benchmark efforts of existing learning to optimize (L2O) approaches on a number of proble

VITA 127 Aug 1, 2022