PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

Related tags

Miscellaneous PET
Overview

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

PET is the first DNN framework that optimizes tensor programs with partially equivalent transformations and automated corrections. PET discovers and applies program transformations that improve computation efficiency but only maintain partial functional equivalence. PET then automatically corrects results to restore full equivalence. We develop rigorous theoretical foundations to simplify equivalence examination and correction for partially equivalent transformations, and design an efficient search algorithm to quickly discover highly optimized programs by combining fully and partially equivalent optimizations at the tensor, operator, and graph levels. Our evaluation shows that PET outperforms existing systems by up to 2.5x, by unlocking previously missed opportunities from partially equivalent transformations.

End-to-end performance comparison
Figure 1: End-to-end performance comparison between PET and existing frameworks. For each DNN, the numbers above the PET bars show the speedups over the best baseline. TASO does not support the 3D convolution operators in Resnet3D-18.

Install PET

See README.pdf A.4 to install PET from source.

Publication

Wang, Haojie, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, Liyan Zheng, Yuanzhi Li, Kaiyuan Rong, Yuanyong Chen, and Zhihao Jia. "PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections." In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pp. 37-54. 2021.

Contributors

Currently PET is maintained in a private repository. Updates will be synchronized to this repository periodically. Contributors of PET are listed as follows.

Contributors

You might also like...
Automated builds/mirrors of various PS3SDKs for Linux systems.

Working PS3SDK Binaries NOTICE: This repo is now deprecated. SDK builds have moved here, and SDK mirrors have moved here. Prepares and releases workin

Backtrace support for Rust `no_std` and embedded programs.

mini-backtrace This crate provides backtrace support for no_std and embedded programs. This is done through by compiling LLVM's libunwind with certain

A multimedia framework developed from scratch in C/C++, bundled with test programs and a neat media analyzer.

MiniVideo framework MiniVideo is a multimedia framework developed from scratch in C/C++, bundled with small testing programs and a neat media analyser

a tool to count accesses to member variables in c++ programs

access_profiler access_profiler is a heavy-weight class field access profiler, implemented as C++ library. to use this profiler, include "access_profi

collection of C/C++ programs that try to get compilers to exploit undefined behavior

------------------------------------------------------------------------------- UB Canaries: A collection of C/C++ programs that detect undefined beh

Run Linux programs on DOS
Run Linux programs on DOS

A WSL alternative for users who prefer an MS-DOS environment. DOS Subsystem for Linux integrates a real Linux environment into MS-DOS systems, allowing users to make use of both DOS and Linux applications from the DOS command prompt.

Create a Jupyter Kernel for 8085 Microprocessor assembly language that can interpret assembly-level programs right from the Jupyter notebook.
Create a Jupyter Kernel for 8085 Microprocessor assembly language that can interpret assembly-level programs right from the Jupyter notebook.

Create a Jupyter Kernel for 8085 Microprocessor assembly language that can interpret assembly-level programs right from the Jupyter notebook.

Several GX2 test programs to compare it with OpenGL. (With additional GLFW+OpenGL implementation provided)

GX2-Tests Provided are GX2 test programs and samples for comparison with OpenGL and with additional GLFW+OpenGL implementation for test on PC. These s

Comments
  • bert 无法跑通

    bert 无法跑通

    在 NVIDIA GPU 环境下克隆此工程并编译运行。

    首先使用 PET_DCU/benchmark/models/scripts/bert_onnx.py 导出了bert的onnx文件,然后编译了DCU之后在build下执行:

    ./onnx_origin ../benchmark/models/scripts/mybert-new.onnx

    此时会尝试导入ONNX到PET的Graph中,这个时候发生了报错:

    onnx_origin: /home/PET_DCU/include/operator.h:1795: tpm::ReshapeOp::ReshapeOp(tpm::Tensor *, tpm::Tensor *): Assertion `input->size() == output->size()' failed.
    Aborted (core dumped)
    

    可以看到挂在Reshape Op的形状检查,我打印了一下input->size()output->size() ,分别为:

    589824 37748736
    

    所以输出Tensor的size是输入的64倍,这个64是导出ONNX时的batch_size,希望可以解答这个错误的原因是什么

    opened by BBuf 2
  • Usage Problem

    Usage Problem

    Hello, I'm trying to PET, but when I configure according to the usage document, the "onnx" folder under "build" is not automatically generated. What should I do? And how do I use PET in Python?

    opened by YFeather 0
  • Understanding the generator

    Understanding the generator

    Hello,

    I am trying to understand the generator as in the PET paper.

    To my understanding, the threshold parameter in the Generator::run() function defines the number of random test point that a legal mutant has to satisfy, as computed by the Generator::approx_equal() function.

    The default threshold=0.7 results in, for example, sometimes a Matmul has 20 mutants, and sometimes has 28 mutants. Is this randomness intended?

    If a legal mutant passes this test, how does the correction kernel generated? Is the correction kernel generated by the Generator?

    Many Thanks,

    opened by hgl71964 0
Releases(v0.1.0)
Owner
PACMAN Group, Tsinghua University
Parallel Architecture & Compiler technology of Mobile, Accelerated, and Networked systems
PACMAN Group, Tsinghua University
Notes on optimizing the linux kernel function csum_partial

Intro Optimizing software for performance is fun. Loads of fun. And sometimes incredibly frustrating. It's also something to do on a long intercontine

Arjan van de Ven 5 Dec 2, 2021
Code accompanying our SIGGRAPH 2021 Technical Communications paper "Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments"

SIGGRAPH ASIA 2021 Technical Communications Transition Motion Tensor: A Data-Driven Framework for Versatile and Controllable Agents in Physically Simu

null 10 Apr 21, 2022
Demagnetization tensor of non-equidistant magnetic layers

Demagnetization tensor of non-equidistant magnetic layers A small standalone project calculating the demagnetization tensor from [1] in multi-threaded

magnum.af 1 Dec 3, 2021
The Project name is "ATM - Automated Teller Machine" and It is for beginners level Project.

ATM - Automated Teller Machine The Project name is "ATM - Automated Teller Machine" and It is for beginners level Project. What is ATM? An automated t

Sorav Kumar Sharma 1 Nov 7, 2021
The Project name is "ATM - Automated Teller Machine" and It is for beginners level Project.

ATM - Automated Teller Machine The Project name is "ATM - Automated Teller Machine" and It is for beginners level Project. What is ATM? An automated t

Sorav Kumar Sharma 1 Nov 8, 2021
The Project name is "ATM - Automated Teller Machine" and It is for beginners level Project.

ATM - Automated Teller Machine The Project name is "ATM - Automated Teller Machine" and It is for beginners level Project. What is ATM? An automated t

Sorav Kumar Sharma 0 Dec 26, 2021
Powerful automated tool for reverse engineering Unity IL2CPP binaries

Powerful automated tool for reverse engineering Unity IL2CPP binaries

Katy 2.1k Jan 7, 2023
Proof-of-concept implementation for the paper "Osiris: Automated Discovery of Microarchitectural Side Channels" (USENIX Security'21)

Osiris This repository contains the implementation of the Osiris framework discussed in the research paper "Osiris: Automated Discovery of Microarchit

CISPA 41 Nov 11, 2022
Automated hydroponics with Home Assistant & ESP8266 controllers

ESPonics Automated hydroponics with ESP8266 microcontrollers & Home Assistant I absolutely want to credit Reddit user u/ghoofman for both the inspirat

jjensn 16 Aug 27, 2022
Hex-Rays microcode plugin for automated simplification of Windows Kernel decompilation.

NtRays NtRays is a Hex-Rays microcode plugin for automated simplification of Windows Kernel decompilation. Features Cleanup of instrumentation and sch

Can Bölük 359 Jan 3, 2023