Blazing-fast Expression Templates Library (ETL) with GPU support, in C++


Expression Templates Library (ETL) 1.3.0

logo coverage jenkins license doc

ETL is a header only library for C++ that provides vector and matrix classes with support for Expression Templates to perform very efficient operations on them.

At this time, the library support compile-time sized matrix and vector and runtime-sized matrix and vector with all element-wise operations implemented. It also supports 1D and 2D convolution, matrix multiplication (naive algorithm and Strassen) and FFT.

You can clone this repository directly to get all ETL features. I advice using it as a submodule of your current project, but you can install it anywhere you like. There are several branches you can chose from

  • master: The main development branch
  • stable: The last stable version

You can also access by tag to a fixed version such as the tag 1.0.


The Reference Documentation is always available on the wiki. This document contains the most basic information that should be enough to get you started.

The library is header-only and does not need to be built it at all, you just have to include its header files.

Most of the headers are not meant to be included directly inside a program. Here are the header that are made to be included:

  • etl.hpp: Contains all the features of the library
  • etl_light.hpp: Contains the basic features of the library (no matrix multiplication, no convolution, no FFT)

You should always include one of these headers in your program. You should never include any other header from the library.

Data structures

Several data structures are available:

  • fast_matrix<T, Dim...>: A matrix of variadic size with elements of type T. This must be used when you know the size of the vector at compile-time. The number of dimensions can be anything. The data is stored is stored directly inside the matrix.
  • fast_dyn_matrix<T, Dim...>: Variant of fast_matrix where the data is stored on the heap.
  • dyn_matrix<T, D>: A matrix with element of type T. The size of the matrix can be set at runtime. The matrix can have D dimensions.

There also exists typedefs for vectors:

  • fast_vector<T, Rows>>
  • fast_dyn_vector<T, Rows>>
  • dyn_vector<T>

You have to keep in mind that fast_matrix directly store its values inside it, therefore, it can be very large and should rarely be stored on the stack. Moreover, that also makes it very expensive to move and copy. This is why fast_dyn_matrix may be an interesting alternative.

Element-wise operations

Classic element-wise operations can be done on vector and matrix as if it was done on scalars. Matrices and vectors can also be added,subtracted,divided, ... by scalars.

etl::dyn_vector<double> a{1.0,2.0,3.0};
etl::dyn_vector<double> b{3.0,2.0,1.0};
etl::dyn_vector<double> c;

c = 1.4 * (a + b) / b + b + a / 1.2;

All the operations are only executed once the expression is evaluated to be assigned to a data structure.

Unary operators

Several unary operators are available. Each operation is performed on every element of the vector or the matrix.

Available operators:

  • log
  • abs
  • sign
  • max/min
  • sigmoid
  • noise: Add standard normal noise to each element
  • logistic_noise: Add normal noise of mean zero and variance sigmoid(x) to each element
  • exp
  • softplus
  • bernoulli

Several transformations are also available:

  • hflip: Flip the vector or the matrix horizontally
  • vflip: Flip the vector or the matrix vertically
  • fflip: Flip the vector or the matrix horizontally and vertically. It is the equivalent of hflip(vflip(x))
  • sub: Return a sub part of the matrix. The first dimension is forced to a special value. It works with matrices of any dimension.
  • dim/row/col: Return a vector representing a sub part of a matrix (a row or a col)
  • reshape: Interpret a vector as a matrix


Several reduction functions are available:

  • sum: Return the sum of a vector or matrix
  • mean: Return the sum of a vector or matrix
  • dot: Return the dot product of two vector or matrices


The header convolution.hpp provides several convolution operations both in 1D (vector) and 2D (matrix).

The header mutiplication.hpp provides the matrix multiplication operation. mmul is the naive algorithm (ijk), which strassen_mmul implements Strassen algorithm.

It is possible to pass an expression rather than an data structure to functions. Keep in mind that expression are lazy, therefore if you pass a + b to a matrix multiplication, an addition will be run each time an element is accessed, therefore, it is not often efficient.


It is also possible to generate sequences of data and perform operations on them.

For now, two generators are available:

  • normal_generator: Generates real numbers distributed on a normal distribution
  • sequence_generator(c=0): Generates numbers in sequence from c

All sequences are considered to have infinite size, therefore, they can be used to initialize or modify any containers or expressions.


This library is completely header-only, there is no need to build it.

However, this library makes extensive use of C++17 and some features of C++20, therefore, a recent compiler is necessary to use it. This library is currently tested on the following compilers:

  • GCC 9.3.0 and greater

If compilation does not work on one of these compilers, or produces warnings, please open an issue on Github and I'll do my best to fix the issue.

The library has never been tested on Windows.

The folders include and lib/include must be included with the -I option.

There are no link-time dependencies.

If you have problems compiling this library, I'd be glad to help, but I do not guarantee that this will work on every compiler. I strongly expect it to not build under Visual Studio.


This library is distributed under the terms of the MIT license, see LICENSE file for details.

  • Expression constructor not working

    Expression constructor not working

    Great library. 👍

    However I encountered some issues with the example on README page. The code below does not compile:

    etl::dyn_vector<double> a({1.0,2.0,3.0});
    etl::dyn_vector<double> b({3.0,2.0,1.0});
    etl::dyn_vector<double> c(1.4 * (a + b) / b + b + a / 1.2);

    However this works:

    etl::dyn_vector<double> a({1.0,2.0,3.0});
    etl::dyn_vector<double> b({3.0,2.0,1.0});
    etl::dyn_vector<double> c;
    c = 1.4 * (a + b) / b + b + a / 1.2;

    I haven't investigated the code carefully, but it seems that the assignment operator is well defined but not the expression constructor. Since the example is taken from README, I suppose this is not by design. Any thoughts?

    opened by yixuan 3
  • Benchmarks?


    Out of curiosity, have you compared performance between ETL and something like Eigen or Blaze? I know at least Eigen can link against BLAS as well.

    I'm looking for a matrix library to use and came across yours.


    opened by dnbaker 3
  • GPU implementation of the Mean Squared Error loss function

    GPU implementation of the Mean Squared Error loss function

    This PR adds support for the Mean Squared Error loss function on GPU. Other PRs: dll: egblas:

    opened by ghost 2
  • Add Tensor Core support for convolutions

    Add Tensor Core support for convolutions

    This PR activates Tensor Cores for all convolution types in ETL. Activating this on GPUs that have no Tensor Cores results in worse performance (at least with a NVIDIA GTX 1650), so this should be activated on-demand. Though Tensor Cores appear to work with these changes, the performance remains the same as with them disabled (on a NVIDIA V100 GPU), so more testing is needed.

    opened by ghost 0
  • cpp_utils missing

    cpp_utils missing

    The library seem to be missing some parts: /usr/local/include/etl/std.hpp:21:32: fatal error: cpp_utils/compat.hpp: No such file or directory from "/etl/std.hpp" the inclusions are not provided .... // cpp_utils #include "cpp_utils/compat.hpp" #include "cpp_utils/tmp.hpp" #include "cpp_utils/likely.hpp" #include "cpp_utils/assert.hpp" #include "cpp_utils/parallel.hpp"

    investingating source tree, no files in git directory.

    Method of installation: git clone cp -R etl /usr/local/lib

    opened by steven-varga 5
  • 1.2.1(Jan 9, 2018)

    • Feature Support for embeddings and embedding gradients
    • Feature Support for merging matrices together
    • Feature Support for bias_batch_var_2d
    • Feature Support for dropout masks
    • Feature Support for normalization
    • Performance Vectorize hyperbolic functions
    • Performance Advanced GPU patterns detections
    • Performance Asynchronous GPU computation
    • GPU Support for uniform and normal random generators
    • GPU Support for shuffle operations
    • Bug Fix fast_dyn_matrix with bool
    • Bug Fix possible stack overflow with fast matrix and aliasing
    • Bug Correctly handle aliasing in assignable (sub_view for instance)
    • Bug Fix small compilation bug with sub_matrix
    • Bug Fix CPU/GPU consistency bug with iterators
    • Bug Fix bug with GPU convolution flipping
    Source code(tar.gz)
    Source code(zip)
  • 1.2(Oct 1, 2017)

    • Feature GPU support for basic expressions (such as c = 1.0 * b + d + e - 1.0)
    • Feature GPU Support for unary and binary operators
    • Feature Support for convolutions for matrices of different data types
    • Feature Support for log2 / log10
    • Feature Default selection of algorithms by default
    • Feature Support for categorical cross entropy loss and error
    • Feature Improve support for complex numbers and etl::complex
    • Performance Improved performance of using parallel BLAS
    • Misc Full cleanup of the traits
    • Misc Use of variable templates (C++14) for the traits
    • Misc Improved support for clang
    • Misc Reduced compilation time for non-tests / non-benchmark code
    • Misc Reduce durations of the tests
    • Misc Preliminary C++17 if constexpr support
    • Bug Fix bug in the GEMM kernel for CM = CM * CM
    • Bug Vectorization bug for binary operations with different data types
    • Bug GPU memory was not correctly handled when std::move is used
    Source code(tar.gz)
    Source code(zip)
  • 1.1(Aug 9, 2017)

    • Performance Better dispatching for alignment
    • Performance Much faster multiplications between matrices of different major
    • Performance Highly improved performed of multiplications with transpose
    • Performance Vectorization of signed integer operations
    • Performance Faster CPU convolutions
    • Performance Better parallelization of convolutions
    • Performance Much better GEMM/GEMV/GEVM kernels (when BLAS not available)
    • Performance Reduced overhead for 3D/4D matrices access by indices
    • Performance Use of non-temporal stores for large matrices
    • Performance Forced alignment of matrices
    • Performance Force basic padding of vectors
    • Performance Better thread reuse
    • Performance Faster dot product
    • Performance Faster batched outer product
    • Performance Better usage of FMA
    • Performance SSE/AVX double-precision exponentiation
    • Performance Much faster Pooling for various dimensions
    • Feature: Sub matrices in 2D, 3D and 4D
    • Feature Helpers for Machine Learning
    • Feature Comparisons operators and functions equal, not_equal, almost_equal
    • Feature Logical operators for boolean containers
    • Feature Shuffle and noise can now operate on custom random engines
    • Feature Pooling with stride is now supported
    • Feature Custom fast and dyn matrices support
    • Feature Matrices and vectors slices view
    • Feature Deeper pooling support
    • Feature bias_add (2D and 4D) (Machine Learning)
    • Feature bias_batch_mean (2D and 4D) (Machine Learning)
    • Feature Transposed convolution
    • GPU Better usage of contexts
    • GPU Pooling and Upsample support
    • GPU batch_outer support
    • GPU sigmoid and RELU and derivatives
    • GPU Memory pool handling
    • GPU Avoid a lot of temporaries
    • Misc Reduced duplications in the code base
    • Misc Simplifications of the iterators to DMA expressions
    • Misc Faster compilation of the test cases
    • Misc Generalized SSE/AVX versions into VEC versions
    • Misc Reviewed completely temporary expressions
    • Bug Lots of small fixes
    • Bug Transpose on GPU was not working on column major matrix
    • Bug 4D Pooling
    • Bug Q/R Decomposition
    Source code(tar.gz)
    Source code(zip)
  • 1.0(Aug 9, 2017)

    Initial version (was rolling released before) with the following main features:

    • Smart Expression Templates
    • Matrix and vector (runtime-sized and compile-time-sized)
    • Simple element-wise operations
    • Reductions (sum, mean, max, ...)
    • Unary operations (sigmoid, log, exp, abs, ...)
    • Matrix multiplication
    • Convolution (1D and 2D and higher variations)
    • Max Pooling
    • Fast Fourrier Transform
    • Use of SSE/AVX to speed up operations
    • Use of BLAS/MKL/CUBLAS/CUFFT/CUDNN libraries to speed up operations
    • Symmetric matrix adapter (experimental)
    • Sparse matrix (experimental)
    Source code(tar.gz)
    Source code(zip)
Baptiste Wicht
Ph.D. in computer science (Deep Learning), fan of C++, performance optimizations, compiler theory, operating systems development.
Baptiste Wicht
C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators

Matrix This is a math and arithmetic matrix library. It has stood many years of performing in mission critical production for financial systems. It ha

Hossein Moein 69 Jul 29, 2022
tiny recursive descent expression parser, compiler, and evaluation engine for math expressions

TinyExpr TinyExpr is a very small recursive descent parser and evaluation engine for math expressions. It's handy when you want to add the ability to

Lewis Van Winkle 1.1k Jul 27, 2022
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs,

ArrayFire 3.9k Aug 7, 2022
Math library using hlsl syntax with SSE/NEON support

HLSL++ Small header-only math library for C++ with the same syntax as the hlsl shading language. It supports any SSE (x86/x64 devices like PC, Mac, PS

null 297 Aug 2, 2022
SymEngine is a fast symbolic manipulation library, written in C++

SymEngine SymEngine is a standalone fast C++ symbolic manipulation library. Optional thin wrappers allow usage of the library from other languages, e.

null 879 Aug 5, 2022
Kraken is an open-source modern math library that comes with a fast-fixed matrix class and math-related functions.

Kraken ?? Table of Contents Introduction Requirement Contents Installation Introduction Kraken is a modern math library written in a way that gives ac

yahya mohammed 24 Mar 28, 2022
SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena.

simd_fastinvsqrt SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena. Why Why not. How This video explai

Liam 7 Jan 28, 2022
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)

KFR - Fast, modern C++ DSP framework Compiler support: KFR is an open source C++ DSP framework that focuses on high performance (s

KFR 1.3k Aug 7, 2022
Fast math tool written on asm/c

math_tool fast math tool written on asm/c This project was created for easy use of mathematical / geometric rules and operations. This project contain

portable executable 3 Mar 8, 2022
MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

MIRACL What is MIRACL? Multiprecision Integer and Rational Arithmetic Cryptographic Library – the MIRACL Crypto SDK – is a C software library that is

MIRACL 490 Jul 28, 2022
A C library for statistical and scientific computing

Apophenia is an open statistical library for working with data sets and statistical or simulation models. It provides functions on the same level as t

null 184 Jun 17, 2022
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library

Klein ?? ?? Project Site ?? ?? Description Do you need to do any of the following? Quickly? Really quickly even? Projecting points onto lines, lines t

Jeremy Ong 600 Jul 29, 2022
linalg.h is a single header, public domain, short vector math library for C++

linalg.h linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute la

Sterling Orsten 728 Jul 26, 2022
LibTomMath is a free open source portable number theoretic multiple-precision integer library written entirely in C.

libtommath This is the git repository for LibTomMath, a free open source portable number theoretic multiple-precision integer (MPI) library written en

libtom 523 Jul 30, 2022
a lean linear math library, aimed at graphics programming. Supports vec3, vec4, mat4x4 and quaternions

linmath.h -- A small library for linear math as required for computer graphics linmath.h provides the most used types required for programming compute

datenwolf 692 Jul 28, 2022
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

OpenBLAS Travis CI: AppVeyor: Drone CI: Introduction OpenBLAS is an optimized BLAS (Basic Linear Algebra Subprograms) library based on GotoBLAS2 1.13

Zhang Xianyi 4.7k Aug 1, 2022
The QuantLib C++ library

QuantLib: the free/open-source library for quantitative finance The QuantLib project ( is aimed at providing a comprehensive softw

Luigi Ballabio 3.3k Jul 29, 2022
A C++ header-only library of statistical distribution functions.

StatsLib StatsLib is a templated C++ library of statistical distribution functions, featuring unique compile-time computing capabilities and seamless

Keith O'Hara 385 Jul 26, 2022
nml is a simple matrix and linear algebra library written in standard C.

nml is a simple matrix and linear algebra library written in standard C.

Andrei Ciobanu 35 Jul 5, 2022