Enoki: structured vectorization and differentiation on modern processor architectures

Related tags

Miscellaneous enoki
Overview

Enoki logo

Enoki — structured vectorization and differentiation on modern processor architectures

Documentation Linux Windows
docs rgl-ci appveyor

Introduction

Enoki is a C++ template library that enables automatic transformations of numerical code, for instance to create a "wide" vectorized variant of an algorithm that runs on the CPU or GPU, or to compute gradients via transparent forward/reverse-mode automatic differentation.

The core parts of the library are implemented as a set of header files with no dependencies other than a sufficiently C++17-capable compiler (GCC >= 8.2, Clang >= 7.0, Visual Studio >= 2017). Enoki code reduces to efficient SIMD instructions available on modern CPUs and GPUs—in particular, Enoki supports:

  • Intel: AVX512, AVX2, AVX, and SSE4.2,
  • ARM: NEON/VFPV4 on armv7-a, Advanced SIMD on 64-bit armv8-a,
  • NVIDIA: CUDA via a Parallel Thread Execution (PTX) just-in-time compiler.
  • Fallback: a scalar fallback mode ensures that programs still run even if none of the above are available.

Deploying a program on top of Enoki usually serves three goals:

  1. Enoki ships with a convenient library of special functions and data structures that facilitate implementation of numerical code (vectors, matrices, complex numbers, quaternions, etc.).

  2. Programs built using these can be instantiated as wide versions that process many arguments at once (either on the CPU or the GPU).

    Enoki is also structured in the sense that it handles complex programs with custom data structures, lambda functions, loadable modules, virtual method calls, and many other modern C++ features.

  3. If derivatives are desired (e.g. for stochastic gradient descent), Enoki performs transparent forward or reverse-mode automatic differentiation of the entire program.

Finally, Enoki can do all of the above simultaneously: if desired, it can compile the same source code to multiple different implementations (e.g. scalar, AVX512, and CUDA+autodiff).

Motivation

The development of this library was prompted by the author's frustration with the current vectorization landscape:

  1. Auto-vectorization in state-of-the-art compilers is inherently local. A computation whose call graph spans separate compilation units (e.g. multiple shared libraries) simply can't be vectorized.

  2. Data structures must be converted into a Structure of Arrays (SoA) layout to be eligible for vectorization.

    SoA layout

    This is analogous to performing a matrix transpose of an application's entire memory layout—an intrusive change that is likely to touch almost every line of code.

  3. Parts of the application likely have to be rewritten using intrinsic instructions, which is going to look something like this:

    intrinsics

    Intrinsics-heavy code is challenging to read and modify once written, and it is inherently non-portable. CUDA provides a nice language environment for programming GPUs but does nothing to help with the other requirements (vectorization on CPUs, automatic differentiation).

  4. Vectorized transcendental functions (exp, cos, erf, ..) are not widely available. Intel, AMD, and CUDA provide proprietary implementations, but many compilers don't include them by default.

  5. It is desirable to retain both scalar and vector versions of an algorithm, but ensuring their consistency throughout the development cycle becomes a maintenance nightmare.

  6. Domain-specific languages (DSLs) for vectorization such as ISPC address many of the above issues but assume that the main computation underlying an application can be condensed into a compact kernel that is implementable using the limited language subset of the DSL (e.g. plain C in the case of ISPC).

    This is not the case for complex applications, where the "kernel" may be spread out over many separate modules involving high-level language features such as functional or object-oriented programming.

What Enoki does differently

Enoki addresses these issues and provides a complete solution for vectorizing and differentiating modern C++ applications with nontrivial control flow and data structures, dynamic memory allocation, virtual method calls, and vector calls across module boundaries. It has the following design goals:

  1. Unobtrusive. Only minor modifications are necessary to convert existing C++ code into its Enoki-vectorized equivalent, which remains readable and maintainable.

  2. No code duplication. It is generally desirable to provide both scalar and vectorized versions of an API, e.g. for debugging, and to preserve compatibility with legacy code. Enoki code extensively relies on class and function templates to achieve this goal without any code duplication—the same code template can be leveraged to create scalar, CPU SIMD, and GPU implementations, and each variant can provide gradients via automatic differentiation if desired.

  3. Custom data structures. Enoki can also vectorize custom data structures. All the hard work (e.g. conversion to SoA format) is handled by the C++17 type system.

  4. Function calls. Vectorized calls to functions in other compilation units (e.g. a dynamically loaded plugin) are possible. Enoki can even vectorize method or virtual method calls (e.g. instance->my_function(arg1, arg2, ...); when instance turns out to be an array containing many different instances).

  5. Mathematical library. Enoki includes an extensive mathematical support library with complex numbers, matrices, quaternions, and related operations (determinants, matrix, inversion, etc.). A set of transcendental and special functions supports real, complex, and quaternion-valued arguments in single and double-precision using polynomial or rational polynomial approximations, generally with an average error of <1/2 ULP on their full domain. These include exponentials, logarithms, and trigonometric and hyperbolic functions, as well as their inverses. Enoki also provides real-valued versions of error function variants, Bessel functions, the Gamma function, and various elliptic integrals.

    Transcendentals

    Importantly, all of this functionality is realized using the abstractions of Enoki, which means that it transparently composes with vectorization, the JIT compiler for generating CUDA kernels, automatic differentiation, etc.

  6. Portability. When creating vectorized CPU code, Enoki supports arbitrary array sizes that don't necessarily match what is supported by the underlying hardware (e.g. 16 x single precision on a machine, whose SSE vector only has hardware support for 4 x single precision operands). The library uses template metaprogramming techniques to efficiently map array expressions onto the available hardware resources. This greatly simplifies development because it's enough to write a single implementation of a numerical algorithm that can then be deployed on any target architecture. There are non-vectorized fallbacks for everything, thus programs will run even on unsupported architectures (albeit without the performance benefits of vectorization).

  7. Modular architecture. Enoki is split into two major components: the front-end provides various high-level array operations, while the back-end provides the basic ingredients that are needed to realize these operations using the SIMD instruction set(s) supported by the target architecture.

    The CPU vector back-ends e.g. make heavy use of SIMD intrinsics to ensure that compilers generate efficient machine code. The intrinsics are contained in separate back-end header files (e.g. array_avx.h for AVX intrinsics), which provide rudimentary arithmetic and bit-level operations. Fancier operations (e.g. atan2) use the back-ends as an abstract interface to the hardware, which means that it's simple to support other instruction sets such as a hypothetical future AVX1024 or even an entirely different architecture (e.g. a DSP chip) by just adding a new back-end.

  8. License. Enoki is available under a non-viral open source license (3-clause BSD).

Cloning

Enoki depends on two other repositories (pybind11 and cub) that are required when using certain optional features, specifically differentiable GPU arrays with Python bindings.

To fetch the entire project including these dependencies, clone the project using the --recursive flag as follows:

$ git clone --recursive https://github.com/mitsuba-renderer/enoki

Documentation

An extensive set of tutorials and reference documentation are available at readthedocs.org.

About

This project was created by Wenzel Jakob. It is named after Enokitake, a type of mushroom with many long and parallel stalks reminiscent of data flow in vectorized arithmetic.

Enoki is the numerical foundation of version 2 of the Mitsuba renderer, though it is significantly more general and should be a trusty tool for a variety of simulation and optimization problems.

When using Enoki in academic projects, please cite

@misc{Enoki,
   author = {Wenzel Jakob},
   year = {2019},
   note = {https://github.com/mitsuba-renderer/enoki},
   title = {Enoki: structured vectorization and differentiation on modern processor architectures}
}
Issues
  • How to pybind FloatC and FloatD

    How to pybind FloatC and FloatD

    I am rendering a gradient image using enoki CUDA array. Is there any suggestion on how to store the c++ cuda array FloatC and FloatD (or vector) into python so I can call backward in python for optimization? I didn't see there is a binding for that in enoki/python.h

    opened by andyyankai 17
  • Enoki PTX linker error

    Enoki PTX linker error

    Hey all!

    First of all thank you very much for publishing/releasing mitsuba2! I wanted to start experimenting with inverse rendering and tried multiple platforms (Google Colab and my own hardware), but I keep facing the exact same issue everywhere:

    import mitsuba                                                                              
    mitsuba.set_variant('gpu_autodiff_rgb') 
    
    # The C++ type associated with 'Float' is enoki::DiffArray<enoki::CUDAArray<float>> 
    from mitsuba.core import Float 
    import enoki as ek 
    
    # Initialize a dynamic CUDA floating point array with some values 
    x = Float([1, 2, 3])                                                                        
    # Tell Enoki that we'll later be interested in gradients of 
    # an as-of-yet unspecified objective function with respect to 'x' 
    ek.set_requires_gradient(x) 
    
    # Example objective function: sum of squares 
    y = ek.hsum(x * x) 
    

    PTX linker error: ptxas fatal : SM version specified by .target is higher than default SM version assumed cuda_check(): driver API error = 0400 "CUDA_ERROR_INVALID_HANDLE" in ../ext/enoki/src/cuda/jit.cu:253.

    I've tried different GPU's and the results are: | GPU | Driver version | CUDA version | Result | Computing Capability | | ------------ | ------------- | ------------- | ------------- | ----------- | | Geforce 940M | 440.64 | 10.0.130 | Fails | 5.0 | | K80 | 418.67 | 10.0.130 | Fails | 3.7 | | Tesla P4 | 418.67 | 10.0.130 | WORKS | 6.1 | | P100 | 418.67 | 10.0.130 | Fails | 6.0 |

    -> The weird thing is that the issue does not occur on a Tesla P4 but it does on all the others

    Does anyone have an idea what can cause this and how I can fix it?

    Thanks a lot! Pieterjan

    opened by PidgeyBE 12
  • Confusion regarding mask types

    Confusion regarding mask types

    I wrote a simple test to try enoki. However, I am unable to perform simple comparison operations due to type differences. Documentation states that return type of operator< and neq is mask_t<Array>. However, types of result1 and result2 variable in the following code are different.

    import enoki as ek
    
    def myfunc(arr1, arr2):
      result1 = ek.dot(arr1, arr1) < 0
      result2 = ek.neq(arr2, 0)
      print(type(result1), type(result2))
      return result1, result2
    
    def test_scalar():
      from enoki import scalar
      arr1 = scalar.Vector1f([1])
      arr2 = scalar.Vector1f([2])
      res = myfunc(arr1, arr2)
    
    
    def test_cuda():
      from enoki import cuda
      arr1 = cuda.Vector1f([1])
      arr2 = cuda.Vector1f([2])
      res = myfunc(arr1, arr2)
    
    
    if __name__ == '__main__':
      test_scalar()
      test_cuda()
    

    Output in scalar mode gives below. According to my understanding this is because the output of dot operation is converted to py::float. Is there a way to perform comparison without explicitly casting to bool in this case?

    <class 'bool'> <class 'enoki.scalar.Vector1m'>
    

    Output in cuda mode gives below. The difference between these types is unclear to me. Can you kindly give more details?

    class 'enoki.cuda.Mask'> <class 'enoki.cuda.Vector1m'>
    
    opened by arpit15 10
  • Dynamic Complex Arrays

    Dynamic Complex Arrays

    What's the correct way to work with dynamic arrays of complex numbers?

    AFAICS there are two possible ways: Complex<DynamicArray<FloatP>> andDynamicArray<Complex<FloatP>>. The first seems to work somehow, but unfortunately it is not possible to use the map function with it. The second version, on the other hand, allows me to map existing memory but fails with most other functions.

    Thanks in advance!

    opened by gergol 6
  • The behavior of enoki::hsum does not correspond to the documentation

    The behavior of enoki::hsum does not correspond to the documentation

    The documentation writes:

    template<typename Array>
    value_t<Array> hsum(Array value)
    

    However, hsum(DiffArray<CUDAArray<float>> value) returns still DiffArray<CUDAArray<float>>, not value_t<DiffArray<CUDAArray<float>>>, which is float. Example code:

    DiffArray<CUDAArray<float>> dc = 1.f;
    value_t<DiffArray<CUDAArray<float>>> vdc = hsum(dc);
    

    compiler complains:

    error: no viable conversion from 'enoki::DiffArray<CUDAArray<float>>' to 'value_t<DiffArray<CUDAArray<float>>>' (aka 'float')
            value_t<DiffArray<CUDAArray<float>>> vdc = hsum(dc);
                                                 ^     ~~~~~~~~
    

    Possible relevant function: in include/enoki/autodiff.h, line 1052:

    DiffArray hsum_() const {
            if constexpr (is_mask_v<Type> || std::is_pointer_v<Scalar>) {
                fail_unsupported("hsum_");
            } else {
                Index index_new = 0;
                if constexpr (Enabled)
                    index_new = tape()->append("hsum", 1, m_index, 1.f);
    
    	    std::cout << "here" << std::endl;
    
                return DiffArray::create(index_new, hsum(m_value));
            }
        }
    

    which returns a DiffArray. detach(DiffArray<CUDAArray<float>> value) returns CUDAArray<float>, but hsum(CUDAArray<float> value) returns still CUDAArray<float>, not value_t<CUDAArray<float>>, which is float:

    error: no viable conversion from 'enoki::CUDAArray<float>' to 'value_t<CUDAArray<float>>' (aka 'float')
            value_t<CUDAArray<float>> vc = hsum(detach(dc));
                                      ^    ~~~~~~~~~~~~~~~~
    

    Possible relevant function: in include/enoki/cuda.h, line 639:

    CUDAArray hsum_() const {
            size_t n = size();
            if (n == 1) {
                return *this;
            } else {
                eval();
                Value *result = cuda_hsum(n, (const Value *) cuda_var_ptr(m_index));
                return CUDAArray::from_index_(cuda_var_register(Type, 1, result, true));
            }
        }
    

    which returns a CUDAArray.

    This problem should also be present for other relevant functions such as hmax() and hprod().

    opened by RiverIntheSky 5
  • Is enoki support array index?

    Is enoki support array index?

    for example A = [1, 2, 3, 4, 5] I = [0, 1, 1, 2] then A[I] = [A[0], A[1], A[1], A[2]] = [1, 2, 2, 3]

    or

        FloatD some = {2.3f, 3.4f, 4.5f, 5.6f, 6.7f, 7.8f};
        IntC index = {1,1,3,3,5};
    
        FloatD check = some[index];
    
        value of check should be: [3.4f, 3.4f, 5.6f, 5.6f, 7.8f]
    
    opened by andyyankai 5
  • memcpy from one blob to another using enoki

    memcpy from one blob to another using enoki

    Consider this simple code:

    void bar(const char* src, int src_size, char* dst, int dst_size) { 
      assert(src_size == dst_size);
    
      for (int i = 0; i < src_size; ++i) { 
        *dst++ = *src++;
      } 
    }
    

    this code generates the following assembly (only the loop part is shown here):

      40d7c8:       c5 fe 6f 04 07          vmovdqu ymm0,YMMWORD PTR [rdi+rax*1]
      40d7cd:       c5 fe 7f 04 02          vmovdqu YMMWORD PTR [rdx+rax*1],ymm0
      40d7d2:       48 83 c0 20             add    rax,0x20
      40d7d6:       48 39 c8                cmp    rax,rcx
      40d7d9:       75 ed                   jne    40d7c8 <bar(char const*, int, char*, int)+0x28>
    

    gcc is smart enough to vectorize this loop and copy chunks of 32 bytes.

    Now consider this code written with enoki:

    void foo(const char* src, int src_size, char* dst, int dst_size) {
      using Array = enoki::Array<int, 8>;
    
      auto es = enoki::DynamicArray<Array>::map(src, src_size);
      auto ed = enoki::DynamicArray<Array>::map(dst, dst_size);
      for (int i = 0; i < (int)es.packets(); ++i) {
        const auto& pkt = es.packet(i);
        auto& dst_pkt = ed.packet(i);
        dst_pkt = pkt;
      }
    }
    

    This code generated this assembly:

      40d850:       c5 fd 6f 04 07          vmovdqa ymm0,YMMWORD PTR [rdi+rax*1]
      40d855:       c5 fd 7f 04 02          vmovdqa YMMWORD PTR [rdx+rax*1],ymm0
      40d85a:       48 83 c0 20             add    rax,0x20
      40d85e:       48 39 c8                cmp    rax,rcx
      40d861:       75 ed                   jne    40d850 <foo(char const*, int, char*, int)+0x20>
    

    So almost the same code (except for aligned read).

    Now I wanted to change the code so use two ymm registers to unroll this loop further. so I changed the Array in above code to

    using Array = enoki::Array<int, 16>;
    

    The assembly generated with 16 byte array is this:

      40d850:       c5 f9 6f 04 07          vmovdqa xmm0,XMMWORD PTR [rdi+rax*1]
      40d855:       c5 f8 29 04 02          vmovaps XMMWORD PTR [rdx+rax*1],xmm0
      40d85a:       c5 f9 6f 4c 07 10       vmovdqa xmm1,XMMWORD PTR [rdi+rax*1+0x10]
      40d860:       c5 f8 29 4c 02 10       vmovaps XMMWORD PTR [rdx+rax*1+0x10],xmm1
      40d866:       c5 f9 6f 54 07 20       vmovdqa xmm2,XMMWORD PTR [rdi+rax*1+0x20]
      40d86c:       c5 f8 29 54 02 20       vmovaps XMMWORD PTR [rdx+rax*1+0x20],xmm2
      40d872:       c5 f9 6f 5c 07 30       vmovdqa xmm3,XMMWORD PTR [rdi+rax*1+0x30]
      40d878:       c5 f8 29 5c 02 30       vmovaps XMMWORD PTR [rdx+rax*1+0x30],xmm3
      40d87e:       48 83 c0 40             add    rax,0x40
      40d882:       48 39 c8                cmp    rax,rcx
      40d885:       75 c9                   jne    40d850 <foo(char const*, int, char*, int)+0x20>
    

    So instead of using two ymm registers, it uses 4 xmm registers. I find this quite odd. Do you have any idea why did enoki do that?

    opened by skgbanga 5
  • `replace_scalar_t` with pointer type (for `slice_ptr`)

    `replace_scalar_t` with pointer type (for `slice_ptr`)

    This use-case comes when taking the slice_ptr of a struct containing a pointer-typed field. These tests used to compile and pass as of 59b4565.

    I'm not sure whether the new tests represent the desired behavior, but they identify the following regression:

    1. Declare a struct Test containing a pointer field with using Ptr = replace_scalar_t<T, Test *>;
    2. Take slice_ptr(test)
    3. This will instantiate Test<float *>. But because of the changed behavior of replace_scalar_t, now Ptr = float * instead of float **.
    4. Compilation error (tested on Linux with clang version 7.0.1 (tags/RELEASE_701/rc3 348918))

    • [x] Identify issue
    • [x] Run git bisect. First bad commit: df53c27c317542b67214e4e39accf751ace26e86
    • [ ] Provide fix or workaround
    opened by merlinND 5
  • Gather: test for pointer types

    Gather: test for pointer types

    I wrote this test case while trying to track down a bug (unrelated to Enoki); since it's there I figured it was worth proposing to merge it :) I think it's complementary to the existing test cases for gather.

    @wjakob, let me know if you would like to keep it, I'll try to fix the Windows build if you do.

    opened by merlinND 5
  • `reinterpret_array` for pointer types

    `reinterpret_array` for pointer types

    Requires #6.

    This is intended to cover a common use-case in Mitsuba2:

    using ObjectPtr = Array<Object *, PacketSize>;
    using EmitterPtr = Array<Emitter *, PacketSize>;
    void my_function(ObjectPtr objects) {
        auto emitters = enoki::reinterpret_array<EmitterPtr>(objects);
        // (...)
    }
    

    The new test is currently failing, due to (it seems) a weird behavior in mask equality check, which I will investigate soon.

    opened by merlinND 5
  • Documentation: propose small improvements

    Documentation: propose small improvements

    Hi Wenzel,

    This PR contains a few tiny changes to the documentation. Below, I will list some remarks that I'm less sure about.

    Otherwise, the documentation is great to read! Every time there's a complex or unexpected notion, all explanations are given with just the right amount of detail. The figures are also really neat.

    Best, Merlin

    opened by merlinND 5
  • Enoki does not generate fma instruction for fmadd with Array<float, 1> and Clang

    Enoki does not generate fma instruction for fmadd with Array and Clang

    Working example here: https://godbolt.org/z/rPbr91Gfb

    Consider the following code:

    #include <enoki/array.h>
    
    template<class VecT>
    void fma_foo(
        void* x,
        void* y,
        void* z,
        void* target
    ) {
        VecT res = enoki::fmadd(enoki::load<VecT>(x), enoki::load<VecT>(y), enoki::load<VecT>(z));
    
        enoki::store(target, res);
    }
    

    The following two instantiations work as expected:

    //Correct
    template
    void fma_foo<float>(
        void* x,
        void* y,
        void* z,
        void* target
    );
    
    //Correct
    template
    void fma_foo<enoki::Array<float, 8>>(
        void* x,
        void* y,
        void* z,
        void* target
    );
    

    and generate a vfmadd132.. instruction

    This one does not:

    //Uh Oh
    template
    void fma_foo<enoki::Array<float, 1>>(
        void* x,
        void* y,
        void* z,
        void* target
    );
    

    instead of fma, it generates

    vmulss  xmm0, xmm0, dword ptr [rsi]
    vaddss  xmm0, xmm0, dword ptr [rdx]
    

    This issue is present on all versions of Clang (9+), but does not seem to exist with GCC

    Can you fix this on your side or should I file a bug with Clang?

    opened by robinchrist 0
  • How to use binary_search overloads in python.

    How to use binary_search overloads in python.

    Basically, I want to do this kind of computation:

    x = Float.linspace(0, 1, 100)
    
    def f(i : UInt32) -> Mask:
        return ek.gather(x, i) > 0.5
    
    ek.binary_search(0, 100, f)
    

    With Float and UInt32 being cuda arrays. But I get the following error: Unable to cast Python instance of type <class 'enoki.cuda_autodiff.Mask'> to C++ type 'bool' It seems like pybind doesn't choose the correct overloaded method. Am I missing something?

    opened by LeonardEyer 0
  • Enoki cannot build on VS2019 16.10

    Enoki cannot build on VS2019 16.10

    16.9 works pretty good. Just to notify here if anyone have any ideas. It seems Visual studio recent update have trouble with cuda related building 0.0

    opened by andyyankai 11
  • Fix documentation build errors for Sphinx > 4.0

    Fix documentation build errors for Sphinx > 4.0

    Sphinx 4.0 removed the add_stylesheet() API, which was deprecated in Sphinx 1.8 in favor of add_css_file(). Switched to the newer API, should work in Sphinx >= 1.8.

    Also, fixed build errors resulting from two missing line folds and a small build warning.

    opened by lplatz-gh 0
  • Fix documentation of python bindings

    Fix documentation of python bindings

    The documentation on the python bindings was deprecated:

    • enoki requires a C++17 capable compiler to build. The given CMakeLists.txt file only tested for C++14 compatibility
    • the vectorize_wrapper function was moved to the dynamic.h header file in 7b566930, so this header has to be imported by the example, too.
    opened by lplatz-gh 0
  • Let user override ENOKI_NATIVE_FLAGS and ENOKI_ARCH_FLAGS

    Let user override ENOKI_NATIVE_FLAGS and ENOKI_ARCH_FLAGS

    PR as a suggestion. I got an illegal instruction signal (SIGILL) on import mitsuba when building on a different system with a more recent CPU than my own computer. The cause is -march=native. I didn't find any way to override this flag without changing Enoki CMakeLists.txt. I tried to change as little as possible code to allow overriding ENOKI_NATIVE_FLAGS and ENOKI_ARCH_FLAGS at the CMake command line to fix the SIGILL.

    opened by mlamarre 0
Owner
Mitsuba Physically Based Renderer
A Retargetable Forward and Inverse Renderer
Mitsuba Physically Based Renderer
Performance Evaluation of a Parallel Image Enhancement Technique for Dark Images on Multithreaded CPU and GPU Architectures

Performance Evaluation of a Parallel Image Enhancement Technique for Dark Images on Multithreaded CPU and GPU Architectures Image processing is a rese

Batuhan Hangün 5 Nov 4, 2021
"Zero setup" cross-compilation for a wide variety of architectures.

"Zero setup" cross-compilation for a wide variety of architectures. xcross includes compact docker images and a build utility for minimal setup C/C++ cross-compiling, inspired by rust-embedded/cross

Alexander Huszagh 24 May 20, 2022
StarPU: A Unified Runtime System for Heterogeneous Multicore Architectures

StarPU: A Unified Runtime System for Heterogeneous Multicore Architectures coverage report What is StarPU? StarPU is a runtime system that offers supp

null 13 Jul 6, 2022
Generating block-structured grids for ocean domains

This is an implementation of the Paper "Automatic Generation of Load-Balancing-Aware Block-Structured Grids for Complex Ocean Domains" presented at th

null 2 Feb 10, 2022
a playground for working with fully static tensors and automatic differentiation

This is a playground for learning about how to apply template-meta-programming to get more efficient evaluation for tensor-based automatic differentiation.

Edward Kmett 16 Mar 18, 2021
Automatic differentiation with weighted finite-state transducers.

GTN: Automatic Differentiation with WFSTs Quickstart | Installation | Documentation What is GTN? GTN is a framework for automatic differentiation with

null 91 Jul 25, 2022
MissionImpossible - A concise C++17 implementation of automatic differentiation (operator overloading)

Mission : Impossible (AutoDiff) Table of contents What is it? News Compilation Meson CMake Examples Jacobian example Complex number example Hessian ac

pixor 18 Jun 1, 2022
This program designed to stress both the processor and memory.

StressTest What it is This program designed to stress both the processor and memory. Firstly it comlete data into memory so the memory load upto 90%(d

null 2 Oct 17, 2021
CERBERUS 2080™, the amazing multi-processor 8-bit microcomputer

CERBERUS 2080™ CERBERUS 2080™, the amazing multi-processor 8-bit microcomputer: a fully open-source project available for anyone to peruse, build, mod

The Byte Attic 66 Jun 22, 2022
A transaction processor for a hypothetical, general-purpose, central bank digital currency

Introduction OpenCBDC is a technical research project focused on answering open questions surrounding central bank digital currencies (CBDCs). This re

The MIT Digital Currency Initiative @ Media Lab 733 Jul 29, 2022
Gaming meets modern C++ - a fast and reliable entity component system (ECS) and much more

EnTT is a header-only, tiny and easy to use library for game programming and much more written in modern C++. Among others, it's used in Minecraft by

Michele Caini 7k Aug 6, 2022
Nameof operator for modern C++, simply obtain the name of a variable, type, function, macro, and enum

_ _ __ _____ | \ | | / _| / ____|_ _ | \| | __ _ _ __ ___ ___ ___ | |_ | | _| |

Daniil Goncharov 1.4k Aug 3, 2022
🏢 A bold, unapologetic, and honest operating system written in modern C

A bold, unapologetic, and honest operating system written in modern C About Striking modernist shapes and bold use of modern C are the hallmarks of BR

Brutal 835 Jul 31, 2022
🏢 An operating system that combine the desire of UNIX utopia from the 1970s with modern technology and engineering

Striking modernist shapes and bold use of modern C are the hallmarks of BRUTAL. BRUTAL combine the desire of UNIX utopia from the 1970s with modern te

Brutal 836 Jul 31, 2022
"Sigma File Manager" is a free, open-source, quickly evolving, modern file manager (explorer / finder) app for Windows, MacOS, and Linux.

"Sigma File Manager" is a free, open-source, quickly evolving, modern file manager (explorer / finder) app for Windows, MacOS, and Linux.

Aleksey Hoffman 898 Aug 7, 2022
Ceres is designed to be a modern and minimalistic C like language.

Ceres v0.0.1 Ceres is designed to be a modern and minimalistic C like language. For now, it will be interpreted but later on I do want to write a comp

null 9 May 18, 2022
External CS:GO hack for Arduino written using modern C++ and WinAPI

SQ Project CSGO Arduino Edition External CS:GO hack for Arduino written using C++ and WinAPI. Special thanks to hazedumper for hazedumper. Shock Byte

Klim Markevich 29 Jul 19, 2022
Modern C++ 20 compile time OpenAPI parser and code generator implementation

OpenApi++ : openapipp This is a proof of concept, currently under active work to become the best OpenAPI implementation for C++. It allows compile tim

tipi.build 5 Apr 8, 2022