A thin, highly portable C++ intermediate representation for dense loop-based computation.

Related tags

Math loop_tool
Overview

loop_tool

loop_tool is an experimental, lightweight, and highly-portable linear algebra toolkit.

Install

pip install loop_tool_py
python -c 'import loop_tool_py as lt; print(lt.backends())'

Tutorial

A Python notebook tutorial can be found here: https://github.com/facebookresearch/loop_tool/blob/main/tutorial.ipynb

Build C++ API from source

To build the C++ API from source, clone this repo and use cmake:

git clone https://github.com/facebookresearch/loop_tool.git
mkdir -p build; cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Python

To build the Python bindings from source, install pybind11:

pip install pybind11 # or conda
python setup.py install

Run

If you have CUDA, check out the demo bench.py file:

python test/bench.py

This will sweep a couple of configurations for a simple pointwise addition. All driven from Python (~100k runs per benchmark), this should be able to find a schedule that hits ~70% of peak bandwidth regardless of GPU.

Extra builds/tests

JavaScript

To build a JavaScript target, specify your emcc directory to cmake and rebuild. This will create two extra files (loop_tool.js and loop_tool.wasm).

EMCC_DIR=$(dirname emcc) cmake ..
make -j$(nproc)

Tests

After building, either run the test binaries or the language tests:

./build/loop_tool_test
PYTHONPATH=build python test/test.py
NODE_PATH=build node test/test.js

License

loop_tool is MIT licensed, as found in the LICENSE file.

Issues
  • Bug with Loop Nest

    Bug with Loop Nest

    Loop Nest fails to evaluate the following example:

    for m_1678 in 64 : L0 for n_1680 in 16 : L1 for k_1679 in 256 : L2 for m_1678' in 4 : L3 for n_1680' in 4 : L4 for n_1680'' in 4 : L5 %2[m_1678, k_1679, n_1680] <- multiply(%0, %1) %3[m_1678, n_1680] <- add(%2) for m_1678' in 4 : L8 for n_1680 in 16 : L9 for n_1680' in 16 : L10 %4[m_1678, n_1680] <- write(%3)

    Reproducer:

    import loop_tool as lt
    import numpy as np
    import pdb
    def mm(A, B):
        s = lt.SymbolGenerator()
        C = A.to(s.m, s.k) * B.to(s.k, s.n)
        return C.sum(s.k)
    
    m, n, k = 256, 256, 256
    A = lt.Tensor(m, k).set(np.random.randn(m, k))
    B = lt.Tensor(k, n).set(np.random.randn(k, n))
    
    s = lt.SymbolGenerator()
    C = mm(A, B)
    
    loop_tree = C.loop_tree.split(0, 4)\
                  .swap_loops(1, 2)\
                  .swap_loops(2, 3)\ <<<<<<<<<<<<<<<<<<<<<<<< this seems problematic
                  .swap_loops(2, 1)\
                  .split(1, 16)\
                  .swap_loops(2, 3)\
                  .swap_loops(3, 4)\
                  .split(4, 4)\
                  .split(9, 16)
    
    C.set(loop_tree)
    with open("data/mm256.txt", "w") as f:
        f.write(C.ir.serialize())
    
    pdb.set_trace()
    loop_tree.eval()
    

    Reward Flops: reset

    /home/dejang/loop_tool_env/loop_tool_service/service_py/env/loop_tool_env.py(169)get_flops_loop_nest() -> with lt.Backend("loop_nest"): (Pdb) n /home/dejang/loop_tool_env/loop_tool_service/service_py/env/loop_tool_env.py(170)get_flops_loop_nest() -> mean_runtime = self.tensor.loop_tree.eval() (Pdb) n RuntimeError: o.second == 1 || (o.second % vector_size == 0) failed file: /home/dejang/loop_tool/extern/loop_nest/include/dabun/x86/loop_nest.hpp line: (2219) /home/dejang/loop_tool_env/loop_tool_service/service_py/env/loop_tool_env.py(170)get_flops_loop_nest()

    opened by dejangrubisic 1
  • Assertion Failures in README.md Code

    Assertion Failures in README.md Code

    I tried to run the code from the README.md, but some of the assertions fail.

    Asserts that fail:

    • assert np.allclose(W.numpy(), np.sum(X.numpy() * Z.numpy()))
    • assert np.allclose(new_W.numpy(), np.sum(new_X.numpy() * new_Z.numpy()))

    When I print the variables, the values change. So could be related to that. I set np.random.seed(0) but that didn't help.

    Running:

    print(f"{W.numpy()=}, {np.sum(X.numpy() * Z.numpy())=}")
    print(f"{W.numpy()=}, {np.sum(X.numpy() * Z.numpy())=}")
    

    Before the first failed assert prints:

    W.numpy()=array(109.48644, dtype=float32), np.sum(X.numpy() * Z.numpy())=3.8519936e+17
    W.numpy()=array(3.8519936e+17, dtype=float32), np.sum(X.numpy() * Z.numpy())=-9.629987e+34
    /home/tcoard/w/fun/lin_alg/new.py:30: RuntimeWarning: overflow encountered in multiply
      assert np.allclose(W.numpy(), np.sum(X.numpy() * Z.numpy()))
    

    System details: numpy==1.22.2 loop-tool==0.1.0 CPU Architecture: x86_64

    Let me know if there are any other details that would be helpful.

    opened by tcoard 1
  • The video in the latest blog post is not showing correctly

    The video in the latest blog post is not showing correctly

    @bwasti Sorry for reaching out here, but I couldn't find another way to get in touch.

    It seems that all embedded mp4s are not getting previewed inline.

    opened by Omar-Elrefaei 1
  • Issues while installing loop_tool_py  in Mac m1

    Issues while installing loop_tool_py in Mac m1

    I couldn't install loop_tool_py in Apple Silicon

    pip3 install loop_tool_py
    Collecting loop_tool_py
      Using cached loop_tool_py-0.0.5.tar.gz (44 kB)
      Preparing metadata (setup.py) ... done
    Collecting pybind11
      Using cached pybind11-2.8.0-py2.py3-none-any.whl (207 kB)
    Building wheels for collected packages: loop-tool-py
      Building wheel for loop-tool-py (setup.py) ... error
      ERROR: Command errored out with exit status 1:
       command: /Users/reshinthadithyan/miniforge3/envs/compiler_gym/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py'"'"'; __file__='"'"'/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-wheel-tf10xzyn
           cwd: /private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/
      Complete output (45 lines):
      running bdist_wheel
      running build
      running build_ext
      CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
      CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
      CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
      -- Configuring incomplete, errors occurred!
      See also "/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/build/temp.macosx-11.0-arm64-3.9/CMakeFiles/CMakeOutput.log".
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py", line 105, in <module>
          setup(
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/site-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 299, in run
          self.run_command('build')
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build_ext.py", line 449, in build_extensions
          self._build_extensions_serial()
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial
          self.build_extension(ext)
        File "/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py", line 95, in build_extension
          subprocess.check_call(
        File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/build/lib.macosx-11.0-arm64-3.9/', '-DPYTHON_EXECUTABLE=/Users/reshinthadithyan/miniforge3/envs/compiler_gym/bin/python3.9', '-DCMAKE_BUILD_TYPE=Release', '-Dpybind11_DIR=/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/.eggs/pybind11-2.8.0-py3.9.egg/pybind11/share/cmake/pybind11', '-DCMAKE_BUILD_WITH_INSTALL_RPATH=TRUE', '[email protected]_path', '-GNinja']' returned non-zero exit status 1.
      ----------------------------------------
      ERROR: Failed building wheel for loop-tool-py
      Running setup.py clean for loop-tool-py
    Failed to build loop-tool-py
    Installing collected packages: pybind11, loop-tool-py
        Running setup.py install for loop-tool-py ... error
        ERROR: Command errored out with exit status 1:
         command: /Users/reshinthadithyan/miniforge3/envs/compiler_gym/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py'"'"'; __file__='"'"'/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-record-y6o8d6tp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/reshinthadithyan/miniforge3/envs/compiler_gym/include/python3.9/loop-tool-py
             cwd: /private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/
        Complete output (47 lines):
        running install
        running build
        running build_ext
        CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
        CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
        CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
        -- Configuring incomplete, errors occurred!
        See also "/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/build/temp.macosx-11.0-arm64-3.9/CMakeFiles/CMakeOutput.log".
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py", line 105, in <module>
            setup(
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/site-packages/setuptools/__init__.py", line 153, in setup
            return distutils.core.setup(**attrs)
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/core.py", line 148, in setup
            dist.run_commands()
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 966, in run_commands
            self.run_command(cmd)
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 985, in run_command
            cmd_obj.run()
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/site-packages/setuptools/command/install.py", line 61, in run
            return orig.install.run(self)
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/install.py", line 546, in run
            self.run_command('build')
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 985, in run_command
            cmd_obj.run()
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build.py", line 135, in run
            self.run_command(cmd_name)
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/dist.py", line 985, in run_command
            cmd_obj.run()
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
            _build_ext.run(self)
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build_ext.py", line 340, in run
            self.build_extensions()
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build_ext.py", line 449, in build_extensions
            self._build_extensions_serial()
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial
            self.build_extension(ext)
          File "/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py", line 95, in build_extension
            subprocess.check_call(
          File "/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/subprocess.py", line 373, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command '['cmake', '/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/build/lib.macosx-11.0-arm64-3.9/', '-DPYTHON_EXECUTABLE=/Users/reshinthadithyan/miniforge3/envs/compiler_gym/bin/python3.9', '-DCMAKE_BUILD_TYPE=Release', '-Dpybind11_DIR=/Users/reshinthadithyan/miniforge3/envs/compiler_gym/lib/python3.9/site-packages/pybind11/share/cmake/pybind11', '-DCMAKE_BUILD_WITH_INSTALL_RPATH=TRUE', '[email protected]_path', '-GNinja']' returned non-zero exit status 1.
        ----------------------------------------
    ERROR: Command errored out with exit status 1: /Users/reshinthadithyan/miniforge3/envs/compiler_gym/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py'"'"'; __file__='"'"'/private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-install-a37iha1m/loop-tool-py_85150e6f71984c46b08f1ad9c3e142b4/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/0s/v3y1q9xx1sg7n6bch4210qrr0000gn/T/pip-record-y6o8d6tp/install-record.txt --single-version-externally-managed --compile --install-headers /Users/reshinthadithyan/miniforge3/envs/compiler_gym/include/python3.9/loop-tool-py Check the logs for full command output.
    

    Footnote

    I was able to get the CPP binding working.

    opened by reshinthadithyan 1
  • is there any plan to open source backend loop-nest?

    is there any plan to open source backend loop-nest?

    Hi, I recently read the LoopStack paper and find it a great job. I found this loop_tool, which is the front-end. Is there any plan to open-source the backend loop-nest? I would like to add some new backends (some DLA(Deep Learning Accelerator)/NPU to LoopStack. Is loop-tool is enough or needs the backend loop-nest?

    Thanks Chunying

    opened by lyuchuny3 0
  • Bug with copy_input both loop_nest and loop_tool

    Bug with copy_input both loop_nest and loop_tool

    I was able to save a partially optimized loop_tree, so the agent just needs to label it properly

    This is how it looks like: for m_1677 in 64 : L0 Both LoopTool and LoopNest fail to generate code for copy input operation. Here is an example and reproducer:

    for n_1679 in 16 : L1 for k_1678 in 256 : L2 for m_1677' in 4 : L3 %5[m_1677, k_1678] <- copy(%0) for n_1679' in 4 : L5 for n_1679'' in 4 : L6 %6[k_1678, n_1679] <- copy(%1) for m_1677' in 4 : L8 for n_1679' in 4 : L9 for n_1679'' in 4 : L10 %2[m_1677, k_1678, n_1679] <- multiply(%5, %6) %3[m_1677, n_1679] <- add(%2) for m_1677' in 4 : L13 for n_1679 in 64 : L14 for n_1679' in 4 : L15 %4[m_1677, n_1679] <- write(%3)

    Reproducer:

    import loop_tool as lt
    import numpy as np
    import pdb
    def mm(A, B):
        s = lt.SymbolGenerator()
        C = A.to(s.m, s.k) * B.to(s.k, s.n)
        return C.sum(s.k)
    
    m, n, k = 256, 256, 256
    A = lt.Tensor(m, k).set(np.random.randn(m, k))
    B = lt.Tensor(k, n).set(np.random.randn(k, n))
    
    s = lt.SymbolGenerator()
    C = mm(A, B)
    
    loop_tree = C.loop_tree.split(0, 4)\
                  .swap_loops(1, 2)\
                  .swap_loops(2, 3)\
                  .swap_loops(2, 1)\
                  .split(1, 16)\
                  .swap_loops(2, 3)\
                  .swap_loops(3, 4)\
                  .copy_input(5, 0)\
                  .try_swap(5, 4)\
                  .split(5, 4)\
                  .copy_input(7, 1)\
                  .decrease_reuse(7)\
                  .decrease_reuse(7)\
                  .decrease_reuse(7)\
                  .split(14, 4)
    
    C.set(loop_tree)
    with open("data/mm256.txt", "w") as f:
        f.write(C.ir.serialize())
    
    pdb.set_trace()
    

    ** Loop Nest Fails on:

    with lt.Backend("loop_nest"):
                mean_runtime = self.tensor.loop_tree.eval()
    

    RuntimeError: assertion: fma_nest failed @ /home/dejang/loop_tool/src/backends/cpu/loop_nest.cpp:30

    ** Loop Tool Fails on: mean_runtime = self.tensor.loop_tree.eval() error assertion: 0 failed @ /Users/dejang/Desktop/work/loop_tool/src/backends/cpu/cpp.cpp:228 can't emit code for copy

    bug high_priority 
    opened by dejangrubisic 0
  • Bug with loop_tool

    Bug with loop_tool

    I was trying to reproduce python steps that leads to the configuration from here: Image

    def mm(A, B):
        s = lt.SymbolGenerator()
        C = A.to(s.m, s.k) * B.to(s.k, s.n)
        return C.sum(s.k)
    
    m, n, k = 256, 256, 256
    A = lt.Tensor(m, k).set(np.random.randn(m, k))
    B = lt.Tensor(k, n).set(np.random.randn(k, n))
    
    s = lt.SymbolGenerator()
    C = mm(A, B)
    
    loop_tree = C.loop_tree.split(0, 4)\
                  .swap_loops(1, 2)\
                  .swap_loops(2, 3)\
                  .swap_loops(2, 1)\
                  .split(1, 16)\
                  .swap_loops(2, 3)\
                  .swap_loops(3, 4)\
                  .copy_input(5, 0)\
                  .try_swap(5, 4)\
                  .split(5, 4)\
                  .copy_input(7, 1)\
                  .decrease_reuse(7)\
                  .decrease_reuse(7)\
                  .decrease_reuse(7)\
                  .split(14, 4)
    
    C.set(loop_tree)
    with open("data/mm256.txt", "w") as f:
        f.write(C.ir.serialize())
    
    pdb.set_trace()
    

    Which results in

    Image

    After I apply previous actions L14: for n_1679 in 64 : L14 should iterate over 16 not 64.

    bug 
    opened by dejangrubisic 0
  • Port script described in LoopStack paper directly to new codebase

    Port script described in LoopStack paper directly to new codebase

    Section 6.3 of the paper describes a scripted tuner that produces state of the art performance across a variety of workloads. The new codebase has not yet had that scripted tuner ported.

    opened by bwasti 0
  • Add support for PyTorch in LoopTool

    Add support for PyTorch in LoopTool

    Currently, NumPy is integrated through py::array_t (a pybind11 builtin). This allows users to create LoopTool native tensors from NumpPy NDArrays.

    https://github.com/facebookresearch/loop_tool/blob/main/src/frontends/python.cpp#L51-L69

    Notably, there is an optional "copy" flag that allows the user to instead the underlying NumPy data, saving on reads/writes.

    Both of these features can be adopted for PyTorch Tensors.

    opened by bwasti 0
Owner
Facebook Research
Facebook Research
📽 Highly optimized 2D|3D math library, also known as OpenGL Mathematics (glm) for `C

Highly optimized 2D|3D math library, also known as OpenGL Mathematics (glm) for `C`. cglm provides lot of utils to help math operations to be fast and quick to write. It is community friendly, feel free to bring any issues, bugs you faced.

Recep Aslantas 1.4k Aug 4, 2022
LibTomMath is a free open source portable number theoretic multiple-precision integer library written entirely in C.

libtommath This is the git repository for LibTomMath, a free open source portable number theoretic multiple-precision integer (MPI) library written en

libtom 523 Jul 30, 2022
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

OpenBLAS Travis CI: AppVeyor: Drone CI: Introduction OpenBLAS is an optimized BLAS (Basic Linear Algebra Subprograms) library based on GotoBLAS2 1.13

Zhang Xianyi 4.7k Aug 1, 2022
Thorin is a compiler intermediate representation.

Introduction Thorin is a compiler intermediate representation. Building git clone --recurse-submodules [email protected]:AnyDSL/thorin2.git cd thorin2 mk

null 13 Jul 20, 2022
The DirectX Shader Compiler project includes a compiler and related tools used to compile High-Level Shader Language (HLSL) programs into DirectX Intermediate Language (DXIL) representation

DirectX Shader Compiler The DirectX Shader Compiler project includes a compiler and related tools used to compile High-Level Shader Language (HLSL) pr

Microsoft 2.3k Aug 7, 2022
a language for fast, portable data-parallel computation

Halide Halide is a programming language designed to make it easier to write high-performance image and array processing code on modern machines. Halid

Halide 5k Aug 5, 2022
🌱Light and powerful C++ web framework for highly scalable and resource-efficient web application. It's zero-dependency and easy-portable.

Oat++ News Hey, meet the new oatpp version 1.2.5! See the changelog for details. Check out the new oatpp ORM - read more here. Oat++ is a modern Web F

Oat++ 5.6k Jul 31, 2022
🌱Light and powerful C++ web framework for highly scalable and resource-efficient web application. It's zero-dependency and easy-portable.

Oat++ News Hey, meet the new oatpp version 1.2.5! See the changelog for details. Check out the new oatpp ORM - read more here. Oat++ is a modern Web F

Oat++ 5.6k Jul 30, 2022
A 3D DNN-based Metric Semantic Dense Mapping pipeline and a Visual Inertial SLAM system

MSDM-SLAM This repository represnets a 3D DNN-based Metric Semantic Dense Mapping pipeline and a Visual Inertial SLAM system that can be run on a grou

ITMO Biomechatronics and Energy Efficient Robotics Laboratory 11 Jul 23, 2022
Thin C++-flavored wrappers for the CUDA Runtime API

cuda-api-wrappers: Thin C++-flavored wrappers for the CUDA runtime API Branch Build Status: Master | Development: nVIDIA's Runtime API for CUDA is int

Eyal Rozenberg 504 Jul 30, 2022
the thin c++ game engine

CI Community Support toy is a thin and modular c++ game engine. it aims to provide the thinnest and simplest stack of technology for making games dire

Hugo Amnov 1.5k Aug 2, 2022
Low Level Graphics Library (LLGL) is a thin abstraction layer for the modern graphics APIs OpenGL, Direct3D, Vulkan, and Metal

Low Level Graphics Library (LLGL) Documentation NOTE: This repository receives bug fixes only, but no major updates. Pull requests may still be accept

Lukas Hermanns 1.4k Aug 1, 2022
Enabling services on your device 70 Jul 31, 2022
A Script to thin Universal Apps on macOS quickly

UBThinner A Script to thin Universal Apps on macOS quickly. It traverses through the given folder recursively, identifies any universal binaries and t

Arm 2 Dec 26, 2021
This is a thin c-api wrapper programmatically generated for the excellent C++ immediate mode gui Dear ImGui.

cimgui This is a thin c-api wrapper programmatically generated for the excellent C++ immediate mode gui Dear ImGui. All imgui.h functions are programm

Victor Bombi 22 Jul 5, 2021
Real-Time Intermediate Flow Estimation for Video Frame Interpolation filter for VapourSynth

Description RIFE filter for VapourSynth, based on rife-ncnn-vulkan. Usage rife.RIFE(clip clip[, int model=0, int gpu_id=auto, int gpu_thread=2, bint t

Home Of VapourSynth Evolution 57 Aug 1, 2022
Qt 6 Core Intermediate with C++ on Udemy

Qt 6 Core Intermediate with C++ on Udemy

Bryan Cairns 41 Jul 24, 2022
All type of codes(Beginner, Intermediate and Advance) feel free to add your codes to this repo !

Hello everyone, Welcome to Basic_codes ?? All type of codes (Beginner, Intermediate and Advance) feel free to add your codes to this repo! ?? ?? You w

Nikhil Verma 2 Oct 15, 2021
This Repository Aims To Help Beginners with their first successful pull request and Know How to do open source contributions Also For Intermediate and Advance level contributors as well.

Hacktoberfest_2021 This Repository Aims To Help Beginners with their first successful pull request and Know How to do open source contributions Also F

Rishu Rajan 15 Jan 9, 2022
LibreSSL Portable itself. This includes the build scaffold and compatibility layer that builds portable LibreSSL from the OpenBSD source code.

LibreSSL Portable itself. This includes the build scaffold and compatibility layer that builds portable LibreSSL from the OpenBSD source code.

OpenBSD LibreSSL Portable 1.1k Jul 29, 2022
Faiss is a library for efficient similarity search and clustering of dense vectors.

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy.

Facebook Research 17.6k Aug 7, 2022
A modern, C++20-native, single-file header-only dense 2D matrix library.

A modern, C++20-native, single-file header-only dense 2D matrix library. Contents Example usage creating matrices basic operations row, col, size, sha

feng wang 49 Aug 1, 2022
Generate dense random crosswords

CrosswordGenerator crossword_gen is a program written in C allowing to generate random crosswords from a list of words. The following parameters are e

null 2 Oct 31, 2021
Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

Direct LiDAR Odometry: Fast Localization with Dense Point Clouds DLO is a lightweight and computationally-efficient frontend LiDAR odometry solution w

VECTR at UCLA 288 Aug 4, 2022
An FPGA accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM).

Sextans Sextans is an accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM). One exciting feature is that we only need to p

linghao.song 23 Jul 22, 2022
Tandem - [CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo Lukas Koestler1*    Nan Yang1,2*,†    Niclas Zeller2,3    Daniel Cremers1

TUM Computer Vision Group 655 Aug 3, 2022
Dense Depth Estimation from Multiple 360-degree Images Using Virtual Depth

Dense Depth Estimation from Multiple 360-degree Images Using Virtual Depth [Project] [Paper] [arXiv] This is the official code of our APIN 2022 paper

null 6 Jun 11, 2022
Radar SLAM: yeti radar odometry + ScanContext-based Loop Closing

navtech-radar-slam Radar SLAM: yeti radar odometry + ScanContext-based Loop Closing What is Navtech-Radar-SLAM? In this repository, a (minimal) SLAM p

Giseop Kim 74 Jul 8, 2022
Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as task graphs that are scheduled concurrently and asynchronously on both CPUs and GPUs.

Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as tasks in a graph structure, where edges represent task dependencies

null 23 May 11, 2022