BladeDISC - BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

Overview

BladeDISC Introduction

Overview

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads, which is one of the key components of Alibaba's PAI-Blade. BladeDISC provides general, transparent, and ease of use performance optimization for TensorFlow/PyTorch workloads on GPGPU and CPU backends. The architecture natively supports dynamic shape workloads, with many considerations in the performance of both static and dynamic shape scenarios. It also supports multiple and flexible deployment solutions, including both Plugin Mode inside TensorFlow/PyTorch runtime, and Standalone Mode for AOT standalone execution. The project is based on MLIR and highly related with mlir-hlo project.

Refer to our website for more information, including the setup tutorial, developer guide, demo examples and documents for developers.

Features and Roadmap

Frontend Framework Support Matrix

TensorFlow [1] PyTorch [2]
Inference Yes Yes
Training Yes [3] Ongoing

[1] TensorFlow 1.12, 1.15, 2.4 & 2.5 are supported and fully verified. For other versions some slight works on adaptation might be needed.

[2] 1.6.0 <= PyTorch version < 1.9.0 has been fully verified.

[3] Although supported, there's much room for improvement on Op coverage for training workloads.

Backend Support Matrix

Memory Intensive Part Compute Intensive Part End-to-End Usability
Nvidia GPU Yes Yes Yes
AMD GPU Ongoing Ongoing No
Hygon DCU Yes Yes Yes
X86 Yes Not open-sourced yet [1] No

[1] The compute-intensive part of the X86 backend is already supported on the internal version. The code decoupling is ongoing and will be open-sourced soon, same for the end-to-end usability.

Deployment Solutions

  • Plugin Mode - BladeDISC works as a plugin of TensorFlow or PyTorch. Only the supported Ops are clustered and compiled, and the unsupported ones will be executed by the original TensorFlow or PyTorch runtime. We recommend this mode to most of the users for its transparency and ease of use.

  • Standalone Mode - In Standalone mode, the input workload will be compiled into a binary that can be executed by it self, aka, does not rely on a TensorFlow or PyTorch runtime. In this mode all ops must be supported.

Numbers of Typical Workloads

By evaluating BladeDISC using a set of typical machine learning workloads for production purpose, DISC shows up to 3x speedup compared with TensorFlow/PyTorch.

Numbers

Advantage in Dynamic Shape Workloads

Specifically, for the BERT large inference on T4 we provide in the examples, static compiler optimization (XLA) shows severe performance degradation due to its compilation overhead, while DISC shows a 1.75x speedup.

TensorFlow XLA DISC
1.78 s 41.69s 1.02s
1X 1.75X

API QuickView

For TensorFlow Users

Only two lines of code are needed on native Tensorflow program as the following:

import numpy as np
import tensorflow as tf

## enable BladeDISC on TensorFlow program
import tensorflow_blade_disc as disc
disc.enable()

## construct TensorFlow Graph and run it
g = tf.Graph()
with g.as_default():
    ...
    with tf.session as sess:
        sess.run(...)

For more information, please refer to QuickStart for TensorFlow Users

For PyTorch Users

PyTorch users only need the following few lines of code to enable BladeDISC:

import torch_blade
# construct PyTorch Module
class MyModule(nn.Module):
    ...

module = MyModule()

with torch.no_grad():
    # blade_module is the optimized module by BladeDISC
    blade_module = torch_blade.optimize(module, allow_tracing=True, model_inputs=(x, y))

# run the optimized module
blade_module(x, y)

torch_blade.optimize accepts an nn.Module object and outputs the optimized module. For more information, please refer to Quickstart for PyTorch Users.

Setup and Examples

Publications

Tutorials and Documents for Developers

How to Contribute

FAQ

Roadmap with mlir-hlo Project

BladeDISC is in a close relationship with mlir-hlo project. Part of the building blocks, including the MHLO Op definitions, TF to MHLO conversions, and some general purpose passes have been upstreamed to mlir-hlo repository. We'll continue to work in a close cooperative relationship with mlir-hlo project in the longer term.

Contact Us

DingTalk

Issues
  • Install from source without docker

    Install from source without docker

    Hi, we are tring to use spack to build and install BladeDISC without docker, however, we are facing some problems.

    1. Spack installs tensorflow from source, and protobuf is installed separately, so the detection logic in FindTensorflow.cmake is broken.
    2. Bazel version in the bundled tensorflow is >4.2.2, while the supported tf2.4/2.5 requires bazel3.7.2

    Can you give us some instrution on how to build BladeDISC outside docker?

    opened by asesidaa 12
  • Support fusing

    Support fusing "isSplat" constants

    The problem is observed in swin-transformer, when pytorch is doing amp.

    "lmhlo.constant"(%275) {value = dense<0.000000e+00> : tensor<64x784x768xf32>} : (memref<64x784x768xf32, "gpu">) -> ()
    
    …
    
    "lmhlo.fusion"() ( {
          …
          "lmhlo.multiply"(%1741, %275, %1742) {disc.device = "gpu"} : (memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">) -> ()
          …
          "lmhlo.terminator"() : () -> ()
     }) {disc.device = "gpu", disc.fusion.name = "main_kLoop_reshape__37_1_2" : () -> ()
    
    "lmhlo.fusion"() ( {
          …
          "lmhlo.multiply"(%1888, %275, %1889) {disc.device = "gpu"} : (memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">) -> ()
          …
          "lmhlo.terminator"() : () -> ()
     }) {disc.device = "gpu", disc.fusion.name = "main_kLoop_reshape__37_1_3" : () -> ()
    

    In general, the "splat" constants outside of a fusion kernel might cause severe performance issues. In swin-transformer, the performance degradation can be very severe. Please be aware that there might be multiple kernels consuming the splat constant.

    Solution 1: mark "splat" constant as fusible in fusion pass; and add an additional fusion stage that allows to duplicate the producer according to some forms of rules, like the FusionMerger in XLA.

    Solution 2: add an additional FuseSplatConstPass after the regular fusion pass that specifically duplicate and fuse the Splat const into fusion kernels.

    Both solutions need also to support the fusion codegen for "splat" constants. Solution 2 can be regarded as a shrink version of solution 1, which can not handle such cases:

    "lmhlo.constant"(%272) {value = dense<0.000000e+00> : tensor<64x784x768xf32>} : (memref<64x784x768xf32, "gpu">) -> ()
    "lmhlo.add"(%272, %273, %275)  {disc.device = "gpu"} : (memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">) -> ()
    
    …
    
    "lmhlo.fusion"() ( {
          …
          "lmhlo.multiply"(%1741, %275, %1742) {disc.device = "gpu"} : (memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">) -> ()
          …
          "lmhlo.terminator"() : () -> ()
     }) {disc.device = "gpu", disc.fusion.name = "main_kLoop_reshape__37_1_2" : () -> ()
    
    "lmhlo.fusion"() ( {
          …
          "lmhlo.multiply"(%1888, %275, %1889) {disc.device = "gpu"} : (memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">, memref<64x784x768xf32, "gpu">) -> ()
          …
          "lmhlo.terminator"() : () -> ()
     }) {disc.device = "gpu", disc.fusion.name = "main_kLoop_reshape__37_1_3" : () -> ()
    
    opened by linearhit 10
  • [Debug]How to verify compiled cluster results against original tf subgraph?

    [Debug]How to verify compiled cluster results against original tf subgraph?

    Hi, there! I am now trying to use BladeDISC to optimize my tensorflow model, I have successfully get my model compiled by DISC, however, when checking the optimized model output against original tf version, I found there is a big difference. So i think there might be something wrong with some of the disc-compiled clusters which i am trying to figure out. After I spent some time diving into the code, i found a setting FLAG called tao_launch_enable_check. According to the code comments, when this flag is ON, it should check the compiled cluster result against tf original subgraph. But i could not find the implementation for this flag. So i am wandering if this logic has been implemented in the codebase. If not, could you please give some advice on implementing it? I would be glad to contribute to this feature if it is with my ablility. Thanks a lot! @linearhit

    opened by EddieBurning 6
  • Support gemm pre-packing via onednn/acl on aarch64

    Support gemm pre-packing via onednn/acl on aarch64

    ACL NEGEMM itself supports weight pre-packing by default. Without this PR, we create a new dnnl::matmul primitive for each matmul call, which in turn creates a new NEGEMM object inside the primitive on aarch64. We can not re-use the pre-packed weight from previous matmul call even for the same matmul configuration since the NEGEMM is destroyed after each matmul call. This RP tries to cache the matmul primitive across matmul calls and thus can make use of the underlying pre-packed weight for compatible matmul calls.

    opened by wyzero 5
  • [WIP] [to split into smaller PRs] A large set of GPU codegen optimization.

    [WIP] [to split into smaller PRs] A large set of GPU codegen optimization.

    1. Loop unroll and interleave.
    2. Row-reduce ops interleave in kStitch fusion.
    3. Schedule selection optimization adapts to target GPU architecture.
    4. Flags to enable CUDA fast-math.
    5. Other detailed refine of codegen schedule for kStitch fusion.
    enhancement 
    opened by JamesTheZ 4
  • Unify TorchBlade and TorchDISC

    Unify TorchBlade and TorchDISC

    The project TorchDISC for training is under heavy development. It will reuse a lot of converter passes existing in TorchBlade. Unify TorchBlade and TorchDISC codes and design will benefit maintain of the project in a long term. This issue is created to track the related items.

    training TorchDisc 
    opened by fortianyou 4
  • [bugfix] make Mnist training converge to baseline

    [bugfix] make Mnist training converge to baseline

    fixed #215 fixed #212

    This PR makes Mnist converge to baseline with TorchDisc, and also adds a unit test to make sure the baseline model (Mnist) always correct in the future.

    image

    opened by Yancey1989 4
  • [Tensorflow] [to #195] package tensorflow wheel and deploy a runtime docker

    [Tensorflow] [to #195] package tensorflow wheel and deploy a runtime docker

    @qiuxiafei @Yancey1989 @linearhit I think we need to discuss whether we need to maintain a separated runtime docker image for tensorflow blade or should install tensorflow-blade in our current tf runtime images. Since currently the ci action for tensorflow-blade does not build disc for tensorflow, maybe a separated docker image is good for now.

    opened by Orion34-lanbo 4
  • [PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC

    [PoC] TorchDisc: accelerating PyTorch training via LTC + BladeDISC

    Background

    BladeDISC is an end-to-end compiler that supports dynamic shape features, and dynamic shape is widely used on the training scene, this issue descript how to improve PyTorch training performance with DISC based on the LazyTensorCore(LTC) mechanism.

    feature branch: https://github.com/alibaba/BladeDISC/tree/features/torch_disc_devel

    Design Overview

    image

    1. According to LTC, a MARK API should be called manually at the end of each iteration to sync and execute a Graph on a physical device.
    2. Lowering To TorchScript, LTC uses TorchScript as the backend engine, ref TSBackendImpl, we can use it lower Lazy IR to TorchScript IR.
    3. Cluster DISC SubGraph,
    4. DISC Compilation Stage a. mhlo conversation, DISC uses MLIR::mhloas the front-end, we should convert TorchScript IR to mhlo before compilation. b. compiling to an executable program, call DISC entry function to compile mhlo IR to an executable file (a dynamic library file).
      c. disc execution, call DISC RAL to execute the executable program with input Tensors.
    5. TorchScript Execution, finally call torch::jit::GraphExecutorto execute the TorchScript IR and return the result Tensors.

    Implement and TODO Actions

    To implement the above features, we should build a Pybind library _torch_disc.so to expose step_mark API with some important C++ functions, the TODO actions as the following:

    • [x] setup a building environment to build _torch_disc.so with Torch LTC, Mhlo Builder, and DISC. #158
    • [x] cluster DISC nodes into sub-graph (maybe implement cluster algorithms with a fake function). #173
    • [x] compilation DISC sub-graph and register the DISC engine #188
    • [x] demonstration TorchDISC with MNIST training. #207 #230

    Reference

    1. PyTorch LazyTensor branch: https://github.com/pytorch/pytorch/tree/lazy_tensor_staging/lazy_tensor_core
    2. PyTorch/XLA backend example: https://github.com/pytorch/xla/tree/asuhan/xla_ltc_plugin
    feature training 
    opened by Yancey1989 4
  • Is there a version of TF15 that can be used out of box?

    Is there a version of TF15 that can be used out of box?

    Hi, there! After reading the paper and run the demo you provided, I am excited by the speedup disc has achieved compared to other framework on various dynamic deep learning work loads. Great Work i must say ! The easy usage API and impressive perf enhancement make us just want to give it a try. But as far as i know, the demo provided is based on tf2.4, but our production environment is tf15. I am currently trying to adapt tf15 to the disc compiler and have encountered lots of compilation issues due to uncompatiable building tools(tf2.4 use bazel 4.0 while tf15 use bazel 0.24.1). According to the Doc, tf15 is also supported, so my question is, is there an out of box version tf15 that we can use directly? If the answer is no, is there any document guide on the adaption can be made?
    Any advice would be much appreciated!

    opened by EddieBurning 4
  • Legacy install failure

    Legacy install failure

    Hi, when trying to build from source I encounter the following issue. This is in the step when running cd pytorch_blade && bash ./ci_build/build_pytorch_blade.sh:

    Building wheels for collected packages: onnx Building wheel for onnx (setup.py) ... error error: subprocess-exited-with-error

    × python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [229 lines of output] fatal: not a git repository (or any of the parent directories): .git /usr/lib/python3/dist-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 0.1.43ubuntu1 is an invalid version and will not be supported in a future release warnings.warn( /usr/lib/python3/dist-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release warnings.warn( /usr/lib/python3/dist-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer. warnings.warn( running bdist_wheel running build running build_py running create_version running cmake_build Using cmake args: ['/home/metcon/.local/bin/cmake', '-DPYTHON_INCLUDE_DIR=/usr/include/python3.10', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DBUILD_ONNX_PYTHON=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DONNX_NAMESPACE=onnx', '-DPY_EXT_SUFFIX=.cpython-310-x86_64-linux-gnu.so', '-DCMAKE_BUILD_TYPE=Release', '-DONNX_ML=1', '/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273'] -- The C compiler identification is GNU 11.2.0 -- The CXX compiler identification is GNU 11.2.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PythonInterp: /usr/bin/python3 (found version "3.10.4") -- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.10.so (found version "3.10.4") -- Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.a (found version "3.12.4") Generated: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-ml.proto Generated: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-operators-ml.proto Generated: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-data.proto CMake Warning at CMakeLists.txt:451 (find_package): By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "pybind11", but CMake did not find one.

        Could not find a package configuration file provided by "pybind11"
        (requested version 2.2) with any of the following names:
      
          pybind11Config.cmake
          pybind11-config.cmake
      
        Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
        "pybind11_DIR" to a directory containing one of the above files.  If
        "pybind11" provides a separate development package or SDK, be sure it has
        been installed.
      
      
      --
      -- ******** Summary ********
      --   CMake version             : 3.22.4
      --   CMake command             : /home/metcon/.local/lib/python3.10/site-packages/cmake/data/bin/cmake
      --   System                    : Linux
      --   C++ compiler              : /usr/bin/c++
      --   C++ compiler version      : 11.2.0
      --   CXX flags                 :  -Wnon-virtual-dtor
      --   Build type                : Release
      --   Compile definitions       : __STDC_FORMAT_MACROS
      --   CMAKE_PREFIX_PATH         :
      --   CMAKE_INSTALL_PREFIX      : /usr/local
      --   CMAKE_MODULE_PATH         :
      --
      --   ONNX version              : 1.11.0
      --   ONNX NAMESPACE            : onnx
      --   ONNX_USE_LITE_PROTO       : OFF
      --   USE_PROTOBUF_SHARED_LIBS  : OFF
      --   Protobuf_USE_STATIC_LIBS  : ON
      --   ONNX_DISABLE_EXCEPTIONS   : OFF
      --   ONNX_WERROR               : OFF
      --   ONNX_BUILD_TESTS          : OFF
      --   ONNX_BUILD_BENCHMARKS     : OFF
      --   ONNXIFI_DUMMY_BACKEND     : OFF
      --   ONNXIFI_ENABLE_EXT        : OFF
      --
      --   Protobuf compiler         : /usr/bin/protoc
      --   Protobuf includes         : /usr/include
      --   Protobuf libraries        : /usr/lib/x86_64-linux-gnu/libprotobuf.a
      --   BUILD_ONNX_PYTHON         : ON
      --     Python version        :
      --     Python executable     : /usr/bin/python3
      --     Python includes       : /usr/include/python3.10
      -- Configuring done
      -- Generating done
      -- Build files have been written to: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build
      [  1%] Running gen_proto.py on onnx/onnx.in.proto
      [  2%] Building C object CMakeFiles/onnxifi_dummy.dir/onnx/onnxifi_dummy.c.o
      [  4%] Building C object CMakeFiles/onnxifi_loader.dir/onnx/onnxifi_loader.c.o
      Processing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/onnx.in.proto
      Writing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-ml.proto
      Writing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-ml.proto3
      generating /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx_pb.py
      [  5%] Running C++ protocol buffer compiler on /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-ml.proto
      [  7%] Linking C static library libonnxifi_loader.a
      [  8%] Linking C shared library libonnxifi_dummy.so
      [  8%] Built target onnxifi_loader
      [  9%] Building C object CMakeFiles/onnxifi_wrapper.dir/onnx/onnxifi_wrapper.c.o
      [  9%] Built target onnxifi_dummy
      [ 11%] Linking C shared module libonnxifi.so
      Writing mypy to onnx/onnx_ml_pb2.pyi
      [ 11%] Built target onnxifi_wrapper
      [ 11%] Built target gen_onnx_proto
      [ 12%] Running gen_proto.py on onnx/onnx-operators.in.proto
      [ 14%] Running gen_proto.py on onnx/onnx-data.in.proto
      Processing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/onnx-data.in.proto
      Writing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-data.proto
      Writing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-data.proto3
      generating /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx_data_pb.py
      [ 15%] Running C++ protocol buffer compiler on /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-data.proto
      Processing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/onnx-operators.in.proto
      Writing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-operators-ml.proto
      Writing /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-operators-ml.proto3
      generating /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx_operators_pb.py
      [ 16%] Running C++ protocol buffer compiler on /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-operators-ml.proto
      Writing mypy to onnx/onnx_data_pb2.pyi
      Writing mypy to onnx/onnx_operators_ml_pb2.pyi
      [ 16%] Built target gen_onnx_data_proto
      [ 16%] Built target gen_onnx_operators_proto
      [ 18%] Building CXX object CMakeFiles/onnx_proto.dir/onnx/onnx-ml.pb.cc.o
      [ 19%] Building CXX object CMakeFiles/onnx_proto.dir/onnx/onnx-operators-ml.pb.cc.o
      [ 21%] Building CXX object CMakeFiles/onnx_proto.dir/onnx/onnx-data.pb.cc.o
      [ 22%] Linking CXX static library libonnx_proto.a
      [ 30%] Built target onnx_proto
      [ 32%] Building CXX object CMakeFiles/onnx.dir/onnx/checker.cc.o
      [ 33%] Building CXX object CMakeFiles/onnx.dir/onnx/common/assertions.cc.o
      [ 36%] Building CXX object CMakeFiles/onnx.dir/onnx/common/path.cc.o
      [ 39%] Building CXX object CMakeFiles/onnx.dir/onnx/common/model_helpers.cc.o
      [ 36%] Building CXX object CMakeFiles/onnx.dir/onnx/common/interned_strings.cc.o
      [ 39%] Building CXX object CMakeFiles/onnx.dir/onnx/common/ir_pb_converter.cc.o
      [ 42%] Building CXX object CMakeFiles/onnx.dir/onnx/common/status.cc.o
      [ 42%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/attr_proto_util.cc.o
      [ 43%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/controlflow/old.cc.o
      [ 45%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/function.cc.o
      [ 46%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/generator/defs.cc.o
      [ 47%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/controlflow/defs.cc.o
      [ 49%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/data_type_utils.cc.o
      [ 50%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/generator/old.cc.o
      [ 52%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/logical/defs.cc.o
      [ 53%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/logical/old.cc.o
      [ 54%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/math/defs.cc.o
      [ 56%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/math/old.cc.o
      [ 57%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/nn/defs.cc.o
      [ 59%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/nn/old.cc.o
      [ 60%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/object_detection/defs.cc.o
      [ 61%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/object_detection/old.cc.o
      [ 63%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/optional/defs.cc.o
      [ 64%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/parser.cc.o
      [ 66%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/printer.cc.o
      [ 67%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/quantization/defs.cc.o
      [ 69%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/quantization/old.cc.o
      [ 70%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/reduction/defs.cc.o
      In file included from /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/common/ir_pb_converter.h:10,
                       from /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/common/ir_pb_converter.cc:8:
      In constructor ‘onnx::Dimension::Dimension(onnx::Dimension&&)’,
          inlined from ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = onnx::Dimension; _Args = {onnx::Dimension}; _Tp = onnx::Dimension]’ at /usr/include/c++/11/ext/new_allocator.h:162:4,
          inlined from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = onnx::Dimension; _Args = {onnx::Dimension}; _Tp = onnx::Dimension]’ at /usr/include/c++/11/bits/alloc_traits.h:516:17,
          inlined from ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {onnx::Dimension}; _Tp = onnx::Dimension; _Alloc = std::allocator<onnx::Dimension>]’ at /usr/include/c++/11/bits/vector.tcc:115:30,
          inlined from ‘void std::vector<_Tp, _Alloc>::push_back(std::vector<_Tp, _Alloc>::value_type&&) [with _Tp = onnx::Dimension; _Alloc = std::allocator<onnx::Dimension>]’ at /usr/include/c++/11/bits/stl_vector.h:1204:21,
          inlined from ‘std::vector<onnx::Dimension> onnx::tensorShapeProtoToDimensions(const onnx::TensorShapeProto&)’ at /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/common/ir_pb_converter.cc:201:21:
      /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/common/ir.h:74:8: warning: ‘<unnamed>.onnx::Dimension::dim’ may be used uninitialized [-Wmaybe-uninitialized]
         74 | struct Dimension final {
            |        ^~~~~~~~~
      /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/common/ir_pb_converter.cc: In function ‘std::vector<onnx::Dimension> onnx::tensorShapeProtoToDimensions(const onnx::TensorShapeProto&)’:
      /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/onnx/common/ir_pb_converter.cc:201:32: note: ‘<anonymous>’ declared here
        201 |       dims.push_back(Dimension());
            |                                ^
      [ 71%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/reduction/old.cc.o
      [ 73%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/rnn/defs.cc.o
      [ 74%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/rnn/old.cc.o
      [ 76%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/schema.cc.o
      [ 77%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/sequence/defs.cc.o
      [ 78%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/shape_inference.cc.o
      [ 80%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/tensor/defs.cc.o
      [ 81%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/tensor/old.cc.o
      [ 83%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/tensor/utils.cc.o
      [ 84%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/tensor_proto_util.cc.o
      [ 85%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/tensor_util.cc.o
      [ 87%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/traditionalml/defs.cc.o
      [ 88%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/traditionalml/old.cc.o
      [ 90%] Building CXX object CMakeFiles/onnx.dir/onnx/defs/training/defs.cc.o
      [ 91%] Building CXX object CMakeFiles/onnx.dir/onnx/onnxifi_utils.cc.o
      [ 92%] Building CXX object CMakeFiles/onnx.dir/onnx/shape_inference/implementation.cc.o
      [ 94%] Building CXX object CMakeFiles/onnx.dir/onnx/version_converter/convert.cc.o
      [ 95%] Building CXX object CMakeFiles/onnx.dir/onnx/version_converter/helper.cc.o
      [ 97%] Linking CXX static library libonnx.a
      [ 97%] Built target onnx
      [ 98%] Building CXX object CMakeFiles/onnx_cpp2py_export.dir/onnx/cpp2py_export.cc.o
      [100%] Linking CXX shared module onnx_cpp2py_export.cpython-310-x86_64-linux-gnu.so
      /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libprotobuf.a(arena.o): relocation R_X86_64_TPOFF32 against hidden symbol `_ZN6google8protobuf8internal9ArenaImpl13thread_cache_E' can not be used when making a shared object
      /usr/bin/ld: failed to set dynamic section sizes: bad value
      collect2: error: ld returned 1 exit status
      gmake[2]: *** [CMakeFiles/onnx_cpp2py_export.dir/build.make:101: onnx_cpp2py_export.cpython-310-x86_64-linux-gnu.so] Error 1
      gmake[1]: *** [CMakeFiles/Makefile2:229: CMakeFiles/onnx_cpp2py_export.dir/all] Error 2
      gmake: *** [Makefile:136: all] Error 2
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/setup.py", line 336, in <module>
          setuptools.setup(
        File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
          self.run_command('build')
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/setup.py", line 232, in run
          self.run_command('cmake_build')
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/setup.py", line 226, in run
          subprocess.check_call(build_args)
        File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/home/metcon/.local/bin/cmake', '--build', '.', '--', '-j', '16']' returned non-zero exit status 2.
      [end of output]
    

    note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for onnx Running setup.py clean for onnx Failed to build onnx Installing collected packages: onnx Running setup.py install for onnx ... error error: subprocess-exited-with-error

    × Running setup.py install for onnx did not run successfully. │ exit code: 1 ╰─> [130 lines of output] fatal: not a git repository (or any of the parent directories): .git /usr/lib/python3/dist-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 0.1.43ubuntu1 is an invalid version and will not be supported in a future release warnings.warn( /usr/lib/python3/dist-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release warnings.warn( /usr/lib/python3/dist-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer. warnings.warn( running install /usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( running build running build_py running create_version running cmake_build Using cmake args: ['/home/metcon/.local/bin/cmake', '-DPYTHON_INCLUDE_DIR=/usr/include/python3.10', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DBUILD_ONNX_PYTHON=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DONNX_NAMESPACE=onnx', '-DPY_EXT_SUFFIX=.cpython-310-x86_64-linux-gnu.so', '-DCMAKE_BUILD_TYPE=Release', '-DONNX_ML=1', '/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273'] Generated: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-ml.proto Generated: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-operators-ml.proto Generated: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build/onnx/onnx-data.proto CMake Warning at CMakeLists.txt:451 (find_package): By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "pybind11", but CMake did not find one.

        Could not find a package configuration file provided by "pybind11"
        (requested version 2.2) with any of the following names:
      
          pybind11Config.cmake
          pybind11-config.cmake
      
        Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
        "pybind11_DIR" to a directory containing one of the above files.  If
        "pybind11" provides a separate development package or SDK, be sure it has
        been installed.
      
      
      --
      -- ******** Summary ********
      --   CMake version             : 3.22.4
      --   CMake command             : /home/metcon/.local/lib/python3.10/site-packages/cmake/data/bin/cmake
      --   System                    : Linux
      --   C++ compiler              : /usr/bin/c++
      --   C++ compiler version      : 11.2.0
      --   CXX flags                 :  -Wnon-virtual-dtor
      --   Build type                : Release
      --   Compile definitions       : __STDC_FORMAT_MACROS
      --   CMAKE_PREFIX_PATH         :
      --   CMAKE_INSTALL_PREFIX      : /usr/local
      --   CMAKE_MODULE_PATH         :
      --
      --   ONNX version              : 1.11.0
      --   ONNX NAMESPACE            : onnx
      --   ONNX_USE_LITE_PROTO       : OFF
      --   USE_PROTOBUF_SHARED_LIBS  : OFF
      --   Protobuf_USE_STATIC_LIBS  : ON
      --   ONNX_DISABLE_EXCEPTIONS   : OFF
      --   ONNX_WERROR               : OFF
      --   ONNX_BUILD_TESTS          : OFF
      --   ONNX_BUILD_BENCHMARKS     : OFF
      --   ONNXIFI_DUMMY_BACKEND     : OFF
      --   ONNXIFI_ENABLE_EXT        : OFF
      --
      --   Protobuf compiler         : /usr/bin/protoc
      --   Protobuf includes         : /usr/include
      --   Protobuf libraries        : /usr/lib/x86_64-linux-gnu/libprotobuf.a
      --   BUILD_ONNX_PYTHON         : ON
      --     Python version        :
      --     Python executable     : /usr/bin/python3
      --     Python includes       : /usr/include/python3.10
      -- Configuring done
      -- Generating done
      -- Build files have been written to: /tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/.setuptools-cmake-build
      [  2%] Built target gen_onnx_proto
      Consolidate compiler generated dependencies of target onnxifi_loader
      Consolidate compiler generated dependencies of target onnxifi_dummy
      [  5%] Built target onnxifi_loader
      [  9%] Built target gen_onnx_operators_proto
      [ 11%] Built target onnxifi_dummy
      [ 14%] Built target gen_onnx_data_proto
      Consolidate compiler generated dependencies of target onnxifi_wrapper
      [ 16%] Built target onnxifi_wrapper
      Consolidate compiler generated dependencies of target onnx_proto
      [ 30%] Built target onnx_proto
      Consolidate compiler generated dependencies of target onnx
      [ 97%] Built target onnx
      Consolidate compiler generated dependencies of target onnx_cpp2py_export
      [ 98%] Linking CXX shared module onnx_cpp2py_export.cpython-310-x86_64-linux-gnu.so
      /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libprotobuf.a(arena.o): relocation R_X86_64_TPOFF32 against hidden symbol `_ZN6google8protobuf8internal9ArenaImpl13thread_cache_E' can not be used when making a shared object
      /usr/bin/ld: failed to set dynamic section sizes: bad value
      collect2: error: ld returned 1 exit status
      gmake[2]: *** [CMakeFiles/onnx_cpp2py_export.dir/build.make:101: onnx_cpp2py_export.cpython-310-x86_64-linux-gnu.so] Error 1
      gmake[1]: *** [CMakeFiles/Makefile2:229: CMakeFiles/onnx_cpp2py_export.dir/all] Error 2
      gmake: *** [Makefile:136: all] Error 2
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/setup.py", line 336, in <module>
          setuptools.setup(
        File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 68, in run
          return orig.install.run(self)
        File "/usr/lib/python3.10/distutils/command/install.py", line 619, in run
          self.run_command('build')
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/setup.py", line 232, in run
          self.run_command('cmake_build')
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-7ierxixb/onnx_92eaee845b1b4effb635515dd28b5273/setup.py", line 226, in run
          subprocess.check_call(build_args)
        File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/home/metcon/.local/bin/cmake', '--build', '.', '--', '-j', '16']' returned non-zero exit status 2.
      [end of output]
    

    note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

    opened by justlike-prog 3
  • [code refactor] merge torch-disc and torch-blade CI job

    [code refactor] merge torch-disc and torch-blade CI job

    This PR removed the independent torch-disc CI and added a new job named TORCH-LATEST which tests on the latest codebase on PyTorch, we may update the PyTorch commit frequently.

    opened by Yancey1989 0
  • [TorchMlir] PyTorch version compatible

    [TorchMlir] PyTorch version compatible

    TorchMlir follows the latest PyTorch version, but BladeDISC CI runs some older versions, e.g. 1.7.1, 1.8.1, and 1.10, the ATen symbol may be incompatiable between the latest Pytorch and older versions, e.g. https://github.com/pytorch/pytorch/blob/v1.12.0-rc6/aten/src/ATen/native/native_functions.yaml#L3989 and https://github.com/pytorch/pytorch/blob/v1.8.1/aten/src/ATen/native/native_functions.yaml#L3273

    As discussed with @fortianyou offline, TorchMlir converter should force on the latest PyTorch version in the current stage, and fix the version compatible issue later.

    need discussion 
    opened by Yancey1989 0
Releases(v0.2.0)
  • v0.2.0(May 11, 2022)

    Release 0.2.0

    Performance Optimization

    GPU stitch fusion

    Make use of GPU shared memory to fuse reduce operator with its consumers into one kernel. It helps to accommodate complex memory-intensive computations (e.g., LayerNorm, SoftMax) into one kernel, reducing off-chip memory traffics and overhead of kernel scheduling and launching. It implements partial functions described in paper AStitch. It is currently under refactoring to enhance the robustness, for which it is not enabled by default. Users of BladeDISC can enable it by setting the environment variable DISC_ENABLE_STITCH=true.

    Note that we have already released the CPU stitch optimization when we open-source the BladeDISC project, which is enabled by default. Refer to the materials for more information about CPU stitch technique details.

    GEMM merging

    Support two types of GEMM merging optimization. One is to merge two GEMMs sharing the same operand into a single GEMM. The other one is to merge two GEMMs with the same shape into a batched GEMM. The GEMM merging optimization helps to increase hardware utilization and to reduce kernel launch overhead.

    CPU GEMM/Convolution weight pre-packing optimization

    Support weight pre-packing optimization for convolution (calling onednn library) and GEMM (calling mkl/onednn/acl libraries) operations.

    Convolution layout optimization and transpose elimination

    Support to transform the layout of convolution operator to the friendliest format on the specific device (i.e., either CPU or GPU). Most of the introduced transpose operators can be eliminated in a following transpose-simplifier pass.

    Other optimizations

    • Optimize the schedule selection strategy for reduce operator on GPU to enhance thread-level-parallelism.
    • Algebraic simplification for operators like power.
    • Support to fuse splat constant operator with its consumers, reducing memory access overhead. Refer to issue.

    Function Enhancement

    CPU end-to-end optimization

    Support end-to-end optimization for X86 and AArch64 CPUs.

    TorchBlade/TensorFlowBlade clustering and optimizing with TensorRT

    According to the supported operators of TensorRT, cluster sub-graphs and apply TensorRT optimization for both TensorFlow and PyTorch models.

    Accelerating PyTorch Training

    Release PoC version for accelerating PyTorch training via Disc + Lazy Tensor Core, referring to the related issue and design doc.

    Shape analysis and simplifier enhancement

    Enhance the shape equality analysis according to the dimension values. Add the function to analyze the collapse and expand relationship between dimensions, which helps to identify the dimension mapping between input and output values of reshape operator. This is the basic function to support GPU stitch fusion.

    Codegen support for int8 datatype

    Support int8 datatype for the code generation of memory-intensive operators (e.g., element-wise, reduce operators).

    Toolchain Support and Process Optimization

    Replay tool

    Support to dump clusters and the corresponding input data, based on which developers can replay the execution. It is effective to help debugging and tuning. Refer to issue.

    CI optimization

    Enhance the CI process of BladeDISC repo, which helps the people from community to contribute to BladeDISC more conveniently and efficiently.

    TorchBlade bazel build

    Migrate TorchBlade's compilation toolchain from the original CMake to bazel, enhancing maintainability.

    Other

    Example preparation

    Prepare a set of commonly used models as the examples for BladeDISC. Compare the performance of BladeDISC with TensorRT, XLA and ONNX Runtime (ORT) upon the examples.

    Community TF rebase

    Rebase to TensorFlow codebase for BladeDISC according to the newest community code.

    Code maintenance

    Continuous bug fixing and code refactoring.

    Source code(tar.gz)
    Source code(zip)
Owner
Alibaba
Alibaba Open Source
Alibaba
WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,

WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models, to reduce the effort of productionizing E2E models, and to explore better E2E models for production.

null 2.1k Jun 19, 2022
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Jun 26, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
[3DV 2021] DSP-SLAM: Object Oriented SLAM with Deep Shape Priors

DSP-SLAM Project Page | Video | Paper This repository contains code for DSP-SLAM, an object-oriented SLAM system that builds a rich and accurate joint

Jingwen Wang 295 Jun 21, 2022
Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation This is the implementation of the approach describ

Taosha Fan 32 Jun 16, 2022
NeeDrop: Self-supervised Shape Representation from Sparse Point Clouds using Needle Dropping

NeeDrop: Self-supervised Shape Representation from Sparse Point Clouds using Needle Dropping by: Alexandre Boulch, Pierre-Alain Langlois, Gilles Puy a

valeo.ai 25 May 24, 2022
heuristically and dynamically sample (more) uniformly from large decision trees of unknown shape

PROBLEM STATEMENT When writing a randomized generator for some file format in a general-purpose programming language, we can view the resulting progra

John Regehr 4 Feb 15, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20k Jun 23, 2022
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

OpenAI 3.7k Jun 20, 2022
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 165.8k Jun 20, 2022
Distributed machine learning platform

Veles Distributed platform for rapid Deep learning application development Consists of: Platform - https://github.com/Samsung/veles Znicz Plugin - Neu

Samsung 897 May 28, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.6k Jun 21, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

The Fido Project 412 Jun 25, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 13.9k Jun 18, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jun 24, 2022
Feature Store for Machine Learning

Overview Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production. Please see ou

Feast 3.3k Jun 18, 2022
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.1k Jun 23, 2022