ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Overview

ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more.

ONNX Runtime uses the portable ONNX computation graph format, backed by execution providers optimized for operating systems, drivers and hardware.

Common use cases for ONNX Runtime:

  • Improve inference performance for a wide variety of ML models
  • Reduce time and cost of training large models
  • Train in Python but deploy into a C#/C++/Java app
  • Run with optimized performance on different hardware and operating systems
  • Support models created in several different frameworks

ONNX Runtime inference APIs are stable and production-ready since the 1.0 release in October 2019 and can enable faster customer experiences and lower costs.

ONNX Runtime training feature was introduced in May 2020 in preview. This feature supports acceleration of PyTorch training on multi-node NVIDIA GPUs for transformer models. Additional updates for this feature are coming soon.

Get Started

http://onnxruntime.ai/

Build Pipeline Status

System CPU GPU EPs
Windows Build Status Build Status Build Status
Linux Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Mac Build Status
Build Status
Android Build Status
iOS Build Status

Data/Telemetry

This project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports, please file a GitHub Issue.

For general discussion or questions, please use Github Discussions.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

This project is licensed under the MIT License.

Comments
  • Openvino ep 2021.4 v3.3

    Openvino ep 2021.4 v3.3

    Changes enabled in OpenVINO EP for IO Buffer Optimization Enable Auto Plugin Feature

    Motivation and Context

    • Change was required to enable IO Buffer Optimization
    • Change was required to enable AutoPlugin, fix Multi, Hetero Flow
    • Change is ONNXRuntime API to get the Device Location for For ORT Value Tensor
    • If it fixes an open issue, please link to the issue here.
    opened by sfatimar 79
  • Java API for onnxruntime

    Java API for onnxruntime

    Description: This pull request provides a Java 8 API using JNI. It has unit tests ported from the v0.5.0 release of the C# API, I'll work on porting the new tests from the master branch over the next few weeks. I assume there will be some design & naming discussion on this PR so we can have that while I work on the unit tests.

    Currently it builds using a separate gradle project which I've tested on Mac & Linux. The build process involves running gradle clean build -x test; gradle build as the combination of a JNI and Java project in Gradle 5 isn't properly supported. I could do with some help integrating it into the CMake build system, but I've not used CMake much before. Integrating it into CMake will make it simpler to put in the appropriate provider compilation flags and fix the oddities in the build (as CMake has all the information necessary).

    opened by Craigacp 75
  • Support CUDA Graph

    Support CUDA Graph

    Description

    This PR wants to support the feature of CUDA Graph. This feature can significantly reduce the CPU overhead of calling CUDA APIs by submitting the entire graph to the GPU with a single call to cudaGraphLaunch.

    Motivation and Context

    • Why is this change required? What problem does it solve? This feature is pretty helpful to reduce the model latency, especially for the online inference, when the above CPU overhead is a bottleneck. For example, it can reduce the 95% latency of the transformer-based online inference model (with 148 millions of parameters) from 4.3ms to 2.1ms.
    opened by feihugis 72
  • Resolve Optim Params Issues

    Resolve Optim Params Issues

    • Includes a test of Optimizer Parameter Groups for the ONNX BERT Model (3 variations)
    • Resolves the issue of not passing default hyperparameters for parameters not in a group
    • Resolves the issue of sending 'lambda_coef' instead of 'lambda' to the backend
    • Resolves the issue of sending lr to the backend as a hyperparameter
    opened by rayankrish 68
  • Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Description: Extend Gist memory compression to support additional compression formats, support of new priority execution order, and other upgrades:

    • New Feature: GistPack1 compression. It compresses from float32/bool to 1 bit. It is used for lossless compression for dropout and relu nodes.
    • New Feature: GistPack8 compression. It compresses from 32 bits/16 bits to 8 bits. It is used for lossy compression for any operator.
    • New Feature: GistPackMsfp15 compression. It compresses 8 (or tile size) values each 32 bits wide to 8 (or tile size) values each 7 bits wide (sign and mantissa) and a single 8 bits shared exponent. It is used for lossy compression for any operator.
    • New Feature: GistPack16 compression. It compresses from 32 bits to 16 bits. It is used for lossy compression for any operator.
    • We also upgraded Gist rule to support different operators. We created a generic Gist rule as long as we provide a Pattern map. The pattern map has key as the target operator and value as the destination operator (e.g. PATTER_MAP[Sofmax] = {“SoftmaxGrad”}. Our rule is operator-agnostic, and makes Gist robust to support new operators in the future.
    • New test for Priority execution order for nested compression.
    • Gist upgrade to support priority execution order to trigger encoder (compression) and decoder (decompression) accordingly.
    • Gist CLI: --use_gist, --op <which operator is being targeted, e.g. Softmax is op 1> --gist_compr <GistPack1|GistPack8|GistPack16|GistPackMsfp15>

    Motivation and Context

    • Why is this change required? What problem does it solve? It fixes and improves Gist optimizer rule by changing Gist operators to handle 1 input and 1 output without the need of early encoder input or late decoder output. It also adds new compression format (Pack1, Pack8).
    training 
    opened by fninaparavecino 61
  • Amdmigraphx fix build error

    Amdmigraphx fix build error

    Description: Describe your changes. For build error related to EP API changes

    Motivation and Context

    1. ORT EP is changed to use shared lib, and APIs for EP is changed, AMD migraphx needs corresponding changes to work as an EP.
    2. Added a few operators that AMDMIGraphX implemented recently.
    • Why is this change required? What problem does it solve? See above explanation

    • If it fixes an open issue, please link to the issue here. No

    opened by scxiao 60
  • Python MacOS arm64 release binaries

    Python MacOS arm64 release binaries

    Describe the bug

    ONNX Runtime does not install using pip on M1.

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 11.2.1
    • ONNX Runtime installed from (source or binary): pip
    • Python version: 3.9.1

    To Reproduce

    ~: uname -v
    Darwin Kernel Version 20.3.0: Thu Jan 21 00:06:51 PST 2021; root:xnu-7195.81.3~1/RELEASE_ARM64_T8101
    ~: which python3
    /opt/homebrew/bin/python3
    ~: which pip
    /opt/homebrew/bin/pip
    ~: python3 --version
    Python 3.9.1
    ~: pip install onnxruntime
    ERROR: Could not find a version that satisfies the requirement onnxruntime
    ERROR: No matching distribution found for onnxruntime
    
    feature request 
    opened by lutzroeder 59
  • [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    Description:

    Refactors the native library loading in Java to allow CUDA to be loaded on demand, fixing #7044. Then expands the shared provider library loading to DNNL, OpenVINO, TensorRT, fixing #6553.

    Added a flag to the native library loading to allow users to supply a directory which contains all the native libraries, fixing #8003. This is also the only way to make the shared library providers load from a different place than the jar, as the individual library path specification conflicts with the way that the ONNX Runtime native code loads the shared library providers.

    I also slightly refactored the Java cmake bits, and added the --console=plain flag to the gradle executions to stop gradle writing over cmake's output.

    Motivation and Context

    • Why is this change required? What problem does it solve? Re-enables DNNL, OpenVINO and TensorRT in Java by allowing them to be packaged in the jar and dynamically loaded in the same way CUDA is.
    • If it fixes an open issue, please link to the issue here. Fixes #6553. Fixes #7044. Fixes #8003.
    opened by Craigacp 54
  • Jetson Xavier - building from source

    Jetson Xavier - building from source

    1. I tried the solution proposed here: `../build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu 2020-02-14 14:34:50,960 Build [INFO] - Build started 2020-02-14 14:34:50,960 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'sync', '--recursive'] Synchronizing submodule url for 'cmake/external/DNNLibrary' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/flatbuffers' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/glog' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/cub' Synchronizing submodule url for 'cmake/external/date' Synchronizing submodule url for 'cmake/external/eigen' Synchronizing submodule url for 'cmake/external/gemmlowp' Synchronizing submodule url for 'cmake/external/googletest' Synchronizing submodule url for 'cmake/external/grpc' Synchronizing submodule url for 'cmake/external/grpc/third_party/abseil-cpp' Synchronizing submodule url for 'cmake/external/grpc/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/libFuzzer' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/re2' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl-with-bazel' Synchronizing submodule url for 'cmake/external/grpc/third_party/cares/cares' Synchronizing submodule url for 'cmake/external/grpc/third_party/data-plane-api' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags/doc' Synchronizing submodule url for 'cmake/external/grpc/third_party/googleapis' Synchronizing submodule url for 'cmake/external/grpc/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxx' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxxabi' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/protoc-gen-validate' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/zlib' Synchronizing submodule url for 'cmake/external/mimalloc' Synchronizing submodule url for 'cmake/external/nsync' Synchronizing submodule url for 'cmake/external/onnx' Synchronizing submodule url for 'cmake/external/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/onnx-tensorrt' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/protobuf' Synchronizing submodule url for 'cmake/external/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/re2' Synchronizing submodule url for 'cmake/external/spdlog' Synchronizing submodule url for 'cmake/external/tvm' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/HalideIR' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dlpack' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dmlc-core' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/rang' Synchronizing submodule url for 'cmake/external/wil' 2020-02-14 14:34:52,305 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'update', '--init', '--recursive'] 2020-02-14 14:34:54,502 Build [INFO] - Generating CMake build tree 2020-02-14 14:34:54,504 Build [DEBUG] - Running subprocess in '/code/onnxruntime/build/Linux/Release' ['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release'] Use gtest from submodule -- Found PythonInterp: /usr/bin/python3 (found version "3.6.9") -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "3.5") Use protobuf from submodule -- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc - broken CMake Error at /usr/local/share/cmake-3.17/Modules/CMakeTestCUDACompiler.cmake:46 (message): The CUDA compiler

      "/usr/local/cuda-10.0/bin/nvcc"

    is not able to compile a simple test program.

    It fails with the following output:

    Change Dir: /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/make cmTC_bb43d/fast && /usr/bin/make -f CMakeFiles/cmTC_bb43d.dir/build.make CMakeFiles/cmTC_bb43d.dir/build
    make[1]: Entering directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_bb43d.dir/main.cu.o
    /usr/local/cuda-10.0/bin/nvcc    -cudart shared  -Xcompiler=-fPIE   -x cu -c /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_bb43d.dir/main.cu.o
    Linking CUDA executable cmTC_bb43d
    /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_bb43d.dir/link.txt --verbose=1
    /usr/bin/g++   CMakeFiles/cmTC_bb43d.dir/main.cu.o -o cmTC_bb43d  -lcudadevrt -lcudart_static  -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverEntrypoints()':
    :(.text+0x23488): undefined reference to `dlsym'
    :(.text+0x234b0): undefined reference to `dlsym'
    :(.text+0x234d4): undefined reference to `dlsym'
    :(.text+0x234f8): undefined reference to `dlsym'
    :(.text+0x2351c): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o)::(.text+0x23540): more undefined references to `dlsym' follow
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::loadDriverInternal()':
    :(.text+0x288cc): undefined reference to `dlopen'
    :(.text+0x28904): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::__loadDriverInternalUtil()':
    :(.text+0x289e0): undefined reference to `dlopen'
    :(.text+0x28a14): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverInternal()':
    :(.text+0x2b664): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInit()':
    :(.text+0x5c7bc): undefined reference to `dlerror'
    :(.text+0x5c7c8): undefined reference to `dlopen'
    :(.text+0x5c7dc): undefined reference to `dlsym'
    :(.text+0x5c7e4): undefined reference to `dlerror'
    :(.text+0x5c7f4): undefined reference to `dlclose'
    :(.text+0x5c838): undefined reference to `dlerror'
    :(.text+0x5c844): undefined reference to `dlopen'
    :(.text+0x5c858): undefined reference to `dlsym'
    :(.text+0x5c860): undefined reference to `dlerror'
    :(.text+0x5c870): undefined reference to `dlclose'
    :(.text+0x5c8b4): undefined reference to `dlerror'
    :(.text+0x5c8c0): undefined reference to `dlopen'
    :(.text+0x5c8d4): undefined reference to `dlsym'
    :(.text+0x5c8dc): undefined reference to `dlerror'
    :(.text+0x5c8ec): undefined reference to `dlclose'
    :(.text+0x5c930): undefined reference to `dlerror'
    :(.text+0x5c93c): undefined reference to `dlopen'
    :(.text+0x5c950): undefined reference to `dlsym'
    :(.text+0x5c958): undefined reference to `dlerror'
    :(.text+0x5c968): undefined reference to `dlclose'
    :(.text+0x5c9a0): undefined reference to `dlerror'
    :(.text+0x5c9ac): undefined reference to `dlopen'
    :(.text+0x5c9c0): undefined reference to `dlsym'
    :(.text+0x5c9c8): undefined reference to `dlerror'
    :(.text+0x5c9d8): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreCreate(sem_t*, int)':
    :(.text+0x5d910): undefined reference to `sem_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreDestroy(sem_t*)':
    :(.text+0x5d92c): undefined reference to `sem_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreWait(sem_t*, unsigned int)':
    :(.text+0x5da10): undefined reference to `sem_timedwait'
    :(.text+0x5da48): undefined reference to `sem_wait'
    :(.text+0x5da60): undefined reference to `sem_trywait'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreSignal(sem_t*)':
    :(.text+0x5dab0): undefined reference to `sem_post'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRangeBug1778973WARInit()':
    :(.text+0x5f448): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5f464): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5f474): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5f484): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5f4a4): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosPosixInit()':
    :(.text+0x5f4f0): undefined reference to `dlerror'
    :(.text+0x5f4fc): undefined reference to `dlopen'
    :(.text+0x5f510): undefined reference to `dlsym'
    :(.text+0x5f518): undefined reference to `dlerror'
    :(.text+0x5f528): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRange(unsigned long, void*, void*, unsigned long)':
    :(.text+0x5f768): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibrary(char const*)':
    :(.text+0x5fc8c): undefined reference to `dlerror'
    :(.text+0x5fca0): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibraryUnsafe(char const*)':
    :(.text+0x5fcb4): undefined reference to `dlerror'
    :(.text+0x5fcc8): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosFreeLibrary(void*)':
    :(.text+0x5fcd4): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosGetProcAddress(void*, char const*)':
    :(.text+0x5fce8): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsAlloc(void (*)(void*))':
    :(.text+0x5fdec): undefined reference to `pthread_key_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsFree(unsigned int)':
    :(.text+0x5fe10): undefined reference to `pthread_key_delete'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsGetValue(unsigned int)':
    :(.text+0x5fe18): undefined reference to `pthread_getspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsSetValue(unsigned int, void*)':
    :(.text+0x5fe28): undefined reference to `pthread_setspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionWithSharedFlag(pthread_mutex_t*, int)':
    :(.text+0x5fef4): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff14): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff24): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ff34): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ff50): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSection(pthread_mutex_t*)':
    :(.text+0x5ff70): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff8c): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff9c): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ffac): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ffc8): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionShared(pthread_mutex_t*)':
    :(.text+0x5ffe8): undefined reference to `pthread_mutexattr_init'
    :(.text+0x60004): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x60014): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x60024): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x60040): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryEnterCriticalSection(pthread_mutex_t*)':
    :(.text+0x60058): undefined reference to `pthread_mutex_trylock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLockEx(void**, void*, unsigned long)':
    :(.text+0x600b4): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x600c4): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x600d4): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLock(void**)':
    :(.text+0x60114): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x60144): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x60154): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireReaderLock(void**)':
    :(.text+0x60164): undefined reference to `pthread_rwlock_rdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireWriterLock(void**)':
    :(.text+0x6016c): undefined reference to `pthread_rwlock_wrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireReaderLock(void**)':
    :(.text+0x6017c): undefined reference to `pthread_rwlock_tryrdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireWriterLock(void**)':
    :(.text+0x601a4): undefined reference to `pthread_rwlock_trywrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseReaderLock(void**)':
    :(.text+0x601c4): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseWriterLock(void**)':
    :(.text+0x601cc): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLockEx(void**)':
    :(.text+0x601d4): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLock(void**)':
    :(.text+0x601ec): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosOnce(int*, void (*)())':
    :(.text+0x60210): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateWithSharedFlag(pthread_cond_t*, int)':
    :(.text+0x60250): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreate(pthread_cond_t*)':
    :(.text+0x602b0): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateShared(pthread_cond_t*)':
    :(.text+0x60310): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreateWithName(cudart::CUOSthread_st**, int (*)(void*), void*, char const*)':
    :(.text+0x60564): undefined reference to `pthread_create'
    :(.text+0x60578): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreate(cudart::CUOSthread_st**, int (*)(void*), void*)':
    :(.text+0x60640): undefined reference to `pthread_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadJoin(cudart::CUOSthread_st*, int*)':
    :(.text+0x606a8): undefined reference to `pthread_join'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadDetach(cudart::CUOSthread_st*)':
    :(.text+0x60708): undefined reference to `pthread_detach'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosHasThreadExited(cudart::CUOSthread_st*)':
    :(.text+0x60758): undefined reference to `pthread_kill'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCreateNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x60ee0): undefined reference to `shm_unlink'
    :(.text+0x60ef8): undefined reference to `shm_open'
    :(.text+0x60f98): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmOpenNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x61124): undefined reference to `shm_open'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCloseEx(cudart::cuosShmInfoEx_st*, unsigned int, unsigned int)':
    :(.text+0x61370): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSetThreadName(cudart::CUOSthread_st*, char const*)':
    :(.text+0x62294): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int, sockaddr*, unsigned int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED2Ev[_ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiPiiEED2Ev[_ZN15CUOSdlsymLoaderIPFiPiiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long const*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPKmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPKmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)()>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFivEED2Ev[_ZN15CUOSdlsymLoaderIPFivEED5Ev]+0x18): undefined reference to `dlclose'
    collect2: error: ld returned 1 exit status
    CMakeFiles/cmTC_bb43d.dir/build.make:103: recipe for target 'cmTC_bb43d' failed
    make[1]: *** [cmTC_bb43d] Error 1
    make[1]: Leaving directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Makefile:138: recipe for target 'cmTC_bb43d/fast' failed
    make: *** [cmTC_bb43d/fast] Error 2
    

    CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:715 (enable_language)

    -- Configuring incomplete, errors occurred! See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeOutput.log". See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeError.log". Traceback (most recent call last): File "/code/onnxruntime/tools/ci_build/build.py", line 1043, in sys.exit(main()) File "/code/onnxruntime/tools/ci_build/build.py", line 972, in main args, cmake_extra_args) File "/code/onnxruntime/tools/ci_build/build.py", line 422, in generate_build_tree run_subprocess(cmake_args + ["-DCMAKE_BUILD_TYPE={}".format(config)], cwd=config_build_dir) File "/code/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1. `

    opened by AndreV84 52
  • Java build system enhancements

    Java build system enhancements

    Description: I cleaned up the CMake files to delegate building and testing to a Gradle build system. Gradle is manages java build/testing, dependencies, and packaging better than CMake. Additionally, all of the Java build output files are in a single output directory. I added a README which explains what JAR files and build output is generated. I relaxed the requirements in JNI loader in OnnxRuntime.java to allow for flexible loading of the library (i.e. from the os libpath or if the developer wants to load the library from some other way). If the developer wants to do that, then they would not use the onnxruntime-lib.jar. I attempted to build out an azure pipeline for the java build (based on the python build). I do not know how to test that. I would assume the maintainers would have to add it. I added spotless plugin to ensure formatting compliance. I selected a stock formatting profile (google's) and reformatted the project in a consistent manner. All changes to java files are formatting except OnnxRuntime.java. Not in this PR: I am playing around with Gradle's maven publishing mechanism.

    Motivation and Context

    • Why is this change required? What problem does it solve?

    Per my original issue below, I wanted to improve the Java packaging and maintainability. I feel these changes will eventually make way for proper maven distribution.

    • If it fixes an open issue, please link to the issue here.

    This is relevant towards https://github.com/microsoft/onnxruntime/issues/2675

    opened by yuzawa-san 52
  • [TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP

    [TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP

    Rename all mentions of STVM to TVM in the onnx runtime codebase.

    Corrections were done after PR#10241 and PR#10211 were merged.

    Notes:

    1. Nuphar and TVM EPs building were separated in build and cmake files for convenient further work.
    2. New key is used for build TVM EP +Cuda to avoid confusion with CUDA EP. Adding the new key is associated with the following --use_cuda usage issues: Using the --use_cuda flag means that when building a project, you need to build CUDA EP. This is not the right approach if we only want to work with TVM EP. Building CUDA EP is a separate flow that requires its own nuances and dependencies, which are redundant if we only need TVM EP. At the build.py script level, there are checks for --cuda_home and --cudnn_home flags, which are not needed for TVM. Also, setting the --use_cuda flag turns on a lot of additional logic when running build.py that is not needed for the TVM EP. In order to disable this additional logic, it is necessary to extend conditions if args.use_cuda: to if args.use_cuda and not args.use_tvm in many places. This is not conceptually correct, since these changes for TVM EP need to be made in methods that are specific to CUDA EP. Additionally, there are problems using TVM EP via PYTHONPATH and the wheel package. This is because setup.py only supports one EP at a time. If we set --use_cuda and --use_tvm flags, then only one wheel package for CUDA EP will be built, because this is how the logic of working with providers in setup.py is arranged. Also, the condition for the _ld_preload.py extension for TVM EP will not be fulfilled, with the help of which the
    3. If input shapes are unset and dynamic the ONNX runtime error is raised instead of warning and automatical inferred of dynamic dimension to 1.
    4. There are two independent structures with the same name (TVMFuncState) in TVM EP internal code and in tests for TVM prepared by NUPHAR.

    Hello @xadupre, can you recheck our updates after your PR?

    opened by vvchernov 50
  • [Training] pytorch exporter deleted symbolic_registry  while ort still uses it

    [Training] pytorch exporter deleted symbolic_registry while ort still uses it

    Describe the issue

    this pytorch PR https://github.com/pytorch/pytorch/pull/84382 removed "torch.onnx.symbolic_registry", while ort still uses it

    @justinchuby could you give some suggestion on what kind of change ort need to take according to this torch change?

    To reproduce

    the python code import torch.onnx.symbolic_registry as sym_registry will fail to run.

    Urgency

    No response

    ONNX Runtime Installation

    Released Package

    ONNX Runtime Version or Commit ID

    ort nightly

    PyTorch Version

    torch nightly

    Execution Provider

    CUDA

    Execution Provider Library Version

    No response

    training 
    opened by zhijxu-MS 0
  • [Web] Can't create session loading large models

    [Web] Can't create session loading large models

    Describe the issue

    Hello,

    I'm trying to load a UNet model (> 2GB), which means I have a model.onnx file as well as a lot of .weight and .bias files.

    Using onnxruntime-web, I'm attempting to load it into a Node/Javascript environment using the following code:

    const ort = require('onnxruntime-web');
    try {
        const session = await ort.InferenceSession.create('./unet/model.onnx')
    } catch (e) {
        console.log(e);
    }
    

    When I do this, I'm faced with this warning:

    Deserialize tensor onnx::MatMul_12479 failed.tensorprotoutils.cc:640 TensorProtoToTensor External initializer: onnx::MatMul_12479 offset: 0 size to read: 409600 given file_length: 32 are out of bounds or can not be read in full.
    

    And then this error:

    Error: Can't create a session.
    

    Using this same method and code I've been able to load smaller models < 2GB without errors and warnings.

    Is there anything I can do to investigate why I'm running into this issue?

    To reproduce

    UNet model can be accessed here: https://drive.google.com/drive/folders/1l65XlDrYYE7m1EUPdEZ6SDGWjrlIljJu?usp=sharing

    You can use this code (copied and pasted from above):

    const ort = require('onnxruntime-web');
    try {
        const session = await ort.InferenceSession.create('./unet/model.onnx')
    } catch (e) {
        console.log(e);
    }
    

    Urgency

    It's urgent for a stealth project.

    ONNX Runtime Installation

    Released Package

    ONNX Runtime Version or Commit ID

    1.8.0 (installed onnxruntime-web through npm)

    Execution Provider

    Other / Unknown

    platform:web 
    opened by edhyah 0
  • Chenta/avoid thread local

    Chenta/avoid thread local

    Description: the thread local implementation of stream pool introduces complicated issue during shut down. to avoid that, re-implement the stream pool at session level, so we can have better control of the stream pool's lifetime.

    opened by souptc 0
  • Update CUDA version to 11.6 and refactor python packaging pipeline

    Update CUDA version to 11.6 and refactor python packaging pipeline

    Description:

    1. Update CUDA version from 11.4 to 11.6.
    2. Update Manylinux version
    3. Upgrade GCC version from 10 to 11 for most x86_64 pipelines. CentOS 7 ARM64 doesn't have GCC 11 yet.
    4. Refactor python packaging pipeline: a. Split Linux GPU build job to two parts, build and test, so that the build part doesn't need to use a GPU machine b. Make the Linux GPU build job and Linux CPU build job more similar: share the same bash script and yaml file.
    5. Temporarily disable Attention_Mask1D_Fp16_B2_FusedNoPadding because it is causing one of our packaging pipeline to fail. I have created an ADO task for this.

    Motivation and Context

    • Why is this change required? What problem does it solve?
    • If it fixes an open issue, please link to the issue here.
    opened by snnn 1
  • Support BFloat16 ?

    Support BFloat16 ?

    Describe the issue

    When will be Tensor type Bfloat16 supported?

    To reproduce

    Urgency

    No response

    Platform

    Linux

    OS Version

    Ubuntu 18.04

    ONNX Runtime Installation

    Released Package

    ONNX Runtime Version or Commit ID

    1.12.1

    ONNX Runtime API

    Python

    Architecture

    X64

    Execution Provider

    CUDA

    Execution Provider Library Version

    No response

    opened by ildoonet 0
Releases(v1.12.1)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

null 882 Sep 16, 2022
Pure C ONNX runtime with zero dependancies for embedded devices

?? cONNXr C ONNX Runtime A onnx runtime written in pure C99 with zero dependencies focused on embedded devices. Run inference on your machine learning

Alvaro 134 Sep 14, 2022
YOLO v5 ONNX Runtime C++ inference code.

yolov5-onnxruntime C++ YOLO v5 ONNX Runtime inference code for object detection. Dependecies: OpenCV 4.5+ ONNXRuntime 1.7+ OS: Windows 10 or Ubuntu 20

null 78 Sep 18, 2022
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Libonnx A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support. Getting Started The library's

xboot.org 421 Sep 15, 2022
yolov5 onnx caffe

环境配置 ubuntu:18.04 cuda:10.0 cudnn:7.6.5 caffe: 1.0 OpenCV:3.4.2 Anaconda3:5.2.0 相关的安装包我已经放到百度云盘,可以从如下链接下载: https://pan.baidu.com/s/17bjiU4H5O36psGrHlF

null 56 Sep 8, 2022
Support Yolov4/Yolov3/Centernet/Classify/Unet. use darknet/libtorch/pytorch to onnx to tensorrt

ONNX-TensorRT Yolov4/Yolov3/CenterNet/Classify/Unet Implementation Yolov4/Yolov3 centernet INTRODUCTION you have the trained model file from the darkn

null 166 Sep 9, 2022
Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI

High-Performance-Computing-Experiments Experimental and Comparative Performance Measurements of High Performance Computing Based on OpenMP and MPI 实验结

Jiang Lu 1 Nov 27, 2021
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 15.5k Sep 22, 2022
HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs)

Merlin: HugeCTR HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-T

null 698 Sep 15, 2022
Training and fine-tuning YOLOv4 Tiny on custom object detection dataset for Taiwanese traffic

Object Detection on Taiwanese Traffic using YOLOv4 Tiny Exploration of YOLOv4 Tiny on custom Taiwanese traffic dataset Trained and tested AlexeyAB's D

Andrew Chen 3 Mar 7, 2022
ResNet Implementation, Training, and Inference Using LibTorch C++ API

LibTorch C++ ResNet CIFAR Example Introduction ResNet implementation, training, and inference using LibTorch C++ API. Because there is no native imple

Lei Mao 20 Jun 20, 2022
Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase.

CFace Training and Evaluating Facial Classification Keras Models using the Tensorflow C API Implemented into a C++ Codebase. Dependancies Tensorflow 2

null 8 Nov 23, 2021
Dorylus: Affordable, Scalable, and Accurate GNN Training

Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads This is Dorylus, a Scalable, Resource-eff

UCLASystem 55 Aug 24, 2022
Reactive Light Training Module used in fitness for developing agility and reaction speed.

Hello to you , Thanks for taking interest in this project. Use case of this project is to help people that want to improve their agility and reactio

null 4 Sep 16, 2022
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

null 33 Jun 27, 2021
OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.

OpenEmbedding English version | 中文版 About OpenEmbedding is an open-source framework for TensorFlow distributed training acceleration. Nowadays, many m

4Paradigm 19 Jul 25, 2022
Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict outcomes.

Linear-Regression Implementation of Univaraint Linear Regresion (Supervised Machine Learning) in c++. With a data set (training set) you can predict o

vincent laizer 1 Nov 3, 2021
A system to flag anomalous source code expressions by learning typical expressions from training data

A friendly request: Thanks for visiting control-flag GitHub repository! If you find control-flag useful, we would appreciate a note from you (to niran

Intel Labs 1.2k Sep 19, 2022
Efficient training of deep recommenders on cloud.

HybridBackend Introduction HybridBackend is a training framework for deep recommenders which bridges the gap between evolving cloud infrastructure and

Alibaba 97 Sep 20, 2022