Verisilicon Tensor Interface Module for OpenVX

Related tags

Graphics TIM-VX
Overview

TIM-VX - Tensor Interface Module for OpenVX

TIM-VX is a software integration module provided by VeriSilicon to facilitate deployment of Neural-Networks on OpenVX enabled ML accelerators. It serves as the backend binding for runtime frameworks such as Android NN, Tensorflow-Lite, MLIR, TVM and more.

Main Features

  • Over 130 internal operators with rich format support for both quantized and floating point
  • Simplified binding API calls to create Tensors and Operations
  • Dynamic graph construction and supports shape inferencing
  • Built-in custom layer extensions
  • A set of utility functions for debugging

Roadmap

Roadmap of TIM-VX will be updated here in the future.

Get started

Build and Run

TIM-VX uses bazel build system by default. Install bazel first to get started.

TIM-VX needs to be compiled and linked against VeriSilicon OpenVX SDK which provides related header files and pre-compiled libraries. A default linux-x86_64 SDK is provided which contains the simulation environment on PC. Platform specific SDKs can be obtained from respective SoC vendors.

To build TIM-VX

bazel build libtim-vx.so

To run sample LeNet

# set VIVANTE_SDK_DIR for runtime compilation environment
export VIVANTE_SDK_DIR=`pwd`/prebuilt-sdk/x86_64_linux

bazel build //samples/lenet:lenet_asymu8_cc
bazel run //samples/lenet:lenet_asymu8_cc

Get familiar with OpenVX spec

To development for TIM-VX, you first need to get familiar with OpenVX API and OpenVX NN Extension API. Please head over to Khronos to read the spec.

Comments
  • Multiple downstream outputs bug

    Multiple downstream outputs bug

    Sorry for making duplicates. Im not sure bug is in vx-delegate or in TIM-VX. https://github.com/VeriSilicon/tflite-vx-delegate/issues/32

    Hi, I think I found a bug in the vx-delegate runtime.

    setup: A311D + Android 9 + TensorFlow Lite with vx-delegate

    Model: detector, that outputs multiple things: BBoxes, Landmarks, Probability scores and features vectors.

    Problem: Landmarks outputs are garbage. How model works for landmarks:

    input image -> backbone -> FPN -> Conv layers that produces features (OUTPUT 1) -> Conv layers that produces landmarks (OUTPUT 2)

    so, if I have two outputs, that are downstream one after another, second outputs is not calculated and I get garbage output. The problem is only with INT8 graph, if I use FP32 graph, it works fine with vx-delegate.

    On x86 with standard TFLite (and xnnpack) everything works fine with both INT8 and FP32 graphs.

    update: Downstream is not important, even if in landmarks branch of graph I have only landmarks as outputs, I get garbage. I don't know why, but part of the graph with landmarks is not calculated on the NPU.

    What could be the problem? Thanks.

    opened by bkovalenkocomp 31
  • Is SpatialTransformer on rv1126 supported?

    Is SpatialTransformer on rv1126 supported?

    I try to run SpatialTransformer(mxnet) operator on rv1126 using Tengine, but the Status of "SpatialTransformer" operator in tim-vx is "InternalOnly", and no TIM-VX API implementation.So i can't find a way to add SpatialTransformer npu support to Tengine, i wondered that:

    1. Is SpatialTransformer(mxnet) operator on rv1126 npu supported?
    2. If it is supported, how can i add it to Tengine?
    opened by sky-fun 26
  • Segmentation fault

    Segmentation fault

    Hi, I have A311D board with ubuntu on board (from khadas). My end goal is to compile tflite with vx-delegate support. What is the best way to do this?

    Also Im trying to compile TIM-VX (libtim-vx.so) it seems Bazel building is broken, so I try CMake. CMake works fine: compiles and links all targets, but when I try to run unit tests (under gdb), I get segfault from *** _LoadStates() from libGAL.so what causes this?

    opened by bkovalenkocomp 17
  • TVM RPC test failed with message:

    TVM RPC test failed with message: "PLS isn't existed"

    I don't know if it's the right place to ask questions about your TVM fork, but I cannot raise issues in that repo.

    I followed the guide from README.VSI.md to build TVM (on host, using x86_64_linux simulation drivers provided here) and TVM runtime (on target, using vendor-provided VIP NPU drivers), and ran the tests in test_vsi_npu, but I got these results:

    logs from TVM C++ RPC tool:

    VsiNpuModule::LoadFromBinary
    LoadFromBinary: nbg size = 593344
    LoadFromBinary: input size = 1
    LoadFromBinary: output size = 1
    VsiNpuModule : DeSerializeTensorSpec
    VsiNpuModule : DeSerializeTensorSpec2
    VsiNpuModule : DeSerializeTensorSpec
    VsiNpuModule : DeSerializeTensorSpec2
    [22:31:35] /home/nullko/Documents/tvm-vsi_npu/apps/cpp_rpc/rpc_env.cc:130: Load module from /home/ubuntu/workspace/rpc/model.so ...
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: tvmgen_default_vsi_npu_0
    [     1] PLS isn't existed
    E [compute_node:379]Create node[0] NBG fail
    Process Graph: 0 ms or 0 us
    

    It seemed that TVM is able to compile the NBG on the host, but the target runtime cannot execute it. I wonder what caused the "PLS isn't existed" issue, is it because I didn't set some environment variables on the target platform?

    Or maybe your tvm fork is still under development and is not ready to be used yet?

    opened by Goose-Bomb 14
  • Handle tensor double free error on x86_64 simulator driver

    Handle tensor double free error on x86_64 simulator driver

    I try to use the vsi_nn_AddTensorFromHandle to create a tensor using buffer allocated by cv::Mat, every time the program exits, there will be a double free error issued by OpenCV. It seems that the passed buffer is freed by the OpenVX driver (OpenVX driver is not supposed to free the buffer since it's a handle ) when the context is deinitialized, so when the OpenCV try to free the same buffer, it causes a double free error.

    I only encountered this problem with the x86_64 simulator driver, when the program runs on the target device using vendor-provided NPU driver, everything works well.

    Here is a short program to reproduce this error:

    #include <iostream>
    #include <opencv2/core.hpp>
    #include <vsi_nn_pub.h>
    
    int main(int argc, char* argv[]) {
        vsi_status err = VSI_SUCCESS;
    
        auto matIn = cv::Mat(4, 4, CV_32F);
        auto matOut = cv::Mat(4, 4, CV_32F);
    
        cv::randu(matIn, -1.0F, 1.0F);
        matOut.setTo(0.0F);
    
        auto context = vsi_nn_CreateContext();
        auto graph = vsi_nn_CreateGraph(context, 2, 1);
        vsi_nn_SetGraphInputs(graph, nullptr, 1);
        vsi_nn_SetGraphOutputs(graph, nullptr, 1);
    
        vsi_nn_tensor_attr_t attr = {};
        attr.dtype.fmt = VSI_NN_DIM_FMT_NCHW;
        attr.dim_num = 4;
        attr.size[0] = 4;
        attr.size[1] = 4;
        attr.size[2] = 1;
        attr.size[3] = 1;
        attr.dtype.vx_type = VSI_NN_TYPE_FLOAT32;
        attr.dtype.qnt_type = VSI_NN_QNT_TYPE_NONE;
        attr.is_const = 0;
        attr.vtl = 0;
    
        auto tensorIn = vsi_nn_AddTensorFromHandle(
            graph, VSI_NN_TENSOR_ID_AUTO, &attr, matIn.data);
        auto tensorOut = vsi_nn_AddTensorFromHandle(
            graph, VSI_NN_TENSOR_ID_AUTO, &attr, matOut.data);
    
        auto nodeReLU = vsi_nn_AddNode(graph, VSI_NN_OP_RELU, 1, 1, nullptr);
        nodeReLU->uid = 100;
        nodeReLU->input.tensors[0] = tensorIn;
        nodeReLU->output.tensors[0] = tensorOut;
    
        graph->input.tensors[0] = tensorIn;
        graph->output.tensors[0] = tensorOut;
    
        err = vsi_nn_SetupGraph(graph, vx_false_e);
        err = vsi_nn_VerifyGraph(graph);
        err = vsi_nn_rnn_RunGraph(graph);
    
        std::cout << "[Input]\n" << matIn << std::endl;
        std::cout << "[Output]\n" << matOut << std::endl;
    
        vsi_nn_ReleaseGraph(&graph);
        vsi_nn_ReleaseContext(&context);
    
        return err;
    }
    

    callstack:

    raise (raise:49)
    abort (abort:60)
    __libc_message (__libc_message:173)
    malloc_printerr (malloc_printerr:0)
    _int_free (_int_free:455)
    __libc_free (__libc_free:28)
    cv::StdMatAllocator::deallocate(cv::UMatData*) const (cv::StdMatAllocator::deallocate(cv::UMatData*) const:17)
    cv::Mat::~Mat() (cv::Mat::~Mat():26)
    main (/home/nullko/Documents/tim-vx/samples/ncc/test_handle_tensor.cpp:54)
    __libc_start_main (__libc_start_main:53)
    _start (_start:13)
    
    opened by Goose-Bomb 13
  • vsi_npu tvm compilation issue

    vsi_npu tvm compilation issue

    I am trying to cross-compile a pytorch model for imx8 with vsi_npu with below code but I am getting the errors noted below, I have also attached a working code for normal linux targets

    compilation code:

    mod, params = relay.frontend.from_pytorch(scripted_model, [(input_name, input_shape)])
    target_string = "llvm  -mtriple=aarch64-linux-gnu"              # imx8 host-triple
    kwargs = {"cc": "aarch64-linux-gnu-gcc", 'fcompile': False}
    disabled_passes = ["AlterOpLayout"]                                     # same error with None
    with tvm.transform.PassContext(opt_level=3, disabled_pass=disabled_passes):
            mod = vsi_npu.partition_for_vsi_npu(mod, params)           # runs fine
            lib = relay.build(mod, target_string, params=params)        # error #
    lib.export_library(build_dir / 'deploy.so',  **kwargs)
    

    error at compile step: (relay taken from quantized pytorch trace)

    This is important----> name_node.value() == tvmgen_default_vsi_npu_593
    GraphMakerImpl::Create
    TensorMakerImpl::InferCall: qnn.quantize
    E [GetMapedTensor:140]Tensor has not beed inserted in tensor map.
    python: /code/tim-vx-lib/TIM-VX/src/tim/transform/layout_inference.cc:141: std::shared_ptr<tim::vx::Tensor> tim::transform::layout_inference_impl::LayoutInferContext::GetMapedTensor(const std::shared_ptr<tim::vx::Tensor>&) const: Assertion `false' failed.
    Aborted (core dumped)
    

    error at compile step: (relay taken from simple pytorch trace)

    This is important----> name_node.value() == tvmgen_default_vsi_npu_57
    GraphMakerImpl::Create
    TensorMakerImpl::InferCall: nn.conv2d
    TensorMakerImpl::InferCall: add
    TensorMakerImpl::InferCall: image.resize2d
    TensorMakerImpl::InferCall: add
    TensorMakerImpl::InferCall: image.resize2d
    TensorMakerImpl::InferCall: add
    TensorMakerImpl::InferCall: image.resize2d
    W [vsi_nn_SortGraphNode:1378]Unprocessed node 7
    W [vsi_nn_SetupGraph:706]Sort graph nodes failure.
    Fatal error: compile to binary failed
    

    Working Code without vsi_npu: (quantized/normal/tuned/notune all cases)

    mod, params = relay.frontend.from_pytorch(scripted_model, [(input_name, input_shape)])
    target_string = "llvm -mtriple=x86_64-linux-gnu"              # linux host-triple
    with tvm.transform.PassContext(opt_level=3, disabled_pass=None):
            lib = relay.build(mod, target_string, params=params)      # no error #
    lib.export_library(build_dir / 'deploy.so')
    

    I have installed using cmake instructions from https://github.com/VeriSilicon/tvm/blob/vsi_npu/README.VSI.md Please guide me on the error

    opened by sidphbot 12
  • How to use NNAPI for NPU?

    How to use NNAPI for NPU?

    Hi, I've learned that it's possible to use Tensorflow Lite NNAPI on vim3 Android device,

    I have /system/lib/libneuralnetworks.so file in my Android 9 OS. How to make sure NPU is used? I benchmarked my model and it seems NPU is not used during TFLite 8bit inference, because speed is 10x slower and there is no difference between channel and tensor quantised models.

    also in dmesg after I run my bench:

    [ 4907.441064] type=1400 audit(1293888104.720:419): avc: denied { read } for pid=7157 
    comm="benchmar" path="/data/local/tmp/build/model/model.tflite" 
    dev="mmcblk0p20" ino=261748 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:shell_data_file:s0 
    tclass=file permissive=1
    
    android 
    opened by bkovalenkocomp 12
  • Does TIM-VX utilizes NPU of vim3(A311D)?

    Does TIM-VX utilizes NPU of vim3(A311D)?

    Hello, I'm going to use TIM-VX to use the VIM3 development board. I have some questions because it's my first time dealing with applications related to ovxlib and npu. I'm going to use TIM-VX for research purposes. The goal of the study is utilizing NPU effectively. I wonder that,

    1. if the NPU of VIM3 can be used through TIM-VX,
    2. if it can be used, do all layers defined at TIM-VX can be run on NPU,
    3. Is there any method to check to know that I am utilizing NPU when run time,
    4. and how much cpu usage of tim-vx is. TIM-VX seems like a very interesting subject. Thank you for reading it!!
    opened by janoslim 12
  • Minimal tflite with vx-delegate example

    Minimal tflite with vx-delegate example

    Hi, could you provide minimal example, for tflite inference with vx-delegate applied? Im not sure how to create and apply vx-delegate to tflite model. Thanks!

    opened by bkovalenkocomp 11
  • Is there the way to deal with graph verification errors?

    Is there the way to deal with graph verification errors?

    Hello, I am using tim-vx for tensorflow lite delegate and tengine, And I encountered graph verification fail error. What can I try to do for verification error? what may I check for this error Thank you!

    opened by janoslim 11
  • feat(tensor): support external buffer when creating input/output tensors

    feat(tensor): support external buffer when creating input/output tensors

    Intent: Up to now TIM-VX allocates necessary tensor buffer on host memory, and vsi_nn_AddTensorFromHandle does accept a non-null data argument, so this PR enables a new usage of reusing data buffers from outsides for input and output tensors.

    This PR extends and replaces #297. And I've run a test using a yolov4-tiny-uint8 model (from tengine) on both x86_64_linux (sim) and aarch64 (hardware), and the test succeeded stably.

    API changes

    Add such public APIs:

    1. virtual bool Tensor::FlushCacheForHandle() = 0;
    2. virtual bool Tensor::InvalidateCacheForHandle() = 0;
    3. virtual void* Tensor::map(bool invalidate_cpu_cache = false) = 0;
    4. virtual void Tensor::unmap() = 0;
    5. virtual std::shared_ptr<Tensor> Graph::CreateIOTensor(const TensorSpec& spec, void* data = nullptr) = 0;

    And add:

    1. corresponding member functions in TensorImpl and GraphImpl classes
    2. TensorImpl::TensorImpl(Graph* graph, const TensorSpec& spec, void* data = nullptr);

    Also redefined TensorImpl::data_ from const void * to void *, which is to indicate this may work as a cache area and be updated by Tensor::RefillCacheFromHandle and Tensor::map(true).

    feature request 
    opened by gdh1995 9
  • Valgrind Reporting Many Warnings With Graphs and Contexts

    Valgrind Reporting Many Warnings With Graphs and Contexts

    Running valgrind on our implemented software, we found there were many errors coming from Tim-VX with regards to the graph and context instances. To verify if this was an issue with our software or something happening internally, we ran it against the lenet example provided in this repo and saw the same output. I've attached the valgrind log here. It's hard for me to tell if this is a tim-vx issue or an openvx issue (or potentially an us issue) so I'm hoping this log can help figure out what may be happening.

    The trend I see in the log is that it happens on all tim-vx functions: creating, initalizing, validating (compile), executing (run), and destroying.

    The command we ran to get this output: valgrind --tool=memcheck --leak-check=full --error-limit=no --log-file="{filename}" ./{program executable filename}

    valgrindOutput3.txt

    opened by lhawana 0
  • quantized binary classifier fails validation

    quantized binary classifier fails validation

    While debugging my own quantized models created with tim-vx, I came across the issue of binary classifiers failing the validation step. This seemed to happen across different models I had although everything was set up properly. Further, my models worked ok with a non tim-vx implementation.

    I am able to reproduce the issue seen in my models with the lenet example provided in this repo by doing the following:

    1. Run example - note that it works.
    2. Modify the following shapes (change 10's to 3's) so it becomes: tim::vx::ShapeType fc4_weight_shape({500, 3}); tim::vx::ShapeType fc4_bias_shape({3}); tim::vx::ShapeType output_shape({3, 1});
    3. Save and run example - note that it works.
    4. Repeat step 2 but change the 3's to 2's.
    5. Save and run example - note that it does not pass the validation step (graph->Compile()).

    I have tested this with models of varying amounts of classes and it only has a problem with 2-class classifiers. I noted there was a similar comment to this in issue 167 but it looks like the issue was resolved without discussing this part further.

    The output error codes I get are: D [operator():134]vsi_nn_SetupGraph Returned 0 D [operator():140]vsi_nn_VerifyGraph Returned -1

    Some notes:

    1. If I remove the softmax layer, the model passes validation.
    2. If I leave all layers in but run a non-quantized version of the model, the model passes validation.
    3. If I leave the network as is (all layers and quantized), the model doesn't pass validation.
    opened by lhawana 0
  • PLT Section missing in galkore

    PLT Section missing in galkore

    Hello, I try to use galcore.ko from libs in release https://github.com/VeriSilicon/TIM-VX/releases/tag/v1.1.42, but have next output:

    /alexrak/vim3l_aarch64/lib # insmod galcore.ko
    insmod: can't insert 'galcore.ko': invalid module format
    /alexrak/vim3l_aarch64/lib # dmesg
    [1128771.137591] galcore: module PLT section(s) missing
    [1128771.161401] galcore: module PLT section(s) missing
    

    Can you help me to find out from error with PLT section missing?

    opened by kventinel 4
  • Fixed unreasonable type of parameter in broadcast

    Fixed unreasonable type of parameter in broadcast

    Change type of shape from int to uint reason is from https://github.com/VeriSilicon/TIM-VX/issues/376 added ut for negative dimension

    Type: Code Improvement Signed-off-by: Feiyue Chen [email protected]

    opened by chenfeiyue-cfy 0
Releases(v1.1.57)
  • v1.1.57(Nov 2, 2022)

    What's Changed

    • Added broadcast layout infernece by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/438
    • Added cases for reduce sum by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/441
    • Rename RoiAlign & RoiPool by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/446
    • add maxpoolwithargmax2 and maxpoolgrad by @MercuryChen in https://github.com/VeriSilicon/TIM-VX/pull/444
    • Fixed quantize param in reduce_sum by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/445
    • update nbg format version by @zhengzhouheng in https://github.com/VeriSilicon/TIM-VX/pull/440
    • update Operators.md by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/447
    • Add ut configuration for cl only device by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/448
    • Fixed param compute bug for lrn by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/451
    • Fix the build error for clang when export TIM_VX_ENABLE_PLATFORM=ON by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/453
    • add readme for ovxlib_bin_build.sh by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/452
    • Added div int32 unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/455
    • Mapped GRUCell & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/457
    • include Topk op's header file by @MercuryChen in https://github.com/VeriSilicon/TIM-VX/pull/460
    • Set graph attributes when compile graph to binary by @xuke537 in https://github.com/VeriSilicon/TIM-VX/pull/459
    • Mapped bidirectional lstm & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/461
    • Update Version to 1.1.50 by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/462
    • Modified Div_int unit test golden by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/466
    • Modified bidirectional_sequence_lstm golden accuracy by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/469
    • Mapped unidirectional gru & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/470
    • Feat: disable maxpoolwithargmax2 if no low-level support by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/471
    • disabled two not supported cases by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/473
    • Fix error in feature compatible guard by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/472
    • Added conv3d unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/468
    • disabled two Div cases by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/476
    • Update OpenCV usage link by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/477
    • Disabled a conv3d case by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/479
    • fixed some errs on gcc12 by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/483
    • supported int16 dfp quantization & added conv2d unit test by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/478
    • Replace name direct_map_op with builtin_op by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/481
    • added Mod op & Mod unit test by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/484
    • added sign & softsign by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/486
    • added Rcp op & modified test_utils by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/487
    • added MaxPool3d op by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/490
    • added cumsum op & added OnBindInputPostProc func by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/489
    • Supported composed layout infer & added unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/488
    • Added two reduce layout infer unittest by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/491
    • Fixed bug when input's index is not 0 by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/493
    • Added two cases in strided_slice by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/494
    • added transpose_test from https://github.com/VeriSilicon/TIM-VX/issue… by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/495
    • fixed bug when broadcast dimensions is negative by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/496
    • Fixed tensorflow version in CI by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/499
    • Update internal & prebuilt-sdk for 22Q3 release by @chenfeiyue-cfy in https://github.com/VeriSilicon/TIM-VX/pull/500

    New Contributors

    • @MercuryChen made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/444

    Full Changelog: https://github.com/VeriSilicon/TIM-VX/compare/v1.1.50...v1.1.57

    Source code(tar.gz)
    Source code(zip)
  • v1.1.50(Jul 25, 2022)

    What's Changed

    • Added param "step" for slice & added unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/352
    • Fixed compiler fail for elu by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/358
    • update ovxlib virtual_device patch by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/357
    • Supported specifying alpha and beta by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/356
    • Fixed layout inference bug for stride_slice by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/329
    • refine tim_internal.cmake for ovxlib vip by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/360
    • Added unit test for maxpool by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/361
    • Suported specifying CRD_mode & DCR_mode in depthtospace by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/362
    • Support specifying pad_mode in pad by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/355
    • add BroadcastInDim to internal expand_broadcast op by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/364
    • Added selu & celu & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/366
    • Add Broadcast op by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/365
    • Update operator support plan by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/367
    • Fixed pad layout inference bug by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/370
    • CI enhancement - enable benchmark_model and samples by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/372
    • rename CopyTensorToData to CopyDataFromTensor to align name of tim::v… by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/373
    • add macro VSI_EXPAND_BROADCAST_ENABLE_DIMENSIONS for ovxlib compatibi… by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/374
    • add test demo for multi_device by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/371
    • Fix ci crash by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/380
    • fix bug of param num in custom op by @zhengzhouheng in https://github.com/VeriSilicon/TIM-VX/pull/385
    • Added topk & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/384
    • Added Ceil & unit test by @MESeraph in https://github.com/VeriSilicon/TIM-VX/pull/381
    • Fixed layout inference bug for stack by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/375
    • add macro VSI_EXPAND_BROADCAST_ENABLE_DIMENSIONS for unit test compat… by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/386
    • fix gather_element operation input num issue by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/388
    • Added gather_elements & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/363
    • add GetElementNum/GetElementByteSize/GetByteSize for TensorSpec by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/393
    • Fixed no-output if transpose is last op and can be optimized by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/395
    • Fix build issue by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/397
    • feat(tensor): support external buffer when creating input/output tensors by @gdh1995 in https://github.com/VeriSilicon/TIM-VX/pull/389
    • Mapped roi_align & added unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/402
    • modify GatherElements by @MESeraph in https://github.com/VeriSilicon/TIM-VX/pull/406
    • Added unidirectional lstm layout inference by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/392
    • Mapped roi_pool & added unit test by @MESeraph in https://github.com/VeriSilicon/TIM-VX/pull/404
    • Update tensorflow to v2.9.0 in ci by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/403
    • add reshape unit test by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/416
    • remove redefinition of TIM_VX_ENABLE_CUSTOM_OP by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/417
    • Added grouped conv2d layout inference by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/419
    • disabled two failed case by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/422
    • Enable SetRoundingPolicy by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/426
    • Disabled 3 failed case by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/428
    • Fixed transpose layout inference bug by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/430
    • Added batch dims in gather by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/435
    • Update internal for 22Q2 release by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/432

    New Contributors

    • @MESeraph made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/381

    Full Changelog: https://github.com/VeriSilicon/TIM-VX/compare/v1.1.42...v1.1.50

    Source code(tar.gz)
    Source code(zip)
  • v1.1.42(Apr 12, 2022)

    What's Changed

    • add alpha & beta parameters for HardSigmoid by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/265
    • Support that op's all inputs in layout inference are constant by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/264
    • Disable fast mode of graph by @onepick in https://github.com/VeriSilicon/TIM-VX/pull/267
    • Update component diagram and README.md by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/269
    • Support NPU access large memory > 4G by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/280
    • Fix build error with gcc 6.2.0 by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/282
    • Enabled bulding with buildroot toolchain. by @SHagerGEL in https://github.com/VeriSilicon/TIM-VX/pull/281
    • [New API] Add compile_option support - relax_mode by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/285
    • fix compile error in g++5.4 by @yingshengBD in https://github.com/VeriSilicon/TIM-VX/pull/286
    • Install headers to place defined by CMAKE_INSTALL_INCLUDEDIR variable by @robert-kalmar in https://github.com/VeriSilicon/TIM-VX/pull/291
    • enable no bias in FC layout inference by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/294
    • Fixed pad bug for grouped_conv1d by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/292
    • Added unit test for stack by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/298
    • Relax tolerance for div_uint8 case by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/303
    • Update reshape to reshape2 by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/310
    • add custom base op and tests by @zhengzhouheng in https://github.com/VeriSilicon/TIM-VX/pull/315
    • Added unit test for batch2space and space2batch by @xuke537 in https://github.com/VeriSilicon/TIM-VX/pull/321
    • fix some comments of Mish and LRN layer by @gdh1995 in https://github.com/VeriSilicon/TIM-VX/pull/322
    • Add document for customized operator by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/323
    • Fix build warn/error with clang by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/326
    • Refine customized op support by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/327
    • Add ArgMax/ArgMin unit tests by @xuke537 in https://github.com/VeriSilicon/TIM-VX/pull/333
    • add cmake option of custom op support by @zhengzhouheng in https://github.com/VeriSilicon/TIM-VX/pull/335
    • OpenCV offical announcement with TIM-VX support by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/341
    • Add layout inference & layout test for stack by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/337
    • support multi virtual devices by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/331
    • Support specifying alpha in elu by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/354

    New Contributors

    • @SHagerGEL made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/281
    • @yingshengBD made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/286
    • @gdh1995 made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/322

    Full Changelog: https://github.com/VeriSilicon/TIM-VX/compare/v1.1.37...v1.1.42

    Source code(tar.gz)
    Source code(zip)
    aarch64_A311D_6.4.10.2.md5sum.txt(61 bytes)
    aarch64_A311D_6.4.10.2.tgz(13.11 MB)
    aarch64_S905D3_6.4.10.2.md5sum.txt(62 bytes)
    aarch64_S905D3_6.4.10.2.tgz(13.12 MB)
  • v1.1.34.fix(Oct 8, 2021)

  • v1.1.32(Jul 13, 2021)

    Update to v1.1.32

    • Add new layer support: Moments, Matmul, SpatialTransformer
    • 100+ Unit Test cases
    • Add multi-thread and benchmark model examples
    • Bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v1.1.30.3(Jun 8, 2021)

    Add support for layout inference Add various unit_test for CI Add new OP support for

    • GroupedConv2d
    • ScatterND
    • Unstack
    • Linear
    • UnMaxpool2d
    • MaxpoolWithArgmax
    • LogSoftmax
    • Resize1d
    • FloorDiv
    • DeConv1d
    • Conv1d
    Source code(tar.gz)
    Source code(zip)
  • v1.1.30.2(Apr 6, 2021)

    Add support for S905D3 SoC (aka VIM3L) Add support for Mish, SoftRelu and HardSigmoid activation Layers Add support for Select Layer Fix a bug in Multiply Layer

    Source code(tar.gz)
    Source code(zip)
  • v1.1.30(Feb 26, 2021)

    02/2021 Update

    • Add support for Deconv2d
    • Add support for NBG (Network Binary Graph)
    • Fix Average Pooling implementation in TIM
    • Various Internal Op update and bug fixes
    Source code(tar.gz)
    Source code(zip)
Owner
VeriSilicon, INC.
A leading Silicon Platform as a Service company
VeriSilicon, INC.
Metal-cpp is a low-overhead C++ interface for Metal that helps developers add Metal functionality to graphics apps, games, and game engines that are written in C++.

About metal-cpp is a low overhead and header only C++ interface for Metal that helps developers add Metal functionality to graphics applications that

Бранимир Караџић 164 Dec 31, 2022
NVRHI (NVIDIA Rendering Hardware Interface) is a library that implements a common abstraction layer over multiple graphics APIs

NVRHI Introduction NVRHI (NVIDIA Rendering Hardware Interface) is a library that implements a common abstraction layer over multiple graphics APIs (GA

NVIDIA GameWorks 445 Jan 3, 2023
A modern, feature-rich single header C++ interface system for GLFW

A modern, feature-rich single header C++ interface system for GLFW

Vortex 3 Dec 27, 2021
Polyscope is a C++/Python viewer and user interface for 3D data such as meshes and point clouds

Polyscope is a C++/Python viewer and user interface for 3D data such as meshes and point clouds. It allows you to register your data and quickly generate informative and beautiful visualizations, either programmatically or via a dynamic GUI.

Nicholas Sharp 1.3k Dec 30, 2022
Dear ImGui is a bloat-free graphical user interface library for C++

dear imgui (This library is available under a free and permissive license, but needs financial support to sustain its continued improvements. In addit

Douglas McCloskey 6 Oct 27, 2020
This is a fast module to probing an area in a 2d plane for physic objects

Godot AreaProber 2D Checking for neighbour colliders made easy AreaProber allows you to probe for colliders anywhere in your 2D game's world, no need

Strauji 8 Feb 14, 2022
kaun is a replacement for löve's built-in love.graphics module intended for 3D graphics

kaun kaun is a replacement for löve's built-in love.graphics module intended for 3D graphics. It is a Lua module you can require from a shared library

Joel Schumacher 4 Apr 5, 2021
This module is a simple, lightweight and flexible way to generate QR codes in Godot

QRCodeTexture Godot Module Summary This module is a simple, lightweight and flexible way to generate QR codes in Godot. It provides a new type of text

Ben Armstrong 18 Oct 20, 2022
Xerus - A general purpose library for numerical calculations with higher order tensors, Tensor-Train Decompositions / Matrix Product States and other Tensor Networks

About The xerus library is a general purpose library for numerical calculations with higher order tensors, Tensor-Train Decompositions / Matrix Produc

null 18 Apr 20, 2021
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Dec 30, 2022
Demagnetization tensor of non-equidistant magnetic layers

Demagnetization tensor of non-equidistant magnetic layers A small standalone project calculating the demagnetization tensor from [1] in multi-threaded

magnum.af 1 Dec 3, 2021
GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units

GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units Welcome to the repository of ESCAL @ UCR's GPTPU project! We aim at demonstrating t

Extreme Storage and Computer Architecture Lab 34 Dec 23, 2022
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.

PSTensor : Custimized a Tensor Data Structure Compatible with PyTorch and TensorFlow. You may need this software in the following cases. Manage memory

Jiarui Fang 8 Feb 12, 2022
Code accompanying our SIGGRAPH 2021 Technical Communications paper "Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments"

SIGGRAPH ASIA 2021 Technical Communications Transition Motion Tensor: A Data-Driven Framework for Versatile and Controllable Agents in Physically Simu

null 10 Apr 21, 2022
Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON.

Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON. Built on top of zax json parser, C++ structures having tensor members can also be JSON-serialized and deserialized, allowing one to save and load the state of a highly hierarchical object.

Tamas Levente Kis 2 Dec 15, 2022
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections PET is the first DNN framework that optimizes tens

PACMAN Group, Tsinghua University 88 Dec 29, 2022
Legion Low Level Rendering Interface provides a graphics API agnostic rendering interface with minimal CPU overhead and low level access to verbose GPU operations.

Legion-LLRI Legion-LLRI, or “Legion Low Level Rendering Interface” is a rendering API that aims to provide a graphics API agnostic approach to graphic

Rythe Interactive 25 Dec 6, 2022
This project shows how to interface Nokia 5110 LCD with Esp32 module to show current prices of any cryptocurrency like Bitcoin, Dogecoin, etc

ESP32 Cryptocurreny Ticker Introduction This project shows how to interface Nokia 5110 LCD with Esp32 module to show current prices of any cryptocurre

Aniket Katkar 20 Jun 16, 2022
AVR-based frequency counter module with I2C interface.

AVR-based Frequency Counter The AVR-based frequency counter is partly based on the project developed by Herbert Dingfelder with some extensions and mo

DoWiD 1 Feb 26, 2022
CLI11 is a command line parser for C++11 and beyond that provides a rich feature set with a simple and intuitive interface.

CLI11: Command line parser for C++11 What's new • Documentation • API Reference CLI11 is a command line parser for C++11 and beyond that provides a ri

null 2.4k Dec 30, 2022