Verisilicon Tensor Interface Module for OpenVX

Related tags

Graphics TIM-VX
Overview

TIM-VX - Tensor Interface Module for OpenVX

TIM-VX is a software integration module provided by VeriSilicon to facilitate deployment of Neural-Networks on OpenVX enabled ML accelerators. It serves as the backend binding for runtime frameworks such as Android NN, Tensorflow-Lite, MLIR, TVM and more.

Main Features

  • Over 130 internal operators with rich format support for both quantized and floating point
  • Simplified binding API calls to create Tensors and Operations
  • Dynamic graph construction and supports shape inferencing
  • Built-in custom layer extensions
  • A set of utility functions for debugging

Roadmap

Roadmap of TIM-VX will be updated here in the future.

Get started

Build and Run

TIM-VX uses bazel build system by default. Install bazel first to get started.

TIM-VX needs to be compiled and linked against VeriSilicon OpenVX SDK which provides related header files and pre-compiled libraries. A default linux-x86_64 SDK is provided which contains the simulation environment on PC. Platform specific SDKs can be obtained from respective SoC vendors.

To build TIM-VX

bazel build libtim-vx.so

To run sample LeNet

# set VIVANTE_SDK_DIR for runtime compilation environment
export VIVANTE_SDK_DIR=`pwd`/prebuilt-sdk/x86_64_linux

bazel build //samples/lenet:lenet_asymu8_cc
bazel run //samples/lenet:lenet_asymu8_cc

Get familiar with OpenVX spec

To development for TIM-VX, you first need to get familiar with OpenVX API and OpenVX NN Extension API. Please head over to Khronos to read the spec.

Issues
  • Multiple downstream outputs bug

    Multiple downstream outputs bug

    Sorry for making duplicates. Im not sure bug is in vx-delegate or in TIM-VX. https://github.com/VeriSilicon/tflite-vx-delegate/issues/32

    Hi, I think I found a bug in the vx-delegate runtime.

    setup: A311D + Android 9 + TensorFlow Lite with vx-delegate

    Model: detector, that outputs multiple things: BBoxes, Landmarks, Probability scores and features vectors.

    Problem: Landmarks outputs are garbage. How model works for landmarks:

    input image -> backbone -> FPN -> Conv layers that produces features (OUTPUT 1) -> Conv layers that produces landmarks (OUTPUT 2)

    so, if I have two outputs, that are downstream one after another, second outputs is not calculated and I get garbage output. The problem is only with INT8 graph, if I use FP32 graph, it works fine with vx-delegate.

    On x86 with standard TFLite (and xnnpack) everything works fine with both INT8 and FP32 graphs.

    update: Downstream is not important, even if in landmarks branch of graph I have only landmarks as outputs, I get garbage. I don't know why, but part of the graph with landmarks is not calculated on the NPU.

    What could be the problem? Thanks.

    opened by bkovalenkocomp 31
  • Is SpatialTransformer on rv1126 supported?

    Is SpatialTransformer on rv1126 supported?

    I try to run SpatialTransformer(mxnet) operator on rv1126 using Tengine, but the Status of "SpatialTransformer" operator in tim-vx is "InternalOnly", and no TIM-VX API implementation.So i can't find a way to add SpatialTransformer npu support to Tengine, i wondered that:

    1. Is SpatialTransformer(mxnet) operator on rv1126 npu supported?
    2. If it is supported, how can i add it to Tengine?
    opened by sky-fun 26
  • Segmentation fault

    Segmentation fault

    Hi, I have A311D board with ubuntu on board (from khadas). My end goal is to compile tflite with vx-delegate support. What is the best way to do this?

    Also Im trying to compile TIM-VX (libtim-vx.so) it seems Bazel building is broken, so I try CMake. CMake works fine: compiles and links all targets, but when I try to run unit tests (under gdb), I get segfault from *** _LoadStates() from libGAL.so what causes this?

    opened by bkovalenkocomp 17
  • TVM RPC test failed with message:

    TVM RPC test failed with message: "PLS isn't existed"

    I don't know if it's the right place to ask questions about your TVM fork, but I cannot raise issues in that repo.

    I followed the guide from README.VSI.md to build TVM (on host, using x86_64_linux simulation drivers provided here) and TVM runtime (on target, using vendor-provided VIP NPU drivers), and ran the tests in test_vsi_npu, but I got these results:

    logs from TVM C++ RPC tool:

    VsiNpuModule::LoadFromBinary
    LoadFromBinary: nbg size = 593344
    LoadFromBinary: input size = 1
    LoadFromBinary: output size = 1
    VsiNpuModule : DeSerializeTensorSpec
    VsiNpuModule : DeSerializeTensorSpec2
    VsiNpuModule : DeSerializeTensorSpec
    VsiNpuModule : DeSerializeTensorSpec2
    [22:31:35] /home/nullko/Documents/tvm-vsi_npu/apps/cpp_rpc/rpc_env.cc:130: Load module from /home/ubuntu/workspace/rpc/model.so ...
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: _lookup_linked_param
    VsiNpuModule::GetFunction: return early
    VsiNpuModule::GetFunction: tvmgen_default_vsi_npu_0
    [     1] PLS isn't existed
    E [compute_node:379]Create node[0] NBG fail
    Process Graph: 0 ms or 0 us
    

    It seemed that TVM is able to compile the NBG on the host, but the target runtime cannot execute it. I wonder what caused the "PLS isn't existed" issue, is it because I didn't set some environment variables on the target platform?

    Or maybe your tvm fork is still under development and is not ready to be used yet?

    opened by Goose-Bomb 14
  • Handle tensor double free error on x86_64 simulator driver

    Handle tensor double free error on x86_64 simulator driver

    I try to use the vsi_nn_AddTensorFromHandle to create a tensor using buffer allocated by cv::Mat, every time the program exits, there will be a double free error issued by OpenCV. It seems that the passed buffer is freed by the OpenVX driver (OpenVX driver is not supposed to free the buffer since it's a handle ) when the context is deinitialized, so when the OpenCV try to free the same buffer, it causes a double free error.

    I only encountered this problem with the x86_64 simulator driver, when the program runs on the target device using vendor-provided NPU driver, everything works well.

    Here is a short program to reproduce this error:

    #include <iostream>
    #include <opencv2/core.hpp>
    #include <vsi_nn_pub.h>
    
    int main(int argc, char* argv[]) {
        vsi_status err = VSI_SUCCESS;
    
        auto matIn = cv::Mat(4, 4, CV_32F);
        auto matOut = cv::Mat(4, 4, CV_32F);
    
        cv::randu(matIn, -1.0F, 1.0F);
        matOut.setTo(0.0F);
    
        auto context = vsi_nn_CreateContext();
        auto graph = vsi_nn_CreateGraph(context, 2, 1);
        vsi_nn_SetGraphInputs(graph, nullptr, 1);
        vsi_nn_SetGraphOutputs(graph, nullptr, 1);
    
        vsi_nn_tensor_attr_t attr = {};
        attr.dtype.fmt = VSI_NN_DIM_FMT_NCHW;
        attr.dim_num = 4;
        attr.size[0] = 4;
        attr.size[1] = 4;
        attr.size[2] = 1;
        attr.size[3] = 1;
        attr.dtype.vx_type = VSI_NN_TYPE_FLOAT32;
        attr.dtype.qnt_type = VSI_NN_QNT_TYPE_NONE;
        attr.is_const = 0;
        attr.vtl = 0;
    
        auto tensorIn = vsi_nn_AddTensorFromHandle(
            graph, VSI_NN_TENSOR_ID_AUTO, &attr, matIn.data);
        auto tensorOut = vsi_nn_AddTensorFromHandle(
            graph, VSI_NN_TENSOR_ID_AUTO, &attr, matOut.data);
    
        auto nodeReLU = vsi_nn_AddNode(graph, VSI_NN_OP_RELU, 1, 1, nullptr);
        nodeReLU->uid = 100;
        nodeReLU->input.tensors[0] = tensorIn;
        nodeReLU->output.tensors[0] = tensorOut;
    
        graph->input.tensors[0] = tensorIn;
        graph->output.tensors[0] = tensorOut;
    
        err = vsi_nn_SetupGraph(graph, vx_false_e);
        err = vsi_nn_VerifyGraph(graph);
        err = vsi_nn_rnn_RunGraph(graph);
    
        std::cout << "[Input]\n" << matIn << std::endl;
        std::cout << "[Output]\n" << matOut << std::endl;
    
        vsi_nn_ReleaseGraph(&graph);
        vsi_nn_ReleaseContext(&context);
    
        return err;
    }
    

    callstack:

    raise (raise:49)
    abort (abort:60)
    __libc_message (__libc_message:173)
    malloc_printerr (malloc_printerr:0)
    _int_free (_int_free:455)
    __libc_free (__libc_free:28)
    cv::StdMatAllocator::deallocate(cv::UMatData*) const (cv::StdMatAllocator::deallocate(cv::UMatData*) const:17)
    cv::Mat::~Mat() (cv::Mat::~Mat():26)
    main (/home/nullko/Documents/tim-vx/samples/ncc/test_handle_tensor.cpp:54)
    __libc_start_main (__libc_start_main:53)
    _start (_start:13)
    
    opened by Goose-Bomb 13
  • How to use NNAPI for NPU?

    How to use NNAPI for NPU?

    Hi, I've learned that it's possible to use Tensorflow Lite NNAPI on vim3 Android device,

    I have /system/lib/libneuralnetworks.so file in my Android 9 OS. How to make sure NPU is used? I benchmarked my model and it seems NPU is not used during TFLite 8bit inference, because speed is 10x slower and there is no difference between channel and tensor quantised models.

    also in dmesg after I run my bench:

    [ 4907.441064] type=1400 audit(1293888104.720:419): avc: denied { read } for pid=7157 
    comm="benchmar" path="/data/local/tmp/build/model/model.tflite" 
    dev="mmcblk0p20" ino=261748 scontext=u:r:hal_neuralnetworks_default:s0 tcontext=u:object_r:shell_data_file:s0 
    tclass=file permissive=1
    
    android 
    opened by bkovalenkocomp 12
  • Does TIM-VX utilizes NPU of vim3(A311D)?

    Does TIM-VX utilizes NPU of vim3(A311D)?

    Hello, I'm going to use TIM-VX to use the VIM3 development board. I have some questions because it's my first time dealing with applications related to ovxlib and npu. I'm going to use TIM-VX for research purposes. The goal of the study is utilizing NPU effectively. I wonder that,

    1. if the NPU of VIM3 can be used through TIM-VX,
    2. if it can be used, do all layers defined at TIM-VX can be run on NPU,
    3. Is there any method to check to know that I am utilizing NPU when run time,
    4. and how much cpu usage of tim-vx is. TIM-VX seems like a very interesting subject. Thank you for reading it!!
    opened by janoslim 12
  • Minimal tflite with vx-delegate example

    Minimal tflite with vx-delegate example

    Hi, could you provide minimal example, for tflite inference with vx-delegate applied? Im not sure how to create and apply vx-delegate to tflite model. Thanks!

    opened by bkovalenkocomp 11
  • Is there the way to deal with graph verification errors?

    Is there the way to deal with graph verification errors?

    Hello, I am using tim-vx for tensorflow lite delegate and tengine, And I encountered graph verification fail error. What can I try to do for verification error? what may I check for this error Thank you!

    opened by janoslim 11
  • feat(tensor): support external buffer when creating input/output tensors

    feat(tensor): support external buffer when creating input/output tensors

    Intent: Up to now TIM-VX allocates necessary tensor buffer on host memory, and vsi_nn_AddTensorFromHandle does accept a non-null data argument, so this PR enables a new usage of reusing data buffers from outsides for input and output tensors.

    This PR extends and replaces #297. And I've run a test using a yolov4-tiny-uint8 model (from tengine) on both x86_64_linux (sim) and aarch64 (hardware), and the test succeeded stably.

    API changes

    Add such public APIs:

    1. virtual bool Tensor::FlushCacheForHandle() = 0;
    2. virtual bool Tensor::InvalidateCacheForHandle() = 0;
    3. virtual void* Tensor::map(bool invalidate_cpu_cache = false) = 0;
    4. virtual void Tensor::unmap() = 0;
    5. virtual std::shared_ptr<Tensor> Graph::CreateIOTensor(const TensorSpec& spec, void* data = nullptr) = 0;

    And add:

    1. corresponding member functions in TensorImpl and GraphImpl classes
    2. TensorImpl::TensorImpl(Graph* graph, const TensorSpec& spec, void* data = nullptr);

    Also redefined TensorImpl::data_ from const void * to void *, which is to indicate this may work as a cache area and be updated by Tensor::RefillCacheFromHandle and Tensor::map(true).

    feature request 
    opened by gdh1995 9
  • [QST] custom op : Some question about kernel resource input data type

    [QST] custom op : Some question about kernel resource input data type

    Hi: I have some question about custom op kernel resource.

    1. Can I use __global float4 *inputA or __global float *inputA instead of __read_only image2d_t inputA, If so, how should I get the data from inputA (inputA[0]?)
    2. Whether the __read_only image3d_t inputA or __read_only image1d_t inputA data type supported?
    question 
    opened by zhnin 8
  • Import question about tvm_vsi version!!!

    Import question about tvm_vsi version!!!

    Hi, 1 Does tvm vsi_npu version support int8 full quant on the a311d NPU? 2 I mean input is just an onnx model, and then could we covert it with TVM , finally put the result model onto the a311d NPU. 3 Perlay or perchannnel quant supported now ? If not ,when it can ,tks ! 4 Is there any detailed document or weblink about how to run the cnn model with the TVM vsi_npu? BR

    opened by 2050airobert 0
  • roialign compute error

    roialign compute error

    I have a onnx case RoiAlign UT that can passed on onnxruntime. (The file is too big size, I put it on your FTP server path:to_verisilicon/shuliu/roialign.zip)

    I add a UT for RoiAlign based on the onnx case : (need put the roialign folder under host_build/src/tim) roi_align_test.cc

    /****************************************************************************
    *
    *    Copyright (c) 2022 Vivante Corporation
    *
    *    Permission is hereby granted, free of charge, to any person obtaining a
    *    copy of this software and associated documentation files (the "Software"),
    *    to deal in the Software without restriction, including without limitation
    *    the rights to use, copy, modify, merge, publish, distribute, sublicense,
    *    and/or sell copies of the Software, and to permit persons to whom the
    *    Software is furnished to do so, subject to the following conditions:
    *
    *    The above copyright notice and this permission notice shall be included in
    *    all copies or substantial portions of the Software.
    *
    *    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    *    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    *    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    *    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    *    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    *    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    *    DEALINGS IN THE SOFTWARE.
    *
    *****************************************************************************/
    #include "tim/vx/ops/roi_align.h"
    
    #include "gtest/gtest.h"
    #include "test_utils.h"
    #include "tim/vx/context.h"
    #include "tim/vx/graph.h"
    #include "tim/vx/types.h"
    #include <iostream>
    #include <fstream>
    #include <numeric>
    
    TEST(ROI_Align, shape_25_25_256_1_float32) {
        auto ctx = tim::vx::Context::Create();
        auto graph = ctx->CreateGraph();
    
        uint32_t height = 25;
        uint32_t width = 25;
        uint32_t channels = 256;
        uint32_t batch = 1;
        uint32_t num_rois = 1000;
        uint32_t depth = channels;
    
        int32_t out_height = 7;
        int32_t out_width = 7;
        float height_ratio = 0.3125f;
        float width_ratio = 0.3125f;
        int32_t height_sample_num = 2;
        int32_t width_sample_num = 2;
    
        tim::vx::ShapeType input_shape({width, height, channels, batch});  //whcn
        tim::vx::ShapeType regions_shape({4, num_rois});
        tim::vx::ShapeType batch_index_shape({num_rois});
        tim::vx::ShapeType output_shape(
            {(uint32_t)out_width, (uint32_t)out_height, depth, num_rois});
    
        tim::vx::TensorSpec input_spec(tim::vx::DataType::FLOAT32, input_shape,
                                        tim::vx::TensorAttribute::INPUT);
        tim::vx::TensorSpec regions_spec(tim::vx::DataType::FLOAT32, regions_shape,
                                        tim::vx::TensorAttribute::INPUT);
        tim::vx::TensorSpec batch_index_spec(tim::vx::DataType::INT32,
                                            batch_index_shape,
                                            tim::vx::TensorAttribute::INPUT);
        tim::vx::TensorSpec output_spec(tim::vx::DataType::FLOAT32, output_shape,
                                        tim::vx::TensorAttribute::OUTPUT);
    
        auto input_count = std::accumulate(input_shape.begin(),
                                                input_shape.end(), 1, std::multiplies<int64_t>());
    
        float *input_data = new float[input_count];
        std::ifstream in("roialign/test_data_set_0/input_0.bin", std::ios::in | std::ios::binary);
        if(in.is_open())
        {
            in.read((char *)input_data, input_count*sizeof(float));
            std::cout<<"input data:"<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<input_data[i]<<" ";
            }
            std::cout<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<input_data[input_count-10+i]<<" ";
            }
            std::cout<<std::endl;
            in.close();
        }
        else
        {
            std::cout<<"open input_0.bin fail!"<<std::endl;
        }
    
        auto regions_count = std::accumulate(regions_shape.begin(),
                                                regions_shape.end(), 1, std::multiplies<int64_t>());
        float *regions_data = new float[regions_count];
        std::ifstream in1("roialign/test_data_set_0/input_1.bin", std::ios::in | std::ios::binary);
        if(in1.is_open())
        {
            in1.read((char *)regions_data, regions_count*sizeof(float));
            std::cout<<"regions data:"<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<regions_data[i]<<" ";
            }
            std::cout<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<regions_data[regions_count-10+i]<<" ";
            }
            std::cout<<std::endl;
            in1.close();
        }
        else
        {
            std::cout<<"open input_1.bin fail!"<<std::endl;
        }
    
    
        auto batch_index_count = std::accumulate(batch_index_shape.begin(),
                                                batch_index_shape.end(), 1, std::multiplies<int64_t>());
        int64_t *batch_index_data = new int64_t[batch_index_count];
        std::ifstream in2("roialign/test_data_set_0/input_2.bin", std::ios::in | std::ios::binary);
        if(in2.is_open())
        {
            in2.read((char *)batch_index_data, batch_index_count*sizeof(int64_t));
            std::cout<<"batch_index data:"<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<batch_index_data[i]<<" ";
            }
            std::cout<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<batch_index_data[batch_index_count-10+i]<<" ";
            }
            std::cout<<std::endl;
            in2.close();
        }
        else
        {
            std::cout<<"open input_2.bin fail!"<<std::endl;
        }
    
    
        auto out_count = std::accumulate(output_shape.begin(),
                                            output_shape.end(), 1, std::multiplies<int64_t>());
        float *golden = new float[out_count];
        std::vector<float> golden_float(out_count);
        std::ifstream out("roialign/test_data_set_0/output_0.bin", std::ios::in | std::ios::binary);
        if(out.is_open())
        {
            out.read((char *)golden, out_count*sizeof(float));
            for(auto i=0; i<out_count; i++)
            {   
                golden_float[i] = golden[i];
            }
            std::cout<<"output data:"<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<golden[i]<<" ";
            }
            std::cout<<std::endl;
            for(int i=0; i<10; i++)
            {
                std::cout<<golden[out_count-10+i]<<" ";
            }
            std::cout<<std::endl;
            out.close();
        }
        else
        {
            std::cout<<"open output_0.bin fail!"<<std::endl;
        }
    
    
    
        auto input_tensor = graph->CreateTensor(input_spec);
        auto regions_tensor = graph->CreateTensor(regions_spec);
        auto batch_index_tensor =
            graph->CreateTensor(batch_index_spec);
        auto output_tensor = graph->CreateTensor(output_spec);
    
        auto roi_align = graph->CreateOperation<tim::vx::ops::ROI_Align>(
            out_height, out_width, height_ratio, width_ratio, height_sample_num,
            width_sample_num);
        (*roi_align)
            .BindInput(input_tensor)
            .BindInput(regions_tensor)
            .BindInput(batch_index_tensor)
            .BindOutput(output_tensor);
    
        EXPECT_TRUE(graph->Compile());
    
        input_tensor->CopyDataToTensor(input_data, sizeof(input_data)*sizeof(float));
        regions_tensor->CopyDataToTensor(regions_data, sizeof(regions_data)*sizeof(float));
        batch_index_tensor->CopyDataToTensor(batch_index_data, sizeof(batch_index_data)*sizeof(float));
    
        EXPECT_TRUE(graph->Run());
    
        std::vector<float> output(num_rois * out_height * out_width * depth);
        EXPECT_TRUE(output_tensor->CopyDataFromTensor(output.data()));
        EXPECT_EQ(golden_float, output);
    }
    
    

    After running, it is inconsistent with the expected result.

    Screen Shot 2022-07-15 at 4 00 14 PM
    opened by MESeraph 4
  • Imporved as RKNN supporting PT model to timvx?

    Imporved as RKNN supporting PT model to timvx?

    hi, 1 As we see, there is some framework or tools that could transform the pytroch model to NPU model just like RKNN. Could we or anyone else could support the tools or code to complete this converting? 2 Quantinatization is a boring work, yet qat is so usefull for us when working on the a311d NPU. So is there any method to directly support the pytorch QAT (quantization aware training) ? 3 Have we got the statistics data with all the cnn framework available on the a311d NPU,which one is the efficientest? And how fast are they , and what is the best utilization ratioof information of them? 4 Could we support the arbitrary subgraph spliting? Which one should we choose for graph spliting and layer/graph level accelerating ? BR

    BR

    opened by 2050airobert 1
  • Useless TVM with a311d NPU?

    Useless TVM with a311d NPU?

    hello, 1 Is there any example about how to using TVM with TIM-VX a311d NPU? 2 Could the link “https://github.com/VeriSilicon/tvm” accelerate the speed of executing the TIM-VX model or anything usefull with a311d NPU? 3 Is there any example to show the power or usefullness of the VeriSilicon/tvm project with a311d NPU ?

    BR

    opened by 2050airobert 1
Releases(v1.1.50)
  • v1.1.50(Jul 25, 2022)

    What's Changed

    • Added param "step" for slice & added unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/352
    • Fixed compiler fail for elu by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/358
    • update ovxlib virtual_device patch by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/357
    • Supported specifying alpha and beta by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/356
    • Fixed layout inference bug for stride_slice by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/329
    • refine tim_internal.cmake for ovxlib vip by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/360
    • Added unit test for maxpool by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/361
    • Suported specifying CRD_mode & DCR_mode in depthtospace by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/362
    • Support specifying pad_mode in pad by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/355
    • add BroadcastInDim to internal expand_broadcast op by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/364
    • Added selu & celu & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/366
    • Add Broadcast op by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/365
    • Update operator support plan by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/367
    • Fixed pad layout inference bug by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/370
    • CI enhancement - enable benchmark_model and samples by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/372
    • rename CopyTensorToData to CopyDataFromTensor to align name of tim::v… by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/373
    • add macro VSI_EXPAND_BROADCAST_ENABLE_DIMENSIONS for ovxlib compatibi… by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/374
    • add test demo for multi_device by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/371
    • Fix ci crash by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/380
    • fix bug of param num in custom op by @zhengzhouheng in https://github.com/VeriSilicon/TIM-VX/pull/385
    • Added topk & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/384
    • Added Ceil & unit test by @MESeraph in https://github.com/VeriSilicon/TIM-VX/pull/381
    • Fixed layout inference bug for stack by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/375
    • add macro VSI_EXPAND_BROADCAST_ENABLE_DIMENSIONS for unit test compat… by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/386
    • fix gather_element operation input num issue by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/388
    • Added gather_elements & unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/363
    • add GetElementNum/GetElementByteSize/GetByteSize for TensorSpec by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/393
    • Fixed no-output if transpose is last op and can be optimized by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/395
    • Fix build issue by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/397
    • feat(tensor): support external buffer when creating input/output tensors by @gdh1995 in https://github.com/VeriSilicon/TIM-VX/pull/389
    • Mapped roi_align & added unit test by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/402
    • modify GatherElements by @MESeraph in https://github.com/VeriSilicon/TIM-VX/pull/406
    • Added unidirectional lstm layout inference by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/392
    • Mapped roi_pool & added unit test by @MESeraph in https://github.com/VeriSilicon/TIM-VX/pull/404
    • Update tensorflow to v2.9.0 in ci by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/403
    • add reshape unit test by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/416
    • remove redefinition of TIM_VX_ENABLE_CUSTOM_OP by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/417
    • Added grouped conv2d layout inference by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/419
    • disabled two failed case by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/422
    • Enable SetRoundingPolicy by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/426
    • Disabled 3 failed case by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/428
    • Fixed transpose layout inference bug by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/430
    • Added batch dims in gather by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/435
    • Update internal for 22Q2 release by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/432

    New Contributors

    • @MESeraph made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/381

    Full Changelog: https://github.com/VeriSilicon/TIM-VX/compare/v1.1.42...v1.1.50

    Source code(tar.gz)
    Source code(zip)
  • v1.1.42(Apr 12, 2022)

    What's Changed

    • add alpha & beta parameters for HardSigmoid by @antkillerfarm in https://github.com/VeriSilicon/TIM-VX/pull/265
    • Support that op's all inputs in layout inference are constant by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/264
    • Disable fast mode of graph by @onepick in https://github.com/VeriSilicon/TIM-VX/pull/267
    • Update component diagram and README.md by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/269
    • Support NPU access large memory > 4G by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/280
    • Fix build error with gcc 6.2.0 by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/282
    • Enabled bulding with buildroot toolchain. by @SHagerGEL in https://github.com/VeriSilicon/TIM-VX/pull/281
    • [New API] Add compile_option support - relax_mode by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/285
    • fix compile error in g++5.4 by @yingshengBD in https://github.com/VeriSilicon/TIM-VX/pull/286
    • Install headers to place defined by CMAKE_INSTALL_INCLUDEDIR variable by @robert-kalmar in https://github.com/VeriSilicon/TIM-VX/pull/291
    • enable no bias in FC layout inference by @liyuenan2333 in https://github.com/VeriSilicon/TIM-VX/pull/294
    • Fixed pad bug for grouped_conv1d by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/292
    • Added unit test for stack by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/298
    • Relax tolerance for div_uint8 case by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/303
    • Update reshape to reshape2 by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/310
    • add custom base op and tests by @zhengzhouheng in https://github.com/VeriSilicon/TIM-VX/pull/315
    • Added unit test for batch2space and space2batch by @xuke537 in https://github.com/VeriSilicon/TIM-VX/pull/321
    • fix some comments of Mish and LRN layer by @gdh1995 in https://github.com/VeriSilicon/TIM-VX/pull/322
    • Add document for customized operator by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/323
    • Fix build warn/error with clang by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/326
    • Refine customized op support by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/327
    • Add ArgMax/ArgMin unit tests by @xuke537 in https://github.com/VeriSilicon/TIM-VX/pull/333
    • add cmake option of custom op support by @zhengzhouheng in https://github.com/VeriSilicon/TIM-VX/pull/335
    • OpenCV offical announcement with TIM-VX support by @sunshinemyson in https://github.com/VeriSilicon/TIM-VX/pull/341
    • Add layout inference & layout test for stack by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/337
    • support multi virtual devices by @lileiigithub in https://github.com/VeriSilicon/TIM-VX/pull/331
    • Support specifying alpha in elu by @chxin66 in https://github.com/VeriSilicon/TIM-VX/pull/354

    New Contributors

    • @SHagerGEL made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/281
    • @yingshengBD made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/286
    • @gdh1995 made their first contribution in https://github.com/VeriSilicon/TIM-VX/pull/322

    Full Changelog: https://github.com/VeriSilicon/TIM-VX/compare/v1.1.37...v1.1.42

    Source code(tar.gz)
    Source code(zip)
    aarch64_A311D_6.4.10.2.md5sum.txt(61 bytes)
    aarch64_A311D_6.4.10.2.tgz(13.11 MB)
    aarch64_S905D3_6.4.10.2.md5sum.txt(62 bytes)
    aarch64_S905D3_6.4.10.2.tgz(13.12 MB)
  • v1.1.34.fix(Oct 8, 2021)

  • v1.1.32(Jul 13, 2021)

    Update to v1.1.32

    • Add new layer support: Moments, Matmul, SpatialTransformer
    • 100+ Unit Test cases
    • Add multi-thread and benchmark model examples
    • Bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v1.1.30.3(Jun 8, 2021)

    Add support for layout inference Add various unit_test for CI Add new OP support for

    • GroupedConv2d
    • ScatterND
    • Unstack
    • Linear
    • UnMaxpool2d
    • MaxpoolWithArgmax
    • LogSoftmax
    • Resize1d
    • FloorDiv
    • DeConv1d
    • Conv1d
    Source code(tar.gz)
    Source code(zip)
  • v1.1.30.2(Apr 6, 2021)

    Add support for S905D3 SoC (aka VIM3L) Add support for Mish, SoftRelu and HardSigmoid activation Layers Add support for Select Layer Fix a bug in Multiply Layer

    Source code(tar.gz)
    Source code(zip)
  • v1.1.30(Feb 26, 2021)

    02/2021 Update

    • Add support for Deconv2d
    • Add support for NBG (Network Binary Graph)
    • Fix Average Pooling implementation in TIM
    • Various Internal Op update and bug fixes
    Source code(tar.gz)
    Source code(zip)
Owner
VeriSilicon, INC.
A leading Silicon Platform as a Service company
VeriSilicon, INC.
This is a fast module to probing an area in a 2d plane for physic objects

Godot AreaProber 2D Checking for neighbour colliders made easy AreaProber allows you to probe for colliders anywhere in your 2D game's world, no need

Strauji 8 Feb 14, 2022
kaun is a replacement for löve's built-in love.graphics module intended for 3D graphics

kaun kaun is a replacement for löve's built-in love.graphics module intended for 3D graphics. It is a Lua module you can require from a shared library

Joel Schumacher 4 Apr 5, 2021
This module is a simple, lightweight and flexible way to generate QR codes in Godot

QRCodeTexture Godot Module Summary This module is a simple, lightweight and flexible way to generate QR codes in Godot. It provides a new type of text

Ben Armstrong 17 Jun 19, 2022
Metal-cpp is a low-overhead C++ interface for Metal that helps developers add Metal functionality to graphics apps, games, and game engines that are written in C++.

About metal-cpp is a low overhead and header only C++ interface for Metal that helps developers add Metal functionality to graphics applications that

Бранимир Караџић 145 Jul 31, 2022
NVRHI (NVIDIA Rendering Hardware Interface) is a library that implements a common abstraction layer over multiple graphics APIs

NVRHI Introduction NVRHI (NVIDIA Rendering Hardware Interface) is a library that implements a common abstraction layer over multiple graphics APIs (GA

NVIDIA GameWorks 390 Aug 3, 2022
A modern, feature-rich single header C++ interface system for GLFW

A modern, feature-rich single header C++ interface system for GLFW

Vortex 3 Dec 27, 2021
Polyscope is a C++/Python viewer and user interface for 3D data such as meshes and point clouds

Polyscope is a C++/Python viewer and user interface for 3D data such as meshes and point clouds. It allows you to register your data and quickly generate informative and beautiful visualizations, either programmatically or via a dynamic GUI.

Nicholas Sharp 1.1k Aug 2, 2022
Dear ImGui is a bloat-free graphical user interface library for C++

dear imgui (This library is available under a free and permissive license, but needs financial support to sustain its continued improvements. In addit

Douglas McCloskey 6 Oct 27, 2020
Xerus - A general purpose library for numerical calculations with higher order tensors, Tensor-Train Decompositions / Matrix Product States and other Tensor Networks

About The xerus library is a general purpose library for numerical calculations with higher order tensors, Tensor-Train Decompositions / Matrix Produc

null 18 Apr 20, 2021
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Jul 30, 2022
Demagnetization tensor of non-equidistant magnetic layers

Demagnetization tensor of non-equidistant magnetic layers A small standalone project calculating the demagnetization tensor from [1] in multi-threaded

magnum.af 1 Dec 3, 2021
GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units

GPTPU: General-Purpose Computing on (Edge) Tensor Processing Units Welcome to the repository of ESCAL @ UCR's GPTPU project! We aim at demonstrating t

Extreme Storage and Computer Architecture Lab 32 Jul 31, 2022
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.

PSTensor : Custimized a Tensor Data Structure Compatible with PyTorch and TensorFlow. You may need this software in the following cases. Manage memory

Jiarui Fang 8 Feb 12, 2022
Code accompanying our SIGGRAPH 2021 Technical Communications paper "Transition Motion Tensor: A Data-Driven Approach for Versatile and Controllable Agents in Physically Simulated Environments"

SIGGRAPH ASIA 2021 Technical Communications Transition Motion Tensor: A Data-Driven Framework for Versatile and Controllable Agents in Physically Simu

null 10 Apr 21, 2022
Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON.

Yet another tensor library in C++. It allows direct access to its underlying data buffer, and serializes in JSON. Built on top of zax json parser, C++ structures having tensor members can also be JSON-serialized and deserialized, allowing one to save and load the state of a highly hierarchical object.

Tamas Levente Kis 2 May 28, 2022
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections PET is the first DNN framework that optimizes tens

PACMAN Group, Tsinghua University 72 Jul 4, 2022
This project shows how to interface Nokia 5110 LCD with Esp32 module to show current prices of any cryptocurrency like Bitcoin, Dogecoin, etc

ESP32 Cryptocurreny Ticker Introduction This project shows how to interface Nokia 5110 LCD with Esp32 module to show current prices of any cryptocurre

Aniket Katkar 20 Jun 16, 2022
AVR-based frequency counter module with I2C interface.

AVR-based Frequency Counter The AVR-based frequency counter is partly based on the project developed by Herbert Dingfelder with some extensions and mo

DoWiD 1 Feb 26, 2022
Legion Low Level Rendering Interface provides a graphics API agnostic rendering interface with minimal CPU overhead and low level access to verbose GPU operations.

Legion-LLRI Legion-LLRI, or “Legion Low Level Rendering Interface” is a rendering API that aims to provide a graphics API agnostic approach to graphic

Rythe Interactive 25 Mar 8, 2022
C++ implementation of the Google logging module

Google Logging Library The Google Logging Library (glog) implements application-level logging. The library provides logging APIs based on C++-style st

Google 5.5k Aug 4, 2022
Locate the current executable and the current module/library on the file system

Where Am I? A drop-in two files library to locate the current executable and the current module on the file system. Supported platforms: Windows Linux

Gregory Pakosz 366 Aug 7, 2022
pongoOS module for playing animated GIFs

bad_apple module This is a simple pongoOS module for playing animated GIFs in pongoOS. The code does work but since it uses GIFs, compiled modules can

null 25 Apr 14, 2022
A fast base64 module for React Native

react-native-quick-base64 A native implementation of Base64 in C++ for React Native. 4x faster than base64-js on an iPhone 11 Pro.

Takuya Matsuyama 200 Aug 6, 2022
libsinsp, libscap, the kernel module driver, and the eBPF driver sources

falcosecurity/libs As per the OSS Libraries Contribution Plan, this repository has been chosen to be the new home for libsinsp, libscap, the kernel mo

Falco 108 Aug 5, 2022
Modify Android linker to provide loading module and hook function

fake-linker Chinese document click here Project description Modify Android linker to provide loading module and plt hook features.Please check the det

sanfengAndroid 194 Jul 19, 2022
[WIP] A Riru module tries to enable Magisk hide for isolated processes.

Riru-IsolatedMagiskHider Background Many applications now detect Magisk for security, Magisk provided "Magisk Hide" to prevent detection, but isolated

残页 509 Aug 5, 2022
This is a tool for software engineers to view,record and analyse data(sensor data and module data) In the process of software development.

![Contributors][Huang Jianyu] Statement 由于工具源码在网上公开,除使用部分开源项目代码外,其余代码均来自我个人,工具本身不包含公司的知识产权,所有与公司有关的内容均从软件包中移除,软件发布遵循Apache协议,任何人均可下载进行修改使用,如使用过程中出现任何问

HuangJianyu 34 May 5, 2022
Documenting the development of a simple first module.

Your First Module This guide will look at writing a complete module, with many common features in a reduced form. This includes the module initialisat

Open Multiplayer 16 Jun 3, 2021
Linux Kernel module-less implant (backdoor)

0 KOPYCAT - Linux Kernel module-less implant (backdoor) Usage $ make $ sudo insmod kopycat.ko insmod: ERROR: could not insert module kopycat.ko: Inapp

Ilya V. Matveychikov 47 Jul 20, 2022