Pure C ONNX runtime with zero dependancies for embedded devices

Overview

🤖 cONNXr C ONNX Runtime

macos-latest ubuntu-latest windows-latest

A onnx runtime written in pure C99 with zero dependencies focused on embedded devices. Run inference on your machine learning models no matter which framework you train it with and no matter the device that you use. This is the perfect way to go in old hardware that doesn't support fancy modern C or C++.

📗 Documentation

Documentation about the project, how to collaborate, architecture and much more. Available here

🎓 Introduction

This repo contains a pure C99 runtime to run inference on onnx models. You can train your model with you favourite framework (tensorflow, keras, sklearn) and once trained export it to a .onnx file, that will be used to run inference. This makes this library totally framework agnostic, no matter how you train your model, this repo will run it using the common interface that onnx provides. This runtime was thought for embedded devices, that might not be able to compile newer cpp versions. No GPUs nor HW accelerators, just pure non multi-thread C99 code, compatible with almost any embedded device. Dealing with old hardware? This might be also for you.

This project can be also useful if you are working with some bare metal hardware with dedicated accelerators. If this is the case, you might find useful to reuse the architecture and replace the specific operators by your own ones.

Note that this project is in a very early stage so its not even close to be production ready. Developers are needed so feel free to contact or contribute with a pull request. You can also have a look to the opened issues if you want to contribute, specially the ones labeled for beginners. See contributing section.

🖥 Out of the box examples

Some very well known models are supported out of the box, just compile the command line as follows and call it with two parameters (first the ONNX model, and second the input to run inference on). Note that the input has to be a .pb file. If you have your own model and its not working, its probably because its using an operator that we haven't implemented yet, so feel free to open an issue and we will happy to help.

make all

CUnit

You may need to install cunit to be able to build the test suite.

I.e. for ubuntu you may execute

sudo apt-get install libcunit1 libcunit1-doc libcunit1-dev

MNIST

build/connxr test/mnist/model.onnx test/mnist/test_data_set_0/input_0.pb

tiny YOLO v2

build/connxr test/tiny_yolov2/Model.onnx test/tiny_yolov2/test_data_set_0/input_0.pb

super resolution

build/connxr test/super_resolution/super_resolution.onnx test/super_resolution/test_data_set_0/input_0.pb

mobilenet v2

build/connxr test/mobilenetv2-1.0/mobilenetv2-1.0.onnx test/mobilenetv2-1.0/test_data_set_0/input_0.pb

TODO:

Example

If you want to use cONNXr as part of your code, you can either include all the files in your project and compile them, or perhaps link it as a static library, but this second option is not supported yet.

int main()
{
  /* Open your onnx model */
  Onnx__ModelProto *model = openOnnxFile("model.onnx");

  /* Create your input tensor or load a protocol buffer one */
  Onnx__TensorProto *inp0 = openTensorProtoFile("input0.pb");

  /* Set the input name */
  inp0set0->name = model->graph->input[0]->name;

  /* Create the array of inputs to the model */
  Onnx__TensorProto *inputs[] = { inp0set0 };

  /* Resolve all inputs and operators */
  resolve(model, inputs, 1);

  /* Run inference on your input */
  Onnx__TensorProto **output = inference(model, inputs, 1);

  /* Print the last output which is the model output */
  for (int i = 0; i < all_context[_populatedIdx].outputs[0]->n_float_data; i++){
      printf("n_float_data[%d] = %f\n", i, all_context[_populatedIdx].outputs[0]->float_data[i]);
  }
}

🏷 Related Projects

Other C/C++ related projects: onnxruntime, darknet, uTensor, nnom, ELL, plaidML, deepC, onnc

Limitations

  • Few basic operators are implemented, so a model that contains a not implemented operator will fail.
  • Each operator works with many data types (double, float, int16, int32). Only few of them are implemented.
  • The reference implementation is with float, so you might run into troubles with other types.
  • As a general note, this project is a proof of concept/prototype, so bear that in mind.

📌 Disclaimer

This project is not associated in any way with ONNX and it is not an official solution nor officially supported by ONNX, it is just an application build on top of the .onnx format that aims to help people that want to run inference in devices that are not supported by the official runtimes. Use at your own risk.

📗 License

MIT License

Comments
  • Operator generation from onnx

    Operator generation from onnx

    • onnx as submodule
      • [x] freezed on latest version
      • [x] wrote Makefile handling onnx build
    • generator script
      • [x] operator header
      • [x] operator doxygen documentation
      • [x] resolve function for operator type
      • [x] global operator stub
      • [x] resolve function for operator name
      • [x] stub for each type combination
      • [x] fallback to installed onnx instead of self-compiled
      • [x] operator subset
      • [x] domain subset
      • [x] version subset
      • [x] old operators
      • [x] ~type subsets~
        • removed, makes no sense to limit this in the generation step
      • [x] return enum
      • [x] domain specific header directory
      • [x] domain in operator name
      • [x] version in operator name
      • [x] domain specific implementation directory
      • [x] domain specific header filename
      • [x] domain specific implementation filename
      • [x] ~switch to replace weakref with hardcoded stub~
        • replaced alias with weak function symbol
      • [x] ~generate headerfiles which include whole domains~
        • mitigated with sets, no need to include whole domains

    Questions

    • how to manage generated files?
      • should be generated with main Makefile, implies building onnx
      • [x] simply add the generated files and update when onnx submodule is updated
        • my choice [alrevuelta: agree. Where do you plan to add the call to the python script?]
    • how to handle the different types?
      • [x] convert current stub into switch for type specific implementations and stub these type implementations [alrevuelta: My choice also. As a random idea, would be nice to have the possibility of not including all the types in the binary. I.e. if I want to run inference on a model that is using only the double implementation, maybe there is a way to "skip" the other types and have a smaller binary, which might be useful in some constrained devices. Just an idea though.]
        • allows partial implementations
        • stub can check if right types were supplied
        • how to resolve different types between inputs?
        • if possible, my choice
      • single implementation handles all types
    • should the operator return value be a custom enum? [alrevuelta: Will be scarify a lot of performance? Your suggestion seems fine to me (ENOSYS, ENOMEM) are important, but I don't really understand EDOM or ERANGE. If the onnx is correct we should never run into overflow issues, right?]
      • [x] yes, we have cases errno is not sufficient
      • no, we can do everything with errno
        • documents which return values are expected
    • should we respect onnx domains?
      • currently 3 existing domains: '', ai.onnx.ml', ai.onnx.training'
      • may produce name conflicts in the future
      • [x] rename operators to reflect domains
        • my choice [alrevuelta: Yep, totally agree. I would both rename and restructure dirs]
      • [x] restructure dirs to reflect domains
        • would allow to generate domain libraries, my choice
      • [x] rename files to reflect domains
        • not necessary if we split up dirs
    • should we integrate 'old' operators?
      • onnx redefined some operators
      • produces naming conflicts
      • what if our implementations become 'old'?
      • [x] rename operators to reflect version they were introduced
        • my choice [alrevuelta: Agree. Lets make this project as flexible and maintainable as possible]
    opened by nopeslide 33
  • Added elu and identity operators

    Added elu and identity operators

    Hi, I'm using a modified version of cONNXr for a project, and I needed elu and identity operators. So, I added them, and I thought you all might want those as well. I did not include the tests that are normally performed on operators, but it's a straight-forward implementation so I'm not worried about it being wrong. I'll add some tests if I have spare time this week.

    opened by mdhimes 18
  • Problem with generated resolvers (1/2)

    Problem with generated resolvers (1/2)

    There is a problem with the autogenerated resolvers (the ones that map a given operator with the function, i.e. argmax with argmax__float)

    Lets use resolve_operator__onnx__argmax__12 as an example. This function returns a given function depending on the type that is used (i.e. operator__onnx__argmax__12__T_tensor_float). The problem here is that if one the functions is not implemented, the compiler can't of course find the symbol and it gives an error.

    This was introduced in #22 and fixed by commenting the types that are not implemented, but should be fixed, since in most of the cases we won't implement all types (float, int,...) for a given operator.

    Can this be solved with weakrefs? So if the symbol is not found it automatically fallbacks to an empty operator stub?

    @nopeslide

    opened by alrevuelta 14
  • Crawl previous onnx tests

    Crawl previous onnx tests

    This PR introduces a new way of testing the operators, by relying both on onnx default test cases and also the possibility of easily writing custom test cases. So testing is divided into two:

    • Official onnx tests: These tests are the ones that onnx provides. Note that onnx does not keep track of opset versions, so in a given release there are only test cases for that release. If an operator changes, that test is overriden by the new one. This PR addresses that, by crawling all onnx releases and organising the testcases by domain and version. Note that there are ongoing conversations with onnx regarding this topic.

    • Custom tests: On the other hand, this PR introduces the possibility of writing custom tests in Python (as onnx does) and generating a model with a set of inputs and expected outputs. See examples in node_custom_case. To generate the expected output, onnx runtime is used, so just the input needs to be provided. With this, there is no excuse, a new test can be written in seconds, and its model can be generated by running a simple script.

    The tests are organised in two folders with the same structure:

    • node_official_data that contains the all official onnx tests, organized by domain and version. Note that this can be generated by get_official_tests.py script, which makes it easy to keep in synch if a new onnx version is released.
    • node_custom_data, that contains the custom tests written by us. These tests are generated using the Python files inside node_custom_case. The script that generates the models is generate_custom_tests.py.

    The file structure of both folders will look like: image

    opened by alrevuelta 13
  • Compile with -Werror

    Compile with -Werror

    This is just a quick "get to know cONNXr"-commit. No functional changes.

    Use CFLAGS when compiling connxr & runtest. Fix compiler warnings: unused variables and print format errors.

    opened by kraiskil 12
  • Add onnx as a python dependancy

    Add onnx as a python dependancy

    Add onnx as a python dependancy using the "requirements.txt" and include "pip install -r requirements.txt" in the Makefile. This ensures that everytime the Python generator script is called it is ran with the version that is specified in the requirements.txt file. No need to have external dependancies in the third_party folder and no need to compile external code. Less stuff to maintain.

    The --onnx option is removed from the Python script and the third_party dependancy is also removed. From now on the dependancy will be in the requirement file, currently set to 1.7.0.

    opened by alrevuelta 11
  • Overcomplicating stuff?

    Overcomplicating stuff?

    Related to the recently merged PR #10 and ongoing work in #11 @nopeslide

    I think we have to stop for a moment and reconsider some of the things that we are doing. Are they worth?

    Recap of what we've done

    • Both of us liked the idea of being able to access the inputs and attributes with inputs->X or attributes->kernel_shape. This is really convenient and since the values are preresolved, we don't waste time in searching for the tensors/attributes (don't really know if this wasted time is that relevant though).

    • To achieve the previous point we have to autogenerate a lot of code. All these operator specific contexts, all this new structures and stuff on top. I think it is starting to accumulate. Also, as we discussed we would need even more generated code to resolve the i/o/attributes, because we need some context specific information (see the discussion).

    • Based on this I think we need to reconsider the solution. The trade off is quite clear I would say. Friendly way of accessing the inputs and attributes with increasing complexity or less friendly way of accessing them but way simpler. I am a very pragmatical person, and I think the second option is better.

    My new approach

    • We already have a nice structure that we have neglected _Onnx__NodeProto. It contains all the information that we need for running a operator. Well, we don't have the TensorProto but maybe we can build something on top.

    We already have this:

    struct  _Onnx__NodeProto
    {
      ProtobufCMessage base;
      size_t n_input;
      char **input;
      size_t n_output;
      char **output;
      char *name;
      char *op_type;
      char *domain;
      size_t n_attribute;
      Onnx__AttributeProto **attribute;
      char *doc_string;
    };
    

    We can use it to build this:

    struct node_context
    {
        _Onnx_NodeProto      *onnx_node;  /* onnx node proto, as it is */
        _Onnx__TensorProto **inputs;         /* resolved inputs, matching the ones in nodeproto */
        _Onnx__TensorProto **outputs;      /* same for the outputs */
        operator_executer       resolved_operator;  /* resolved operator that runs on that node */
    }
    
    • So we can keep the initial idea of resolving the operators before running inference, so we already know which function to call for each node.

    • We will have to search among the inputs/output/attributes by name, but this is usually a rather low number (3-5). Don't think we will lose that much performance. Some operators are running convolutions which are O(n^4) at least which is really the bottleneck here.

    • We can use this node_context as a common interface for all the operators. Since there is no specific context for each operator, we don't have to cast anything. Way simpler.

    • I have the feeling that we are wrapping a wrapper that wraps a wrapper almost recursively, onion-like. We have lots of levels and repeated variables. I don't think its needed.

    Of course, would love to hear your insights.

    opened by alrevuelta 9
  • Add traces with different levels

    Add traces with different levels

    Currently there are 3 macros to trace information TRACE_LEVELXX (see trace.h). However they are not really used and there is a random mix between normal prints and this macros.

    Task:

    • Use the different TRACE_LEVELXX macros according to the relevancy of what is being traces. Level 0 for important information, level 1 for more detailed stuff and level 2 for very detailed traces.
    • Replace the existing prints.
    enhancement good first issue 
    opened by alrevuelta 8
  • Makefile: Link to math lib

    Makefile: Link to math lib

    Hi, I think the "-lm" should be added in order to use the math lib functions (sqrtf etc).

    ps. I only ran the "make build_cli", could not ran "make build" yet because I'm currently missing the cunit framework

    opened by ilou89 6
  • build fail on linux and mac

    build fail on linux and mac

    gcc -o build/src/trace.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/trace.c gcc -o build/src/utils.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/utils.c gcc -o build/src/test/test_utils.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/test/test_utils.c gcc -shared -o build/libconnxr.so -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb -std=c99 -Wall -g3 -gdwarf -O2 -fpic -L/home/linuxbrew/.linuxbrew/opt/[email protected]/lib -g -lcunit -lm find build/src/ -type f /usr/bin/ld: cannot find -lcunit collect2: error: ld returned 1 exit status Makefile:105: recipe for target 'build/sharedlib' failed make: *** [build/sharedlib] Error 1

    uname -a

    Linux faith 5.4.0-53-generic #59~18.04.1-Ubuntu SMP Wed Oct 21 12:14:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

    cat /etc/os-release NAME="Ubuntu" VERSION="18.04.5 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.5 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

    opened by iamfaith 5
  • generation finalisation

    generation finalisation

    • renamed check structure to info structure
    • added info struct to operator sets
    • added info&resolver for operator stub
    • added a dryrun option (run everything but do not touch the filesystem)
    • added a Template base class for code generation
    • refactored all code generation onto this new base class
    • info struct can be used by non-generated generic check functions
    • added these generic check functions
    • removed all check generation

    TODO

    • [x] #25
      • output constraints are ignored
    • [x] #21
      • removed list and made info optional
    opened by nopeslide 5
  • How to convert the model input to a .pb file?

    How to convert the model input to a .pb file?

    I have a reinforcement learning model in ONNX format. The input to the model in Python code is a NumPy array. For example, it could be np.zeros((1, observation_size)).astype(np.float32) where observation_size = 4. How can I convert this input to a .pb file? Next, I want to run: 'build/connxr my_model.onnx my_input_0.pb' Thank you.

    opened by AnatoliyZabrovskiy 0
  • example use other input

    example use other input

    In cONNXr/examples/example1/example.c Onnx__TensorProto *inp0set0 = openTensorProtoFile("../test/mnist/test_data_set_0/input_0.pb"); Onnx__TensorProto *out0set0 = openTensorProtoFile("../test/mnist/test_data_set_0/output_0.pb"); For test, this can be save image with .pb for read.

    I want to port connx on MCU, chip only get data_buf[high][wide][channel], if use connxr, I have to convert it to pb. Modifying the data interface is very painful! Like model read, I like [ xxd -i xxx.onnx ] for model in file .c/.h, which can be read directly, its great.

    How to put data[][][] directly into Onnx__TensorProto? or other Interface for model input?

    opened by lzlwakeup 1
  • Add operator can be write in macro without type

    Add operator can be write in macro without type

    #define tensorAdd(type,o_C,i_A,i_B)
    do{
    if(!tensorCheckBroadcasting(i_A,i_B)){
    TRACE_LEVEL0("invalid broadcasting");
    exit(EXIT_FAILURE);
    }else{
    int subscript = malloc(o_C->n_dimssizeof(int));
    for(int i=0; i<o_C->n_##type##_data; i++){
    tensorIdxToSubscript(o_C, subscript, i);
    o_C->type##_data[i] = i_A->type##_data[tensorSubscriptToIdx(i_A,subscript)]
    + i_B->type##_data[tensorSubscriptToIdx(i_B,subscript)];
    }
    free(subscript);
    }
    }while(0)

    opened by ChenHuaYou 1
  • modify src/inference.c file target to resolve once and not depend on inputs

    modify src/inference.c file target to resolve once and not depend on inputs

    void resolve(Onnx__ModelProto model) { TRACE_ENTRY(1); / Resolving operators and input/outputs. Has to be moved outside of infeference */

    TRACE_FATAL(0, model->graph->n_node > MAX_NUM_OF_NODES, "The number of nodes of the model is greater than the hardcoded one");
    model->graph->inputs = malloc(sizeof(Onnx__TensorProto **) * model->graph->n_input);
    
    for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++){
        //printf("node: %s\n",NODE[nodeIdx]->name);
        // Allocate memory for future outputs and set the name
        model->graph->node[nodeIdx]->outputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx]->n_output);
        model->graph->node[nodeIdx]->inputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx]->n_input);
        for (int i = 0; i < model->graph->node[nodeIdx]->n_output; i++){
            //printf("output: %s\n",NODE[nodeIdx]->output[i]);
            model->graph->node[nodeIdx]->outputs[i] = malloc(sizeof(Onnx__TensorProto));
            init_tensor_proto(model->graph->node[nodeIdx]->outputs[i]);
            model->graph->node[nodeIdx]->outputs[i]->name = strdup(model->graph->node[nodeIdx]->output[i]);
            bool fuck = true;
            // match from model->graph->output
            for(int j=0; j<model->graph->n_output; j++){
                //printf("grap_output: %s\n", model->graph->output[j]->name);
                if(!strcmp(model->graph->output[j]->name,model->graph->node[nodeIdx]->outputs[i]->name)){
                    fuck = false;
                    model->graph->node[nodeIdx]->outputs[i]->n_dims = model->graph->output[j]->type->tensor_type->shape->n_dim;
                    model->graph->node[nodeIdx]->outputs[i]->dims = malloc(sizeof(int64_t *)*model->graph->node[nodeIdx]->outputs[i]->n_dims);
                    for(int k=0; k<model->graph->node[nodeIdx]->outputs[i]->n_dims; k++){
                        model->graph->node[nodeIdx]->outputs[i]->dims[k] = model->graph->output[j]->type->tensor_type->shape->dim[k]->dim_value;
                        model->graph->node[nodeIdx]->outputs[i]->data_type = model->graph->output[j]->type->tensor_type->elem_type;
                    }
                }
            }
            // match from model->graph->value_info
            for(int j=0; j<model->graph->n_value_info; j++){
                //printf("valueinfo: %s\n", model->graph->value_info[j]->name);
                if(!strcmp(model->graph->value_info[j]->name,model->graph->node[nodeIdx]->outputs[i]->name)){
                    fuck = false;
                    model->graph->node[nodeIdx]->outputs[i]->n_dims = model->graph->value_info[j]->type->tensor_type->shape->n_dim;
                    model->graph->node[nodeIdx]->outputs[i]->dims = malloc(sizeof(int64_t *)*model->graph->node[nodeIdx]->outputs[i]->n_dims);
                    for(int k=0; k<model->graph->node[nodeIdx]->outputs[i]->n_dims; k++){
                        model->graph->node[nodeIdx]->outputs[i]->dims[k] = model->graph->value_info[j]->type->tensor_type->shape->dim[k]->dim_value;
                        model->graph->node[nodeIdx]->outputs[i]->data_type = model->graph->value_info[j]->type->tensor_type->elem_type;
                    }
                }
            }
    
            // TODO This is unset at this point but set afterward inside each
            // function. However there is a problem because some node output
            // is some node else input. Hence if the type is unset it can't
            // be resolved. Hardcoded to FLOAT but this is a HUGE TODO
            //model->graph->node[nodeIdx]->outputs[i]->data_type = 1;
        }
    
        // connectNodes
        for (int i = 0; i < model->graph->node[nodeIdx]->n_input; i++)
        {
            connectNodes(model, nodeIdx, i);
            if (model->graph->node[nodeIdx]->inputs[i] && model->graph->node[nodeIdx]->inputs[i]->has_raw_data){
                /* If the tensor has raw data, deserialize it */
                TRACE(1, true, "input %s has raw data", model->graph->node[nodeIdx]->input[i]);
                // TODO: Not tested. Crashing but currently not needed
                convertRawDataOfTensorProto(model->graph->node[nodeIdx]->inputs[i]);
            }
        }
    
        /*** Prototyping ***/
        // Check model->opset_import->has_version must be True
        // More than 1 opset can be imported. Iterate n_opset_import
        // model->opset_import[0]->version
        // TODO Hackish temporal solution. Use opset 12.
        size_t version = 12;
        operator_preparer prepare = operator_set_find_preparer(model->graph->node[nodeIdx]->op_type, version);
        TRACE_FATAL(0, !prepare, "No prepare function could be found for operator '%s' version '%zu'", model->graph->node[nodeIdx]->op_type, version);
        prepare(model->graph->node[nodeIdx]);
        //printf("prepare\n");
        checkNode(model->graph->node[nodeIdx]);
    }
    TRACE_EXIT(1);
    

    }

    Onnx__TensorProto** inference(Onnx__ModelProto *model, Onnx__TensorProto **inputs) { if(!model->resolved){ resolve(model); } int n_bind = 0; for(int i=0; igraph->n_input; i++){ for(int j=0; inputs[j]; j++){ printf("compare input %s <=> %s \n", model->graph->input[i]->name, inputs[j]->name); if(!strcmp(model->graph->input[i]->name,inputs[j]->name)){ *model->graph->inputs[i] = inputs[j]; n_bind ++; } } } TRACE_ENTRY(1); TRACE(1, true, "The graph has nodes=%zu", model->graph->n_node);

    /* Run inference */
    for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++)
    {
        TRACE(0, true, "Running node %d, operator=%s", nodeIdx, model->graph->node[nodeIdx]->op_type);
        model->graph->node[nodeIdx]->executer(model->graph->node[nodeIdx]);
    }
    
    // TODO
    TRACE_EXIT(1);
    //freeContext(all_context, model);
    return model->graph->node[model->graph->n_node-1]->outputs;
    

    }

    opened by ChenHuaYou 1
  • using macro function to reduce  execute_operator_***.c file to single c source file

    using macro function to reduce execute_operator_***.c file to single c source file

    using macro function to reduce execute_operator_***.c file to single c source file, because they just different data type and version, almost the same algorithmn.

    opened by ChenHuaYou 1
  • src/inference.c line 29

    src/inference.c line 29

    src/inference.c line 29 should be: all_context[nodeIdx].inputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx] ->n_input);

    opened by ChenHuaYou 1
Owner
Alvaro
“If you want to find the secrets of the universe, think in terms of energy, frequency and vibration.” - Nikola Tesla
Alvaro
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more.

Microsoft 7.8k Nov 30, 2022
Examples for using ONNX Runtime for machine learning inferencing.

Examples for using ONNX Runtime for machine learning inferencing.

Microsoft 371 Nov 24, 2022
YOLO v5 ONNX Runtime C++ inference code.

yolov5-onnxruntime C++ YOLO v5 ONNX Runtime inference code for object detection. Dependecies: OpenCV 4.5+ ONNXRuntime 1.7+ OS: Windows 10 or Ubuntu 20

null 94 Nov 19, 2022
yolov5 onnx caffe

环境配置 ubuntu:18.04 cuda:10.0 cudnn:7.6.5 caffe: 1.0 OpenCV:3.4.2 Anaconda3:5.2.0 相关的安装包我已经放到百度云盘,可以从如下链接下载: https://pan.baidu.com/s/17bjiU4H5O36psGrHlF

null 58 Nov 23, 2022
Support Yolov4/Yolov3/Centernet/Classify/Unet. use darknet/libtorch/pytorch to onnx to tensorrt

ONNX-TensorRT Yolov4/Yolov3/CenterNet/Classify/Unet Implementation Yolov4/Yolov3 centernet INTRODUCTION you have the trained model file from the darkn

null 170 Nov 29, 2022
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20.6k Dec 2, 2022
Real time eye tracking for embedded and mobile devices.

drishti Real time eye tracking for embedded and mobile devices in C++11. NEWS (2018/08/10) Native iOS, Android, and "desktop" variants of the real-tim

null 354 Nov 15, 2022
Zero-latency convolution on Bela platform

bela-zlc Zero-latency convolution on Bela platform | Report | Video | Overview Convolution has many applications in audio, such as equalization and ar

Christian J. Steinmetz 19 Jun 25, 2022
A GKR-based zero-knowledge proof protocol for CNN model inference.

zkCNN Introduction This is the implementation of this paper, which is a GKR-based zero-knowledge proof for CNN reference, containing some common CNN m

null 41 Nov 15, 2022
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Tencent 1.2k Nov 23, 2022
Native runtime package for MediaPipe.NET.

MediaPipe.NET.Runtime Native library package for MediaPipe.NET. This is the first half of the port of MediaPipeUnityPlugin, in order to use MediaPipe

Vignette 14 Oct 12, 2022
PaRSEC: the Parallel Runtime Scheduler and Execution Controller for micro-tasks on distributed heterogeneous systems.

PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.

null 15 Nov 23, 2022
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Coqui.ai 1.6k Nov 29, 2022
Control connected devices with the wave of a finger.

FUI (Finger User Interface) is a part of the TensorFlow Lite for Microcontroller Experiments, a collection of open source, interactive projects designed to demonstrate some fun ways to combine Arduino and TensorFlow Lite for Microcontrollers.

Google Creative Lab 42 Nov 21, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

PocketSphinx 5prealpha This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech r

null 3.2k Nov 28, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

The Fido Project 412 Sep 19, 2022
Low dependency(C++11 STL only), good portability, header-only, deep neural networks for embedded

LKYDeepNN LKYDeepNN 可訓練的深度類神經網路 (Deep Neural Network) 函式庫。 輕量,核心部份只依賴 C++11 標準函式庫,低相依性、好移植,方便在嵌入式系統上使用。 Class diagram 附有訓練視覺化 demo 程式 訓練視覺化程式以 OpenCV

Lin Kao-Yuan 44 Nov 7, 2022
Spying on Microcontrollers using Current Sensing and embedded TinyML models

Welcome to CurrentSense-TinyML CurrentSense-TinyML is all about detecting microcontroller behaviour with current sensing and TinyML. Basically we are

Santander Security Research 71 Sep 17, 2022
VNOpenAI 28 Nov 30, 2022