MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Overview
MACE

License Build Status pipeline status doc build status

Documentation | FAQ | Release Notes | Roadmap | MACE Model Zoo | Demo | Join Us | 中文

Mobile AI Compute Engine (or MACE for short) is a deep learning inference framework optimized for mobile heterogeneous computing on Android, iOS, Linux and Windows devices. The design focuses on the following targets:

  • Performance
    • Runtime is optimized with NEON, OpenCL and Hexagon, and Winograd algorithm is introduced to speed up convolution operations. The initialization is also optimized to be faster.
  • Power consumption
    • Chip dependent power options like big.LITTLE scheduling, Adreno GPU hints are included as advanced APIs.
  • Responsiveness
    • UI responsiveness guarantee is sometimes obligatory when running a model. Mechanism like automatically breaking OpenCL kernel into small units is introduced to allow better preemption for the UI rendering task.
  • Memory usage and library footprint
    • Graph level memory allocation optimization and buffer reuse are supported. The core library tries to keep minimum external dependencies to keep the library footprint small.
  • Model protection
    • Model protection has been the highest priority since the beginning of the design. Various techniques are introduced like converting models to C++ code and literal obfuscations.
  • Platform coverage
    • Good coverage of recent Qualcomm, MediaTek, Pinecone and other ARM based chips. CPU runtime supports Android, iOS and Linux.
  • Rich model formats support

Getting Started

Performance

MACE Model Zoo contains several common neural networks and models which will be built daily against a list of mobile phones. The benchmark results can be found in the CI result page (choose the latest passed pipeline, click release step and you will see the benchmark results). To get the comparison results with other frameworks, you can take a look at MobileAIBench project.

Communication

  • GitHub issues: bug reports, usage issues, feature requests
  • Slack: mace-users.slack.com
  • QQ群: 756046893

Contributing

Any kind of contribution is welcome. For bug reports, feature requests, please just open an issue without any hesitation. For code contributions, it's strongly suggested to open an issue for discussion first. For more details, please refer to the contribution guide.

License

Apache License 2.0.

Acknowledgement

MACE depends on several open source projects located in the third_party directory. Particularly, we learned a lot from the following projects during the development:

Finally, we also thank the Qualcomm, Pinecone and MediaTek engineering teams for their help.

Join Us

We are hiring.

Issues
  • caffe to mace

    caffe to mace

    Before you open an issue, please make sure you have tried the following steps:

    1. Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
    2. Have you ever read the document for your usage?
    3. Check if your issue appears in HOW-TO-DEBUG or FAQ.
    4. The form below must be filled.

    System information

    • OS Platform and Distribution Ubuntu 16.04:
    • NDK version 16r:
    • GCC version(5.4.0):
    • MACE version (Use the command: git describe --long --tags):
    • Python version(2.7):
    • Bazel version (0.15.0):

    Model deploy file (*.yml)

    library_name: cpmCaffe
    target_abis: [armeabi-v7a]
    model_graph_format: file
    model_data_format: file
    models:
      openposecaffe: # model tag, which will be used in model loading and must be specific.
        platform: caffe
        # support local path, http:// and https://
        model_file_path: /media/long/data/android/PoseEstimationForMobile/caffemodel/new.prototxt 
        weight_file_path: /media/long/data/android/PoseEstimationForMobile/caffemodel/new.caffemodel
        # sha256_checksum of your model's graph and data files.
        # get the sha256_checksum: sha256sum path/to/your/file
        model_sha256_checksum: 9b335cbfb0acefe8d64237bda267180ec2fdfb24005386c99f01daa4bacf25e6
        weight_sha256_checksum: 6b382bcf12e75b51e635d166b909301c499debfefa4cb1ebf8f9a7c2ee1137cc
        # define your model's interface
        # if there multiple inputs or outputs, write like blow:
        subgraphs:
          - input_tensors:
              - data
            input_shapes:
              - 1,128,128,3
            output_tensors:
              - detection_out
            output_shapes:
              - 1,1,1,6
        runtime: cpu+gpu
        winograd: 0
    ````
    
    ### Describe the problem
    A clear and concise description of what the bug is.
    
    ### To Reproduce
    Steps to reproduce the problem:
    ```bash
    python tools/converter.py convert --config=/media/long/data/android/PoseEstimationForMobile/release/mace_ymls/cpmCaffe.yml 
    ```
    
    ### Error information / logs
    Please include the **full** log and/or traceback here.
    ```bash
    ![image](https://user-images.githubusercontent.com/30176962/56037277-77861e80-5d61-11e9-92e0-45fea5a972d1.png)
    
    picture ,check the website.
    ```
    
    ### Additional context
    Add any other context about the problem here, e.g., what you have modified about the code.
    
    opened by chenloveheimei 41
  • 在编译Android库的是时候是否多个模型同时编译

    在编译Android库的是时候是否多个模型同时编译

    我在Android Demo的根目录下的mobilenet.yml增加了inception-v3、mobilenet-v2、resnet-v2-50,我想一次性编译多个模型,是否是支持的?

    library_name: mobilenet
    target_abis: [armeabi-v7a]
    model_graph_format: code
    model_data_format: code
    models:
      mobilenet_v1:
        platform: tensorflow
        model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
        model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
        subgraphs:
          - input_tensors:
              - input
            input_shapes:
              - 1,224,224,3
            output_tensors:
              - MobilenetV1/Predictions/Reshape_1
            output_shapes:
              - 1,1001
        runtime: cpu+gpu
        limit_opencl_kernel_time: 0
        nnlib_graph_mode: 0
        obfuscate: 0
        winograd: 0
    
      mobilenet_v2:
        platform: tensorflow
        model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v2/mobilenet-v2-1.0.pb
        model_sha256_checksum: 369f9a5f38f3c15b4311c1c84c032ce868da9f371b5f78c13d3ea3c537389bb4
        subgraphs:
          - input_tensors:
              - input
            input_shapes:
              - 1,224,224,3
            output_tensors:
              - MobilenetV2/Predictions/Reshape_1
            output_shapes:
              - 1,1001
        runtime: cpu+gpu
        limit_opencl_kernel_time: 0
        nnlib_graph_mode: 0
        obfuscate: 0
        winograd: 0
    
      inception_v3:
        platform: tensorflow
        model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/inception-v3/inception-v3.pb
        model_sha256_checksum: 632bb664547c846e7b4e44fe6a2ff9e35df277eb6440b80d9b559302baab1b8d
        subgraphs:
          - input_tensors:
              - input
            input_shapes:
              - 1,299,299,3
            output_tensors:
              - InceptionV3/Predictions/Reshape_1
            output_shapes:
              - 1,1001
        runtime: cpu+gpu
        limit_opencl_kernel_time: 0
        nnlib_graph_mode: 0
        obfuscate: 1
        winograd: 0
    
      resnet_v2_50:
        platform: tensorflow
        model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/resnet-v2-50/resnet-v2-50.pb
        model_sha256_checksum: 713c1a5cbe23d8113f4c013107c8296210b24720b365799023e033c1aa4f9360
        subgraphs:
          - input_tensors:
              - input
            output_tensors:
              - resnet_v2_50/predictions/Reshape_1
            input_shapes:
              - 1,299,299,3
            output_shapes:
              - 1,1001
        runtime: cpu+gpu
        limit_opencl_kernel_time: 0
        nnlib_graph_mode: 0
        obfuscate: 0
        winograd: 0
    
    opened by yeyupiaoling 34
  • Run example on RK3399 with GPU Fall back to CPU

    Run example on RK3399 with GPU Fall back to CPU

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ununtu16

    • NDK version(e.g., 15c): 17

    • GCC version(if compiling for host, e.g., 5.4.0):

    • MACE version (Use the command: git describe --long --tags): master

    • Python version(2.7): 2.7

    • Bazel version (e.g., 0.13.0): 13.0

    Model deploy file (*.yml)

    ......
    # The name of library
    library_name: libunitest
    target_abis: [arm64-v8a]
    model_graph_format: file
    model_data_format: file
    models:
      teacher: # model tag, which will be used in model loading and must be specific.
        platform: tensorflow
        # path to your tensorflow model's pb file. Support local path, http:// and https://
        model_file_path: /home/administrator/Code/share/mace/new_mace/sphereface-sphere_64_bggr_cosface_box_0-975000_features.pb
        # sha256_checksum of your model's pb file.
        # use this command to get the sha256_checksum: sha256sum path/to/your/pb/file
        model_sha256_checksum: f3bdc80bba9160b3344fbaefafff80ff1fc72ca533116b62f780baf258668cb8
        # define your model's interface
        # if there multiple inputs or outputs, write like blow:
        subgraphs:
          - input_tensors:
              - input
        #     - input1
            input_shapes:
              - 1,64,64,4
        #     - 1,224,224,3
            output_tensors:
              - Resface/Bottleneck/BatchNorm/FusedBatchNorm
        #      - output1
            output_shapes:
              - 1,1,1,512
        #      - 1,1001
        # cpu, gpu or cpu+gpu
        runtime: gpu
        obfuscate: 1
        winograd: 4
    
    
    

    Describe the problem

    I am running mace performance test on RK3399 GPU python3 tools/converter.py convert --config=/home/administrator/Code/share/mace/new_mace/FA_models_3.yml

    python3 tools/converter.py run --config=/home/administrator/Code/share/mace/new_mace/FA_models_3.yml --round=10

    And it failed to run on GPU!!!???

    Error information / logs

    python3 tools/converter.py run --config=/home/administrator/Code/share/mace/new_mace/FA_models_3.yml --round=100

    • Build //mace/tools/validation:mace_run_static with ABI arm64-v8a WARNING: ignoring http_proxy in environment. WARNING: The major revision of the Android NDK referenced by android_ndk_repository rule 'androidndk' is 17. The major revisions supported by Bazel are [10, 11, 12, 13, 14, 15, 16]. Bazel will attempt to treat the NDK as if it was r16. This may cause compilation and linkage problems. Please download a supported NDK version. INFO: Analysed target //mace/tools/validation:mace_run_static (1 packages loaded). INFO: Found 1 target... Target //mace/tools/validation:mace_run_static up-to-date: bazel-bin/mace/tools/validation/mace_run_static INFO: Elapsed time: 1.407s, Critical Path: 0.03s INFO: 0 processes. INFO: Build completed successfully, 1 total action ('build', '//mace/tools/validation:mace_run_static', '--config', 'android', '--cpu=arm64-v8a', '--define', 'neon=true', '--define', 'openmp=false', '--define', 'opencl=true', '--define', 'quantize=false', '--define', 'hexagon=false', '--define', 'hta=false', '--define', 'apu=false', '--config', 'optimization', '--config', 'symbol_hidden') Build done!

          Run model teacher on Firefly-RK3399          
    

    Generate input file: build/libunitest/_tmp/teacher/3e2b51b8ce4724bea30161fb3c936745/Firefly-RK3399_rk3399/arm64-v8a/model_input_input Generate input file done.

    • Run 'teacher' with round=100, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,) Push build/libunitest/_tmp/teacher/3e2b51b8ce4724bea30161fb3c936745/Firefly-RK3399_rk3399/arm64-v8a/model_input_input to /data/local/tmp/mace_run Push build/libunitest/model/teacher.data to /data/local/tmp/mace_run Push build/libunitest/model/teacher.pb to /data/local/tmp/mace_run/teacher.pb Push build/libunitest/_tmp/arm64-v8a/mace_run_static to /data/local/tmp/mace_run Push /tmp/cmd_file-teacher-1568037556.1706104 to /data/local/tmp/mace_run/cmd_file-teacher-1568037556.1706104 I mace/tools/validation/mace_run.cc:462] model name: teacher I mace/tools/validation/mace_run.cc:463] mace version: v0.11.0-rc0-110-g03362fa I mace/tools/validation/mace_run.cc:464] input node: input I mace/tools/validation/mace_run.cc:465] input shape: 1,64,64,4 I mace/tools/validation/mace_run.cc:466] output node: Resface/Bottleneck/BatchNorm/FusedBatchNorm I mace/tools/validation/mace_run.cc:467] output shape: 1,1,1,512 I mace/tools/validation/mace_run.cc:468] input_file: /data/local/tmp/mace_run/model_input I mace/tools/validation/mace_run.cc:469] output_file: /data/local/tmp/mace_run/model_out I mace/tools/validation/mace_run.cc:470] model_data_file: /data/local/tmp/mace_run/teacher.data I mace/tools/validation/mace_run.cc:471] model_file: /data/local/tmp/mace_run/teacher.pb I mace/tools/validation/mace_run.cc:472] device: GPU I mace/tools/validation/mace_run.cc:473] round: 100 I mace/tools/validation/mace_run.cc:474] restart_round: 1 I mace/tools/validation/mace_run.cc:475] gpu_perf_hint: 3 I mace/tools/validation/mace_run.cc:476] gpu_priority_hint: 3 I mace/tools/validation/mace_run.cc:477] omp_num_threads: -1 I mace/tools/validation/mace_run.cc:478] cpu_affinity_policy: 1 I mace/libmace/mace.cc:430] Creating MaceEngine, MACE version: v0.11.0-rc0-110-g03362fa I mace/libmace/mace.cc:469] Initializing MaceEngine I mace/libmace/mace.cc:602] Destroying MaceEngine I mace/tools/validation/mace_run.cc:519] restart round 0 E ./mace/utils/tuner.h:170] Failed to read tuned param file: /data/local/tmp/mace_run/libunitest_tuned_opencl_parameter.Firefly-RK3399.rk3399.bin I mace/libmace/mace.cc:875] Create MaceEngine from model graph proto and weights data I mace/libmace/mace.cc:430] Creating MaceEngine, MACE version: v0.11.0-rc0-110-g03362fa E mace/core/kv_storage.cc:109] Failed to read kv store file: /data/local/tmp/mace_run/interior//mace_cl_compiled_program.bin W mace/core/runtime/opencl/opencl_runtime.cc:382] Load OpenCL cached compiled kernel file failed. Please make sure the storage directory exist and you have Write&Read permission I mace/libmace/mace.cc:469] Initializing MaceEngine I mace/core/net_def_adapter.cc:348] Op d019f091 fall back to CPU I mace/core/net_def_adapter.cc:348] Op 59ce7a42 fall back to CPU I mace/tools/validation/mace_run.cc:268] Create Mace Engine latency: 1891.87 ms I mace/tools/validation/mace_run.cc:275] Total init latency: 1892.24 ms acc in=65536 output_size =2048 I mace/tools/validation/mace_run.cc:326] Warm up run I mace/tools/validation/mace_run.cc:362] 1st warm up run latency: 2851.67 ms I mace/tools/validation/mace_run.cc:369] Run model I mace/tools/validation/mace_run.cc:420] Average latency: 785.491 ms ======================================================== capability(CPU) init warmup run_avg ======================================================== time 47.533 1892.243 2851.674 785.491 I mace/tools/validation/mace_run.cc:442] Write output file /data/local/tmp/mace_run/model_out_Resface_Bottleneck_BatchNorm_FusedBatchNorm with size 2048 done. I mace/libmace/mace.cc:602] Destroying MaceEngine Running finished!

    Elapse time: 1.647197 minutes.

    • Package libs for libunitest Start packaging 'libunitest' libs into build/libunitest/libmace_libunitest.tar.gz build/libunitest/model/ build/libunitest/model/teacher.pb build/libunitest/model/gpu/ build/libunitest/model/teacher.data Packaging Done!
    opened by dimitryn 25
  • WINOGRAD questions

    WINOGRAD questions

    Before you open an issue, please make sure you have tried the following steps:

    1. Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
    2. Have you ever read the document for your usage?
    3. Check if your issue appears in HOW-TO-DEBUG or FAQ.
    4. The form below must be filled.

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
    • NDK version(e.g., 15c): 18b
    • GCC version(if compiling for host, e.g., 5.4.0): 5.4.0
    • MACE version (Use the command: git describe --long --tags): 0.13.0
    • Python version(2.7): 3.6
    • Bazel version (e.g., 0.13.0): 0.16.0
    • CMake version: 3.16.0

    Model deploy file (*.yml)

    # The name of library
    library_name: FD
    target_abis: [arm64-v8a]
    model_graph_format: file
    model_data_format: file
    models:
      RF: # model tag, which will be used in model loading and must be specific.
        platform: caffe
        # path to your tensorflow model's pb file. Support local path, http:// and https://
        model_file_path: /models/model.prototxt
        weight_file_path: /models/model.caffemodel
        # sha256_checksum of your model's pb file.
        # use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file
        model_sha256_checksum: 98f9b69a085e7d8f40704ac6b2fedae0fda876fff4658509dde3d74d883a9684
        weight_sha256_checksum: a9f5d4dfe944315511c6070e8556790409ae0f0bd9005c5db66b4fdd5c38b716
        subgraphs:
          - input_tensors:
              - data
            input_shapes:
              - 1,3,112,112
            input_data_formats:
              - NCHW
            output_tensors:
              - fc1
            output_shapes:
              - 1,1,1,512
        obfuscate: 0
        runtime: gpu
        winograd: 4
    

    Describe the problem

    • On kirin710 device, winograd: 4 is slower than winograd: 2. Why does this happen?
    • kirin710 model output is different compared to the output obtained when running the same model on a different device (e.g. sdm845 or msmnile). Is WINOGRAD algorithm device-agnostic?

    To Reproduce

    Steps to reproduce the problem:

    1. cd /path/to/mace
    2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file
    2. python tools/converter.py run --validate --disable_tuning --config_file=/path/to/your/model_deployment_file
    

    Error information / logs

    Please include the full log and/or traceback here.

    Additional context

    Model to reproduce the issue can be found here

    opened by gasgallo 24
  • 请问怎么指定本地的交叉编译器(arm-linux)

    请问怎么指定本地的交叉编译器(arm-linux)

    Before you open an issue, please make sure you have tried the following steps:

    1. Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
    2. Have you ever read the document for your usage?
    3. The form below must be filled.

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    • NDK version(e.g., 15c):
    • GCC version(if compiling for host, e.g., 5.4.0):
    • MACE version (Use the command: git describe --long --tags):
    • Python version(2.7):
    • Bazel version (e.g., 0.13.0):

    Model deploy file (*.yml)

    ......
    

    Describe the problem

    A clear and concise description of what the bug is.

    To Reproduce

    Steps to reproduce the problem:

    1. cd /path/to/mace
    2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file
    

    Error information / logs

    Please include the full log and/or traceback here.

    LOGs
    

    Additional context

    Add any other context about the problem here, e.g., what you have modified about the code.

    opened by leeburt 24
  • [Question] Compiled model size increases 4 - 5 times comparing between v0.12.0 and v0.11.0-rc0

    [Question] Compiled model size increases 4 - 5 times comparing between v0.12.0 and v0.11.0-rc0

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu16.04 (MACE Docker image)
    • NDK version(e.g., 15c): 18b
    • GCC version(if compiling for host, e.g., 5.4.0): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
    • MACE version (Use the command: git describe --long --tags): 0.12.0 (commit: 90d4b860d573fae8bcdcd986884861e18eec18c9)
    • Python version(2.7): 3.6
    • Bazel version (e.g., 0.13.0): [0.16.0]
    • CMake version: 3.16.0

    Model deploy file (*.yml)

    ......
    

    Describe the problem

    I have a question about a size of a compiled model library. The size of a model library compiled with v0.12.0 increases significantly by 4x-5x. 2 Caffe and 3 Tensorflow models are converted into the compiled library. Here is the table showing the size of libraries compiled with v0.12.0 and v0.11.0-rc0.

    | Target ABI | Library size (v0.11.0-rc0) | Library size (v0.12.0) | |:-----------:|:--------------------------:|:----------------------:| | armeabi-v7a | 94M | 387M | | arm64-v8a | 96M | 504M |

    Is it a normal result in v0.12.0?

    PS. The compile libraries work fine.

    opened by mexeniz 22
  • WIP: Fix reshape GPU image for NCHW model format

    WIP: Fix reshape GPU image for NCHW model format

    Following issue https://github.com/XiaoMi/mace/issues/595 and specifically @lu229 suggestion https://github.com/XiaoMi/mace/issues/595#issuecomment-599338955, I've implemented a basic data layout transformation to get correct results from Reshape op on GPU when using a model with NCHW data format.

    Using the model and yml file from https://github.com/XiaoMi/mace/issues/595#issuecomment-593762744, now validation is successful.

    [email protected]:/mace# python tools/converter.py run --config /models/model/model_net3.yml --validate
    CMD> bazel build //mace/proto:mace_py
    WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
    Loading:
    Loading: 0 packages loaded
    Analyzing: target //mace/proto:mace_py (6 packages loaded)
    INFO: Analysed target //mace/proto:mace_py (17 packages loaded).
    INFO: Found 1 target...
    [0 / 7] [-----] BazelWorkspaceStatusAction stable-status.txt
    Target //mace/proto:mace_py up-to-date:
    bazel-genfiles/mace/proto/mace_pb2.py
    INFO: Elapsed time: 2.364s, Critical Path: 0.06s
    INFO: 0 processes.
    INFO: Build completed successfully, 1 total action
    INFO: Build completed successfully, 1 total action
    
    CMD> cp -f bazel-genfiles/mace/proto/mace_pb2.py tools/python/py_proto
    
    CMD> bazel build //third_party/caffe:caffe_py
    WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
    Loading:
    Loading: 0 packages loaded
    Analyzing: target //third_party/caffe:caffe_py (6 packages loaded)
    INFO: Analysed target //third_party/caffe:caffe_py (17 packages loaded).
    INFO: Found 1 target...
    [0 / 1] [-----] BazelWorkspaceStatusAction stable-status.txt
    Target //third_party/caffe:caffe_py up-to-date:
    bazel-genfiles/third_party/caffe/caffe_pb2.py
    INFO: Elapsed time: 2.337s, Critical Path: 0.05s
    INFO: 0 processes.
    INFO: Build completed successfully, 1 total action
    INFO: Build completed successfully, 1 total action
    
    CMD> cp -f bazel-genfiles/third_party/caffe/caffe_pb2.py tools/python/py_proto
    
    * Build //mace/tools:mace_run_static with ABI arm64-v8a
    WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
    WARNING: The major revision of the Android NDK referenced by android_ndk_repository rule 'androidndk' is 19. The major revisions supported by Bazel are [10, 11, 12, 13, 14, 15, 16]. Bazel will attempt to treat the NDK as if it was r16. This may cause compilation and linkage problems. Please download a supported NDK version.
    INFO: Analysed target //mace/tools:mace_run_static (32 packages loaded).
    INFO: Found 1 target...
    Target //mace/tools:mace_run_static up-to-date:
      bazel-bin/mace/tools/mace_run_static
    INFO: Elapsed time: 11.826s, Critical Path: 0.50s
    INFO: 0 processes.
    INFO: Build completed successfully, 1 total action
    ('build', '//mace/tools:mace_run_static', '--config', 'android', '--cpu=arm64-v8a', '--define', 'neon=true', '--define', 'openmp=false', '--define', 'opencl=true', '--define', 'quantize=false', '--define', 'hexagon=false', '--define', 'hta=false', '--define', 'apu=false', '--config', 'optimization', '--config', 'symbol_hidden')
    Build done!
    
    ***********************************************
              Run model model on MI9          
    ***********************************************
    
    Generate input file:  build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a/model_input_data
    Generate input file done.
    * Run 'model' with round=1, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,)
    Push build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a/model_input_data to /data/local/tmp/mace_run
    Push build/model/model/model.data to /data/local/tmp/mace_run
    Push build/model/model/model.pb to /data/local/tmp/mace_run/model.pb
    Push build/model/_tmp/arm64-v8a/mace_run_static to /data/local/tmp/mace_run
    Push /tmp/cmd_file-model-1584697819.6052978 to /data/local/tmp/mace_run/cmd_file-model-1584697819.6052978
    I mace/tools/mace_run.cc:527] model name: model
    I mace/tools/mace_run.cc:528] mace version: v0.12.0-0-ga610d50
    I mace/tools/mace_run.cc:529] input node: data
    I mace/tools/mace_run.cc:530] input shape: 1,3,160,160
    I mace/tools/mace_run.cc:531] output node: face_rpn_cls_prob_reshape_stride32,face_rpn_bbox_pred_stride32,face_rpn_landmark_pred_stride32,face_rpn_cls_prob_reshape_stride16,face_rpn_bbox_pred_stride16,face_rpn_landmark_pred_stride16,face_rpn_cls_prob_reshape_stride8,face_rpn_bbox_pred_stride8,face_rpn_landmark_pred_stride8
    I mace/tools/mace_run.cc:532] output shape: 1,4,5,5:1,8,5,5:1,20,5,5:1,4,10,10:1,8,10,10:1,20,10,10:1,4,20,20:1,8,20,20:1,20,20,20
    I mace/tools/mace_run.cc:533] input_file: /data/local/tmp/mace_run/model_input
    I mace/tools/mace_run.cc:534] output_file: /data/local/tmp/mace_run/model_out
    I mace/tools/mace_run.cc:535] input dir:
    I mace/tools/mace_run.cc:536] output dir:
    I mace/tools/mace_run.cc:537] model_data_file: /data/local/tmp/mace_run/model.data
    I mace/tools/mace_run.cc:538] model_file: /data/local/tmp/mace_run/model.pb
    I mace/tools/mace_run.cc:539] device: GPU
    I mace/tools/mace_run.cc:540] round: 1
    I mace/tools/mace_run.cc:541] restart_round: 1
    I mace/tools/mace_run.cc:542] gpu_perf_hint: 3
    I mace/tools/mace_run.cc:543] gpu_priority_hint: 3
    I mace/tools/mace_run.cc:544] omp_num_threads: -1
    I mace/tools/mace_run.cc:545] cpu_affinity_policy: 1
    I mace/tools/mace_run.cc:548] limit_opencl_kernel_time: 0
    I mace/tools/mace_run.cc:553] opencl_queue_window_size: 0
    I mace/libmace/mace.cc:464] Creating MaceEngine, MACE version: v0.12.0-0-ga610d50
    I mace/libmace/mace.cc:503] Initializing MaceEngine
    I mace/libmace/mace.cc:636] Destroying MaceEngine
    I mace/tools/mace_run.cc:596] restart round 0
    W ./mace/utils/tuner.h:201] Failed to read tuned param file: /data/local/tmp/mace_run/model_tuned_opencl_parameter.MI9.msmnile.bin
    I mace/libmace/mace.cc:911] Create MaceEngine from model graph proto and weights data
    I mace/libmace/mace.cc:464] Creating MaceEngine, MACE version: v0.12.0-0-ga610d50
    W mace/core/kv_storage.cc:109] Failed to read kv store file: /data/local/tmp/mace_run/interior//mace_cl_compiled_program.bin
    W mace/core/runtime/opencl/opencl_runtime.cc:382] Load OpenCL cached compiled kernel file failed. Please make sure the storage directory exist and you have Write&Read permission
    I mace/libmace/mace.cc:503] Initializing MaceEngine
    I mace/tools/mace_run.cc:269] Create Mace Engine latency: 884.83 ms
    I mace/tools/mace_run.cc:276] Total init latency: 884.993 ms
    I mace/tools/mace_run.cc:370] Warm up run
    I mace/tools/mace_run.cc:406] 1st warm up run latency: 1360.07 ms
    I mace/tools/mace_run.cc:414] Run model
    I mace/tools/mace_run.cc:476] Average latency: 11.777 ms
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride32 with size 400 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride32 with size 800 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride32 with size 2000 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride16 with size 1600 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride16 with size 3200 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride16 with size 8000 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride8 with size 6400 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride8 with size 12800 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride8 with size 32000 done.
    ========================================================
         capability(CPU)        init      warmup     run_avg
    ========================================================
    time          19.788     884.993    1360.074      11.777
    I mace/libmace/mace.cc:636] Destroying MaceEngine
    Running finished!
    
    * Validate with caffe
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride32 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride32 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride32 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride16 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride16 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride16 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride8 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride8 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride8 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/MI9_msmnile/arm64-v8a
    face_rpn_cls_prob_reshape_stride32 MACE VS CAFFE similarity: 0.9999999920146434 , sqnr: 12416689.129010014 , pixel_accuracy: 0.8
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_bbox_pred_stride32 MACE VS CAFFE similarity: 0.9999543587571056 , sqnr: 10949.429334335218 , pixel_accuracy: 1.0
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_landmark_pred_stride32 MACE VS CAFFE similarity: 0.9999657541037306 , sqnr: 13611.449222524909 , pixel_accuracy: 1.0
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_cls_prob_reshape_stride16 MACE VS CAFFE similarity: 0.9999999900582555 , sqnr: 13540919.254218182 , pixel_accuracy: 0.775
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_bbox_pred_stride16 MACE VS CAFFE similarity: 0.9999490543979106 , sqnr: 9740.480377276486 , pixel_accuracy: 0.9625
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_landmark_pred_stride16 MACE VS CAFFE similarity: 0.9999721485332628 , sqnr: 17715.55990542613 , pixel_accuracy: 0.975
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_cls_prob_reshape_stride8 MACE VS CAFFE similarity: 0.9999999887438001 , sqnr: 13026576.736919338 , pixel_accuracy: 0.75
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_bbox_pred_stride8 MACE VS CAFFE similarity: 0.9998798908366177 , sqnr: 3858.796852940529 , pixel_accuracy: 0.95625
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_landmark_pred_stride8 MACE VS CAFFE similarity: 0.9999325048718496 , sqnr: 7404.160247931779 , pixel_accuracy: 0.97
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    Validation done!
    
    Dana service is not available.
    Elapse time: 0.439396 minutes.
    * Build //mace/tools:mace_run_static with ABI arm64-v8a
    WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
    WARNING: The major revision of the Android NDK referenced by android_ndk_repository rule 'androidndk' is 19. The major revisions supported by Bazel are [10, 11, 12, 13, 14, 15, 16]. Bazel will attempt to treat the NDK as if it was r16. This may cause compilation and linkage problems. Please download a supported NDK version.
    INFO: Analysed target //mace/tools:mace_run_static (32 packages loaded).
    INFO: Found 1 target...
    Target //mace/tools:mace_run_static up-to-date:
      bazel-bin/mace/tools/mace_run_static
    INFO: Elapsed time: 12.009s, Critical Path: 0.50s
    INFO: 0 processes.
    INFO: Build completed successfully, 1 total action
    ('build', '//mace/tools:mace_run_static', '--config', 'android', '--cpu=arm64-v8a', '--define', 'neon=true', '--define', 'openmp=false', '--define', 'opencl=true', '--define', 'quantize=false', '--define', 'hexagon=false', '--define', 'hta=false', '--define', 'apu=false', '--config', 'optimization', '--config', 'symbol_hidden')
    Build done!
    
    **************************************************
              Run model model on POCOF1          
    **************************************************
    
    Generate input file:  build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a/model_input_data
    Generate input file done.
    * Run 'model' with round=1, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,)
    Push build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a/model_input_data to /data/local/tmp/mace_run
    Push build/model/model/model.data to /data/local/tmp/mace_run
    Push build/model/model/model.pb to /data/local/tmp/mace_run/model.pb
    Push build/model/_tmp/arm64-v8a/mace_run_static to /data/local/tmp/mace_run
    Push /tmp/cmd_file-model-1584697859.1764498 to /data/local/tmp/mace_run/cmd_file-model-1584697859.1764498
    I mace/tools/mace_run.cc:527] model name: model
    I mace/tools/mace_run.cc:528] mace version: v0.12.0-0-ga610d50
    I mace/tools/mace_run.cc:529] input node: data
    I mace/tools/mace_run.cc:530] input shape: 1,3,160,160
    I mace/tools/mace_run.cc:531] output node: face_rpn_cls_prob_reshape_stride32,face_rpn_bbox_pred_stride32,face_rpn_landmark_pred_stride32,face_rpn_cls_prob_reshape_stride16,face_rpn_bbox_pred_stride16,face_rpn_landmark_pred_stride16,face_rpn_cls_prob_reshape_stride8,face_rpn_bbox_pred_stride8,face_rpn_landmark_pred_stride8
    I mace/tools/mace_run.cc:532] output shape: 1,4,5,5:1,8,5,5:1,20,5,5:1,4,10,10:1,8,10,10:1,20,10,10:1,4,20,20:1,8,20,20:1,20,20,20
    I mace/tools/mace_run.cc:533] input_file: /data/local/tmp/mace_run/model_input
    I mace/tools/mace_run.cc:534] output_file: /data/local/tmp/mace_run/model_out
    I mace/tools/mace_run.cc:535] input dir:
    I mace/tools/mace_run.cc:536] output dir:
    I mace/tools/mace_run.cc:537] model_data_file: /data/local/tmp/mace_run/model.data
    I mace/tools/mace_run.cc:538] model_file: /data/local/tmp/mace_run/model.pb
    I mace/tools/mace_run.cc:539] device: GPU
    I mace/tools/mace_run.cc:540] round: 1
    I mace/tools/mace_run.cc:541] restart_round: 1
    I mace/tools/mace_run.cc:542] gpu_perf_hint: 3
    I mace/tools/mace_run.cc:543] gpu_priority_hint: 3
    I mace/tools/mace_run.cc:544] omp_num_threads: -1
    I mace/tools/mace_run.cc:545] cpu_affinity_policy: 1
    I mace/tools/mace_run.cc:548] limit_opencl_kernel_time: 0
    I mace/tools/mace_run.cc:553] opencl_queue_window_size: 0
    I mace/libmace/mace.cc:464] Creating MaceEngine, MACE version: v0.12.0-0-ga610d50
    I mace/libmace/mace.cc:503] Initializing MaceEngine
    I mace/libmace/mace.cc:636] Destroying MaceEngine
    I mace/tools/mace_run.cc:596] restart round 0
    W ./mace/utils/tuner.h:201] Failed to read tuned param file: /data/local/tmp/mace_run/model_tuned_opencl_parameter.POCOF1.sdm845.bin
    I mace/libmace/mace.cc:911] Create MaceEngine from model graph proto and weights data
    I mace/libmace/mace.cc:464] Creating MaceEngine, MACE version: v0.12.0-0-ga610d50
    W mace/core/kv_storage.cc:109] Failed to read kv store file: /data/local/tmp/mace_run/interior//mace_cl_compiled_program.bin
    W mace/core/runtime/opencl/opencl_runtime.cc:382] Load OpenCL cached compiled kernel file failed. Please make sure the storage directory exist and you have Write&Read permission
    I mace/libmace/mace.cc:503] Initializing MaceEngine
    I mace/tools/mace_run.cc:269] Create Mace Engine latency: 1204.28 ms
    I mace/tools/mace_run.cc:276] Total init latency: 1204.47 ms
    I mace/tools/mace_run.cc:370] Warm up run
    I mace/tools/mace_run.cc:406] 1st warm up run latency: 1874.9 ms
    I mace/tools/mace_run.cc:414] Run model
    I mace/tools/mace_run.cc:476] Average latency: 12.42 ms
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride32 with size 400 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride32 with size 800 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride32 with size 2000 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride16 with size 1600 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride16 with size 3200 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride16 with size 8000 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride8 with size 6400 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride8 with size 12800 done.
    I mace/tools/mace_run.cc:491] Write output file /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride8 with size 32000 done.
    ========================================================
         capability(CPU)        init      warmup     run_avg
    ========================================================
    time          21.267    1204.469    1874.897      12.420
    I mace/libmace/mace.cc:636] Destroying MaceEngine
    Running finished!
    
    * Validate with caffe
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride32 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride32 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride32 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride16 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride16 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride16 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride8 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride8 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride8 to build/model/_tmp/model/ff8ca0d5edb943c867518c604a0c575d/POCOF1_sdm845/arm64-v8a
    face_rpn_cls_prob_reshape_stride32 MACE VS CAFFE similarity: 0.9999999923376746 , sqnr: 12761169.735631926 , pixel_accuracy: 0.75
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_bbox_pred_stride32 MACE VS CAFFE similarity: 0.9999559880321396 , sqnr: 11251.16938898572 , pixel_accuracy: 1.0
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_landmark_pred_stride32 MACE VS CAFFE similarity: 0.9999370739215658 , sqnr: 7828.408858784128 , pixel_accuracy: 0.98
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_cls_prob_reshape_stride16 MACE VS CAFFE similarity: 0.9999999903230272 , sqnr: 13382509.571867507 , pixel_accuracy: 0.75
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_bbox_pred_stride16 MACE VS CAFFE similarity: 0.9999496464497996 , sqnr: 9698.99300525273 , pixel_accuracy: 0.9625
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_landmark_pred_stride16 MACE VS CAFFE similarity: 0.9999721620823648 , sqnr: 17802.002994858874 , pixel_accuracy: 0.99
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_cls_prob_reshape_stride8 MACE VS CAFFE similarity: 0.9999999887525579 , sqnr: 12572009.393454924 , pixel_accuracy: 0.6875
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_bbox_pred_stride8 MACE VS CAFFE similarity: 0.999877265839963 , sqnr: 3695.563582568223 , pixel_accuracy: 0.91875
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    face_rpn_landmark_pred_stride8 MACE VS CAFFE similarity: 0.9999272805682087 , sqnr: 6862.242260144659 , pixel_accuracy: 0.95
    ******************************************
              Similarity Test Passed          
    ******************************************
    
    Validation done!
    
    Dana service is not available.
    Elapse time: 0.444397 minutes.
    * Package libs for model
    Start packaging 'model' libs into build/model/libmace_model.tar.gz
    build/model/model/
    build/model/model/gpu/
    build/model/model/model.data
    build/model/model/model.pb
    Packaging Done!
    
    --------------------------------------------------------------
                               Library                            
    --------------------------------------------------------------
    |          key           |               value               |
    ==============================================================
    | MACE Model package Path| build/model/libmace_model.tar.gz|
    --------------------------------------------------------------
    

    Code isn't complete yet, indeed I'm not very familiar with MACE project classes and I don't know how to assess if a model requires this data layout transformation or not. Namely, how can I get the model data format (ǸCHW, NHWC) from within mace/ops/opencl/image/reshape.cc? So that we can apply the transformation only if needed.

    One additional point is related to performance. I noticed that adding this data layout transformation, of course, takes some time during inference. Maybe there's a better and more performing way to implement it?

    @lu229 I hope this helps, please tag anyone else that might be interested in contributing

    opened by gasgallo 22
  • Segmentation fault for DepthwiseConv2d INT8 (CAFFE)

    Segmentation fault for DepthwiseConv2d INT8 (CAFFE)

    Before you open an issue, please make sure you have tried the following steps:

    1. Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
    2. Have you ever read the document for your usage?
    3. Check if your issue appears in HOW-TO-DEBUG or FAQ.
    4. The form below must be filled.

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
    • NDK version(e.g., 15c): 18b
    • GCC version(if compiling for host, e.g., 5.4.0): 5.4.0
    • MACE version (Use the command: git describe --long --tags): 0.11.0-rc0
    • Python version(2.7): 2.7
    • Bazel version (e.g., 0.13.0): 0.16.0

    Model deploy file (*.yml)

    # The name of library
    library_name: model
    target_abis: [arm64-v8a]
    model_graph_format: file
    model_data_format: file
    models:
      sp: # model tag, which will be used in model loading and must be specific.
        platform: caffe
        # path to your tensorflow model's pb file. Support local path, http:// and https://
        model_file_path: /models/sp/model-nofc.prototxt
        weight_file_path: /models/sp/model-nofc.caffemodel
        # sha256_checksum of your model's pb file.
        # use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file
        model_sha256_checksum: 54479f5ec821884f5bfcc03cb1f4558275541c6e80d9f33f65cc58562fffe91b 
        weight_sha256_checksum: e9599be0e9d5a5f08b85f9b98d2a76b55463ecb6820efc3bcdbc3ea0050f62a0 
        subgraphs:
          - input_tensors:
              - data
            input_shapes:
              - 1,3,112,112
            input_data_formats:
              - NCHW
            output_tensors:
              - fc1bn
            output_shapes:
              - 1,1,1,512
        obfuscate: 0
        quantize: 1
        quantize_range_file: /mace/overall_range
        runtime: cpu # cpu, gpu or cpu+gpu or dsp
        winograd: 0
    

    Describe the problem

    Segmentation fault happens when running quantized depthwise conv2d.

    To Reproduce

    Steps to reproduce the problem:

    1. cd /path/to/mace
    2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file
    2. python tools/converter.py run --config_file=/path/to/your/model_deployment_file
    

    Error information / logs

    Please include the full log and/or traceback here. https://gist.github.com/gasgallo/619eb23800d7caf46e6e97ed23bfc38a

    Additional context

    Models runs fine w/o quantization.

    opened by gasgallo 22
  • onnx_converter.py转换模型出错

    onnx_converter.py转换模型出错

    我把一个onnx模型转换成mace时,在onnx_converter.py 197行 "to": lambda x: data_type.onnx2tf(x) onnx_attr_translator = { "axis": lambda x: int(x), "axes": lambda x: [int(a) for a in x], "dtype": lambda x: data_type.onnx2tf(x), "keepdims": lambda x: bool(x), "to": lambda x: data_type.onnx2tf(x), } 中遇到 “global name 'data_type' is not defined”这个错误,请问如何解决?

    opened by Wuqiman 22
  • Caffe model validation fails on MACE v0.12.0 due to low similarity

    Caffe model validation fails on MACE v0.12.0 due to low similarity

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu16.04 (MACE Docker image)
    • NDK version(e.g., 15c): 18b
    • GCC version(if compiling for host, e.g., 5.4.0): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
    • MACE version (Use the command: git describe --long --tags): 0.12.0
    • Python version(2.7): 3.6
    • Bazel version (e.g., 0.13.0): [0.16.0]
    • CMake version: 3.16.0

    Model deploy file (*.yml)

    # The name of library
    library_name: libretinanet
    target_abis: [arm64-v8a]
    model_graph_format: code
    model_data_format: code
    models:
      retinanet:
        platform: caffe
        model_file_path: /models/retinanet/retinanet3.prototxt
        weight_file_path: /models/retinanet/retinanet3.caffemodel
        model_sha256_checksum: 638e05fc466737c3b8fc36261adaaff40cbd4de5a8c72a46b37f2b00f01180e1
        weight_sha256_checksum: 6222910a773c693c23b4765baba4ed8427e9f3c11781060918e6282a297a7437
        subgraphs:
          - input_tensors:
              - data
            input_shapes:
              - 1,3,320,320
            input_data_formats:
              - NCHW
            output_tensors:
              - face_rpn_cls_prob_reshape_stride32
              - face_rpn_bbox_pred_stride32
              - face_rpn_landmark_pred_stride32
              - face_rpn_cls_prob_reshape_stride16
              - face_rpn_bbox_pred_stride16
              - face_rpn_landmark_pred_stride16
              - face_rpn_cls_prob_reshape_stride8
              - face_rpn_bbox_pred_stride8
              - face_rpn_landmark_pred_stride8
            output_shapes:
              - 1,4,10,10
              - 1,8,10,10
              - 1,20,10,10
              - 1,4,20,20
              - 1,8,20,20
              - 1,20,20,20
              - 1,4,40,40
              - 1,8,40,40
              - 1,20,40,40
            output_data_formats:
              - NCHW
              - NCHW
              - NCHW
              - NCHW
              - NCHW
              - NCHW
              - NCHW
              - NCHW
              - NCHW
        obfuscate: 0
        runtime: gpu
        winograd: 4
    

    Describe the problem

    With MACE v0.12.0, the output of converted model is significantly different. As a result, it fails to run in validation mode. In addition, I tried to run unit tests on Andorid Studio and found the output difference. However, with MACE v0.11.0-rc1, the output is fine and the validation runs successfully.

    ========================================================
         capability(CPU)        init      warmup     run_avg
    ========================================================
    time           7.484     877.331    1323.142      16.371
    I mace/libmace/mace.cc:636] Destroying MaceEngine
    Running finished!
    
    * Validate with caffe
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride32 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride32 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride32 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride16 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride16 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride16 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride8 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride8 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride8 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
    Traceback (most recent call last):
      File "/mace/validate.py", line 459, in <module>
    face_rpn_cls_prob_reshape_stride32 MACE VS CAFFE similarity: 0.7051022200187764 , sqnr: 1.988659806883121 , pixel_accuracy: 0.4
        FLAGS.log_file)
      File "/mace/validate.py", line 371, in validate
        validation_threshold, log_file)
      File "/mace/validate.py", line 262, in validate_caffe_model
        value, validation_threshold, log_file)
      File "/mace/validate.py", line 113, in compare_output
        "", common.StringFormatter.block("Similarity Test Failed"))
    TypeError: summary() takes exactly 1 argument (2 given)
    Traceback (most recent call last):
      File "tools/converter.py", line 1151, in <module>
        flags.func(flags)
      File "tools/converter.py", line 938, in run_mace
        device.run_specify_abi(flags, configs, target_abi)
      File "/mace/tools/device.py", line 782, in run_specify_abi
        log_file=log_file,
      File "/mace/tools/sh_commands.py", line 756, in validate_model
        _fg=True)
      File "/root/.pyenv/versions/3.6.3/lib/python3.6/site-packages/sh.py", line 1413, in __call__
        raise exc
    sh.ErrorReturnCode_1: 
    
      RAN: /usr/bin/docker exec mace_caffe_lastest_validator python -u /mace/validate.py --platform=caffe --model_file=/mace/retinanet3.prototxt --weight_file=/mace/retinanet3.caffemodel --input_file=/mace/model_input --mace_out_file=/mace/model_out --device_type=GPU --input_node=data --output_node=face_rpn_cls_prob_reshape_stride32,face_rpn_bbox_pred_stride32,face_rpn_landmark_pred_stride32,face_rpn_cls_prob_reshape_stride16,face_rpn_bbox_pred_stride16,face_rpn_landmark_pred_stride16,face_rpn_cls_prob_reshape_stride8,face_rpn_bbox_pred_stride8,face_rpn_landmark_pred_stride8 --input_shape=1,3,320,320 --output_shape=1,4,10,10:1,8,10,10:1,20,10,10:1,4,20,20:1,8,20,20:1,20,20,20:1,4,40,40:1,8,40,40:1,20,40,40 --input_data_format=NCHW --output_data_format=NCHW,NCHW,NCHW,NCHW,NCHW,NCHW,NCHW,NCHW,NCHW --validation_threshold=0.995000 --input_data_type=float32 --backend=tensorflow --validation_outputs_data= --log_file=
    
      STDOUT:
    
    
      STDERR:
    
    

    To Reproduce

    Steps to reproduce the problem:

    1. cd /path/to/mace
    2. python tools/converter.py convert --config_file=/models/retinanet/retinanet3.yml
    2. python tools/converter.py run --validate --config_file=/models/retinanet/retinanet3.yml
    

    Error information / logs

    Please refer to this gist link for full conversion and validation log: MACE v0.12.0 Error log - validation failed · GitHub

    Additional context

    For MACE v0.12.0, I followed a workaround from the last answer of this issue https://github.com/XiaoMi/mace/issues/560

    diff --git a/tools/python/transform/transformer.py b/tools/python/transform/transformer.py
    index bb9154f..bbf14b4 100644
    --- a/tools/python/transform/transformer.py
    +++ b/tools/python/transform/transformer.py
    @@ -1353,7 +1353,7 @@ class Transformer(base_converter.ConverterInterface):
             visited = set()
             sorted_nodes = []
     
    -        output_nodes = self._option.check_nodes.keys()
    +        output_nodes = list(self._option.check_nodes.keys())
             if not self._quantize_activation_info:
                 output_nodes.extend(self._option.output_nodes)
             for output_node in output_nodes:
    
    opened by mexeniz 21
  • yolo3输出结果不对

    yolo3输出结果不对

    Before you open an issue, please make sure you have tried the following steps:

    1. Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
    2. Have you ever read the document for your usage?
    3. Check if your issue appears in HOW-TO-DEBUG or FAQ.
    4. The form below must be filled.

    System information

    • **OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Centos7
    • NDK version(e.g., 15c):17b
    • GCC version(if compiling for host, e.g., 5.4.0):
    • **MACE version (Use the command: git describe --long --tags):v0.11.0-rc0-0-g2d650b6
    • **Python version(2.7):2.7
    • **Bazel version (e.g., 0.13.0):0.13.0

    Model deploy file (*.yml)

    library_name: yolo-v3 target_abis: [armeabi-v7a, arm64-v8a] model_graph_format: code model_data_format: code models: yolo_v3: platform: tensorflow model_file_path: /MACE/mace-models/yolo-v3/frozen_darknet_yolov3_model.pb model_sha256_checksum: 90d96d1e07340bce8e250dd66bbcb05e2965bad7e22d0f875aa65d97e11113f1 subgraphs: - input_tensors: - inputs:0 input_shapes: - 1,416,416,3 output_tensors: - detector/yolo-v3/Conv_6/BiasAdd:0 - detector/yolo-v3/Conv_14/BiasAdd:0 - detector/yolo-v3/Conv_22/BiasAdd:0 output_shapes: - 1,13,13,42 - 1,26,26,42 - 1,52,52,42 runtime: cpu+gpu limit_opencl_kernel_time: 0 nnlib_graph_mode: 0 obfuscate: 0 winograd: 0

    Describe the problem

    问题1:通过GPU运行的,CPU总是崩溃 问题2:有没有解析yolo3输出结果的例子呢? 问题3:GPU运行模型后,输出的数据都是类似下方的样子, NaN NaN 0.0 0.0 -4.25229E-39 1.8E-44 -1.4E-45 0.0 0.0 -1.4E-45 0.0 0.0 -1.4E-45 0.0 0.0 -1.4E-45 0.0

    To Reproduce

    Steps to reproduce the problem:

    1. cd /path/to/mace
    2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file
    

    Error information / logs

    日志信息: 06-21 15:21:21.144 29261-29261/? I/art: Late-enabling -Xcheck:jni 06-21 15:21:21.174 29261-29267/? I/art: Debugger is no longer active 06-21 15:21:21.204 29261-29261/? W/ActivityThread: Application com.xiaomi.mace.demo is waiting for the debugger on port 8100... 06-21 15:21:21.204 29261-29261/? I/System.out: Sending WAIT chunk 06-21 15:21:25.494 29261-29267/com.xiaomi.mace.demo I/art: Debugger is active 06-21 15:21:25.684 29261-29261/com.xiaomi.mace.demo I/System.out: Debugger has connected 06-21 15:21:25.684 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:25.884 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:26.084 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:26.284 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:26.494 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:26.694 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:26.894 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:27.094 29261-29261/com.xiaomi.mace.demo I/System.out: waiting for debugger to settle... 06-21 15:21:27.294 29261-29261/com.xiaomi.mace.demo I/System.out: debugger has settled (1451) 06-21 15:21:27.904 29261-29261/com.xiaomi.mace.demo I/InstantRun: starting instant run server: is main process 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = person 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bicycle 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = car 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = motorbike 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = aeroplane 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bus 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = train 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = truck 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = boat 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = traffic light 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = fire hydrant 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = stop sign 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = parking meter 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bench 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bird 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = cat 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = dog 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = horse 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = sheep 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = cow 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = elephant 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bear 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = zebra 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = giraffe 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = backpack 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = umbrella 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = handbag 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = tie 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = suitcase 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = frisbee 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = skis 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = snowboard 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = sports ball 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = kite 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = baseball bat 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = baseball glove 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = skateboard 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = surfboard 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = tennis racket 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bottle 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = wine glass 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = cup 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = fork 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = knife 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = spoon 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bowl 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = banana 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = apple 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = sandwich 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = orange 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = broccoli 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = carrot 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = hot dog 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = pizza 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = donut 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = cake 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = chair 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = sofa 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = pottedplant 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = bed 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = diningtable 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = toilet 06-21 15:21:27.964 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = tvmonitor 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = laptop 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = mouse 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = remote 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = keyboard 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = cell phone 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = microwave 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = oven 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = toaster 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = sink 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = refrigerator 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = book 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = clock 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = vase 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = scissors 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = teddy bear 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = hair drier 06-21 15:21:27.974 29261-29313/com.xiaomi.mace.demo D/labelCache: readLine = toothbrush 06-21 15:21:27.994 29261-29261/com.xiaomi.mace.demo D/PhoneWindow: DEBUG_SYSTEMUI:origin statusbar style 06-21 15:21:27.994 29261-29261/com.xiaomi.mace.demo D/PhoneWindow: DEBUG_SYSTEMUI:windowDrawsFlag set 06-21 15:21:27.994 29261-29261/com.xiaomi.mace.demo D/PhoneWindow: DEBUG_SYSTEMUI:IconColor=1 06-21 15:21:27.994 29261-29261/com.xiaomi.mace.demo D/PhoneWindow: DEBUG_SYSTEMUI:StatusBarColor final set ff303f9f 06-21 15:21:28.064 29261-29314/com.xiaomi.mace.demo W/linker: /data/app/com.xiaomi.mace.demo-1/lib/arm/libmace_mobile_jni.so: unused DT entry: type 0x1d arg 0x1e0aa 06-21 15:21:28.474 29261-29314/com.xiaomi.mace.demo I/APPModel: maceMobilenetCreateGPUContext result = 0 06-21 15:21:28.474 29261-29314/com.xiaomi.mace.demo I/image_classify attrs: gpu perf: 3, priority: 3 06-21 15:21:28.474 29261-29314/com.xiaomi.mace.demo I/image_classify attrs: device: 2 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo E/MACE: thread_pool.cc:140 Fail to get cpu max frequencies 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo E/MACE: thread_pool.cc:79 CPU core is empty 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo I/MACE: mace.cc:431 Creating MaceEngine, MACE version: v0.11.0-rc1-0-gd9406dd 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:28.494 29261-29314/com.xiaomi.mace.demo I/MACE: cpu_runtime.cc:99 GetCPUMaxFreq(&cpu_max_freqs) failed with error: Runtime error 06-21 15:21:28.504 29261-29315/com.xiaomi.mace.demo D/OpenGLRenderer: Use EGL_SWAP_BEHAVIOR_PRESERVED: true 06-21 15:21:29.714 29261-29315/com.xiaomi.mace.demo I/Adreno: QUALCOMM build : 31f65f7, Ia3ef73d9d4 Build Date : 12/16/16 OpenGL ES Shader Compiler Version: XE031.08.00.00 Local Branch : Remote Branch : Remote Branch : Reconstruct Branch : 06-21 15:21:30.014 29261-29315/com.xiaomi.mace.demo I/OpenGLRenderer: Initialized EGL, version 1.4 06-21 15:21:30.304 29261-29261/com.xiaomi.mace.demo I/Choreographer: Skipped 68 frames! The application may be doing too much work on its main thread. 06-21 15:21:30.664 29261-29314/com.xiaomi.mace.demo I/MACE: mace.cc:470 Initializing MaceEngine 06-21 15:21:31.064 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:31.064 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:31.064 29261-29314/com.xiaomi.mace.demo I/MACE: cpu_runtime.cc:99 GetCPUMaxFreq(&cpu_max_freqs) failed with error: Runtime error 06-21 15:21:31.064 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/transpose fall back to CPU 06-21 15:21:31.064 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/truediv fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/transpose_1 fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/ResizeNearestNeighbor fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/transpose_2 fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/concat_3 fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/transpose_4 fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/ResizeNearestNeighbor_1 fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/transpose_5 fall back to CPU 06-21 15:21:31.094 29261-29314/com.xiaomi.mace.demo I/MACE: net_def_adapter.cc:348 Op detector/yolo-v3/concat_7 fall back to CPU 06-21 15:21:31.104 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:31.104 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:31.104 29261-29314/com.xiaomi.mace.demo I/MACE: cpu_runtime.cc:99 GetCPUMaxFreq(&cpu_max_freqs) failed with error: Runtime error 06-21 15:21:31.534 29261-29314/com.xiaomi.mace.demo I/image_classify attrs: create result: Success: Success: Success,input:1,output:3 06-21 15:21:31.534 29261-29314/com.xiaomi.mace.demo I/APPModel: maceMobilenetCreateEngine result = 0 06-21 15:21:34.584 29261-29261/com.xiaomi.mace.demo D/PhoneWindow: DEBUG_SYSTEMUI:origin statusbar style 06-21 15:21:34.584 29261-29261/com.xiaomi.mace.demo D/PhoneWindow: DEBUG_SYSTEMUI:IconColor=1 06-21 15:21:34.584 29261-29261/com.xiaomi.mace.demo D/PhoneWindow: DEBUG_SYSTEMUI:StatusBarColor final set ff303f9f 06-21 15:21:34.594 29261-29261/com.xiaomi.mace.demo D/yuan: onVisibilityChanged----android.widget.ListView{ede1ff3 VFED.VC.. ......I. 0,0-0,0 #7f070047 app:id/list_menu} 06-21 15:21:35.624 29261-29314/com.xiaomi.mace.demo I/image_classify attrs: device: 0 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo E/MACE: thread_pool.cc:140 Fail to get cpu max frequencies 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo E/MACE: thread_pool.cc:79 CPU core is empty 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo I/MACE: mace.cc:431 Creating MaceEngine, MACE version: v0.11.0-rc1-0-gd9406dd 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo I/MACE: cpu_runtime.cc:99 GetCPUMaxFreq(&cpu_max_freqs) failed with error: Runtime error 06-21 15:21:35.634 29261-29314/com.xiaomi.mace.demo I/MACE: mace.cc:603 Destroying MaceEngine 06-21 15:21:35.764 29261-29314/com.xiaomi.mace.demo I/MACE: mace.cc:470 Initializing MaceEngine 06-21 15:21:36.834 29261-29315/com.xiaomi.mace.demo D/OpenGLRenderer: endAllStagingAnimators on 0x80018598 (RippleDrawable) with handle 0x800fbc68 06-21 15:21:36.964 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:36.964 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:36.964 29261-29314/com.xiaomi.mace.demo I/MACE: cpu_runtime.cc:99 GetCPUMaxFreq(&cpu_max_freqs) failed with error: Runtime error 06-21 15:21:37.004 29261-29314/com.xiaomi.mace.demo E/MACE: env.cc:86 failed to open /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 06-21 15:21:37.004 29261-29314/com.xiaomi.mace.demo I/MACE: env.cc:97 LinuxBaseEnv::GetCPUMaxFreq(max_freqs) failed with error: Runtime error 06-21 15:21:37.004 29261-29314/com.xiaomi.mace.demo I/MACE: cpu_runtime.cc:99 GetCPUMaxFreq(&cpu_max_freqs) failed with error: Runtime error 06-21 15:21:37.664 29261-29314/com.xiaomi.mace.demo I/image_classify attrs: create result: Success: Success: Success,input:1,output:3 06-21 15:21:37.664 29261-29314/com.xiaomi.mace.demo I/APPModel: maceMobilenetCreateEngine result = 0 06-21 15:21:41.474 29261-29261/com.xiaomi.mace.demo I/Choreographer: Skipped 104 frames! The application may be doing too much work on its main thread. 06-21 15:21:41.474 29261-29314/com.xiaomi.mace.demo I/image_classify: output:detector/yolo-v3/Conv_6/BiasAdd:0,size:1 06-21 15:21:41.474 29261-29314/com.xiaomi.mace.demo I/image_classify: output:detector/yolo-v3/Conv_14/BiasAdd:0,size:2 06-21 15:21:41.474 29261-29314/com.xiaomi.mace.demo I/image_classify: output:detector/yolo-v3/Conv_22/BiasAdd:0,size:3 06-21 15:21:41.484 29261-29315/com.xiaomi.mace.demo I/Adreno: QUALCOMM build : 31f65f7, Ia3ef73d9d4 Build Date : 12/16/16 OpenGL ES Shader Compiler Version: XE031.08.00.00 Local Branch : Remote Branch : Remote Branch : Reconstruct Branch : 06-21 15:21:41.794 29261-29315/com.xiaomi.mace.demo I/OpenGLRenderer: Initialized EGL, version 1.4 bt

    CPU崩溃信息 A/libc: Fatal signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xc8300000 in tid 20169 (jniThread), pid 20098 (iaomi.mace.demo) std::__ndk1::__function::__func<mace::ops::arm::fp32::Conv2dGeneral::Compute(mace::OpContext const*, mace::Tensor const*, mace::Tensor const*, mace::Tensor*)::$_0, std::__ndk1::allocator<mace::ops::arm::fp32::Conv2dGeneral::Compute(mace::OpContext const*, mace::Tensor const*, mace::Tensor const*, mace::Tensor*)::$_0>, void (long long, long long, long long, long long, long long, long long)>::operator()(long long&&, long long&&, long long&&, long long&&, long long&&, long long&&) 0x000000009aa0df22 mace::utils::ThreadPool::Compute2D(std::__ndk1::function<void (long long, long long, long long, long long, long long, long long)> const&, long long, long long, long long, long long, long long, long long, long long, long long, int) 0x000000009aa57ae8 mace::ops::arm::fp32::Conv2dGeneral::Compute(mace::OpContext const*, mace::Tensor const*, mace::Tensor const*, mace::Tensor*) 0x000000009aa0d890 mace::ops::Conv2dOp<(mace::DeviceType)0, float>::Run(mace::OpContext*) 0x000000009a9ba16a mace::SerialNet::Run(mace::RunMetadata*) 0x000000009aa42642 mace::MaceEngine::Impl::Run(std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator >, mace::MaceTensor, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const, mace::MaceTensor> > > const&, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator >, mace::MaceTensor, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const, mace::MaceTensor> > >, mace::RunMetadata) 0x000000009a8febd4 mace::MaceEngine::Run(std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator >, mace::MaceTensor, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const, mace::MaceTensor> > > const&, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator >, mace::MaceTensor, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits, std::__ndk1::allocator > const, mace::MaceTensor> > >*) 0x000000009a8ff16c ::Java_com_xiaomi_mace_JniMaceUtils_maceMobilenetClassify(JNIEnv *, jclass, jfloatArray) image_classify.cc:345 art_quick_generic_jni_trampoline 0x00000000b3d67d1a art_quick_invoke_stub_internal 0x00000000b3d63622 art_quick_invoke_static_stub 0x00000000b406a628

    opened by gaopeipeiok 21
  • onnx convert_conv2d problems

    onnx convert_conv2d problems

    将onnx模型转mace models时,一个conv op转换失败,报错如下: Traceback (most recent call last): File "tools/converter.py", line 1334, in flags.func(flags) File "tools/converter.py", line 926, in convert_func convert.convert(configs, MODEL_CODEGEN_DIR, flags.enable_micro) File "tools/python/convert.py", line 89, in convert net_def_with_Data = convert_net(net_name, net_conf, enable_micro) File "tools/python/convert.py", line 235, in convert_net output_graph_def, converter_info = converter.run() File "tools/python/transform/onnx_converter.py", line 565, in run self.convert_ops(graph_def) File "tools/python/transform/onnx_converter.py", line 649, in convert_ops self._op_convertersnode.op_type File "tools/python/transform/onnx_converter.py", line 937, in convert_conv2d filter_shape = self._graph_shapes_dict[node.inputs[1]] KeyError: '819' 而其中,这个op打印如下: { input: "540" input: "819" input: "820" output: "818" name: "Conv_12" op_type: "Conv" attribute { name: "dilations" ints: 1 ints: 1 type: INTS } attribute { name: "group" i: 32 type: INT } attribute { name: "kernel_shape" ints: 3 ints: 3 type: INTS } attribute { name: "pads" ints: 1 ints: 1 ints: 1 ints: 1 type: INTS } attribute { name: "strides" ints: 2 ints: 2 type: INTS } } 顺着代码看了下,报错那行代码中试图从self._graph_shapes_dict中读取shape,这个self._graph_shapes_dict是通过读取图中tensor得到的,可819号其实是参数W,所以没有出现在self._graph_shapes_dict中,导致报错。请问这个符合预期吗?

    opened by realdartagnan 1
  • 怎么直接执行mace编译后的模型

    怎么直接执行mace编译后的模型

    benchmark 没有c++接口吗

    跑模型报错

    I /mace_other/mace/mace/libmace/mace.cc:503] Initializing MaceEngine
    F /mace_other/mace/mace/core/operator.h:189] Check failed: idx < inputs_.size() 
    F /mace_other/mace/mace/core/operator.h:189] backtrace:
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x7806499418 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x780649bfd8 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x780649bf84 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x780649c254 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x780649c33c 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x780634b76c 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x78063d0ec0 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x780645a1c0 
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x7806343b58 _ZN4mace10MaceEngine4Impl3RunERKNSt6__ndk13mapINS2_12basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEENS_10MaceTensorENS2_4lessIS9_EENS7_INS2_4pairIKS9_SA_EEEEEEPSH_PNS_11RunMetadataE
    F /mace_other/mace/mace/core/operator.h:189]  pc 0x61139239a4 
    
    
    opened by Tasfa 1
Releases(v1.1.1)
  • v1.1.1(Jan 13, 2022)

    Feature:

    1. Support ION buffer on APU v4 and support input is float
    2. Auto signing libhexagon_nn_skel.so inside
    3. Remove op module when do not use cpu or gpu
    4. Supports boost and preference hints for APU
    5. Support build apu mace_run with no device connected
    6. Add dsp soc id 450
    7. Support fake warmup for OpenCL to speed up GPU warmup
    8. Add Qnn Backend and update qnn library
    9. Add special models to CI and Micro runtime_load_model example
    10. Support opencl3.0
    11. Support mtk ion mode
    12. Support dma_buf_heap
    13. Remove fallbacks caused by Reshape
    14. Add run validation for MACE-Micro
    15. Add MACE-Micro runtime load model interface
    16. Update MTK APU lib

    Operator:

    1. Support sigmoid uint8 mode
    2. Support DepthToSpace, SpaceToDepth, ReduceSum and DetectionOutput operator
    3. Support depthwise_deconv2d host configuration
    4. Add keras converter supported ops
    5. Support InstanceNorm operator and fold InstanceNorm from TensorFlow
    6. Supports depth_to_space CRD mode
    7. Support dsp op: leaky relu, reshape
    8. Support htp op: depthwise_deconv, leaky_relu
    9. Support keras op: substract, multiply
    10. Support op: HardSigmoid

    Performance:

    1. Optimize cpu op pooling and softmax performance
    2. Optimize Softmax on GPU and support GPU Reduce on channel dimension

    Other

    1. Fix some compatibility and stability bugs
    2. Fix some document error
    3. Add some convert bug
    Source code(tar.gz)
    Source code(zip)
  • v1.0.4(Mar 18, 2021)

  • v1.0.3(Mar 3, 2021)

    1. Support i/o data types such as fp16, bf16.
    2. Fix building error on APU runtime.
    3. Support hexagon memory usage statistics.
    4. Fix CMake building error for static library.
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 12, 2021)

  • v1.0.1(Dec 23, 2020)

  • v1.0.0(Nov 4, 2020)

    Release Note

    The following are the highlights in this release:

    Support Quantization For MACE Micro

    At the beginning of this year, we released MACE Micro to fully support ultra-low-power inference scenarios of mobile phones and IoT devices. In this version, we support quantization for MACE Micro and integrate CMSIS5 to support Cortex-M chips better.

    Support More Model Formats

    We find more and more R&D engineers are using the PyTorch framework to train their models. In previous versions, MACE transformed the PyTorch model by using ONNX format as a bridge. In order to serve PyTorch developers better, we support direct transformation for PyTorch models in this version, which improves the performance of the model inference. At the same time, we cooperated with MEGVII company and support its MegEngine model format. If you trained your models by MegEngine framework, now you can use MACE to deploy the models on mobile phones or IoT devices.

    Support More Data Precision

    Armv8.2 provides support for half-precision floating-point data processing instructions, in this version we support the fp16 precision computation by Armv8.2 fp16 instructions, which increases inference speed by roughly 40% for models such as mobilenet-v1 model. The bfloat16 (Brain Floating Point) floating-point format is a computer number format occupying 16 bits in computer memory, we also support bfloat16 precision in this version, which increases inference speed by roughly 40% for models such as mobilenet-v1/2 model on some low-end chips.

    Others

    In this version, we also add the following features:

    1. Support more operators, such as GroupNorm, ExtractImagePatches, Elu, etc.
    2. Optimize the performance of the framework and operators, such as the Reduce operator.
    3. Support dynamic filter of conv2d/deconv2d.
    4. Integrate MediaTek APU support on mt6873, mt6885, and mt6853.

    Acknowledgement

    Thanks to the following guys who contribute code which makes MACE better.

    @ZhangZhijing1, who contributed the bf16 code which was then committed by someone else. @yungchienhsu, @Yi-Kai-Chen, @Eric-YK-Chen, @yzchen, @gasgallo, @lq, @huahang, @elswork, @LovelyBuggies, @freewym.

    Attachment

    libmace-v1.0.0.tar.gz: Prebuilt MACE library using NDK-19c, which contains armeabi-v7a, arm64-v8a, arm_linux and linux-x86-64 libraries.

    Source code(tar.gz)
    Source code(zip)
    libmace-v1.0.0.tar.gz(122.92 MB)
  • v0.13.0(Apr 3, 2020)

    Release Note

    The following are the highlights in this release:

    Support for Mace Micro

    Compared with mobile devices such as mobile phones, micro-controllers are small, low-energy computing devices, which are often embedded in hardware that only needs basic computing, including household appliances and IoT devices. Billions of microcontrollers are produced every year. Mace adds micro-controller support to fully support ultra-low-power inference scenarios of mobile phones and IoT devices. Mace's micro-controller engine does not rely on any OS, heap memory allocation, C++ library or other third-party libraries except the math library.

    Further Support For Quantization

    Mace supports two kinds of quantization mechanisms: quantization-aware training and post-training quantization. In this version, we add a mixed-use of them. Furthermore, we support Armv8.2 dot product instruction for CPU quantization.

    Performance Optimization

    Mace is continuously optimizing the performance. This time, we add ION buffer support for Qualcomm socs, which greatly improves the inference performance of models that need to switch between GPU and CPU. Moreover, we optimize the operators' performance such as ResizeNearestNeighbor, Deconv.

    Others

    In this version, We support many new operators, BatchMatMulV2 and Select operators for TensorFlow, Deconv2d, Strided-Slice, Sigmoid for Hexagon DSP and fix some bugs on validation and tuning.

    Acknowledgement

    Thanks for the following guys who contribute code which makes MACE better. gasgallo

    Attachment

    libmace-v0.13.0.tar.gz: Prebuilt MACE library using NDK-19c, which contains armeabi-v7a, arm64-v8a, arm_linux and linux-x86-64 libraries.

    Source code(tar.gz)
    Source code(zip)
    libmace-v0.13.0.tar.gz(301.33 MB)
  • v0.12.0(Nov 17, 2019)

    Release Note

    The following are the highlights in this release:

    Performance Optimization

    We found that the lack of OP implementations on devices(GPU, Hexagon DSP, etc.) would lead to inefficient model execution, for the memory synchronization between the device and the CPU consumed much time, so we added and enhanced some operators on the GPU( reshape, lpnorm, mvnorm, etc.) and Hexagon DSP (s2d, d2s, sub, etc.) to improve the efficiency of model execution.

    Further Support For Speech Recognition

    In the last version, we supported the Kaldi framework. In Xiaomi we did a lot of work to support the speech recognition model, including the support of flatten, unsample and other operators in onnx, as well as some bug fixes.

    CMake Support

    Mace is continuously optimizing our compilation tools. This time, we support cmake compilation. Because of the use of ccache for acceleration, the compilation speed of cmake is much faster than the original bazel. Related Docs: https://mace.readthedocs.io/en/latest/user_guide/basic_usage_cmake.html

    Others

    In this version, We supported detection of perfomance regression by dana , and “ gpu_queue_window” parameter is added to yml file, to solve the UI jam problem caused by GPU task execution. Related Docs: https://mace.readthedocs.io/en/latest/faq.html

    Acknowledgement

    Thanks for the following guys who contribute code which make MACE better.

    yungchienhsu, gasgallo, albu, yunikkk

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0-rc1(May 30, 2019)

  • v0.11.0-rc0(May 15, 2019)

    Improvements

    1. Support kaldi framework.
    2. Support ios and os-x.
    3. Support HTA device from Qualcomm.
    4. Support APU device from MTK.
    5. Add new thread pool to replace OpenMP
    6. New strategy to support mixing usage of CPU and GPU.
    7. Support many new ops and bug fixed.

    Incompatible Changes

    None

    New APIs

    1. Add a new CreateEngineFromProto API.
    2. MaceTensor support data type(float and int32).

    Acknowledgement

    Thanks for the following guys who contribute code which make MACE better.

    yungchienhsu, gigadeplex, hanton, idstein, herbakamil.

    Attachment

    libmace.zip: Prebuilt MACE library using NDK-17b, which contains armeabi-v7a, arm64-v8a, arm_linux and linux-x86-64 libraries.

    Source code(tar.gz)
    Source code(zip)
    libmace.zip(60.55 MB)
  • v0.10.0(Jan 4, 2019)

  • v0.9.0(Aug 2, 2018)

Owner
Xiaomi
Xiaomi
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Rachel Selina Rajarathnam 15 Jul 20, 2022
Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

3D+NeuroSim V1.0 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly av

NeuroSim 10 Dec 21, 2021
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Jul 30, 2022
TFCC is a C++ deep learning inference framework.

TFCC is a C++ deep learning inference framework.

Tencent 110 Jul 16, 2022
KSAI Lite is a deep learning inference framework of kingsoft, based on tensorflow lite

KSAI Lite English | 简体中文 KSAI Lite是一个轻量级、灵活性强、高性能且易于扩展的深度学习推理框架,底层基于tensorflow lite,定位支持包括移动端、嵌入式以及服务器端在内的多硬件平台。 当前KSAI Lite已经应用在金山office内部业务中,并逐步支持金山

null 76 Aug 11, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20k Aug 5, 2022
Forward - A library for high performance deep learning inference on NVIDIA GPUs

a library for high performance deep learning inference on NVIDIA GPUs.

Tencent 123 Mar 17, 2021
A library for high performance deep learning inference on NVIDIA GPUs.

Forward - A library for high performance deep learning inference on NVIDIA GPUs Forward - A library for high performance deep learning inference on NV

Tencent 502 Jul 31, 2022
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

null 847 Aug 6, 2022
Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, ncnn, MNN, SNPE, Arm NN, NNAbla

InferenceHelper This is a helper class for deep learning frameworks especially for inference This class provides an interface to use various deep lear

iwatake 154 Aug 1, 2022
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

TensorRT Open Source Software This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for Tens

NVIDIA Corporation 5.7k Aug 8, 2022
Benchmark framework of compute-in-memory based accelerators for deep neural network (inference engine focused)

DNN+NeuroSim V1.3 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly a

NeuroSim 23 Aug 5, 2022
VNOpenAI 23 Jul 31, 2022
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision

The MRPT project 1. Introduction Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer v

MRPT 1.5k Aug 4, 2022
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters

SHARK Communication Channels GitHub issues: Feature requests, bugs etc Nod.ai SHARK Discord server: Real time discussions with the nod.ai team and oth

nod.ai 37 Aug 3, 2022
A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

A lightweight 2D Pose model can be deployed on Linux/Window/Android, supports CPU/GPU inference acceleration, and can be detected in real time on ordinary mobile phones.

JinquanPan 49 Aug 1, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
Simple inference deep head pose ncnn version

ncnn-deep-head-pose Simple implement inference deep head pose ncnn version with high performance and optimized resource. This project based on deep-he

Đỗ Công Minh 11 Jun 13, 2022