A hierarchical parameter server framework based on MXNet. GeoMX also implements multiple communication-efficient strategies.

Overview

Introduction

GeoMX is a MXNet-based two-layer parameter server framework, aiming at integrating data knowledge that owned by multiple independent parties in a privacy-preserving way (i.e. no need to transfer raw data), by training a shared deep learning model collaboratively in a decentralized and distributed manner.

Unlike other distributed deep learning software framworks and the emerging Federated Learning technologies which are based on single-layer parameter server architecture, GeoMX applies two-layer architecture to reduce communication cost between parties.

GeoMX allows parties to train the deep learning model on their own data and clusters in a distributed mannar locally. Parties only need to upload the locally aggregated model gradients (or model updates) to the central party to perform global aggregation, model updating and model synchronization.

To mitigate the communication bottleneck between the central party and participating parties, GeoMX implements multiple communication-efficient strategies, such as BSC, DGT, TSEngine and P3, boosting the model training.

BSC, fully named as Bilateral Sparse Compression, reduces the size of gradient tensors during Local Server's push and pull progress. DGT, a contribution-aware differential gradient transmission protocol, fully named as Differential Gradient Transmission , transfers gradients in multi-channels with different reliability and priority according to their contribution to model convergence. TSEngine, an adaptive communication scheduler for efficient communication overlay of the parameter server system in DML-WANs, dynamically schedules the communication logic over the parameter server and workers based on the active network perception. P3 overlaps parameter synchronization with computation in order to improve the training performance.

Furthermore, GeoMX supports:

  1. 4 communication algorithms, including fully-synchronous algorithm, mix-synchronous algorithm, HierFAVG-styled synchronous algorithm and DC-ASGD asynchronous algorithm.

  2. (6 categories) 25 machine learning algorithms, including 9 gradient descent algorithms, 7 ensemble learning algorithms, 3 support vector algorithms, 2 MapReduce algorithms, 2 online learning algorithms, 2 incremental learning algorithms.

Installation

Build from source

clone the GeoMX project

git clone https://github.com/INET-RC/GeoMX.git
cd GeoMX

Install a Math Library and OpenCV

# e.g. OpenBLAS
sudo apt-get install -y libopenblas-dev
sudo apt-get install -y libopencv-dev

Build core shared library

There is a configuration file for make, make/config.mk compilation options.

If building on CPU and using OpenBLAS:

USE_OPENCV=1
USE_BLAS=openblas

If building on GPU and you want OpenCV and OpenBLAS:

USE_OPENCV=1
USE_BLAS=openblas
USE_CUDA=1
USE_CUDA_PATH=/usr/local/cuda

Visit usage-example for other compilation options. You can edit it and then run make -j$(nproc)

Building from source creates a library called libmxnet.so in the lib folder in your project root.

You may also want to add the shared library to your LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$PWD/lib

Install Python bindings

Navigate to the root of the GeoMX folder then run the following:

$ cd python
$ pip install -e .

Note that the -e flag is optional. It is equivalent to --editable and means that if you edit the source files, these changes will be reflected in the package installed.

Docker

It's highly recommended to run GeoMX in a Docker container. To build such a Docker image, use the provided Dockerfile.

docker build -t GeoMX -f Dockerfile .

GPU Backend

Please make sure NVIDIA Container Toolkit is installed. Here are some install for Ubuntu users:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

restart the Docker daemon to complete the installation:

sudo systemctl restart docker

Then edit daemon.json to set the default runtime:

$ vim /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Finally, restart the Docker daemon:

sudo systemctl restart docker

Documentation

  • Deployment: How to run GeoMX on multiple machines with Docker.
  • Configurations: How to use GeoMX's communication-efficient strategies.
  • System Design: How and why GeoMX and its modifications make better.
  • Examples: Understandable demos of GeoMX's communication-efficient strategies, communication algorithms and machine learning algorithms.

Communication

Contributors and Institutions

Contributors:

  • Intelligent Network and Application Research Center¹
  • DataMiningLab¹

Institutions:

  1. University of Electronic Science and Technology of China

Copyright and License

GeoMX is provided under the Apache-2.0 license.

You might also like...
Faiss is a library for efficient similarity search and clustering of dense vectors.

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy.

This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

Deploy SCRFD, an efficient high accuracy face detection approach, in your web browser with ncnn and webassembly

ncnn-webassembly-scrfd open https://nihui.github.io/ncnn-webassembly-scrfd and enjoy build and deploy Install emscripten

An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.
PPLNN is a high-performance deep-learning inference engine for efficient AI inferencing.

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing.

An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism

LineFS repository for the Artifact Evaluation of SOSP 2021 1. System requirements (Tested environment) 1.1. Hardware requirements 1.1.1. Host machine

Owner
null
HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs)

Merlin: HugeCTR HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-T

null 698 Sep 15, 2022
TengineGst is a streaming media analytics framework, based on GStreamer multimedia framework, for creating varied complex media analytics pipelines.

TengineGst is a streaming media analytics framework, based on GStreamer multimedia framework, for creating varied complex media analytics pipelines. It ensures pipeline interoperability and provides optimized media, and inference operations using Tengine Toolkit Inference Engine backend, across varied architecture - CPU, iGPU and VPU.

OAID 64 May 30, 2022
Provide sample code of efficient operator implementation based on the Cambrian Machine Learning Unit (MLU) .

Cambricon CNNL-Example CNNL-Example 提供基于寒武纪机器学习单元(Machine Learning Unit,MLU)开发高性能算子、C 接口封装的示例代码。 依赖条件 操作系统: 目前只支持 Ubuntu 16.04 x86_64 寒武纪 MLU SDK: 编译和

Cambricon Technologies 1 Mar 7, 2022
Implementations of Multiple View Geometry in Computer Vision and some extended algorithms.

MVGPlus Implementations of Multiple View Geometry in Computer Vision and some extended algorithms. Implementations Template-based RANSAC 2D Line estim

Chenyu 6 Apr 7, 2022
Cartographer is a system that provides real-time simultaneous localization and mapping (SLAM) in 2D and 3D across multiple platforms and sensor configurations.

Cartographer Purpose Cartographer is a system that provides real-time simultaneous localization and mapping (SLAM) in 2D and 3D across multiple platfo

Cartographer 6.2k Sep 24, 2022
open Multiple View Geometry library. Basis for 3D computer vision and Structure from Motion.

OpenMVG (open Multiple View Geometry) License Documentation Continuous Integration (Linux/MacOs/Windows) Build Code Quality Chat Wiki local/docker bui

openMVG 4.4k Sep 22, 2022
Dense Depth Estimation from Multiple 360-degree Images Using Virtual Depth

Dense Depth Estimation from Multiple 360-degree Images Using Virtual Depth [Project] [Paper] [arXiv] This is the official code of our APIN 2022 paper

null 6 Aug 16, 2022
Implement yolov5 with Tensorrt C++ api, and integrate batchedNMSPlugin. A Python wrapper is also provided.

yolov5 Original codes from tensorrtx. I modified the yololayer and integrated batchedNMSPlugin. A yolov5s.wts is provided for fast demo. How to genera

weiwei zhou 42 Aug 24, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser)

Brown Laboratory for Linguistic Information Processing 214 Aug 15, 2022
Toy path tracer for my own learning purposes (CPU/GPU, C++/C#, Win/Mac/Wasm, DX11/Metal, also Unity)

Toy Path Tracer Toy path tracer for my own learning purposes, using various approaches/techs. Somewhat based on Peter Shirley's Ray Tracing in One Wee

Aras Pranckevičius 913 Sep 19, 2022