SecMML: Secure MPC(multi-party computation) Machine Learning Framework

Overview

SecMML

介绍

SecMML是FudanMPL(Multi-Party Computation + Machine Learning)的一个分支,是用于训练机器学习模型的高效可扩展的安全多方计算(MPC)框架,基于BGW协议实现。此框架可以应用到三个及以上参与方联合训练的场景中。目前,SecMML能够支持几种当前流行的机器学习模型,包括线性回归模型,逻辑回归模型,BP神经网络和LSTM神经网络模型。

arch

Introduction

SecMML, a branch of FudanMPL (Multi-Party Computation + Machine Learning) , is a scalable and efficient MPC framework for training machine learning models based on BGW protocol. It has the generality to be extended in the application scenarios of three+ parties in both semi-honest and malicious (todo) settings . Currently, SecMML is able to support several popular machine learning models, including linear regression, logistic regression, BP neural networks and LSTM neural networks.

Application scenario

There are two practical situations as follow:

  1. As the following figure shows, several companies hold their own data sets respectively and want to train a better model on their union data sets wihtout sharing the plaintext of their datasets. At first, they share their data to other parties in a secret sharing manner. In this way, each party has a share of the entire data set. Then, as a party, each company trains the model collaboratively. Our framework is extensible to support arbitrary number of participants (three+) to train models on the entire data set composed of the data they hold.

    scenario

  2. There are a large number of individual data owners and they do not want their private data to be known by others. Internet companies want to make use of these distributed data to acquire better models. These companies may first specify several servers to perform the computation and these servers must be independent of each other. All data owners then send their data to these servers in secret sharing manner. The servers collaboratively train the model with these data and the trained model is finally revealed to the data owners. The scalability of the framework is that it can support any number of data owners, and any number of servers can be selected as computing parties.

Repository Structure

  • core/: Core libraries in MPL. The fundamental matrix lib, math operator lib and Player lib. Some math computations are compiled as libraries (libcore_lib.so).

  • machine_learning/: Machine learning algorithms: neural networks, linear regression and logistic regression.

  • datesets/mnist/: Training dataset.

  • util/: Data IO and network IO package. The network is implemented using socket, compatible on both Windows and Ubuntu.

  • Constant.h: Some constants and general functions in SecMML. Note that, for windows users, the macro UNIX_PLATFORM should be defined to use the winsocket library.

  • CMakeLists.txt: Define the compile rule for the project. Note that, for windows users, the target_link_libraries(SMMLF ws2_32) shall be uncommented.

Running

Here take training a linear regression model among three parties as an example

  • Clone the SecMML git repository by running:

    git clone https://github.com/SMMLF/MPL-Public.git

  • Set the number of parties to 3 (in Constant.h. Note that, M can be any arbitrary number >= 3):

    #define M 3

  • Specify the platform:

    • if Ubuntu (in Constant.h)

        `#ifndef UNIX_PLATFORM`
      
        `#define UNIX_PLATFORM`
        
        `#endif`
      
    • if Windows (in CMakeLists.txt):

        Add `target_link_libraries(SMMLF ws2_32)` to the file.
      
  • Choose the machine learning model (main.cpp):

    • Linear Regression Model: bp->linear_graph();
    • Logistic Regression Model: bp->logistic_graph();
    • Three-layer Model: bp->graph();
  • LSTM
  • Compile the executable file:

    • cd SecMML
    • cmake .
    • make
  • Start three processes and input the party index, respectively:

    • ./MPL
    • Please enter party index:
    • Enter 0,1,...,M for each process in order.

Help

Any question, please contact [email protected].

Contributor

Faculty: Prof. Weili Han

Students: Haoqi Wu (Graduate Student), Zifeng Jiang (Graduate Student), Wenqiang Ruan (Ph.D Candidate), Lushan Song (Ph.D Candidate), Dingyi Tang (Post Graduate Student)

Issues
  • make时遇到了问题

    make时遇到了问题

    make时遇到了以下问题,请问应该怎么解决? `collect2: error: ld returned 1 exit status

    CMakeFiles/MPL.dir/build.make:353: recipe for target 'MPL' failed make[2]: *** [MPL] Error 1 CMakeFiles/Makefile2:114: recipe for target 'CMakeFiles/MPL.dir/all' failed make[1]: *** [CMakeFiles/MPL.dir/all] Error 2 Makefile:90: recipe for target 'all' failed make: *** [all] Error 2 `

    opened by xsk0206 7
  • A python file used to download Mnist datasets and transfer them into .csv has created.

    A python file used to download Mnist datasets and transfer them into .csv has created.

    A python file used to download Mnist datasets and transfer them into .csv has created. Readme files have been updated to figure out how to use download.py in SecMML/datasets/mnist Readme file in Test directory instructs users to modify constant.json file in order to run these tests correctly.

    opened by SkyTu 0
  • Replace SWIG with pybind11

    Replace SWIG with pybind11

    1. Replace SWIG with pybind11 to provide interfaces for python.
    2. Move global variables(node_type, globalRound) to Constant.h and Constant.cpp.
    3. main.cpp: Move local variables(tel, ips, ports) into the main function.
    opened by shadowqwq 0
  • Model configuration optimization and performance optimization

    Model configuration optimization and performance optimization

    1. Define more marcos to specify the model parameters and control the offline/online and local/distributed tests.
    2. Use multi-thread; optimize the Mul_Const and LTZ protocol.
    opened by llCurious 0
  • [LTZ]: update the secret shared version

    [LTZ]: update the secret shared version

    MathOp.cpp: Update the implementation of LTZ (less than zero). Change it from the plain-text version to secret shared version. Concretely, using Div2m instead of Reveal.

    opened by llCurious 0
A framework for generic hybrid two-party computation and private inference with neural networks

MOTION2NX -- A Framework for Generic Hybrid Two-Party Computation and Private Inference with Neural Networks This software is an extension of the MOTI

ENCRYPTO 10 May 27, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8k Jul 1, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.6k Jul 1, 2022
A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems

mpi-histo A program developed using MPI for distributed computation of Histogram for large data and their performance anaysis on multi-core systems. T

Raj Shrestha 2 Dec 21, 2021
Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine DSSTNE (pronounced "Destiny") is an open source software library for training and deploying

Amazon Archives 4.4k Jun 26, 2022
A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms.

iNeural A library for creating Artificial Neural Networks, for use in Machine Learning and Deep Learning algorithms. What is a Neural Network? Work on

Fatih Küçükkarakurt 5 Apr 5, 2022
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 166.1k Jul 1, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 13.9k Jun 30, 2022
CNStream is a streaming framework for building Cambricon machine learning pipelines

CNStream is a streaming framework for building Cambricon machine learning pipelines

Cambricon Technologies 170 Jun 15, 2022
Unofficial third-party implementation of FFD (fast feature detector) published in IEEE TIP 2020.

fast_feature_detector Unofficial third-party implementation of FFD (fast feature detector) published in IEEE TIP 2020. Caution I have not got any perm

kamino410 12 Feb 17, 2022
Distributed machine learning platform

Veles Distributed platform for rapid Deep learning application development Consists of: Platform - https://github.com/Samsung/veles Znicz Plugin - Neu

Samsung 897 May 28, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

The Fido Project 412 Jun 25, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jun 24, 2022
Feature Store for Machine Learning

Overview Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production. Please see ou

Feast 3.3k Jun 30, 2022
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.1k Jul 2, 2022
In-situ data analyses and machine learning with OpenFOAM and Python

PythonFOAM: In-situ data analyses with OpenFOAM and Python Using Python modules for in-situ data analytics with OpenFOAM 8. NOTE that this is NOT PyFO

Argonne Leadership Computing Facility - ALCF 99 May 11, 2022
In this tutorial, we will use machine learning to build a gesture recognition system that runs on a tiny microcontroller, the RP2040.

Pico-Motion-Recognition This Repository has the code used on the 2 parts tutorial TinyML - Motion Recognition Using Raspberry Pi Pico The first part i

Marcelo Rovai 16 Jun 18, 2022
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 176 Jun 17, 2022