SecMML是FudanMPL(Multi-Party Computation + Machine Learning)的一个分支，是用于训练机器学习模型的高效可扩展的安全多方计算（MPC）框架，基于BGW协议实现。此框架可以应用到三个及以上参与方联合训练的场景中。目前，SecMML能够支持几种当前流行的机器学习模型，包括线性回归模型，逻辑回归模型，BP神经网络和LSTM神经网络模型。
SecMML, a branch of FudanMPL (Multi-Party Computation + Machine Learning) , is a scalable and efficient MPC framework for training machine learning models based on BGW protocol. It has the generality to be extended in the application scenarios of three+ parties in both semi-honest and malicious (todo) settings . Currently, SecMML is able to support several popular machine learning models, including linear regression, logistic regression, BP neural networks and LSTM neural networks.
There are two practical situations as follow:
As the following figure shows, several companies hold their own data sets respectively and want to train a better model on their union data sets wihtout sharing the plaintext of their datasets. At first, they share their data to other parties in a secret sharing manner. In this way, each party has a share of the entire data set. Then, as a party, each company trains the model collaboratively. Our framework is extensible to support arbitrary number of participants (three+) to train models on the entire data set composed of the data they hold.
There are a large number of individual data owners and they do not want their private data to be known by others. Internet companies want to make use of these distributed data to acquire better models. These companies may ﬁrst specify several servers to perform the computation and these servers must be independent of each other. All data owners then send their data to these servers in secret sharing manner. The servers collaboratively train the model with these data and the trained model is ﬁnally revealed to the data owners. The scalability of the framework is that it can support any number of data owners, and any number of servers can be selected as computing parties.
core/: Core libraries in MPL. The fundamental matrix lib, math operator lib and Player lib. Some math computations are compiled as libraries (libcore_lib.so).
machine_learning/: Machine learning algorithms: neural networks, linear regression and logistic regression.
datesets/mnist/: Training dataset.
util/: Data IO and network IO package. The network is implemented using socket, compatible on both Windows and Ubuntu.
Constant.h: Some constants and general functions in SecMML. Note that, for windows users, the macro
UNIX_PLATFORMshould be defined to use the
CMakeLists.txt: Define the compile rule for the project. Note that, for windows users, the
target_link_libraries(SMMLF ws2_32)shall be uncommented.
Here take training a linear regression model among three parties as an example
Clone the SecMML git repository by running:
git clone https://github.com/SMMLF/MPL-Public.git
Set the number of parties to 3 (in
Constant.h. Note that, M can be any arbitrary number >= 3):
#define M 3
Specify the platform:
if Ubuntu (in
`#ifndef UNIX_PLATFORM` `#define UNIX_PLATFORM` `#endif`
if Windows (in
Add `target_link_libraries(SMMLF ws2_32)` to the file.
Choose the machine learning model (
- Linear Regression Model: bp->linear_graph();
- Logistic Regression Model: bp->logistic_graph();
- Three-layer Model: bp->graph();
Compile the executable file:
Start three processes and input the party index, respectively:
Please enter party index:
- Enter 0,1,...,M for each process in order.
Any question, please contact [email protected].
Faculty: Prof. Weili Han
Students: Haoqi Wu (Graduate Student), Zifeng Jiang (Graduate Student), Wenqiang Ruan (Ph.D Candidate), Lushan Song (Ph.D Candidate), Dingyi Tang (Post Graduate Student)