ML++ - A library created to revitalize C++ as a machine learning front end

Overview

ML++

Machine learning is a vast and exiciting discipline, garnering attention from specialists of many fields. Unfortunately, for C++ programmers and enthusiasts, there appears to be a lack of support in the field of machine learning. To fill that void and give C++ a true foothold in the ML sphere, this library was written. The intent with this library is for it to act as a crossroad between low-level developers and machine learning engineers.

Installation

Begin by downloading the header files for the ML++ library. You can do this by cloning the repository and extracting the MLPP directory within it:

git clone https://github.com/novak-99/MLPP

Next, execute the "buildSO.sh" shell script:

sudo ./buildSO.sh

After doing so, maintain the ML++ source files in a local directory and include them in this fashion:

#include "MLPP/Stat/Stat.hpp" // Including the ML++ statistics module. 

int main(){
...
}

Finally, after you have concluded creating a project, compile it using g++:

g++ main.cpp /usr/local/lib/MLPP.so --std=c++17

Usage

Please note that ML++ uses the std::vector data type for emulating vectors, and the std::vector > data type for emulating matricies.

Begin by including the respective header file of your choice.

#include "MLPP/LinReg/LinReg.hpp"

Next, instantiate an object of the class. Don't forget to pass the input set and output set as parameters.

LinReg model(inputSet, outputSet);

Afterwards, call the optimizer that you would like to use. For iterative optimizers such as gradient descent, include the learning rate, epoch number, and whether or not to utilize the UI pannel.

model.gradientDescent(0.001, 1000, 0);

Great, you are now ready to test! To test a singular testing instance, utilize the following function:

model.modelTest(testSetInstance);

This will return the model's singular prediction for that example.

To test an entire test set, use the following function:

model.modelSetTest(testSet);

The result will be the model's predictions for the entire dataset.

Contents of the Library

  1. Regression
    1. Linear Regression
    2. Logistic Regression
    3. Softmax Regression
    4. Exponential Regression
    5. Probit Regression
    6. CLogLog Regression
    7. Tanh Regression
  2. Deep, Dynamically Sized Neural Networks
    1. Possible Activation Functions
      • Linear
      • Sigmoid
      • Softmax
      • Swish
      • Mish
      • SinC
      • Softplus
      • Softsign
      • CLogLog
      • Logit
      • Gaussian CDF
      • RELU
      • GELU
      • Sign
      • Unit Step
      • Sinh
      • Cosh
      • Tanh
      • Csch
      • Sech
      • Coth
      • Arsinh
      • Arcosh
      • Artanh
      • Arcsch
      • Arsech
      • Arcoth
    2. Possible Optimization Algorithms
      • Batch Gradient Descent
      • Mini-Batch Gradient Descent
      • Stochastic Gradient Descent
      • Gradient Descent with Momentum
      • Nesterov Accelerated Gradient
      • Adagrad Optimizer
      • Adadelta Optimizer
      • Adam Optimizer
      • Adamax Optimizer
      • Nadam Optimizer
      • AMSGrad Optimizer
      • 2nd Order Newton-Raphson Optimizer*
      • Normal Equation*

      *Only available for linear regression
    3. Possible Loss Functions
      • MSE
      • RMSE
      • MAE
      • MBE
      • Log Loss
      • Cross Entropy
      • Hinge Loss
    4. Possible Regularization Methods
      • Lasso
      • Ridge
      • ElasticNet
    5. Possible Weight Initialization Methods
      • Uniform
      • Xavier Normal
      • Xavier Uniform
      • He Normal
      • He Uniform
      • LeCun Normal
      • LeCun Uniform
    6. Possible Learning Rate Schedulers
      • Time Based
      • Epoch Based
      • Step Based
      • Exponential
  3. Prebuilt Neural Networks
    1. Multilayer Peceptron
    2. Autoencoder
    3. Softmax Network
  4. Generative Modeling
    1. Tabular Generative Adversarial Networks
  5. Natural Language Processing
    1. Word2Vec (Continous Bag of Words, Skip-Gram)
    2. Stemming
    3. Bag of Words
    4. TFIDF
    5. Tokenization
    6. Auxiliary Text Processing Functions
  6. Computer Vision
    1. The Convolution Operation
    2. Max, Min, Average Pooling
    3. Global Max, Min, Average Pooling
    4. Prebuilt Feature Detectors
      • Horizontal/Vertical Prewitt Filter
      • Horizontal/Vertical Sobel Filter
      • Horizontal/Vertical Scharr Filter
      • Horizontal/Vertical Roberts Filter
      • Gaussian Filter
      • Harris Corner Detector
  7. Principal Component Analysis
  8. Naive Bayes Classifiers
    1. Multinomial Naive Bayes
    2. Bernoulli Naive Bayes
    3. Gaussian Naive Bayes
  9. Support Vector Classification
    1. Primal Formulation (Hinge Loss Objective)
    2. Dual Formulation (Via Lagrangian Multipliers)
  10. K-Means
  11. k-Nearest Neighbors
  12. Outlier Finder (Using z-scores)
  13. Matrix Decompositions
    1. SVD Decomposition
    2. Cholesky Decomposition
      • Positive Definiteness Checker
    3. QR Decomposition
  14. Numerical Analysis
    1. Numerical Diffrentiation
      • Univariate Functions
      • Multivariate Functions
    2. Jacobian Vector Calculator
    3. Hessian Matrix Calculator
    4. Function approximator
      • Constant Approximation
      • Linear Approximation
      • Quadratic Approximation
      • Cubic Approximation
    5. Diffrential Equations Solvers
      • Euler's Method
      • Growth Method
  15. Mathematical Transforms
    1. Discrete Cosine Transform
  16. Linear Algebra Module
  17. Statistics Module
  18. Data Processing Module
    1. Setting and Printing Datasets
    2. Feature Scaling
    3. Mean Normalization
    4. One Hot Representation
    5. Reverse One Hot Representation
    6. Supported Color Space Conversions
      • RGB to Grayscale
      • RGB to HSV
      • RGB to YCbCr
      • RGB to XYZ
      • XYZ to RGB
  19. Utilities
    1. TP, FP, TN, FN function
    2. Precision
    3. Recall
    4. Accuracy
    5. F1 score

What's in the Works?

ML++, like most frameworks, is dynamic, and constantly changing. This is especially important in the world of ML, as new algorithms and techniques are being developed day by day. Here are a couple of things currently being developed for ML++:

- Convolutional Neural Networks

- Kernels for SVMs

- Support Vector Regression

Citations

Various different materials helped me along the way of creating ML++, and I would like to give credit to several of them here. This article by TutorialsPoint was a big help when trying to implement the determinant of a matrix, and this article by GeeksForGeeks was very helpful when trying to take the adjoint and inverse of a matrix.

Issues
  • preformance_function error?

    preformance_function error?

    double Utilities::performance(std::vector<double> y_hat, std::vector<double> outputSet){
        double correct = 0;
        for(int i = 0; i < y_hat.size(); i++){
            if(std::round(y_hat[i]) == outputSet[i]){
                correct++;
            }
        }
        return correct/y_hat.size();
    }
    

    problem:std::round(y_hat[i]) == outputSet[i]???

    opened by algorithmconquer 1
  • logit function error

    logit function error

    return std::log(z / (1 - z)); this is my PR #10

    https://github.com/novak-99/MLPP/blob/aac9bd6479a26e092f16cd46e58d790da2bdba8a/MLPP/Activation/Activation.cpp#L224

    opened by SmartAI 1
  • Hyperbolic Activations

    Hyperbolic Activations

    In the README, it looks like you've (planned on) implemented a lot of hyperbolic functions as activation functions.

    If you don't mind me asking, are there specific cases where such functions with a diverging gradient (such as Cosh and Sinh) that you've implemented would be helpful? I am very curious, I haven't seen these used before. Thanks!

    opened by towermitten 1
  • Softmax Optimization

    Softmax Optimization

    https://github.com/novak-99/MLPP/blob/4ebcc0a1e3866e54d61933dcdbe4d5ce902ff6a7/MLPP/Activation/Activation.cpp#L52-L64

    Here is one specific example of code optimization: The softmax function here will give the correct answer but as it is currently written it is recalculating the same sum z.size() times. It would be much better to calculate the sum outside of the loop once and then reuse that value inside the loop without recalculating it.

    If we wanted to optimize even more we could look at the use of the exp() function. Even with the above fix the exponential of each element in z is being calculated twice (once for the sum and once for the final output element calculation). Assuming memory allocations and accesses are faster than the exp() function it would be better to make an intermediary array of the exponential values and then access that array to calculate the sum and then also to calculate values of a.

    The first optimization here will make a massive difference, so I think it is definitely worth keeping things like this in mind. The second one will have much less of an impact and goes a bit more into the weeds, so I would not worry too much about optimizations like that during the initial coding - I mention it here just to give a more complete idea of what kind of optimizations are possible even in a very simple function.

    opened by mdeib 1
  • Is MLPP reinventing the wheel? What would it be used for?

    Is MLPP reinventing the wheel? What would it be used for?

    Great work! Very interesting!

    In the README, you say that MLPP serves to revitalize C++ as a machine learning front-end. How does MLPP separate itself from the Pytorch C++ API? If you don't mind me asking, why not build wrappers around the already open-source and highly optimized Pytorch C++ code?

    Thanks!

    opened by towermitten 4
  • introduce CMAKE build

    introduce CMAKE build

    Hi Marc,

    I pulled your project into my CLion IDE (IntelliJ), looks very interesting! great work there.. are you really 16 years old!!!

    I have introduced a cmake file that would allow more portable and modern build approach and easy IDE agnostic integration. I am attaching it here. just drop it next to buildSo.sh. the cmake will configure two targets ( shared lib "mlpp" and mlpp_runner for your main.cpp that will link against the shared library). I also upped the C++ support to 20. CMakeLists.txt

    opened by bilaleluneis 0
  • Optimizing matrix multiplication

    Optimizing matrix multiplication

    Impressive work!

    You should swap the two inner loops here: https://github.com/novak-99/MLPP/blob/2a21d259997e44d80552b5c5842f05d4eae1d62a/MLPP/LinAlg/LinAlg.cpp#L80-L86

    That is:

     for(int i = 0; i < A.size(); i++){ 
         for(int k = 0; k < B.size(); k++){ 
             for(int j = 0; j < B[0].size(); j++){ 
                 C[i][j] += A[i][k] * B[k][j]; 
             } 
         } 
     } 
    

    It won't change the result, but it should speed up the multiplication. Explanations: https://viralinstruction.com/posts/hardware/#15f5c31a-8aef-11eb-3f19-cf0a4e456e7a

    Also, std::vector<std::vector<>> is not the best way to store a matrix: https://stackoverflow.com/a/55478808

    opened by Jonas1312 3
Releases(v1.0.2)
  • v1.0.2(Feb 13, 2022)

  • v1.0.1(Jan 29, 2022)

  • v1.0.0(Jan 22, 2022)

    First official release. Includes deep neural networks, special optimizers, computer vision algorithms, regression, PCA, linear algebra, statistics, and more.

    Source code(tar.gz)
    Source code(zip)
Owner
marc
16 year old aspiring machine learning engineer. C++ enthusiast and enjoyer of differential equations.
marc
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 176 Jul 21, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

The Fido Project 412 Jun 25, 2022
A C++ standalone library for machine learning

Flashlight: Fast, Flexible Machine Learning in C++ Quickstart | Installation | Documentation Flashlight is a fast, flexible machine learning library w

Facebook Research 4.4k Aug 2, 2022
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4k Aug 8, 2022
Flashlight is a C++ standalone library for machine learning

Flashlight is a fast, flexible machine learning library written entirely in C++ from the Facebook AI Research Speech team and the creators of Torch and Deep Speech.

null 4.4k Aug 6, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.3k Aug 3, 2022
null 5.6k Aug 12, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

File systems and Storage Lab (FSL) 178 Aug 1, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23k Aug 3, 2022
RNNLIB is a recurrent neural network library for sequence learning problems. Forked from Alex Graves work http://sourceforge.net/projects/rnnl/

Origin The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl while this "fork" is created to repeat results for the online handwriting

Sergey Zyrianov 873 Jul 14, 2022
Samsung Washing Machine replacing OS control unit

hacksung Samsung Washing Machine WS1702 replacing OS control unit More info at https://www.hackster.io/roni-bandini/dead-washing-machine-returns-to-li

null 24 May 12, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 32.8k Aug 6, 2022
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Language Machines 69 Jun 20, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Liu Liu 6.9k Aug 6, 2022
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

Chih-Jen Lin 4.2k Aug 4, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 63k Aug 3, 2022
oneAPI Data Analytics Library (oneDAL)

IntelĀ® oneAPI Data Analytics Library Installation | Documentation | Support | Examples | Samples | How to Contribute IntelĀ® oneAPI Data Analytics Libr

oneAPI-SRC 509 Aug 2, 2022
A C library for product recommendations/suggestions using collaborative filtering (CF)

Recommender A C library for product recommendations/suggestions using collaborative filtering (CF). Recommender analyzes the feedback of some users (i

Ghassen Hamrouni 250 Aug 5, 2022