mlpack: a scalable C++ machine learning library --

Overview

mlpack: a fast, flexible machine learning library
a fast, flexible machine learning library

Home | Documentation | Doxygen | Community | Help | IRC Chat

Jenkins Coveralls License NumFOCUS

Download: current stable version (3.4.2)

mlpack is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. In addition to its powerful C++ interface, mlpack also provides command-line programs, Python bindings, Julia bindings, Go bindings and R bindings.

mlpack uses an open governance model and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.


0. Contents

  1. Introduction
  2. Citation details
  3. Dependencies
  4. Building mlpack from source
  5. Running mlpack programs
  6. Using mlpack from Python
  7. Further documentation
  8. Bug reporting

1. Introduction

The mlpack website can be found at https://www.mlpack.org and it contains numerous tutorials and extensive documentation. This README serves as a guide for what mlpack is, how to install it, how to run it, and where to find more documentation. The website should be consulted for further information:

2. Citation details

If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):

@article{mlpack2018,
    title     = {mlpack 3: a fast, flexible machine learning library},
    author    = {Curtin, Ryan R. and Edel, Marcus and Lozhnikov, Mikhail and
                 Mentekidis, Yannis and Ghaisas, Sumedh and Zhang,
                 Shangtong},
    journal   = {Journal of Open Source Software},
    volume    = {3},
    issue     = {26},
    pages     = {726},
    year      = {2018},
    doi       = {10.21105/joss.00726},
    url       = {https://doi.org/10.21105/joss.00726}
}

Citations are beneficial for the growth and improvement of mlpack.

3. Dependencies

mlpack has the following dependencies:

  Armadillo      >= 8.400.0
  Boost (math_c99, spirit) >= 1.58.0
  CMake          >= 3.2.2
  ensmallen      >= 2.10.0
  cereal         >= 1.1.2

All of those should be available in your distribution's package manager. If not, you will have to compile each of them by hand. See the documentation for each of those packages for more information.

If you would like to use or build the mlpack Python bindings, make sure that the following Python packages are installed:

  setuptools
  cython >= 0.24
  numpy
  pandas >= 0.15.0

If you would like to build the Julia bindings, make sure that Julia >= 1.3.0 is installed.

If you would like to build the Go bindings, make sure that Go >= 1.11.0 is installed with this package:

 Gonum

If you would like to build the R bindings, make sure that R >= 4.0 is installed with these R packages.

 Rcpp >= 0.12.12
 RcppArmadillo >= 0.8.400.0
 RcppEnsmallen >= 0.2.10.0
 BH >= 1.58
 roxygen2

If the STB library headers are available, image loading support will be compiled.

If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.

4. Building mlpack from source

This document discusses how to build mlpack from source. These build directions will work for any Linux-like shell environment (for example Ubuntu, macOS, FreeBSD etc). However, mlpack is in the repositories of many Linux distributions and so it may be easier to use the package manager for your system. For example, on Ubuntu, you can install the mlpack library and command-line executables (e.g. mlpack_pca, mlpack_kmeans etc.) with the following command:

$ sudo apt-get install libmlpack-dev mlpack-bin

On Fedora or Red Hat (EPEL): $ sudo dnf install mlpack-devel mlpack-bin

Note: Older Ubuntu versions may not have the most recent version of mlpack available---for instance, at the time of this writing, Ubuntu 16.04 only has mlpack 3.4.2 available. Options include upgrading your Ubuntu version, finding a PPA or other non-official sources, or installing with a manual build.

There are some useful pages to consult in addition to this section:

mlpack uses CMake as a build system and allows several flexible build configuration options. You can consult any of the CMake tutorials for further documentation, but this tutorial should be enough to get mlpack built and installed.

First, unpack the mlpack source and change into the unpacked directory. Here we use mlpack-x.y.z where x.y.z is the version.

$ tar -xzf mlpack-x.y.z.tar.gz
$ cd mlpack-x.y.z

Then, make a build directory. The directory can have any name, but 'build' is sufficient.

$ mkdir build
$ cd build

The next step is to run CMake to configure the project. Running CMake is the equivalent to running ./configure with autotools. If you run CMake with no options, it will configure the project to build with no debugging symbols and no profiling information:

$ cmake ../

Options can be specified to compile with debugging information and profiling information:

$ cmake -D DEBUG=ON -D PROFILE=ON ../

Options are specified with the -D flag. The allowed options include:

DEBUG=(ON/OFF): compile with debugging symbols
PROFILE=(ON/OFF): compile with profiling symbols
ARMA_EXTRA_DEBUG=(ON/OFF): compile with extra Armadillo debugging symbols
BOOST_ROOT=(/path/to/boost/): path to root of boost installation
ARMADILLO_INCLUDE_DIR=(/path/to/armadillo/include/): path to Armadillo headers
ARMADILLO_LIBRARY=(/path/to/armadillo/libarmadillo.so): Armadillo library
BUILD_CLI_EXECUTABLES=(ON/OFF): whether or not to build command-line programs
BUILD_PYTHON_BINDINGS=(ON/OFF): whether or not to build Python bindings
PYTHON_EXECUTABLE=(/path/to/python_version): Path to specific Python executable
PYTHON_INSTALL_PREFIX=(/path/to/python/): Path to root of Python installation
BUILD_JULIA_BINDINGS=(ON/OFF): whether or not to build Julia bindings
JULIA_EXECUTABLE=(/path/to/julia): Path to specific Julia executable
BUILD_GO_BINDINGS=(ON/OFF): whether or not to build Go bindings
GO_EXECUTABLE=(/path/to/go): Path to specific Go executable
BUILD_GO_SHLIB=(ON/OFF): whether or not to build shared libraries required by Go bindings
BUILD_R_BINDINGS=(ON/OFF): whether or not to build R bindings
R_EXECUTABLE=(/path/to/R): Path to specific R executable
BUILD_TESTS=(ON/OFF): whether or not to build tests
BUILD_SHARED_LIBS=(ON/OFF): compile shared libraries as opposed to
   static libraries
DISABLE_DOWNLOADS=(ON/OFF): whether to disable all downloads during build
DOWNLOAD_ENSMALLEN=(ON/OFF): If ensmallen is not found, download it
ENSMALLEN_INCLUDE_DIR=(/path/to/ensmallen/include): path to include directory
   for ensmallen
DOWNLOAD_STB_IMAGE=(ON/OFF): If STB is not found, download it
STB_IMAGE_INCLUDE_DIR=(/path/to/stb/include): path to include directory for
   STB image library
USE_OPENMP=(ON/OFF): whether or not to use OpenMP if available
BUILD_DOCS=(ON/OFF): build Doxygen documentation, if Doxygen is available
   (default ON)

Other tools can also be used to configure CMake, but those are not documented here. See this section of the build guide for more details, including a full list of options, and their default values.

By default, command-line programs will be built, and if the Python dependencies (Cython, setuptools, numpy, pandas) are available, then Python bindings will also be built. OpenMP will be used for parallelization when possible by default.

Once CMake is configured, building the library is as simple as typing 'make'. This will build all library components as well as 'mlpack_test'.

$ make

If you do not want to build everything in the library, individual components of the build can be specified:

$ make mlpack_pca mlpack_knn mlpack_kfn

If the build fails and you cannot figure out why, register an account on Github and submit an issue. The mlpack developers will quickly help you figure it out:

mlpack on Github

Alternately, mlpack help can be found in IRC at #mlpack on chat.freenode.net.

If you wish to install mlpack to /usr/local/include/mlpack/, /usr/local/lib/, and /usr/local/bin/, make sure you have root privileges (or write permissions to those three directories), and simply type

$ make install

You can now run the executables by name; you can link against mlpack with -lmlpack and the mlpack headers are found in /usr/local/include/mlpack/ and if Python bindings were built, you can access them with the mlpack package in Python.

If running the programs (i.e. $ mlpack_knn -h) gives an error of the form

error while loading shared libraries: libmlpack.so.2: cannot open shared object file: No such file or directory

then be sure that the runtime linker is searching the directory where libmlpack.so was installed (probably /usr/local/lib/ unless you set it manually). One way to do this, on Linux, is to ensure that the LD_LIBRARY_PATH environment variable has the directory that contains libmlpack.so. Using bash, this can be set easily:

export LD_LIBRARY_PATH="/usr/local/lib/:$LD_LIBRARY_PATH"

(or whatever directory libmlpack.so is installed in.)

5. Running mlpack programs

After building mlpack, the executables will reside in build/bin/. You can call them from there, or you can install the library and (depending on system settings) they should be added to your PATH and you can call them directly. The documentation below assumes the executables are in your PATH.

Consider the 'mlpack_knn' program, which finds the k nearest neighbors in a reference dataset of all the points in a query set. That is, we have a query and a reference dataset. For each point in the query dataset, we wish to know the k points in the reference dataset which are closest to the given query point.

Alternately, if the query and reference datasets are the same, the problem can be stated more simply: for each point in the dataset, we wish to know the k nearest points to that point.

Each mlpack program has extensive help documentation which details what the method does, what each of the parameters is, and how to use them:

$ mlpack_knn --help

Running mlpack_knn on one dataset (that is, the query and reference datasets are the same) and finding the 5 nearest neighbors is very simple:

$ mlpack_knn -r dataset.csv -n neighbors_out.csv -d distances_out.csv -k 5 -v

The -v (--verbose) flag is optional; it gives informational output. It is not unique to mlpack_knn but is available in all mlpack programs. Verbose output also gives timing output at the end of the program, which can be very useful.

6. Using mlpack from Python

If mlpack is installed to the system, then the mlpack Python bindings should be automatically in your PYTHONPATH, and importing mlpack functionality into Python should be very simple:

>>> from mlpack import knn

Accessing help is easy:

>>> help(knn)

The API is similar to the command-line programs. So, running knn() (k-nearest-neighbor search) on the numpy matrix dataset and finding the 5 nearest neighbors is very simple:

>>> output = knn(reference=dataset, k=5, verbose=True)

This will store the output neighbors in output['neighbors'] and the output distances in output['distances']. Other mlpack bindings function similarly, and the input/output parameters exactly match those of the command-line programs.

7. Further documentation

The documentation given here is only a fraction of the available documentation for mlpack. If doxygen is installed, you can type make doc to build the documentation locally. Alternately, up-to-date documentation is available for older versions of mlpack:

8. Bug reporting

(see also mlpack help)

If you find a bug in mlpack or have any problems, numerous routes are available for help.

Github is used for bug tracking, and can be found at https://github.com/mlpack/mlpack/. It is easy to register an account and file a bug there, and the mlpack development team will try to quickly resolve your issue.

In addition, mailing lists are available. The mlpack discussion list is available at

mlpack discussion list

and the git commit list is available at

commit list

Lastly, the IRC channel #mlpack on Freenode can be used to get help.

Issues
  • [GSoC] Augmented RNN models - benchmarking framework

    [GSoC] Augmented RNN models - benchmarking framework

    This PR is part of my GSoC project "Augmented RNNs". Imeplemented:

    • class CopyTask for evaluating models on the sequence copy problem, showcasing benchmarking framework;
    • unit test for it (a simple non-ML model that is hardcoded to copy the sequence required number of times is expected to ace the CopyTask).
    opened by sidorov-ks 102
  • Swap boost::variant with vtable.

    Swap boost::variant with vtable.

    I updated the abstract class and also update the Linear layer as an example, there are various layer we have to update, so if anybody likes to work on some of the layers I listed below, comment on the PR. Unfortunately I can't enable commit permission to a specific branch. So to get the changes in you you can just fork the repository as usual create a new feature branch and do the changes, but instead of opening another PR, just post the link to the branch here and I cherry-pick the commit.

    Steps:

    1. Inherit the Layer class, each layer should implement the necessary functions that are relevant for the layer-specific computations and inherent the rest from the base class.
    2. Rename InputDataType to InputType and OutputDataType to OutputType, to make the interface more consistent with the rest of the codebase, rename the type for the input and output data.
    3. Use InputType and OutputType instead of arma::mat or arma::Mat<eT>, to make the layer work with the abstract class we have to follow the interface accordingly.
    4. Provide default layer type to hide some of the template functionalities that could be confusing for users that aren’t familiar with templates. So instead of using Linear<> all the time, a user can just use Linear. This is a result of https://github.com/mlpack/mlpack/issues/2524#issuecomment-664776530.
    5. Update the tests to use the updated interface.

    Example: For an example checkout the Linear layer.

    Here is a list of layers we have to update:

    • [x] adaptive_max_pooling.hpp - @Aakash-kaushik
    • [x] adaptive_mean_pooling.hpp - @Aakash-kaushik
    • [x] add.hpp - @Aakash-kaushik
    • [x] add_merge.hpp - @Aakash-kaushik
    • [x] alpha_dropout.hpp - @Aakash-kaushik
    • [x] atrous_convolution.hpp - @Aakash-kaushik
    • [x] batch_norm.hpp - @Aakash-kaushik
    • [x] base_layer.hpp - @mrityunjay-tripathi
    • [x] bilinear_interpolation.hpp - @mrityunjay-tripathi
    • [x] c_relu.hpp - @zoq
    • [x] celu.hpp - @zoq
    • [x] concat.hpp - @mrityunjay-tripathi
    • [ ] concat_performance.hpp - @hello-fri-end
    • [x] concatenate.hpp - @mrityunjay-tripathi
    • [x] constant.hpp - @zoq
    • [x] convolution.hpp - @mrityunjay-tripathi
    • [x] dropconnect.hpp - @zoq
    • [x] dropout.hpp - @zoq
    • [x] elu.hpp - @zoq
    • [x] fast_lstm.hpp - @mrityunjay-tripathi
    • [x] flexible_relu.hpp - @zoq
    • [x] glimpse.hpp - @mrityunjay-tripathi
    • [ ] gru.hpp - @zoq
    • [x] hard_tanh.hpp - @zoq
    • [x] hardshrink.hpp - @zoq
    • [x] highway.hpp - @mrityunjay-tripathi
    • [x] join.hpp - @mrityunjay-tripathi
    • [x] layer_norm.hpp - @mrityunjay-tripathi
    • [x] leaky_relu.hpp - @zoq
    • [x] linear.hpp - @zoq
    • [x] linear3d.hpp - @mrityunjay-tripathi
    • [x] linear_no_bias.hpp - @zoq
    • [x] log_softmax.hpp - @zoq
    • [x] lookup.hpp - @mrityunjay-tripathi
    • [ ] lstm.hpp - @zoq
    • [x] max_pooling.hpp - @mrityunjay-tripathi
    • [x] mean_pooling.hpp - @mrityunjay-tripathi
    • [ ] minibatch_discrimination.hpp - @hello-fri-end
    • [x] multihead_attention.hpp - @mrityunjay-tripathi
    • [x] multiply_constant.hpp - @zoq
    • [x] multiply_merge.hpp - @mrityunjay-tripathi
    • [x] noisylinear.hpp - @zoq
    • [x] padding.hpp - @mrityunjay-tripathi
    • [x] parametric_relu.hpp - @zoq
    • [x] positional_encoding.hpp - @mrityunjay-tripathi
    • [x] radial_basis_function.hpp - @hello-fri-end
    • [ ] recurrent.hpp - @kaushal07wick
    • [ ] recurrent_attention.hpp - @kaushal07wick
    • [x] reinforce_normal.hpp - @mrityunjay-tripathi
    • [x] reparametrization.hpp - @mrityunjay-tripathi
    • [x] select.hpp - @mrityunjay-tripathi
    • [x] sequential.hpp - @mrityunjay-tripathi
    • [x] softmax.hpp - @zoq
    • [x] softmin.hpp - @zoq
    • [x] softshrink.hpp - @zoq
    • [x] spatial_dropout.hpp - @zoq
    • [x] subview.hpp - @mrityunjay-tripathi
    • [x] transposed_convolution.hpp - @mrityunjay-tripathi
    • [x] virtual_batch_norm.hpp - @mrityunjay-tripathi
    • [x] vr_class_reward.hpp - @mrityunjay-tripathi
    • [x] weight_norm.hpp - @mrityunjay-tripathi

    I left the base layer since I'm not sure yet if it makes sense to implement them as an independent class.


    Building upon the work from @Aakash-kaushik we can get a first impression of the advantage of using boost::visitor compared with a virtual inheritance approach (#2647)

    Note we stripped basically everything out, except the FNN class, linear layer, FlexibleReLU layer, LogSoftMax layer; which enables us to get a first impression about what timings we can expect from a virtual inheritance approach.

    I tested two scenarios, but used the same network for each:

    FFN<> model;
    model.Add<Linear<> >(trainData.n_rows, 128);
    model.Add<FlexibleReLU<> >();
    model.Add<Linear<> >(128, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 256);
    model.Add<Linear<> >(256, 512);
    model.Add<Linear<> >(512, 2048);
    model.Add<Linear<> >(2048, 512);
    model.Add<Linear<> >(512, 8);
    model.Add<Linear<> >(8, 3);
    model.Add<LogSoftMax<> >();
    

    Scenario - 1

    batch-size: 1 iterations: 10000 trials: 10

    vtable - DEBUG=ON

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 494.485s
    elapsed time: 503.777s
    elapsed time: 496.802s
    elapsed time: 499.928s
    elapsed time: 502.504s
    elapsed time: 495.735s
    elapsed time: 495.745s
    elapsed time: 505.284s
    elapsed time: 495.32s
    elapsed time: 495.209s
    --------------------------------------
    elapsed time averaged(10): 498.479s
    

    boost::variant - DEBUG=ON

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 496.419s
    elapsed time: 495.27s
    elapsed time: 494.769s
    elapsed time: 494.922s
    elapsed time: 497.729s
    elapsed time: 497.464s
    elapsed time: 498.024s
    elapsed time: 501.722s
    elapsed time: 500.59s
    elapsed time: 497.925s
    --------------------------------------                                                                                                                                                                                                                                                                                       
    elapsed time averaged (10): 497.483s   
    

    vtable - DEBUG=OFF

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 199.713s
    elapsed time: 205.177s
    elapsed time: 200.135s
    elapsed time: 200.179s
    elapsed time: 205.792s
    elapsed time: 198.293s
    elapsed time: 198.535s
    elapsed time: 206.635s
    elapsed time: 198.263s
    elapsed time: 198.521s
    --------------------------------------
    elapsed time averaged(10): 201.124s
    

    boost::variant - DEBUG=OFF

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 198.645s
    elapsed time: 194.854s
    elapsed time: 194.748s
    elapsed time: 194.983s
    elapsed time: 197.42s
    elapsed time: 196.864s
    elapsed time: 197.454s
    elapsed time: 204.318s
    elapsed time: 201.076s
    elapsed time: 200.549s
    --------------------------------------
    elapsed time averaged (10): 198.091s
    

    Scenario - 2

    batch-size: 32 iterations: 10000 trials: 10

    vtable - DEBUG=ON

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 70.4116s
    elapsed time: 70.5631s
    elapsed time: 70.682s
    elapsed time: 70.5635s
    elapsed time: 71.2245s
    elapsed time: 71.1649s
    elapsed time: 71.4714s
    elapsed time: 71.2688s
    elapsed time: 71.3348s
    elapsed time: 71.3406s
    --------------------------------------
    elapsed time averaged(10): 71.0025s
    

    boost::variant - DEBUG=ON

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 70.3247s
    elapsed time: 70.5059s
    elapsed time: 70.5368s
    elapsed time: 70.5208s
    elapsed time: 70.4539s
    elapsed time: 70.788s
    elapsed time: 70.7692s
    elapsed time: 70.9473s
    elapsed time: 70.9146s
    elapsed time: 70.7278s
    --------------------------------------
    elapsed time averaged (10): 70.6489s
    

    vtable - DEBUG=OFF

    mlpack version: mlpack git-aa6d2b1aa
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 59.7968s
    elapsed time: 59.4626s
    elapsed time: 59.9147s
    elapsed time: 59.9682s
    elapsed time: 60.5511s
    elapsed time: 60.2109s
    elapsed time: 60.7782s
    elapsed time: 60.4981s
    elapsed time: 60.719s
    elapsed time: 60.7632s
    --------------------------------------
    elapsed time averaged(10): 60.2663s
    

    boost::variant - DEBUG=OFF

    mlpack version: mlpack git-4d01fe5e9
    armadillo version: 9.200.7 (Carpe Noctem)
    Filters: FFVanillaNetworkTest
    elapsed time: 60.8466s
    elapsed time: 61.0629s
    elapsed time: 61.1269s
    elapsed time: 60.7426s
    elapsed time: 60.8178s
    elapsed time: 60.7287s
    elapsed time: 60.864s
    elapsed time: 60.8982s
    elapsed time: 60.9232s
    elapsed time: 60.8519s
    --------------------------------------
    elapsed time averaged (10): 60.8863s
    

    Looking at the timings, boost::variant doesn't provide the speedup I thought it would, on top of that the little speedup we would gain with boost::variant is marginal in comparison to the actual calculation.

    help wanted c: methods update dependencies 
    opened by zoq 98
  • Adding All Loss Functions

    Adding All Loss Functions

    Hello, I was going through loss functions and managed to get a list of loss functions that aren't implemented yet. I found these using pytorch and tensor flow kindly refer for more informations. The list goes as:

    1. HingeEmbedding Loss (taken by me)
    2. CosineEmbedding Loss (taken up by @kartikdutt18)
    3. MultiLabelMargin Loss
    4. TripletMargin Loss
    5. L1 Loss
    6. BCE Loss

    This might not be complete list. I will update this list as I find more. I hope this is ok with the community. Kindly feel free to take up any of the idle loss functions here. Thank You. :)

    help wanted good first issue s: stale c: methods 
    opened by ojhalakshya 93
  • ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found

    -- The C compiler identification is GNU 4.8.1 -- The CXX compiler identification is GNU 4.8.1 -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- Check for working C compiler: /usr/usc/gnu/gcc/4.8.1/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- Check for working CXX compiler: /usr/usc/gnu/gcc/4.8.1/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Checking for C++11 compiler -- Checking for C++11 compiler - available -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include
    CMake Error at CMake/FindArmadillo.cmake:327 (message): ARMADILLO_INCLUDE_DIR-NOTFOUND/armadillo_bits/config.hpp not found! Cannot determine what to link against. Call Stack (most recent call first): CMakeLists.txt:113 (find_package)

    how can I solve this problem? thanks a lot.

    t: question s: answered 
    opened by acgtun 83
  • Implementation of SPSA optimizer

    Implementation of SPSA optimizer

    As of now, I have just created the basic files necessary to implement the optimizer for the sake of creating the PR... I'll push the code in the subsequent commits :v:

    opened by Rajiv2605 79
  • Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    Adapting armadillo's parser for mlpack(Removing Boost Dependencies)

    For background knowledge, look at these

    Sample code to use the feature

    #include <iostream>
    #include <mlpack/core.hpp>
    
    int main()
    {
      arma::Mat<double> data;
      std::fstream file;
      
      file.open("data.csv");
      mlpack::data::load_data<double>(data, arma::csv_ascii, file);
      data.raw_print();
      
      return 0;  
    }
    
    c: core update dependencies 
    opened by heisenbuug 74
  • Addition of Essential Metrics Only.

    Addition of Essential Metrics Only.

    This is a good first issue and will help new contributors to get familiar with the codebase. Also This issue doesn't aim to add all Metrics to mlpack since each metric would have to be maintained, this aims to add metrics that either I find essential (or have used a couple of time) or those metrics which are very common. List of metrics that can be added include:

    1. IoU and meanIoU (Taken up by me)
    2. SSIM (Useful when you augment data and need to ensure that you don't augment it to an extent such that it becomes irrelevant. I used this in medical scans where there was heavy bias so I used as metric to find right augmentation parameters to perform oversampling [set augmentation parameters s.t. (average SSIM) > threshold] to automate process a bit). I think @ojhalakshya is working on it.

    Other interesting metrics would:

    1. r metric
    2. Top K Accuracy metric
    3. ~RMSE (Already implemented)~
    4. [Maybe, Not really sure about this.] Sparse Top K Accuracy

    In case some of these are implemented, please forgive my ignorance. Also anyone who starts working on them please check the following:

    1. Has it been implemented.
    2. Is there a PR open for this.
    3. Is this taken up by some one.

    This is especially necessary for functions like RMSE, r metric. Sorry for increasing workload of members and contributors, I think at least some of them will be nice additions. Thanks.

    t: feature request help wanted good first issue s: stale c: methods 
    opened by kartikdutt18 70
  • Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Resolve Comments in Go Bindings(#1492) and Add Markdown Documentation

    Hi @rcurtin I have tired to resolve some of the comments in PR#1492 and also add Markdown Documentation for Go Bindings.

    DONE:

    • [x] Build a fully working Go binding using make go.
    • [x] Configure CMake with cmake ../, which would find Go using FindGo.cmake.
    • [x] Add Markdown Documentation for Go Bindings.
    • [x] Resolve underscores to camelcase
    • [x] Tried to avoid unnecessary copies.
    • [x] Resolve output in arma_util.cpp , that was going out of scope.
    • [x] Removing unnecessary inputOptions and outputOptions.
    • [x] Resolve documentation for multiple outputs.
    • [x] Add Some getter and setter method for Umat,Urow and Ucol
    • [x] Add test for Umat ,Urow and Ucol
    • [x] Resolve Style issues(lines less than 80 characters) in go_binding_test.go
    • [x] Add vector of strings and int parameter type and added their tests.
    • [x] Add matrix with dataset info parameter type.
    s: keep open c: automatic bindings t: added feature 
    opened by Yashwants19 68
  • Algorithm yet to be implemented

    Algorithm yet to be implemented

    Hi there, I am interested in implementing an algorithm or a feature in mlpack which hasn't been implemented yet. It would be great if you could suggest any :smile:

    opened by Rajiv2605 61
  • Build scripts for Python bindings are not correct [Windows]

    Build scripts for Python bindings are not correct [Windows]

    Issue description

    Attempting to build python bindings on Windows using Visual Studio 2017 fails due to several issues:

    1. When using the flag BUILD_PYTHON_BINDINGS, CMake still shows a warning about not building python bindings, even though the bindings will be generated (not a roadblock).
    2. When the flag BUILD_PYTHON_BINDINGS is ON, the library will be built statically by default. I presume the python bindings require mlpack as a DLL? In that case, -DBUILD_SHARED_LIBS=ON must be enforced.
    3. Line 106 of setup.py refers to an invalid path. E.g. package_dir={ '': 'C:/mlpack/build/src/mlpack/bindings/python/' } This path ends in a slash which is not valid in a python package. What is more, I believe this path should be relative. If so, it should be replaced by: package_dir={ '': '.' },
    4. The linker expects to find mlpack and boost libraries in C:\mlpack\build\lib but this directory doesn't exist as a result of an mlpack build. Therefore, the directory needs to be manually created and populated with the following libraries: boost_serialization.lib, libboost_program_options-vc141-mt-1_65_1.lib, libboost_serialization-vc141-mt-1_65_1.lib, mlpack.dll, mlpack.lib
    5. After fixing issues 1 to 4, build will be successful. However, the resulting python package will fail to import mlpack with the following error: ImportError: cannot import name 'test_python_binding' from 'mlpack.test_python_binding' (C:\mlpack\build\src\mlpack\bindings\python\mlpack\test_python_binding.cp37-win_amd64.pyd)

    Your environment

    • version of mlpack: master branch April 19 (3.0.5)
    • operating system: windows 10 64 bits
    • compiler: MSVC 14.1
    • version of dependencies (Boost/Armadillo): boost 1.65.1, armadillo-9.300.2, OpenBLAS.0.2.14.1
    • any other environment information you think is relevant: miniconda3 (python 3.7.1)

    Steps to reproduce

    1. Clone master branch
    2. Run cmake including the flags: -DBUILD_PYTHON_BINDINGS=ON -DBUILD_SHARED_LIBS=ON
    3. Open solution with Visual Studio 2017 and build

    Expected behavior

    To successfully build python bindings AND the egg package to work (be able to import mlpack in python)

    Actual behavior

    Build failures (when workarounds are applied, the resulting package doesn't work)

    s: fixed t: bug report c: build system 
    opened by gmanlan 60
  • Add tests for command-line and Python bindings

    Add tests for command-line and Python bindings

    This is a great issue for anyone looking to get involved with mlpack development. No huge amount of C++ or machine learning knowledge is necessary; in order to write a test, essentially all you need to know is how the particular binding you are writing a test for works, and you can use the existing test code as boilerplate for the new tests. So I think this is a good place to get started.

    If you don't know about the automatic bindings system, you can read about it at http://www.mlpack.org/docs/mlpack-git/doxygen/bindings.html .

    In essence, the problem is that we have all of these command-line programs and Python bindings like mlpack_knn, mlpack_sparse_coding, and so forth, but these are not rigorously tested. Thanks to the automatic binding system, we now have an easy way to test these, which allows us to avoid all kinds of little bugs that have happened in the past where a command-line program might access the name of a variable incorrectly or something like this. However, the tests have not been written yet. So the purpose of this issue is to ensure that these tests actually do get written.

    So, for example, let's take a look at a test we already have for the PCA binding. This is from src/mlpack/tests/pca_test.cpp.

    /**
     * Check that we can't specify an invalid new dimensionality.
     */
    BOOST_AUTO_TEST_CASE(PCATooHighNewDimensionalityTest)
    { 
      arma::mat x = arma::randu<arma::mat>(5, 5);
      
      SetInputParam("input", std::move(x));
      SetInputParam("new_dimensionality", (int) 7); // Invalid.
    
      Log::Fatal.ignoreInput = true;
      BOOST_REQUIRE_THROW(mlpackMain(), std::runtime_error);
      Log::Fatal.ignoreInput = false;
    }
    

    This is a pretty simple test, which just checks that a user can't ask for a new dimensionality that is too high, and this is the kind of test we're looking for here. We don't need to check if the exact output of the algorithm is correct---we are already doing that in the other tests we have for the algorithms. Instead we want to test the input/output of these bindings.

    So, if you'd like to write a test, the basic procedure is this:

    • Pick an uncompleted binding from the list below (I am trying to keep this up to date):

      • [x] adaboost (src/mlpack/methods/adaboost_main.cpp): AdaBoost
      • [x] approx_kfn (src/mlpack/methods/approx_kfn_main.cpp): approximate k-nearest-neighbor search
      • [x] cf (src/mlpack/methods/cf/cf_main.cpp): collaborative filtering
      • [x] dbscan (src/mlpack/methods/dbscan/dbscan_main.cpp): DBSCAN clustering
      • [x] decision_tree (src/mlpack/methods/decision_tree/decision_tree_main.cpp): decision trees
      • [x] det (src/mlpack/methods/det/det_main.cpp): density estimation trees
      • [x] emst (src/mlpack/methods/emst/emst_main.cpp): Euclidean minimum spanning tree calculation
      • [x] fastmks (src/mlpack/methods/fastmks/fastmks_main.cpp): fast max-kernel search (#1356, needs someone to finish the work; you can use that PR as a starting point)
      • [x] gmm_generate (src/mlpack/methods/gmm/gmm_generate_main.cpp): generate samples from a GMM
      • [x] gmm_probability (src/mlpack/methods/gmm/gmm_probability_main.cpp): estimate the probability of a sample coming from a GMM
      • [x] gmm_train (src/mlpack/methods/gmm/gmm_train_main.cpp): train a GMM on data
      • [x] hmm_generate (src/mlpack/methods/hmm/hmm_generate_main.cpp): generate samples from an HMM
      • [x] hmm_loglik (src/mlpack/methods/hmm/hmm_loglik_main.cpp): find the log-likelihood of a sequence coming from an HMM
      • [x] hmm_train (src/mlpack/methods/hmm/hmm_train_main.cpp): train an HMM on data
      • [x] hmm_viterbi (src/mlpack/methods/hmm/hmm_viterbi_main.cpp): predict the hidden state sequence with an HMM
      • [x] hoeffding_tree (src/mlpack/methods/hoeffding_trees/hoeffding_tree_main.cpp): streaming decision trees
      • [x] kernel_pca (src/mlpack/methods/kernel_pca/kernel_pca_main.cpp): kernel PCA (#1600)
      • [x] kfn (src/mlpack/methods/neighbor_search/kfn_main.cpp): k-furthest neighbors search (#1263)
      • [x] kmeans (src/mlpack/methods/kmeans/kmeans_main.cpp): k-means clustering
      • [x] knn (src/mlpack/methods/neighbor_search/knn_main.cpp): k-nearest neighbors search (#1263)
      • [x] linear_regression (src/mlpack/methods/linear_regression/linear_regression_main.cpp): linear regression
      • [x] local_coordinate_coding (src/mlpack/methods/local_coordinate_coding/lcc_main.cpp): local coordinate coding
      • [x] logistic_regression (src/mlpack/methods/logistic_regression/logistic_regression_main.cpp): logistic regression
      • [x] lsh (src/mlpack/methods/lsh/lsh_main.cpp): locality-sensitive hashing
      • [x] mean_shift (src/mlpack/methods/mean_shift/mean_shift_main.cpp): mean shift clustering
      • [x] nbc (src/mlpack/methods/naive_bayes/nbc_main.cpp): naive Bayes classifier
      • [x] nca (src/mlpack/methods/nca/nca_main.cpp): neighborhood components analysis (#1369)
      • [x] nmf (src/mlpack/methods/nmf/nmf_main.cpp): non-negative matrix factorization
      • [x] pca (src/mlpack/methods/pca/pca_main.cpp): principal components analysis
      • [x] perceptron (src/mlpack/methods/perceptron/perceptron_main.cpp): perceptrons
      • [x] preprocess_binarize (src/mlpack/methods/preprocess/preprocess_binarize_main.cpp): binarize data
      • [x] preprocess_describe (src/mlpack/methods/preprocess/preprocess_describe_main.cpp): describe a dataset (#1337)
      • [x] preprocess_imputer (src/mlpack/methods/preprocess/preprocess_imputer_main.cpp): impute values into a dataset
      • [x] preprocess_split (src/mlpack/methods/preprocess/preprocess_split_main.cpp): split a dataset into a training and test set
      • [x] radical (src/mlpack/methods/radical/radical_main.cpp): an independent components analysis technique
      • [x] random_forest (src/mlpack/methods/random_forest/random_forest_main.cpp): random forests for classification
      • [x] range_search (src/mlpack/methods/range_search/range_search_main.cpp): range search
      • [x] softmax_regression (src/mlpack/methods/softmax_regression/softmax_regression_main.cpp): softmax regression
      • [x] sparse_coding (src/mlpack/methods/sparse_coding/sparse_coding_main.cpp): sparse coding
    • Use and understand the options for the binding that you have chosen, either from the command-line or from Python.

    • Based on some of the other examples in src/mlpack/tests/main_tests/, add a new file that tests the binding you chose. Focus on tests like:

      • Ensuring errors are given when the user inputs invalid parameters.
      • Make sure that saved models can be reused in future invocations of the binding.
      • Make sure that output matrices have the expected shape.
      • Don't write tests to check the exact correctness of the algorithm---we already have those, so there is no need. :)
    • When you have good coverage of the binding, go ahead and open a PR and we can review it and get it merged.

    In addition, there's no need to ask whether or not you can work on a particular test. If no PR is open for the given binding and there is no test already in the repository, you are free to work on it and open a PR for it. I'll try to keep the list above up to date.

    s: fixed t: feature request help wanted good first issue s: keep open c: binding c: testing 
    opened by rcurtin 58
  • Make mlpack completely header-only

    Make mlpack completely header-only

    This PR does the last step---removes libmlpack.so entirely. This will probably require some adaptation downstream in the examples and models repositories, but, that should be pretty easy (just don't link against libmlpack.so anymore).

    Most of this PR is CMake reconfiguration and simplification: now that there is no libmlpack.so, there's a lot less that we have to do.

    A shortlist of notable modifications here:

    • libmlpack.so is gone, and so linking for bindings and tests is now a little bit simpler.

    • The arma_config_check.hpp file, which made sure that the same compilation settings were used in libmlpack.so were used when mlpack was included, are no longer necessary, and so all related CMake infrastructure has been removed.

    • The pkgconfig generator is modified so it no longer includes -lmlpack in the linker command.

    • mlpack_export.hpp is no longer needed---so everything related to that is now gone.

    • Documentation is updated to reflect that mlpack is now header-only (maybe it could be updated in more places---I would be interested in people's comments on where to do that).

    • Finally, there is now only one source file that ever changes as a result of CMake: src/mlpack/util/gitversion.hpp. This is already generated directly into src/, and not into build/include/, so there is no compelling reason to make the first step of every build to copy every mlpack header into build/include/. Therefore, I removed the mlpack_headers target, and now there is no more step to copy all of the headers. This should accelerate builds, I hope, or at least remove some of the tedium... the only "downside" is that users used to including the build/include/ directory (if they build mlpack without installing, for instance, like I often do), will need to just include src/ instead---a minor change.

    c: build system c: core t: added feature 
    opened by rcurtin 1
  • Fix DET cross-validation loop

    Fix DET cross-validation loop

    Thanks to @SuvarshaChennareddy for pointing this out in #3238!

    In fe7d1039 (that commit is from 2012!), I made some changes to the density estimation tree cross-validation code. When I did that, I inadvertently changed the loop that maximized negative error to a loop that maximized error---that's backwards! So, I went through the original code before that commit, compared it with the changes in the commit, and this revealed the issue. I inverted the computation of thisError, and renamed it more accurately to thisNegError.

    (I also changed some +=s to = for clarity, since those values were only ever assigned once.)

    c: methods t: bugfix 
    opened by rcurtin 0
  • [WIP] Proximal Policy Optimization

    [WIP] Proximal Policy Optimization

    • [x] Added basic skeleton
    • [] Write Update() , SelectAction(), Episode()
    • [] Add epsilon in training_config.hpp
    • [] Implement Tests
    • [] Implement continuous environment (if required )
    opened by eshaanagarwal 0
Releases(3.4.2)
  • 3.4.2(Oct 28, 2020)

    Released Oct. 28, 2020.

    • Added Mean Absolute Percentage Error.
    • Added Softmin activation function as layer in ann/layer.
    • Fix spurious ARMA_64BIT_WORD compilation warnings on 32-bit systems (#2665).
    Source code(tar.gz)
    Source code(zip)
  • 3.4.1(Sep 7, 2020)

    Released Sep. 7, 2020.

    • Fix incorrect parsing of required matrix/model parameters for command-line bindings (#2600).

    • Add manual type specification support to data::Load() and data::Save() (#2084, #2135, #2602).

    • Remove use of internal Armadillo functionality (#2596, #2601, #2602).

    Source code(tar.gz)
    Source code(zip)
  • 3.4.0(Sep 1, 2020)

    Released Sept. 1st, 2020.

    • Issue warnings when metrics produce NaNs in KFoldCV (#2595).

    • Added bindings for R during Google Summer of Code (#2556).

    • Added common striptype function for all bindings (#2556).

    • Refactored common utility function of bindings to bindings/util (#2556).

    • Renamed InformationGain to HoeffdingInformationGain in methods/hoeffding_trees/information_gain.hpp (#2556).

    • Added macro for changing stream of printing and warnings/errors (#2556).

    • Added Spatial Dropout layer (#2564).

    • Force CMake to show error when it didn't find Python/modules (#2568).

    • Refactor ProgramInfo() to separate out all the different information (#2558).

    • Add bindings for one-hot encoding (#2325).

    • Added Soft Actor-Critic to RL methods (#2487).

    • Added Categorical DQN to q_networks (#2454).

    • Added N-step DQN to q_networks (#2461).

    • Add Silhoutte Score metric and Pairwise Distances (#2406).

    • Add Go bindings for some missed models (#2460).

    • Replace boost program_options dependency with CLI11 (#2459).

    • Additional functionality for the ARFF loader (#2486); use case sensitive categories (#2516).

    • Add bayesian_linear_regression binding for the command-line, Python, Julia, and Go. Also called "Bayesian Ridge", this is equivalent to a version of linear regression where the regularization parameter is automatically tuned (#2030).

    • Fix defeatist search for spill tree traversals (#2566, #1269).

    • Fix incremental training of logistic regression models (#2560).

    • Change default configuration of BUILD_PYTHON_BINDINGS to OFF (#2575).

    Source code(tar.gz)
    Source code(zip)
  • 3.3.2(Jun 18, 2020)

    Released June 18, 2020.

    • Added Noisy DQN to q_networks (#2446).

    • Add [preview release of] Go bindings (#1884).

    • Added Dueling DQN to q_networks, Noisy linear layer to ann/layer and Empty loss to ann/loss_functions (#2414).

    • Storing and adding accessor method for action in q_learning (#2413).

    • Added accessor methods for ANN layers (#2321).

    • Addition of Elliot activation function (#2268).

    • Add adaptive max pooling and adaptive mean pooling layers (#2195).

    • Add parameter to avoid shuffling of data in preprocess_split (#2293).

    • Add MatType parameter to LSHSearch, allowing sparse matrices to be used for search (#2395).

    • Documentation fixes to resolve Doxygen warnings and issues (#2400).

    • Add Load and Save of Sparse Matrix (#2344).

    • Add Intersection over Union (IoU) metric for bounding boxes (#2402).

    • Add Non Maximal Supression (NMS) metric for bounding boxes (#2410).

    • Fix no_intercept and probability computation for linear SVM bindings (#2419).

    • Fix incorrect neighbors for k > 1 searches in approx_kfn binding, for the QDAFN algorithm (#2448).

    • Add RBF layer in ann module to make RBFN architecture (#2261).

    Source code(tar.gz)
    Source code(zip)
  • 3.3.1(Apr 30, 2020)

    Released April 29th, 2020.

    • Minor Julia and Python documentation fixes (#2373).

    • Updated terminal state and fixed bugs for Pendulum environment (#2354, #2369).

    • Added EliSH activation function (#2323).

    • Add L1 Loss function (#2203).

    • Pass CMAKE_CXX_FLAGS (compilation options) correctly to Python build (#2367).

    • Expose ensmallen Callbacks for sparseautoencoder (#2198).

    • Bugfix for LARS class causing invalid read (#2374).

    • Add serialization support from Julia; use mlpack.serialize() and mlpack.deserialize() to save and load from IOBuffers.

    Source code(tar.gz)
    Source code(zip)
  • 3.3.0(Apr 7, 2020)

    Released April 7th, 2020.

    • Templated return type of Forward function of loss functions (#2339).

    • Added R2 Score regression metric (#2323).

    • Added mean squared logarithmic error loss function for neural networks (#2210).

    • Added mean bias loss function for neural networks (#2210).

    • The DecisionStump class has been marked deprecated; use the DecisionTree class with NoRecursion=true or use ID3DecisionStump instead (#2099).

    • Added probabilities_file parameter to get the probabilities matrix of AdaBoost classifier (#2050).

    • Fix STB header search paths (#2104).

    • Add DISABLE_DOWNLOADS CMake configuration option (#2104).

    • Add padding layer in TransposedConvolutionLayer (#2082).

    • Fix pkgconfig generation on non-Linux systems (#2101).

    • Use log-space to represent HMM initial state and transition probabilities (#2081).

    • Add functions to access parameters of Convolution and AtrousConvolution layers (#1985).

    • Add Compute Error function in lars regression and changing Train function to return computed error (#2139).

    • Add Julia bindings (#1949). Build settings can be controlled with the BUILD_JULIA_BINDINGS=(ON/OFF) and JULIA_EXECUTABLE=/path/to/julia CMake parameters.

    • CMake fix for finding STB include directory (#2145).

    • Add bindings for loading and saving images (#2019); mlpack_image_converter from the command-line, mlpack.image_converter() from Python.

    • Add normalization support for CF binding (#2136).

    • Add Mish activation function (#2158).

    • Update init_rules in AMF to allow users to merge two initialization rules (#2151).

    • Add GELU activation function (#2183).

    • Better error handling of eigendecompositions and Cholesky decompositions (#2088, #1840).

    • Add LiSHT activation function (#2182).

    • Add Valid and Same Padding for Transposed Convolution layer (#2163).

    • Add CELU activation function (#2191)

    • Add Log-Hyperbolic-Cosine Loss function (#2207)

    • Change neural network types to avoid unnecessary use of rvalue references (#2259).

    • Bump minimum Boost version to 1.58 (#2305).

    • Refactor STB support so HAS_STB macro is not needed when compiling against mlpack (#2312).

    • Add Hard Shrink Activation Function (#2186).

    • Add Soft Shrink Activation Function (#2174).

    • Add Hinge Embedding Loss Function (#2229).

    • Add Cosine Embedding Loss Function (#2209).

    • Add Margin Ranking Loss Function (#2264).

    • Bugfix for incorrect parameter vector sizes in logistic regression and softmax regression (#2359).

    Source code(tar.gz)
    Source code(zip)
  • 3.2.1(Nov 26, 2019)

    Released Oct. 1, 2019. (But I forgot to release it on Github; sorry about that.)

    • Enforce CMake version check for ensmallen #2032.
    • Fix CMake check for Armadillo version #2029.
    • Better handling of when STB is not installed #2033.
    • Fix Naive Bayes classifier computations in high dimensions #2022.
    Source code(tar.gz)
    Source code(zip)
  • 3.2.0(Sep 26, 2019)

    Released Sept. 25, 2019.

    • Fix occasionally-failing RADICAL test (#1924).

    • Fix gcc 9 OpenMP compilation issue (#1970).

    • Added support for loading and saving of images (#1903).

    • Add Multiple Pole Balancing Environment (#1901, #1951).

    • Added functionality for scaling of data (#1876); see the command-line binding mlpack_preprocess_scale or Python binding preprocess_scale().

    • Add new parameter maximum_depth to decision tree and random forest bindings (#1916).

    • Fix prediction output of softmax regression when test set accuracy is calculated (#1922).

    • Pendulum environment now checks for termination. All RL environments now have an option to terminate after a set number of time steps (no limit by default) (#1941).

    • Add support for probabilistic KDE (kernel density estimation) error bounds when using the Gaussian kernel (#1934).

    • Fix negative distances for cover tree computation (#1979).

    • Fix cover tree building when all pairwise distances are 0 (#1986).

    • Improve KDE pruning by reclaiming not used error tolerance (#1954, #1984).

    • Optimizations for sparse matrix accesses in z-score normalization for CF (#1989).

    • Add kmeans_max_iterations option to GMM training binding gmm_train_main.

    • Bump minimum Armadillo version to 8.400.0 due to ensmallen dependency requirement (#2015).

    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.1.1(May 27, 2019)

    Released May 26, 2019.

    • Fix random forest bug for numerical-only data (#1887).
    • Significant speedups for random forest (#1887).
    • Random forest now has minimum_gain_split and subspace_dim parameters (#1887).
    • Decision tree parameter print_training_error deprecated in favor of print_training_accuracy.
    • output option changed to predictions for adaboost and perceptron binding. Old options are now deprecated and will be preserved until mlpack 4.0.0 (#1882).
    • Concatenated ReLU layer (#1843).
    • Accelerate NormalizeLabels function using hashing instead of linear search (see src/mlpack/core/data/normalize_labels_impl.hpp) (#1780).
    • Add ConfusionMatrix() function for checking performance of classifiers (#1798).
    • Install ensmallen headers when it is downloaded during build (#1900).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.1.0(Apr 26, 2019)

    Released April 25, 2019. Release email

    • Add DiagonalGaussianDistribution and DiagonalGMM classes to speed up the diagonal covariance computation and deprecate DiagonalConstraint (#1666).

    • Add kernel density estimation (KDE) implementation with bindings to other languages (#1301).

    • Where relevant, all models with a Train() method now return a double value representing the goodness of fit (i.e. final objective value, error, etc.) (#1678).

    • Add implementation for linear support vector machine (see src/mlpack/methods/linear_svm).

    • Change DBSCAN to use PointSelectionPolicy and add OrderedPointSelection (#1625).

    • Residual block support (#1594).

    • Bidirectional RNN (#1626).

    • Dice loss layer (#1674, #1714) and hard sigmoid layer (#1776).

    • output option changed to predictions and output_probabilities to probabilities for Naive Bayes binding (mlpack_nbc/nbc()). Old options are now deprecated and will be preserved until mlpack 4.0.0 (#1616).

    • Add support for Diagonal GMMs to HMM code (#1658, #1666). This can provide large speedup when a diagonal GMM is acceptable as an emission probability distribution.

    • Python binding improvements: check parameter type (#1717), avoid copying Pandas dataframes (#1711), handle Pandas Series objects (#1700).

    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.4(Nov 13, 2018)

    Released November 13, 2018.

    • Bump minimum CMake version to 3.3.2.
    • CMake fixes for Ninja generator by Marc Espie (#1550, #1537, #1523).
    • More efficient linear regression implementation (#1500).
    • Serialization fixes for neural networks (#1508, #1535).
    • Mean shift now allows single-point clusters (#1536).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.3(Jul 29, 2018)

    Released July 27th, 2018.

    • Fix Visual Studio compilation issue (#1443).
    • Allow running local_coordinate_coding binding with no initial_dictionary parameter when input_model is not specified (#1457).
    • Make use of OpenMP optional via the CMake USE_OPENMP configuration variable (#1474).
    • Accelerate FNN training by 20-30% by avoiding redundant calculations (#1467).
    • Fix math::RandomSeed() usage in tests (#1462, #1440).
    • Generate better Python setup.py with documentation (#1460).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.2(Jun 9, 2018)

    Released June 8th, 2018.

    • Documentation generation fixes for Python bindings (#1421).
    • Fix build error for man pages if command-line bindings are not being built (#1424).
    • Add shuffle parameter and Shuffle() method to KFoldCV (#1412). This will shuffle the data when the object is constructed, or when Shuffle() is called.
    • Added neural network layers: AtrousConvolution (#1390), Embedding (#1401), and LayerNorm (layer normalization) (#1389).
    • Add Pendulum environment for reinforcement learning (#1388) and update Mountain Car environment (#1394).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.1(May 11, 2018)

    Released May 10th, 2018.

    • Fix intermittently failing tests (#1387).
    • Add Big-Batch SGD (BBSGD) optimizer in src/mlpack/core/optimizers/bigbatch_sgd (#1131).
    • Fix simple compiler warnings (#1380, #1373).
    • Simplify NeighborSearch constructor and Train() overloads (#1378).
    • Add warning for OpenMP setting differences (#1358/#1382). When mlpack is compiled with OpenMP but another application linking against mlpack is not (or vice versa), a compilation warning will now be issued.
    • Restructured loss functions in src/mlpack/methods/ann/ (#1365).
    • Add environments for reinforcement learning tests (#1368, #1370, #1329).
    • Allow single outputs for multiple timestep inputs for recurrent neural networks (#1348).
    • Neural networks: add He and LeCun normal initializations (#1342), add FReLU and SELU activation functions (#1346, #1341), add alpha-dropout (#1349).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-3.0.0(Mar 31, 2018)

    Released March 30th, 2018.

    • Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
    • Bump minimum required version of Armadillo to 6.500.0.
    • Add automatically generated Python bindings. These have the same interface as the command-line programs.
    • Add deep learning infrastructure in src/mlpack/methods/ann/.
    • Add reinforcement learning infrastructure in src/mlpack/methods/reinforcement_learning/.
    • Add optimizers: AdaGrad, CMAES, CNE, FrankeWolfe, GradientDescent, GridSearch, IQN, Katyusha, LineSearch, ParallelSGD, SARAH, SCD, SGDR, SMORMS3, SPALeRA, SVRG.
    • Add hyperparameter tuning infrastructure and cross-validation infrastructure in src/mlpack/core/cv/ and src/mlpack/core/hpt/.
    • Fix bug in mean shift.
    • Add random forests (see src/mlpack/methods/random_forest).
    • Numerous other bugfixes and testing improvements.
    • Add randomized Krylov SVD and Block Krylov SVD.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.5(Aug 26, 2017)

  • mlpack-2.2.4(Jul 19, 2017)

    Released July 18th, 2017.

    • Speed and memory improvements for DBSCAN. --single_mode can now be used for situations where previously RAM usage was too high.
    • Fix bug in CF causing incorrect recommendations.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.3(May 24, 2017)

  • mlpack-2.2.2(May 5, 2017)

    Released May 4th, 2017.

    • Install backwards-compatibility mlpack_allknn and mlpack_allkfn programs; note they are deprecated and will be removed in mlpack 3.0.0 (#992).
    • Fix RStarTree bug that surfaced on OS X only (#964).
    • Small fixes for MiniBatchSGD and SGD and tests.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.2.1(Apr 13, 2017)

  • mlpack-2.2.0(Mar 21, 2017)

    Released Mar. 21st, 2017.

    • Bugfix for mlpack_knn program (#816).
    • Add decision tree implementation in methods/decision_tree/. This is very similar to a C4.5 tree learner.
    • Add DBSCAN implementation in methods/dbscan/.
    • Add support for multidimensional discrete distributions (#810, #830).
    • Better output for Log::Debug/Log::Info/Log::Warn/Log::Fatal for Armadillo objects (#895, #928).
    • Refactor categorical CSV loading with boost::spirit for faster loading (#681).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.1.1(Dec 22, 2016)

    Released Dec. 22nd, 2016.

    • HMMs now use random initialization; this should fix some convergence issues (#828).
    • HMMs now initialize emissions according to the distribution of observations (#833).
    • Minor fix for formatted output (#814).
    • Fix DecisionStump to properly work with any input type.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.1.0(Oct 31, 2016)

    Released Oct. 31st, 2016.

    • Fixed CoverTree to properly handle single-point datasets.
    • Fixed a bug in CosineTree (and thus QUIC-SVD) that caused split failures for some datasets (#717).
    • Added mlpack_preprocess_describe program, which can be used to print statistics on a given dataset (#742).
    • Fix prioritized recursion for k-furthest-neighbor search (mlpack_kfn and the KFN class), leading to orders-of-magnitude speedups in some cases.
    • Bump minimum required version of Armadillo to 4.200.0.
    • Added simple Gradient Descent optimizer, found in src/mlpack/core/optimizers/gradient_descent/ (#792).
    • Added approximate furthest neighbor search algorithms QDAFN and DrusillaSelect in src/mlpack/methods/approx_kfn/, with command-line program mlpack_approx_kfn.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.3(Jul 21, 2016)

    Released July 21st, 2016.

    • Standardize some parameter names for programs (old names are kept for reverse compatibility, but warnings will now be issued).
    • RectangleTree optimizations (#721).
    • Fix memory leak in NeighborSearch (#731).
    • Documentation fix for k-means tutorial (#730).
    • Fix TreeTraits for BallTree (#727).
    • Fix incorrect parameter checks for some command-line programs.
    • Fix error in HMM training with probabilities for each point (#636).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.2(Jun 20, 2016)

    Released June 20th, 2016.

    • Added the function LSHSearch::Projections(), which returns an arma::cube with each projection table in a slice (#663). Instead of Projection(i), you should now use Projections().slice(i).
    • A new constructor has been added to LSHSearch that creates objects using projection tables provided in an arma::cube (#663).
    • LSHSearch projection tables refactored for speed (#675).
    • Handle zero-variance dimensions in DET (#515).
    • Add MiniBatchSGD optimizer (src/mlpack/core/optimizers/minibatch_sgd/) and allow its use in mlpack_logistic_regression and mlpack_nca programs.
    • Add better backtrace support from Grzegorz Krajewski for Log::Fatal messages when compiled with debugging and profiling symbols. This requires libbfd and libdl to be present during compilation.
    • CosineTree test fix from Mikhail Lozhnikov (#358).
    • Fixed HMM initial state estimation (#600).
    • Changed versioning macros __MLPACK_VERSION_MAJOR, __MLPACK_VERSION_MINOR, and __MLPACK_VERSION_PATCH to MLPACK_VERSION_MAJOR, MLPACK_VERSION_MINOR, and MLPACK_VERSION_PATCH. The old names will remain in place until mlpack 3.0.0.
    • Renamed mlpack_allknn, mlpack_allkfn, and mlpack_allkrann to mlpack_knn, mlpack_kfn, and mlpack_krann. The mlpack_allknn, mlpack_allkfn, and mlpack_allkrann programs will remain as copies until mlpack 3.0.0.
    • Add --random_initialization option to mlpack_hmm_train, for use when no labels are provided.
    • Add --kill_empty_clusters option to mlpack_kmeans and KillEmptyClusters policy for the KMeans class (#595, #596).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.1(Mar 3, 2016)

    Released Feb. 4th, 2016.

    • Fix CMake to properly detect when MKL is being used with Armadillo.
    • Minor parameter handling fixes to mlpack_logistic_regression (#504, #505).
    • Properly install arma_config.hpp.
    • Memory handling fixes for Hoeffding tree code.
    • Add functions that allow changing training-time parameters to HoeffdingTree class.
    • Fix infinite loop in sparse coding test.
    • Documentation spelling fixes (#501).
    • Properly handle covariances for Gaussians with large condition number (#496), preventing GMMs from filling with NaNs during training (and also HMMs that use GMMs).
    • CMake fixes for finding LAPACK and BLAS as Armadillo dependencies when ATLAS is used.
    • CMake fix for projects using mlpack's CMake configuration from elsewhere (#512).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-2.0.0(Dec 24, 2015)

    Released Dec. 23rd, 2015.

    • Removed overclustering support from k-means because it is not well-tested, may be buggy, and is (I think) unused. If this was support you were using, open a bug or get in touch with us; it would not be hard for us to reimplement it.
    • Refactored KMeans to allow different types of Lloyd iterations.
    • Added implementations of k-means: Elkan's algorithm, Hamerly's algorithm, Pelleg-Moore's algorithm, and the DTNN (dual-tree nearest neighbor) algorithm.
    • Significant acceleration of LRSDP via the use of accu(a % b) instead of trace(a * b).
    • Added MatrixCompletion class (matrix_completion), which performs nuclear norm minimization to fill unknown values of an input matrix.
    • No more dependence on Boost.Random; now we use C++11 STL random support.
    • Add softmax regression, contributed by Siddharth Agrawal and QiaoAn Chen.
    • Changed NeighborSearch, RangeSearch, FastMKS, LSH, and RASearch API; these classes now take the query sets in the Search() method, instead of in the constructor.
    • Use OpenMP, if available. For now OpenMP support is only available in the DET training code.
    • Add support for predicting new test point values to LARS and the command-line 'lars' program.
    • Add serialization support for Perceptron and LogisticRegression.
    • Refactor SoftmaxRegression to predict into an arma::Row<size_t> object, and add a softmax_regression program.
    • Refactor LSH to allow loading and saving of models.
    • ToString() is removed entirely (#487).
    • Add --input_model_file and --output_model_file options to appropriate machine learning algorithms.
    • Rename all executables to start with an "mlpack" prefix (#229).
    Source code(tar.gz)
    Source code(zip)
  • mlpack-1.0.12(Jan 7, 2015)

  • mlpack-1.0.9(Dec 22, 2014)

    Released July 28th, 2014.

    • GMM initialization is now safer and provides a working GMM when constructed with only the dimensionality and number of Gaussians (#314).
    • Check for division by 0 in Forward-Backward algorithm in HMMs (#314).
    • Fixed implementation of Viterbi algorithm in HMM::Predict() (#316)
    • Significant speedups for dual-tree algorithms using the cover tree (#243, #329) including a faster implementation of FastMKS.
    • CF (collaborative filtering) now expects users and items to be zero-indexed, not one-indexed (#324).
    • CF::GetRecommendations() API change: now requires the number of recommendations as the first parameter. The number of users in the local neighborhood should be specified with CF::NumUsersForSimilarity().
    • Removed incorrect PeriodicHRectBound (#30).
    • Refactor LRSDP into LRSDP class and standalone function to be optimized (#318).
    • Fix for centering in kernel PCA (#355).
    • Added simulated annealing (SA) optimizer, contributed by Zhihao Lou.
    • HMMs now support initial state probabilities; these can be set in the constructor, trained, or set manually with HMM::Initial() (#315).
    • Added Nyström method for kernel matrix approximation by Marcus Edel.
    • Kernel PCA now supports using the Nyström method for approximation.
    • Ball trees now work with dual-tree algorithms, via the BallBound<> bound structure (#320); fixed by Yash Vadalia.
    • The NMF class is now AMF<>, and supports far more types of factorizations, by Sumedh Ghaisas.
    • A QUIC-SVD implementation has returned, written by Siddharth Agrawal and based on older code from Mudit Gupta.
    • Added perceptron and decision stump by Udit Saxena (these are weak learners for an eventual AdaBoost class).
    • Sparse autoencoder added by Siddharth Agrawal.
    Source code(tar.gz)
    Source code(zip)
  • mlpack-1.0.8(Dec 22, 2014)

    Released January 6th, 2014.

    • Memory leak in NeighborSearch index-mapping code fixed.
    • GMMs can be trained using the existing model as a starting point by specifying an additional boolean parameter to GMM::Estimate().
    • Logistic regression implementation added in methods/logistic_regression.
    • Version information is now obtainable via mlpack::util::GetVersion() or the __MLPACK_VERSION_MAJOR, __MLPACK_VERSION_MINOR, and __MLPACK_VERSION_PATCH macros.
    • Fix typos in allkfn and allkrann output.
    Source code(tar.gz)
    Source code(zip)
Owner
mlpack
a scalable C++ machine learning library
mlpack
Edge ML Library - High-performance Compute Library for On-device Machine Learning Inference

Edge ML Library (EMLL) offers optimized basic routines like general matrix multiplications (GEMM) and quantizations, to speed up machine learning (ML) inference on ARM-based devices. EMLL supports fp32, fp16 and int8 data types. EMLL accelerates on-device NMT, ASR and OCR engines of Youdao, Inc.

NetEase Youdao 176 Jul 21, 2022
A lightweight C++ machine learning library for embedded electronics and robotics.

Fido Fido is an lightweight, highly modular C++ machine learning library for embedded electronics and robotics. Fido is especially suited for robotic

The Fido Project 412 Jun 25, 2022
A C++ standalone library for machine learning

Flashlight: Fast, Flexible Machine Learning in C++ Quickstart | Installation | Documentation Flashlight is a fast, flexible machine learning library w

Facebook Research 4.4k Aug 2, 2022
Flashlight is a C++ standalone library for machine learning

Flashlight is a fast, flexible machine learning library written entirely in C++ from the Facebook AI Research Speech team and the creators of Torch and Deep Speech.

null 4.4k Aug 6, 2022
ML++ - A library created to revitalize C++ as a machine learning front end

ML++ Machine learning is a vast and exiciting discipline, garnering attention from specialists of many fields. Unfortunately, for C++ programmers and

marc 1k Aug 3, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.3k Aug 3, 2022
null 5.6k Aug 2, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

File systems and Storage Lab (FSL) 178 Aug 1, 2022
RNNLIB is a recurrent neural network library for sequence learning problems. Forked from Alex Graves work http://sourceforge.net/projects/rnnl/

Origin The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl while this "fork" is created to repeat results for the online handwriting

Sergey Zyrianov 873 Jul 14, 2022
Samsung Washing Machine replacing OS control unit

hacksung Samsung Washing Machine WS1702 replacing OS control unit More info at https://www.hackster.io/roni-bandini/dead-washing-machine-returns-to-li

null 24 May 12, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 32.8k Aug 6, 2022
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Frog - A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch Copyright 2006-2020 Ko van der Sloot, Maarten van Gompel, Antal van den

Language Machines 69 Jun 20, 2022
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Liu Liu 6.9k Aug 6, 2022
libsvm websitelibsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website

Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification,

Chih-Jen Lin 4.2k Aug 4, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 63k Aug 3, 2022
oneAPI Data Analytics Library (oneDAL)

Intel® oneAPI Data Analytics Library Installation | Documentation | Support | Examples | Samples | How to Contribute Intel® oneAPI Data Analytics Libr

oneAPI-SRC 509 Aug 2, 2022
A C library for product recommendations/suggestions using collaborative filtering (CF)

Recommender A C library for product recommendations/suggestions using collaborative filtering (CF). Recommender analyzes the feedback of some users (i

Ghassen Hamrouni 250 Aug 5, 2022
An open library of computer vision algorithms

VLFeat -- Vision Lab Features Library Version 0.9.21 The VLFeat open source library implements popular computer vision algorithms specialising in imag

VLFeat.org 1.5k Jul 28, 2022