A C++ header-only library of statistical distribution functions.

Overview

StatsLib   Mentioned in Awesome Cpp Build Status Coverage Status Codacy Badge License Documentation Status

StatsLib is a templated C++ library of statistical distribution functions, featuring unique compile-time computing capabilities and seamless integration with several popular linear algebra libraries.

Features:

  • A header-only library of probability density functions, cumulative distribution functions, quantile functions, and random sampling methods.
  • Functions are written in C++11 constexpr format, enabling the library to operate as both a compile-time and run-time computation engine.
  • Designed with a simple R-like syntax.
  • Optional vector-matrix functionality with wrappers to support:
  • Matrix-based operations are parallelizable with OpenMP.
  • Released under a permissive, non-GPL license.

Contents:

Distributions

Functions to compute the cdf, pdf, quantile, as well as random sampling methods, are available for the following distributions:

  • Bernoulli
  • Beta
  • Binomial
  • Cauchy
  • Chi-squared
  • Exponential
  • F
  • Gamma
  • Inverse-Gamma
  • Laplace
  • Logistic
  • Log-Normal
  • Normal (Gaussian)
  • Poisson
  • Student's t
  • Uniform
  • Weibull

In addition, pdf and random sampling functions are available for several multivariate distributions:

  • inverse-Wishart
  • Multivariate Normal
  • Wishart

Installation and Dependencies

StatsLib is a header-only library. Simply add the header files to your project using

#include "stats.hpp"

The only dependency is the latest version of GCEM and a C++11 compatible compiler.

Documentation

Full documentation is available online:

Documentation Status

A PDF version of the documentation is available here.

Jupyter Notebook

You can test the library online using an interactive Jupyter notebook:

Binder

Options

The following options should be declared before including the StatsLib header files.

  • For inline-only functionality (i.e., no constexpr specifiers):
#define STATS_GO_INLINE
  • OpenMP functionality is enabled by default if the _OPENMP macro is detected (e.g., by invoking -fopenmp with GCC or Clang). To explicitly enable OpenMP features use:
#define STATS_USE_OPENMP
  • To disable OpenMP functionality:
#define STATS_DONT_USE_OPENMP
  • To use StatsLib with Armadillo, Blaze or Eigen:
#define STATS_ENABLE_ARMA_WRAPPERS
#define STATS_ENABLE_BLAZE_WRAPPERS
#define STATS_ENABLE_EIGEN_WRAPPERS
  • To enable wrappers for std::vector:
#define STATS_ENABLE_STDVEC_WRAPPERS

Syntax and Examples

Functions are called using an R-like syntax. Some general rules:

  • density functions: stats::d*. For example, the Normal (Gaussian) density is called using
stats::dnorm(<value>,<mean parameter>,<standard deviation>);
  • cumulative distribution functions: stats::p*. For example, the Gamma CDF is called using
stats::pgamma(<value>,<shape parameter>,<scale parameter>);
  • quantile functions: stats::q*. For example, the Beta quantile is called using
stats::qbeta(<value>,<a parameter>,<b parameter>);
  • random sampling: stats::r*. For example, to generate a single draw from the Logistic distribution:
stats::rlogis(<location parameter>,<scale parameter>,<seed value or random number engine>);

All of these functions have matrix-based equivalents using Armadillo, Blaze, and Eigen dense matrices.

  • The pdf, cdf, and quantile functions can take matrix-valued arguments. For example,
// Using Armadillo:
arma::mat norm_pdf_vals = stats::dnorm(arma::ones(10,20),1.0,2.0);
  • The randomization functions (r*) can output random matrices of arbitrary size. For example, For example, the following code will generate a 100-by-50 matrix of iid draws from a Gamma(3,2) distribution:
// Armadillo:
arma::mat gamma_rvs = stats::rgamma<arma::mat>(100,50,3.0,2.0);
// Blaze:
blaze::DynamicMatrix<double> gamma_rvs = stats::rgamma<blaze::DynamicMatrix<double>>(100,50,3.0,2.0);
// Eigen:
Eigen::MatrixXd gamma_rvs = stats::rgamma<Eigen::MatrixXd>(100,50,3.0,2.0);
  • All matrix-based operations are parallelizable with OpenMP. For GCC and Clang compilers, simply include the -fopenmp option during compilation.

Seeding Values

Random number seeding is available in two forms: seed values and random number engines.

  • Seed values are passed as unsigned integers. For example, to generate a draw from a normal distribution N(1,2) with seed value 1776:
stats::rnorm(1,2,1776);
  • Random engines in StatsLib use the 64-bit Mersenne-Twister generator (std::mt19937_64) and are passed by reference. Example:
std::mt19937_64 engine(1776);
stats::rnorm(1,2,engine);

Examples

More examples with code:

// evaluate the normal PDF at x = 1, mu = 0, sigma = 1
double dval_1 = stats::dnorm(1.0,0.0,1.0);
 
// evaluate the normal PDF at x = 1, mu = 0, sigma = 1, and return the log value
double dval_2 = stats::dnorm(1.0,0.0,1.0,true);
 
// evaluate the normal CDF at x = 1, mu = 0, sigma = 1
double pval = stats::pnorm(1.0,0.0,1.0);
 
// evaluate the Laplacian quantile at p = 0.1, mu = 0, sigma = 1
double qval = stats::qlaplace(0.1,0.0,1.0);

// draw from a t-distribution dof = 30
double rval = stats::rt(30);

// matrix output
arma::mat beta_rvs = stats::rbeta<arma::mat>(100,100,3.0,2.0);
// matrix input
arma::mat beta_cdf_vals = stats::pbeta(beta_rvs,3.0,2.0);

Compile-time Computing Capabilities

StatsLib is designed to operate equally well as a compile-time computation engine. Compile-time computation allows the compiler to replace function calls (e.g., dnorm(0,0,1)) with static values in the source code. That is, functions are evaluated during the compilation process, rather than at run-time. This capability is made possible due to the templated constexpr design of the library and can be verified by inspecting the assembly code generated by the compiler.

The compile-time features are enabled using the constexpr specifier. The example below computes the pdf, cdf, and quantile function of the Laplace distribution.

#include "stats.hpp"

int main()
{
    
    constexpr double dens_1  = stats::dlaplace(1.0,1.0,2.0); // answer = 0.25
    constexpr double prob_1  = stats::plaplace(1.0,1.0,2.0); // answer = 0.5
    constexpr double quant_1 = stats::qlaplace(0.1,1.0,2.0); // answer = -2.218875...

    return 0;
}

Assembly code generated by Clang without any optimization:

LCPI0_0:
	.quad	-4611193153885729483    ## double -2.2188758248682015
LCPI0_1:
	.quad	4602678819172646912     ## double 0.5
LCPI0_2:
	.quad	4598175219545276417     ## double 0.25000000000000006
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_main
	.p2align	4, 0x90
_main:                                  ## @main
	push	rbp
	mov	rbp, rsp
	xor	eax, eax
	movsd	xmm0, qword ptr [rip + LCPI0_0] ## xmm0 = mem[0],zero
	movsd	xmm1, qword ptr [rip + LCPI0_1] ## xmm1 = mem[0],zero
	movsd	xmm2, qword ptr [rip + LCPI0_2] ## xmm2 = mem[0],zero
	mov	dword ptr [rbp - 4], 0
	movsd	qword ptr [rbp - 16], xmm2
	movsd	qword ptr [rbp - 24], xmm1
	movsd	qword ptr [rbp - 32], xmm0
	pop	rbp
	ret

Author

Keith O'Hara

License

Apache Version 2

Comments
  • Documentation. Samples.

    Documentation. Samples.

    Thanks for sharing your work!

    It would be interesting to document the library and provide examples. At least list supported distribution functions and features.

    Thanks again!

    DJuego

    opened by DJuego 5
  • passing vectors to qnorm()

    passing vectors to qnorm()

    It is not obvious to me what the syntax is for passing a vector of probabilities to stats::qnorm(). How would I replicate this R code.

    a = qnorm(p= c(0.01, 0.20, 0.50, 0.80, 0.99))
    

    I was able to pass single values to stats::qnorm, but got errors when I would attempt to pass a vector.

    opened by alexhallam 3
  • Compilation problem with Eigen

    Compilation problem with Eigen

    Hi! First of all, I'd like to thank for the great library! I'm playing around with the library and got following issue. My sample code is:

    #include <iostream>
    #include "stats.hpp"
    #include <Eigen/Dense>
    
    #define STATS_USE_EIGEN
    
    int main() {
    
        Eigen::MatrixXd gamma_rvs = stats::rgamma<Eigen::MatrixXd>(100,50,3.0,2.0);
        std::cout << gamma_rvs << std::endl;
    
        return 0;
    }
    

    And compilation gave me:

    /usr/local/bin/cmake --build /Users/alex/Development/stats_tests/cmake-build-debug --target stats_tests -- -j 4 Scanning dependencies of target stats_tests [ 50%] Building CXX object CMakeFiles/stats_tests.dir/main.cpp.o In file included from /Users/alex/Development/stats_tests/include/rand/rand.hpp:27:0, from /Users/alex/Development/stats_tests/include/stats.hpp:37, from /Users/alex/Development/stats_tests/main.cpp:2: /Users/alex/Development/stats_tests/include/rand/rgamma.hpp:42:4: warning: inline function 'mT stats::rgamma(stats::uint_t, stats::uint_t, eT, eT) [with mT = Eigen::Matrix<double, -1, -1>; eT = double; stats::uint_t = unsigned int]' used but never defined mT rgamma(const uint_t n, const uint_t k, const eT shape_par, const eT scale_par); ^~~~~~ [100%] Linking CXX executable stats_tests Undefined symbols for architecture x86_64: "Eigen::Matrix<double, -1, -1, 0, -1, -1> stats::rgamma<Eigen::Matrix<double, -1, -1, 0, -1, -1>, double>(unsigned int, unsigned int, double, double)", referenced from: _main in main.cpp.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status make[3]: *** [stats_tests] Error 1 make[2]: *** [CMakeFiles/stats_tests.dir/all] Error 2 make[1]: *** [CMakeFiles/stats_tests.dir/rule] Error 2 make: *** [stats_tests] Error 2

    Could you please help me to figure out what's going on. Thanks! Alex

    opened by Kyrish 3
  • safely compute log of determinant with Cholesky decomposition

    safely compute log of determinant with Cholesky decomposition

    I have changed the code to avoid direct computation of determinant of the covariance matrix in multi-variate normal density function. It's safe to compute log of the determinant using Cholesky decomposition of the covariance matrix to avoid underflow of the determinant. Reference: https://blogs.sas.com/content/iml/2012/10/31/compute-the-log-determinant-of-a-matrix.html

    opened by riturajkaushik 3
  • compiling with MSVC

    compiling with MSVC

    Hi Keith, Thanks for sharing your library.

    Changing the define at the bottom of statslib_options.hpp, from: #define __stats_pointer_settings__ __restrict__

    to: #define __stats_pointer_settings__ __restrict

    will make the library compatible with MSVC as well as GCC/Clang.

    Thanks again, Rafael.

    opened by rlatowicz 3
  • qpois gives incorrect values when rate different from 10

    qpois gives incorrect values when rate different from 10

    Hello,

    Thanks for your work on this project. I'm trying to use the qpois function with different rates. The output is correct when rate=10 in the test cases, but if say the rate is 1135, the function returns all 66. See the test output below:

    *** qpois: begin tests. ***

    qpois(0.00): 66.00000. Success = 0 qpois(0.00): 66.00000. Success = 0 qpois(0.00): 66.00000. Success = 0 qpois(0.01): 66.00000. Success = 0 qpois(0.07): 66.00000. Success = 0 qpois(0.33): 66.00000. Success = 0 qpois(0.58): 66.00000. Success = 0 qpois(0.92): 66.00000. Success = 0 qpois(1.00): 66.00000. Success = 0

    Thank you for any guidance on correcting this. Let me know if you need help debugging.

    opened by grizant 3
  • Segmentation fault using qbeta on gcc 7.3

    Segmentation fault using qbeta on gcc 7.3

    The following program produces a segmentation fault for me:

    #include "include/stats.hpp"
    #include <iostream>
    
    main() {
            std::cout << stats::qbeta(0.0022, 45.5, 80.5) << std::endl;
    }
    

    (Compile with e.g. g++ example.cc)

    Valgrind reports that the fault is caused by a stack overflow in gcem::sqrt_recur. As a comparison, R gives the following:

    > qbeta(0.0022, 45.5, 80.5)
    [1] 0.2461071
    
    opened by kristomu 3
  • Please allow specification of the random number generator implementation

    Please allow specification of the random number generator implementation

    I'm working on a project where I need to keep backwards compatibility with GSL's behaviour in an application but use code under an Apache-2.0 license. I need to use the mt19937 with a particular seed rather than the mt19937_64 rng. Currently I believe the only way of doing this in the stats library is editing the stats_options.hpp file to use mt19937, set my own seed, and then ensuring I pass in a smaller type (i.e. floats rather than double) to functions I call to ensure the random number generator is used the same way, and generates the same sequence of numbers for testing. This works for most functions (although some, deep down, use larger types than I need and thus generates a difference sequence - but this isn't common). This does mean forcing downstream users to fetch and edit this options.hpp file, which will likely cause confusion.

    If you currently have a way of overriding the random number generator without editing the stats_options.hpp file that'd be great to understand. If not, please let me know, and I'll add in a new option define, test it, and contribute it back as a PR. Please let me know how you'd like me to proceed.

    It's a great library. I hope to use the OpenMP functionality soon too. Thanks for the hard work!

    opened by adamfowleruk 2
  • Add a global stats::random_engine_t.

    Add a global stats::random_engine_t.

    This commit enables setting an explicit random number seed globally. This is useful when using the vectorized wrappers which do not otherwise support setting a seed. This commit is tested only with Blaze.

    There are many possible ways add support for seeding the vector wrappers, and this pattern seemed the simplest to me. It is not the first pattern attempted. I do not really expect you to merge this as is, but maybe just think about and describe the correct, idiomatic way to possibly support this feature. I don't mind actually doing the work on it, if you give me the guidance of how you want it done. It would also be better if all the distributions were ported to the favored pattern; this commit only experimented with rnorm and rinvwish.

    opened by JohnGalbraith 2
  • Wrong results

    Wrong results

    We've been using the chi-squared distribution function, and it seems to give us wrong answers. We've compared with the libgsl implementation, and it seems that for many cases, pchisq is just wrong.

    For example, test_statistic = 10605, extra_parameters = 9

    double p_value1 = 1.0 - gsl_cdf_chisq_P (test_statistic, (double)extra_parameters); double p_value2 = gsl_cdf_chisq_Q (test_statistic, (double)extra_parameters); double p_value = 1.0 - stats::pchisq(test_statistic, extra_parameters, false);

    p_value is 0, but p_value1 = 1, which is the correct answer. Any ideas what's going on? Are we using it wrong?

    Edit: It seems like values lower than 0.05 are often wrong.

    opened by laxris 2
  • Typos

    Typos

    Thanks for providing this package! In the middle of wrapping up a python binding (https://github.com/aforren1/pystatslib), I noticed these few function names didn't match, and the rt implementation was recursively calling itself rather than rchisq, leading to segfaults. Tests pass locally (Travis is pointed at the master branch?).

    opened by aforren1 2
  • qinvgamma returns nan

    qinvgamma returns nan

    For some inputs (apparently close to 1), qinvgamma is returning nan. Example:

    stats::qinvgamma(0.999551553841898, 2.0, 1.0)
    

    Probably the result of overflow?

    opened by j-faria 1
Releases(v3.2.0)
Owner
Keith O'Hara
Ph.D. (NYU), Econometrics and Quantitative Economics. Econometrician @aws
Keith O'Hara
A C++ header only library for decomposition of spectra into a sum of response functions whose weights are positive definite.

DecompLib A C++ header only library for decomposition of spectra into a sum of response functions whose weights are positive definite. Introduction Bu

James T. Matta 2 Jan 22, 2022
RcppFastFloat: Rcpp Bindings for the fastfloat C++ Header-Only Library

Converting ascii text into (floating-point) numeric values is a very common problem. The fast_float header-only C++ library by Daniel Lemire does this very well, and very fast at up to or over to 1 gigabyte per second as described in more detail in a recent arXiv paper.

Dirk Eddelbuettel 18 Nov 15, 2022
Header only FFT library

dj_fft: Header-only FFT library Details This repository provides a header-only library to compute fourier transforms in 1D, 2D, and 3D. Its goal is to

Jonathan Dupuy 133 Nov 23, 2022
C++ header-only fixed-point math library

fpm A C++ header-only fixed-point math library. "fpm" stands for "fixed-point math". It is designed to serve as a drop-in replacement for floating-poi

Mike Lankamp 378 Dec 5, 2022
C++ header-only library with methods to efficiently encode/decode Morton codes in/from 2D/3D coordinates

Libmorton v0.2.7 Libmorton is a C++ header-only library with methods to efficiently encode/decode 64, 32 and 16-bit Morton codes and coordinates, in 2

Jeroen Baert 480 Nov 27, 2022
Extremely simple yet powerful header-only C++ plotting library built on the popular matplotlib

matplotlib-cpp Welcome to matplotlib-cpp, possibly the simplest C++ plotting library. It is built to resemble the plotting API used by Matlab and matp

Benno Evers 3.5k Nov 30, 2022
A modern, C++20-native, single-file header-only dense 2D matrix library.

A modern, C++20-native, single-file header-only dense 2D matrix library. Contents Example usage creating matrices basic operations row, col, size, sha

feng wang 58 Nov 26, 2022
A header-only C++ library for large scale eigenvalue problems

NOTE: Spectra 1.0.0 is released, with a lot of API-breaking changes. Please see the migration guide for a smooth transition to the new version. NOTE:

Yixuan Qiu 600 Nov 26, 2022
libmpc++ is a C++ header-only library to solve linear and non-linear MPC

libmpc++ libmpc++ is a C++ library to solve linear and non-linear MPC. The library is written in modern C++17 and it is tested to work on Linux, macOS

Nicola Piccinelli 45 Nov 24, 2022
Header-only C++11 library to handle physical measures

cpp-measures Header-only C++11 library to handle physical measures License: This project is released under the Mozilla Public License 2.0. Purpose Thi

Carlo Milanesi 20 Jun 28, 2018
Header only, single file, simple and efficient C++ library to compute the signed distance function to a triangle mesh

TriangleMeshDistance Header only, single file, simple and efficient C++11 library to compute the signed distance function to a triangle mesh. The dist

Interactive Computer Graphics 100 Nov 12, 2022
A work-in-progress C++20/23 header-only maths library for game development, embedded, kernel and general-purpose that works in constant context.

kMath /kmæθ/ A work-in-progress general-purpose C++20/23 header-only maths library that works in constant context Abstract The kMath Project aims to p

The λ Project 13 Sep 5, 2022
A matrix header-only library, uses graphs internally, helpful when your matrix is part of a simulation where it needs to grow many times (or auto expand)

GraphMat Header-only Library Matrix implemented as a graph, specially for the use case when it should be auto expanding at custom rate, specially in s

Aditya Gupta 3 Oct 25, 2021
Kraken is an open-source modern math library that comes with a fast-fixed matrix class and math-related functions.

Kraken ?? Table of Contents Introduction Requirement Contents Installation Introduction Kraken is a modern math library written in a way that gives ac

yahya mohammed 24 Nov 30, 2022
A play around of mathematical functions to draw interesting objects to the screen.

LibDragonN64 Color Graphics Test A play around of mathematical functions to draw interesting objects to the screen. Compile Script (Windows only) In V

null 1 Dec 11, 2021
linalg.h is a single header, public domain, short vector math library for C++

linalg.h linalg.h is a single header, public domain, short vector math library for C++. It is inspired by the syntax of popular shading and compute la

Sterling Orsten 754 Nov 24, 2022
MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

MIRACL What is MIRACL? Multiprecision Integer and Rational Arithmetic Cryptographic Library – the MIRACL Crypto SDK – is a C software library that is

MIRACL 517 Nov 22, 2022
P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library

Klein ?? ?? Project Site ?? ?? Description Do you need to do any of the following? Quickly? Really quickly even? Projecting points onto lines, lines t

Jeremy Ong 628 Nov 18, 2022
LibTomMath is a free open source portable number theoretic multiple-precision integer library written entirely in C.

libtommath This is the git repository for LibTomMath, a free open source portable number theoretic multiple-precision integer (MPI) library written en

libtom 540 Nov 23, 2022