Bolt is an algorithm for compressing vectors of real-valued data and running mathematical operations directly on the compressed representations.

Overview

Bolt

Bolt is an algorithm for compressing vectors of real-valued data and running mathematical operations directly on the compressed representations.

If you have a large collection of mostly-dense vectors and can tolerate lossy compression, Bolt can probably save you 10-200x space and compute time.

Bolt also has theoretical guarantees bounding the errors in its approximations.

EDIT: this repo now also features the source code for MADDNESS, our shiny new algorithm for approximate matrix multiplication. MADDNESS has no Python wrapper yet, and is referred to as "mithral" in the source code. Name changed because apparently I'm the only who gets Lord of the Rings references. MADDNESS runs ridiculously fast and, under reasonable assumptions, requires zero multiply-adds. Realistically, it'll be most useful for speeding up neural net inference on CPUs, but it'll take another couple papers to get it there; we need to generalize it to convolution and write the CUDA kernels to allow GPU training.

Installing

Python

  $ brew install swig  # for wrapping C++; use apt-get, yum, etc, if not OS X
  $ pip install numpy  # bolt installation needs numpy already present
  $ git clone https://github.com/dblalock/bolt.git
  $ cd bolt && python setup.py install
  $ pytest tests/  # optionally, run the tests

If you run into any problems, please don't hesitate to mention it in the Python build problems issue.

C++

Install Bazel, Google's open-source build system. Then

  $ git clone https://github.com/dblalock/bolt.git
  $ cd bolt/cpp && bazel run :main

The bazel run command will build the project and run the tests and benchmarks.

If you want to integrate Bolt with another C++ project, include cpp/src/include/public.hpp and add the remaining files under cpp/src to your builds. You should let me know if you're interested in doing such an integration because I'm hoping to see Bolt become part of many libraries and thus would be happy to help you.

Notes

Bolt currently only supports machines with AVX2 instructions, which basically means x86 machines from fall 2013 or later. Contributions for ARM support are welcome. Also note that the Bolt Python wrapper is currently configured to require Clang, since GCC apparently runs into issues.

How does it work?

Bolt is based on vector quantization. For details, see the Bolt paper or slides.

Benchmarks

Bolt includes a thorough set of speed and accuracy benchmarks. See the experiments/ directory. This is also what you want if you want to reproduce the results in the paper.

Note that all of the timing results use the raw C++ implementation. At present, the Python wrapper is slightly slower due to Python overhead. If you're interested in having a full-speed wrapper, let me know and I'll allocate time to making this happen.

Basic usage

X, queries = some N x D array, some iterable of length D arrays

# these are approximately equal (though the latter are shifted and scaled)
enc = bolt.Encoder(reduction='dot').fit(X)
[np.dot(X, q) for q in queries]
[enc.transform(q) for q in queries]

# same for these
enc = bolt.Encoder(reduction='l2').fit(X)
[np.sum((X - q) * (X - q), axis=1) for q in queries]
[enc.transform(q) for q in queries]

# but enc.transform() is 10x faster or more

Example: Matrix-vector multiplies

.97 # massive space savings print(X.nbytes) # 1777 rows * 64 cols * 8B = 909KB print(enc.nbytes) # 1777 * 2B = 3.55KB # massive time savings (~10x here, but often >100x on larger # datasets with less Python overhead; see the paper) t_np = timeit.Timer( lambda: [np.dot(X, q) for q in Q]).timeit(5) # ~9ms t_bolt = timeit.Timer( lambda: [enc.transform(q) for q in Q]).timeit(5) # ~800us print "Numpy / BLAS time, Bolt time: {:.3f}ms, {:.3f}ms".format( t_np * 1000, t_bolt * 1000) # can get output without offset/scaling if needed dots_bolt = [enc.transform(q, unquantize=True) for q in Q] ">
import bolt
import numpy as np
from scipy.stats import pearsonr as corr
from sklearn.datasets import load_digits
import timeit

# for simplicity, use the sklearn digits dataset; we'll split
# it into a matrix X and a set of queries Q
X, _ = load_digits(return_X_y=True)
nqueries = 20
X, Q = X[:-nqueries], X[-nqueries:]

enc = bolt.Encoder(reduction='dot', accuracy='lowest') # can tweak acc vs speed
enc.fit(X)

dot_corrs = np.empty(nqueries)
for i, q in enumerate(Q):
    dots_true = np.dot(X, q)
    dots_bolt = enc.transform(q)
    dot_corrs[i] = corr(dots_true, dots_bolt)[0]

# dot products closely preserved despite compression
print "dot product correlation: {} +/- {}".format(
    np.mean(dot_corrs), np.std(dot_corrs))  # > .97

# massive space savings
print(X.nbytes)  # 1777 rows * 64 cols * 8B = 909KB
print(enc.nbytes)  # 1777 * 2B = 3.55KB

# massive time savings (~10x here, but often >100x on larger
# datasets with less Python overhead; see the paper)
t_np = timeit.Timer(
    lambda: [np.dot(X, q) for q in Q]).timeit(5)        # ~9ms
t_bolt = timeit.Timer(
    lambda: [enc.transform(q) for q in Q]).timeit(5)    # ~800us
print "Numpy / BLAS time, Bolt time: {:.3f}ms, {:.3f}ms".format(
    t_np * 1000, t_bolt * 1000)

# can get output without offset/scaling if needed
dots_bolt = [enc.transform(q, unquantize=True) for q in Q]

Example: K-Nearest Neighbor / Maximum Inner Product Search

# search using squared Euclidean distances
# (still using the Digits dataset from above)
enc = bolt.Encoder('l2', accuracy='high').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q]  # knn for each query

# search using dot product (maximum inner product search)
enc = bolt.Encoder('dot', accuracy='medium').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q]  # knn for each query

Miscellaneous

Bolt stands for "Based On Lookup Tables". Feel free to use this exciting fact at parties.

Comments
  • Python unit test is broken?

    Python unit test is broken?

    OS: MacOS Monterey 12.5 (Intel chip) Python: 3.10.5

    
    ❯ pytest tests
    ============================================================================== test session starts ===============================================================================
    platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
    rootdir: /Users/xiao/development/github.com/XiaoConstantine/bolt-1
    collected 4 items
    
    tests/test_encoder.py ..F.                                                                                                                                                 [100%]
    
    ==================================================================================== FAILURES ====================================================================================
    ________________________________________________________________________________ test_unquantize _________________________________________________________________________________
    
        def test_unquantize():
            X, Q = _load_digits_X_Q(nqueries=20)
    >       enc = bolt.Encoder('dot', accuracy='high').fit(X)
    
    tests/test_encoder.py:151:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ../../dblalock/bolt/venv/lib/python3.10/site-packages/pybolt-0.1.4-py3.10-macosx-11-x86_64.egg/bolt/bolt_api.py:466: in fit
        centroids = _learn_centroids(X, ncentroids=ncentroids,
    ../../dblalock/bolt/venv/lib/python3.10/site-packages/pybolt-0.1.4-py3.10-macosx-11-x86_64.egg/bolt/bolt_api.py:142: in _learn_centroids
        centroids, labels = kmeans(X_in, ncentroids)
    ../../dblalock/bolt/venv/lib/python3.10/site-packages/pybolt-0.1.4-py3.10-macosx-11-x86_64.egg/bolt/bolt_api.py:106: in kmeans
        seeds = kmc2.kmc2(X, k).astype(np.float32)
    kmc2.pyx:97: in kmc2.kmc2
        ???
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    >   ???
    E   ValueError: probabilities contain NaN
    
    mtrand.pyx:935: ValueError
    
    opened by XiaoConstantine 5
  • Update README with additional steps for python build

    Update README with additional steps for python build

    The existing steps on the main README seem a bit outdated(for python at least)

    This PR is mainly moving existing information around to avoid frictions to get started with using python api

    opened by XiaoConstantine 0
  • Old AVX test fails due to GPF/sigsegv

    Old AVX test fails due to GPF/sigsegv

    Summary

    The avx sgemm test fails at n=14 on my system, apparently due to a general protection fault reading unaligned memory.

    Problem Information

    _mm256_load_ps raises a general protection fault if an address is passed to it that is not aligned to 32 bytes. Documentation: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_load_ps&ig_expand=4279

    System information

    $ cat .git/refs/heads/master
    e7726a4c165cc45ac117e9eabd8761013a26640e
    $ uname -a
    Linux DESKTOP-E3P7TC0 3.10.0-1160.62.1.el7.x86_64 #1 SMP Wed Mar 23 09:04:02 UTC 2022 x86_64 GNU/Linux
    $ cat /etc/system-release
    Red Hat Enterprise Linux Workstation release 7.9 (Maipo)
    

    Diagnostic Information

    $ gdb cpp/bolt-build/bolt
    (gdb) run
    Starting program: /shared/src/bolt/cpp/bolt-build/bolt 
    brute force sgemm test, n = 1...
    brute force sgemm test, n = 2...
    brute force sgemm test, n = 5...
    brute force sgemm test, n = 14...
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
    874       return *(__m256 *)__P;
    (gdb) up
    #1  (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1, 
        out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
    394                             sums[mm] = _mm256_load_ps(out_ptr);
    (gdb) p out_ptr
    $1 = (float *) 0x886cf8
    (gdb) bt
    #0  0x00000000005a1af9 in _mm256_load_ps (__P=0x886cf8) at /usr/local/lib/gcc/x86_64-pc-linux-gnu/9.4.1/include/avxintrin.h:874
    #1  (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2> (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0, add_to_output=false, A_col_stride=14, B_col_stride=1, 
        out_col_stride=14, nrows_per_chunk=512) at /home/user/src/bolt/cpp/src/utils/avx_utils.hpp:394
    #2  0x000000000059ff9e in sgemm_colmajor (A=0x8878e0, B=0x875d80, N=14, D=1, M=2, out=0x886cc0) at /home/user/src/bolt/cpp/src/utils/avx_utils.cpp:18
    #3  0x00000000005ee949 in _test_sgemm_colmajor<-1, -1> (N=14, D=1, M=2, simple_entries=false) at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:54
    #4  0x00000000005ec3f5 in ____C_A_T_C_H____T_E_S_T____100 () at /home/user/src/bolt/cpp/test/test_avx_utils.cpp:155
    #5  0x00000000005bb56e in Catch::FreeFunctionTestCase::invoke (this=0x86ef90) at /home/user/src/bolt/cpp/test/external/catch.hpp:5507
    #6  0x00000000005aa337 in Catch::TestCase::invoke (this=0x889280) at /home/user/src/bolt/cpp/test/external/catch.hpp:6389
    #7  0x00000000005b972b in Catch::RunContext::runCurrentTest (this=0x7fffffffd560, redirectedCout="", redirectedCerr="") at /home/user/src/bolt/cpp/test/external/catch.hpp:5131
    #8  0x00000000005b8737 in Catch::RunContext::runTest (this=0x7fffffffd560, testCase=...) at /home/user/src/bolt/cpp/test/external/catch.hpp:5001
    #9  0x00000000005ba095 in Catch::Runner::runTests (this=0x7fffffffd810) at /home/user/src/bolt/cpp/test/external/catch.hpp:5275
    #10 0x00000000005bae1b in Catch::Session::run (this=0x7fffffffdb10) at /home/user/src/bolt/cpp/test/external/catch.hpp:5395
    #11 0x00000000005bace8 in Catch::Session::run (this=0x7fffffffdb10, argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/external/catch.hpp:5378
    #12 0x00000000005ae8bb in main (argc=1, argv=0x7fffffffdce8) at /home/user/src/bolt/cpp/test/main.cpp:22
    
    $ valgrind cpp/bolt-build/bolt
    ==5087== Memcheck, a memory error detector                                                     
    ==5087== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.                       
    ==5087== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright info                
    ==5087== Command: cpp/bolt-build/bolt                                                          
    ==5087==                                                                                       
    brute force sgemm test, n = 1...                                                               
    brute force sgemm test, n = 2...                                                               
    brute force sgemm test, n = 5...                                                                                                                                                              
    brute force sgemm test, n = 14...                                                                                                                                                             
    ==5087==                                                                                                                                                                                      
    ==5087== Process terminating with default action of signal 11 (SIGSEGV)                                                                                                                       
    ==5087==  General Protection Fault                                                                                                                                                            
    ==5087==    at 0x5A1AF9: _mm256_load_ps (avxintrin.h:874)                                                                                                                                     
    ==5087==    by 0x5A1AF9: void (anonymous namespace)::sgemm_colmajor_narrow_padded<1, 2>(float const*, float const*, int, int, int, float*, bool, int, int, int, int) (avx_utils.hpp:394)      
    ==5087==    by 0x59FF9D: sgemm_colmajor(float const*, float const*, int, int, int, float*) (avx_utils.cpp:18)                                                                                 
    ==5087==    by 0x5EE948: void _test_sgemm_colmajor<-1, -1>(int, int, int, bool) (test_avx_utils.cpp:54)                                                                                       
    ==5087==    by 0x5EC3F4: ____C_A_T_C_H____T_E_S_T____100() (test_avx_utils.cpp:155)                                                                                                           
    ==5087==    by 0x5BB56D: Catch::FreeFunctionTestCase::invoke() const (catch.hpp:5507)                                                                                                         
    ==5087==    by 0x5AA336: Catch::TestCase::invoke() const (catch.hpp:6389)                                                                                                                     
    ==5087==    by 0x5B972A: Catch::RunContext::runCurrentTest(std::string&, std::string&) (catch.hpp:5131)
    ==5087==    by 0x5B8736: Catch::RunContext::runTest(Catch::TestCase const&) (catch.hpp:5001)
    ==5087==    by 0x5BA094: Catch::Runner::runTests() (catch.hpp:5275)
    ==5087==    by 0x5BAE1A: Catch::Session::run() (catch.hpp:5395)
    ==5087==    by 0x5BACE7: Catch::Session::run(int, char* const*) (catch.hpp:5378)
    ==5087==    by 0x5AE8BA: main (main.cpp:22)
    

    Work to Resolve

    I pursued this just a little bit before changing tasks. Following the control flow, it looked to me like the unaligned memory arose from passing a matrix with a nonaligned stride greater than 8, to sgemm_colmajor. I've come up with the below so far, but have not yet tested and debugged it and make frequent mistakes so likely something is wrong. The patch is pasted here from a tmux pane and then hand edited, so may need manual application.

    diff --git a/cpp/test/test_avx_utils.cpp b/cpp/test/test_avx_utils.cpp
    index 4ec4e00..34c318a 100644
    --- a/cpp/test/test_avx_utils.cpp
    +++ b/cpp/test/test_avx_utils.cpp
    @@ -44,6 +44,8 @@ void _test_sgemm_colmajor(int N, int D, int M, bool simple_entries=false) {
             B.setRandom();
         }
         C = (C.array() + -999).matrix();  // value we won't accidentally get
    +    int aligned_rows = C.rows() - (C.rows() % (-32 / int(sizeof(float))));
    +    C.resize(aligned_rows, C.cols());
    
    opened by xloem 0
  • setup.py does not work on osx (`error: command swig failed with exit status 1`)

    setup.py does not work on osx (`error: command swig failed with exit status 1`)

    Hello,

    I've seen the other thread about installation but did not find an answer to my problem.

    I can install swig and numpy but when I do python3 setup.py install I get the following

    ....
    running install
    running bdist_egg
    running egg_info
    writing pybolt.egg-info/PKG-INFO
    writing dependency_links to pybolt.egg-info/dependency_links.txt
    writing requirements to pybolt.egg-info/requires.txt
    writing top-level names to pybolt.egg-info/top_level.txt
    reading manifest file 'pybolt.egg-info/SOURCES.txt'
    writing manifest file 'pybolt.egg-info/SOURCES.txt'
    installing library code to build/bdist.macosx-10.9-x86_64/egg
    running install_lib
    running build_py
    creating build/lib.macosx-10.9-x86_64-3.7
    creating build/lib.macosx-10.9-x86_64-3.7/bolt
    copying python/bolt/bolt.py -> build/lib.macosx-10.9-x86_64-3.7/bolt
    copying python/bolt/bolt_api.py -> build/lib.macosx-10.9-x86_64-3.7/bolt
    copying python/bolt/__init__.py -> build/lib.macosx-10.9-x86_64-3.7/bolt
    copying python/bolt/native.i -> build/lib.macosx-10.9-x86_64-3.7/bolt
    running build_ext
    building '_bolt' extension
    swigging python/bolt/native.i to python/bolt/native_wrap.c
    swig -python -o python/bolt/native_wrap.c python/bolt/native.i
    python/bolt/config.i:50: Warning 301: class keyword used, but not in C++ mode.
    python/bolt/config.i:51: Error: Syntax error - possibly a missing semicolon.
    error: command 'swig' failed with exit status 1
    

    It seems swig is not installed but I can do swig -version and I get

    
    SWIG Version 4.0.2
    
    Compiled with clang++ [x86_64-apple-darwin21.1.0]
    
    Configured options: +pcre
    
    Please see http://www.swig.org for reporting bugs and further information
    
    opened by davidbp 0
  • mithral C++ parameter optimization

    mithral C++ parameter optimization

    Does the mithral C++ codebase provide functions for parameter optimization? I have problems finding it and executing run_matmul() in the struct mithral_amm_task returns a wrong output matrix, since the optimization parameters are set randomly. Thank you!

    opened by fjrdev 16
Owner
Machine learning PhD Student at MIT. I build fast machine learning algorithms.
null
A play around of mathematical functions to draw interesting objects to the screen.

LibDragonN64 Color Graphics Test A play around of mathematical functions to draw interesting objects to the screen. Compile Script (Windows only) In V

null 1 Dec 11, 2021
Some out-of-core matrix operations via HDF5.

hdfmat Version: 0.2-2 License: BSD 2-Clause Author: Drew Schmidt This package provides some out-of-core matrix operations via HDF5. The main purpose o

Drew Schmidt 5 Jan 9, 2022
A Code Base for Matrix operations in C++

SimpM A Code Base for Matrix operations in C++ Dependencies: GNU Bignum Library: https://gmplib.org Needed to be installed on you computer. Check your

null 3 Oct 27, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Dec 31, 2022
An updated, 1.2.1 version of Equalizer APO running with internal double precision processing (64 bit)

EqualizerAPO - 64bit port This repo contains an updated, 1.2.1 64-bit port of EqualizerAPO - system wide equalizer for Windows. The port here is inspi

FireKahuna 41 Dec 4, 2022
Library for nonconvex constrained optimization using the augmented Lagrangian method and the matrix-free PANOC algorithm.

alpaqa Alpaqa is an efficient implementation of the Augmented Lagrangian method for general nonlinear programming problems, which uses the first-order

OPTEC 21 Dec 9, 2022
Seidel's Algorithm: Linear-Complexity Linear Programming for Small-Dimensional Variables

SDLP Seidel's Algorithm: Linear-Complexity Linear Programming (LP) for Small-Dimensions About This solver is super efficient for small-dimensional LP

ZJU FAST Lab 43 Dec 19, 2022
SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena.

simd_fastinvsqrt SIMD (SSE) implementation of the infamous Fast Inverse Square Root algorithm from Quake III Arena. Why Why not. How This video explai

Liam 7 Oct 4, 2022
Optimized implementations of the Number Theoretic Transform (NTT) algorithm for the ring R/(X^N + 1) where N=2^m.

optimized-number-theoretic-transform-implementations This sample code package is an implementation of the Number Theoretic Transform (NTT) algorithm f

International Business Machines 12 Nov 14, 2022
A simple C++ complex & real matrix library, with matrix inversion, left division and determinant calculation

NaiveMatrixLib 帆帆的简易矩阵计算库 A simple C++ stdlib-based complex & real matrix library, with matrix inversion, left division (A\b) and determinant calculat

FerryYoungFan 50 Dec 28, 2022
Jing-Kalk is a beautifully designed calculator that conforms to the JingOS style and Integrating the interactive experience of pad and PC.

Jing-Kalk Jing-Kalk is based on Kalk gitlab. Jing-Kalk is a beautifully designed calculator that conforms to the JingOS style and Integrating the inte

JingOS 42 Nov 26, 2022
A C library for statistical and scientific computing

Apophenia is an open statistical library for working with data sets and statistical or simulation models. It provides functions on the same level as t

null 186 Sep 11, 2022
MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).

MIRACL What is MIRACL? Multiprecision Integer and Rational Arithmetic Cryptographic Library – the MIRACL Crypto SDK – is a C software library that is

MIRACL 527 Jan 7, 2023
a lean linear math library, aimed at graphics programming. Supports vec3, vec4, mat4x4 and quaternions

linmath.h -- A small library for linear math as required for computer graphics linmath.h provides the most used types required for programming compute

datenwolf 729 Jan 9, 2023
tiny recursive descent expression parser, compiler, and evaluation engine for math expressions

TinyExpr TinyExpr is a very small recursive descent parser and evaluation engine for math expressions. It's handy when you want to add the ability to

Lewis Van Winkle 1.2k Jan 6, 2023
C++ tensors with broadcasting and lazy computing

Multi-dimensional arrays with broadcasting and lazy computing. Introduction xtensor is a C++ library meant for numerical analysis with multi-dimension

Xtensor Stack 2.8k Jan 8, 2023
nml is a simple matrix and linear algebra library written in standard C.

nml is a simple matrix and linear algebra library written in standard C.

Andrei Ciobanu 45 Dec 9, 2022
libmpc++ is a C++ header-only library to solve linear and non-linear MPC

libmpc++ libmpc++ is a C++ library to solve linear and non-linear MPC. The library is written in modern C++17 and it is tested to work on Linux, macOS

Nicola Piccinelli 46 Dec 20, 2022
A lightweight, minimal and customisable maths library for C99

Small Maths Library A lightweight, minimal and customisable maths library for C99, generated by Lua. Generating Requires Lua 5.3. lua sml.lua Generat

null 5 May 6, 2022