A c++ toolbox of locality-sensitive hashing (LSH), provides several popular LSH algorithms, also support python and matlab.

Related tags

Containers LSHBOX
Overview

LSHBOX-0.9

A C++ Toolbox of Locality-Sensitive Hashing for Large Scale Image Retrieval, Also Support Python and MATLAB.


Change Log

  • 2015.07.04

A new LSH method, K-means Based Double-Bit Quantization for Hashing (KDBQ), was added into LSHBOX-0.9 on July 4th, 2015. We implement KDBQ by C++ but also provide MATLAB interface. And the Python interface will be added into LSHBOX-0.9 later. Other files related to KDBQ have been updated synchronously.

  • 2015.06.04

A new LSH method, Double-Bit Quantization Hashing (DBQ), was added into LSHBOX-0.9 on June 4th, 2015. We implement DBQ by C++ but also provide MATLAB interface. And the Python interface will be added into LSHBOX-0.9 later. Other files related to DBQ have been updated synchronously.

Chapter 1 - Introduction

Locality-Sensitive Hashing (LSH) is an efficient method for large scale image retrieval, and it achieves great performance in approximate nearest neighborhood searching.

LSHBOX is a simple but robust C++ toolbox that provides several LSH algrithms, in addition, it can be integrated into Python and MATLAB languages. The following LSH algrithms have been implemented in LSHBOX, they are:

  • LSH Based on Random Bits Sampling
  • Random Hyperplane Hashing
  • LSH Based on Thresholding
  • LSH Based on p-Stable Distributions
  • Spectral Hashing (SH)
  • Iterative Quantization (ITQ)
  • Double-Bit Quantization Hashing (DBQ)
  • K-means Based Double-Bit Quantization Hashing (KDBQ)

There are two repositories for compilation and performance tests, they are:

In addition, File-Based-ITQ is an File Based ITQ example for LSHBOX.

Part of the code depends on the C++11, So I think your compiler should support this feature. We tested LSHBOX with VS2010 in Windows 7/8 32bit/64bit and with g++ in Linux, Mac test will be done in the future. We hope that there are more people that join in the test or contribute more algrithms.

Please feel free to contact us [[email protected], [email protected] or [email protected]] if you have any questions.

Chapter 2 - Compilation

LSHBOX is written by C++. And it also can be easily used in many contexts through the Python and MATLAB bindings provided with this toolbox.

LSHBOX is simple and easy to use. If you want to integrate LSHBOX into you application, it don't need compile. You only need to add the include directory or modify the program search path, then you can use this library directly in C, C++, Python or MATLAB.

If you want to test or contribute, CMAKE, a cross-platform, open-source build system, is usded to build some tools for the purpose. CMake can be downloaded from CMake' website.

In some cases, if you want or need to compile it by yourself with Python and MATLAB, please delete the comment of the last two lines in file CMakeLists.txt, and you will find the compiling progress of python must rely on Boost library or some part of this library. For more detailed information, you can view the document ./python/README.md.

During compilation, create a new directory named build in the main directory, then choose a appropriate compiler and switch to the build directory, finally, execute the following command according to your machine:

  • Windows
cmake -DCMAKE_BUILD_TYPE=Release .. -G"NMake Makefiles"
nmake
  • Linux & OS X
cmake ..
make

Chapter 3 - Usage

This chapter contains small examples of how to use the LSHBOX library from different programming languages (C++, Python and MATLAB).

For C++

/**
 * @file itqlsh_test.cpp
 *
 * @brief Example of using Iterative Quantization LSH index for L2 distance.
 */
#include <lshbox.h>
int main(int argc, char const *argv[])
{
    if (argc != 4)
    {
        std::cerr << "Usage: ./itqlsh_test data_file lsh_file benchmark_file" << std::endl;
        return -1;
    }
    std::cout << "Example of using Iterative Quantization" << std::endl << std::endl;
    typedef float DATATYPE;
    std::cout << "LOADING DATA ..." << std::endl;
    lshbox::timer timer;
    lshbox::Matrix<DATATYPE> data(argv[1]);
    std::cout << "LOAD TIME: " << timer.elapsed() << "s." << std::endl;
    std::cout << "CONSTRUCTING INDEX ..." << std::endl;
    timer.restart();
    std::string file(argv[2]);
    bool use_index = false;
    lshbox::itqLsh<DATATYPE> mylsh;
    if (use_index)
    {
        mylsh.load(file);
    }
    else
    {
        lshbox::itqLsh<DATATYPE>::Parameter param;
        param.M = 521;
        param.L = 5;
        param.D = data.getDim();
        param.N = 8;
        param.S = 100;
        param.I = 50;
        mylsh.reset(param);
        mylsh.train(data);
        mylsh.hash(data);
    }
    mylsh.save(file);
    std::cout << "CONSTRUCTING TIME: " << timer.elapsed() << "s." << std::endl;
    std::cout << "LOADING BENCHMARK ..." << std::endl;
    timer.restart();
    lshbox::Matrix<DATATYPE>::Accessor accessor(data);
    lshbox::Metric<DATATYPE> metric(data.getDim(), L1_DIST);
    lshbox::Benchmark bench;
    std::string benchmark(argv[3]);
    bench.load(benchmark);
    unsigned K = bench.getK();
    lshbox::Scanner<lshbox::Matrix<DATATYPE>::Accessor> scanner(
        accessor,
        metric,
        K
    );
    std::cout << "LOADING TIME: " << timer.elapsed() << "s." << std::endl;
    std::cout << "RUNING QUERY ..." << std::endl;
    timer.restart();
    lshbox::Stat cost, recall;
    lshbox::progress_display pd(bench.getQ());
    for (unsigned i = 0; i != bench.getQ(); ++i)
    {
        mylsh.query(data[bench.getQuery(i)], scanner);
        recall << bench.getAnswer(i).recall(scanner.topk());
        cost << float(scanner.cnt()) / float(data.getSize());
        ++pd;
    }
    std::cout << "MEAN QUERY TIME: " << timer.elapsed() / bench.getQ() << "s." << std::endl;
    std::cout << "RECALL   : " << recall.getAvg() << " +/- " << recall.getStd() << std::endl;
    std::cout << "COST     : " << cost.getAvg() << " +/- " << cost.getStd() << std::endl;

    // mylsh.query(data[0], scanner);
    // std::vector<std::pair<float, unsigned> > res = scanner.topk().getTopk();
    // for (std::vector<std::pair<float, unsigned> >::iterator it = res.begin(); it != res.end(); ++it)
    // {
    //     std::cout << it->second << ": " << it->first << std::endl;
    // }
    // std::cout << "DISTANCE COMPARISON TIMES: " << scanner.cnt() << std::endl;
}

You can get the sample dataset audio.data from http://www.cs.princeton.edu/cass/audio.tar.gz, if the link is invalid, you can also get it from LSHBOX-sample-data.

FOR EXAMPLE, YOU CAN RUN THE FOLLOWING CODE IN COMMAND LINE AFTER BUILD ALL THE TOOLS:

> create_benchmark audio.data audio.ben 200 50
> itqlsh_test audio.data audio.itq audio.ben
NOTE1:

In our project, the format of the input file (such as audio.data, which is in float data type) is a binary file but not a text file, because binary file has many advantages. In LSHBOX/tools/create_test_data.cpp, we create a binary file with unsigned data type, from the process, you will find that the binary file is organized as the following format:

{Bytes of the data type} {The size of the vectors} {The dimension of the vectors} {All of the binary vector, arranged in turn}

For your application, you should also transform your dataset into this binary format.

NOTE2:

In addition, the dataset should be zero-centered, IT IS VERY IMPORTANT!

For Python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# pylshbox_example.py
import pylshbox
import numpy as np
print 'prepare test data'
float_mat = np.random.randn(100000, 192)
float_query = float_mat[0]
unsigned_mat = np.uint32(float_mat * 5)
unsigned_query = unsigned_mat[0]
print ''
print 'Test rbsLsh'
rbs_mat = pylshbox.rbslsh()
rbs_mat.init_mat(unsigned_mat.tolist(), '', 521, 5, 20, 5)
result = rbs_mat.query(unsigned_query.tolist(), 1)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print ''
print 'Test rhpLsh'
rhp_mat = pylshbox.rhplsh()
rhp_mat.init_mat(float_mat.tolist(), '', 521, 5, 6)
result = rhp_mat.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print ''
print 'Test thLsh'
th_mat = pylshbox.thlsh()
th_mat.init_mat(float_mat.tolist(), '', 521, 5, 12)
result = th_mat.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print ''
print 'Test psdlsh with param.T = 1'
psdL1_mat = pylshbox.psdlsh()
psdL1_mat.init_mat(float_mat.tolist(), '', 521, 5, 1, 5)
result = psdL1_mat.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print ''
print 'Test psdlsh with param.T = 2'
psdL2_mat = pylshbox.psdlsh()
psdL2_mat.init_mat(float_mat.tolist(), '', 521, 5, 2, 0.5)
result = psdL2_mat.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print ''
print 'Test shLsh'
sh_mat = pylshbox.shlsh()
sh_mat.init_mat(float_mat.tolist(), '', 521, 5, 4, 100)
result = sh_mat.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print ''
print 'Test itqLsh'
itq_mat = pylshbox.itqlsh()
itq_mat.init_mat(float_mat.tolist(), '', 521, 5, 8, 100, 50)
result = itq_mat.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
NOTE:

In Windows, the py module name is pylshbox, but in linux, it will be libpylshbox.

For MATLAB

% matlab_example.m
disp('prepare test data ...')
dataset = randn(128,100000);
dataset = dataset - repmat(mean(dataset), size(dataset, 1), 1);
testset = dataset(:,1:10);
disp('ok')
input('Test rhplsh, Press any key to continue ...')
param_rhp.M = 521;
param_rhp.L = 5;
param_rhp.N = 6;
[indices, dists] = rhplsh(dataset, testset, param_rhp, '', 2, 10)
input('Test thlsh, Press any key to continue ...')
param_th.M = 521;
param_th.L = 5;
param_th.N = 12;
[indices, dists] = thlsh(dataset, testset, param_th, '', 2, 10)
input('Test psdlsh with param_psdL1.T = 1, Press any key to continue ...')
param_psdL1.M = 521;
param_psdL1.L = 5;
param_psdL1.T = 1;
param_psdL1.W = 5;
[indices, dists] = psdlsh(dataset, testset, param_psdL1, '', 1, 10)
input('Test psdlsh with param_psdL2.T = 2, Press any key to continue ...')
param_psdL2.M = 521;
param_psdL2.L = 5;
param_psdL2.T = 2;
param_psdL2.W = 0.5;
[indices, dists] = psdlsh(dataset, testset, param_psdL2, '', 2, 10)
input('Test shlsh, Press any key to continue ...')
param_sh.M = 521;
param_sh.L = 5;
param_sh.N = 4;
param_sh.S = 100;
[indices, dists] = shlsh(dataset, testset, param_sh, '', 2, 10)
disp('Test itqlsh, Press any key to continue.')
param_itq.M = 521;
param_itq.L = 5;
param_itq.N = 8;
param_itq.S = 100;
param_itq.I = 50;
[indices, dists] = itqlsh(dataset, testset, param_itq, '', 2, 10)
disp('Test dbqlsh, Press any key to continue.')
param_dbq.M = 521;
param_dbq.L = 5;
param_dbq.N = 4;
param_dbq.I = 5;
[indices, dists] = dbqlsh(dataset, testset, param_dbq, '', 2, 10)
disp('Test kdbqlsh, Press any key to continue.')
param_kdbq.M = 521;
param_kdbq.L = 5;
param_kdbq.N = 4;
param_kdbq.I = 50;
[indices, dists] = kdbqlsh(dataset, testset, param_kdbq, '', 2, 10)

Have you ever find the empty string used in the Python and MATLAB code? In fact, they can be used to save the index through pass a file name. Like the following, you will find the next query speed faster than the first, because there is no re-indexing.

In Python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# pylshbox_example2.py
import pylshbox
import numpy as np
import time
print 'prepare test data'
float_mat = np.random.randn(100000, 192)
float_query = float_mat[0]
print ''
print 'Test itqLsh'
print ''
print 'First time, need to constructing index.'  # About 7s.
start = time.time()
itq_mat = pylshbox.itqlsh()
itq_mat.init_mat(float_mat.tolist(), 'pyitq.lsh', 521, 5, 8, 100, 50)
result = itq_mat.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print 'Elapsed time is %f seconds.' % (time.time() - start)
print ''
print 'Second time, no need to re-indexing.'  # About 3s.
start = time.time()
itq_mat2 = pylshbox.itqlsh()
itq_mat2.init_mat(float_mat.tolist(), 'pyitq.lsh')
result = itq_mat2.query(float_query.tolist(), 2, 10)
indices, dists = result[0], result[1]
for i in range(len(indices)):
    print indices[i], '\t', dists[i]
print 'Elapsed time is %f seconds.' % (time.time() - start)

In MATLAB

% matlab_example2.m
dataset = randn(128,500000);
testset = dataset(:,1:10);
disp('Test itqlsh')
param_itq.M = 521;
param_itq.L = 5;
param_itq.N = 8;
param_itq.S = 100;
param_itq.I = 50;
disp('First time, need to constructing index') % About 13s.
tic;
[indices, dists] = itqlsh(dataset, testset, param_itq, 'itq.lsh', 2, 10);
toc;
disp('Second time, no need to re-indexing') % About 0.5s.
tic;
[indices, dists] = itqlsh(dataset, testset, param_itq, 'itq.lsh', 2, 10);
toc;

Chapter 4 - Algorithm

LSHBOX is based on many approximate nearest neighbor schemes, and the following is a brief description of each algorithm and its parameters.

4.1 - Locality-Sensitive Hashing Scheme Based on Random Bits Sampling

Reference
P. Indyk and R. Motwani. Approximate Nearest Neighbor - Towards Removing the Curse of Dimensionality. In Proceedings of the 30th Symposium on Theory of Computing, 1998, pp. 604-613.

A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), 1999.
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Binary code bytes
	unsigned N;
	/// The Difference between upper and lower bound of each dimension
	unsigned C;
};
Implementation
#include <lshbox/lsh/rbslsh.h>

According to the second assumption in the paper, all coordinates of points in P are positive integer. Although we can convert all coordinates to integers by multiplying them by a suitably large number and rounding to the nearest integer, but I think it is very fussy, What's more, it often gets criticized for using too much memory when in a larger range of data. Therefore, it is recommended to use other algorithm.

4.2 - Locality-Sensitive Hashing Scheme Based on Random Hyperplane

Reference
Charikar, M. S. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the Thiry-Fourth Annual ACM Symposium on theory of Computing (Montreal, Quebec, Canada, May 19 - 21, 2002). STOC '02. ACM, New York, NY, 380-388. DOI= http://doi.acm.org/10.1145/509907.509965
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Binary code bytes
	unsigned N;
};
Implementation
#include <lshbox/lsh/rhplsh.h>

4.3 - Locality-Sensitive Hashing Scheme Based on Thresholding

Reference
Zhe Wang, Wei Dong, William Josephson, Qin Lv, Moses Charikar, Kai Li. Sizing Sketches: A Rank-Based Analysis for Similarity Search. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems . San Diego, CA, USA. June 2007.

Qin Lv, Moses Charikar, Kai Li. Image Similarity Search with Compact Data Structures. In Proceedings of ACM 13th Conference on Information and Knowledge Management (CIKM), Washington D.C., USA. November 2004.
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Binary code bytes
	unsigned N;
	/// Upper bound of each dimension
	float Max;
	/// Lower bound of each dimension
	float Min;
};
Implementation
#include <lshbox/lsh/thlsh.h>

4.4 - Locality-Sensitive Hashing Scheme Based on p-Stable Distributions

Reference
Mayur Datar , Nicole Immorlica , Piotr Indyk , Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the twentieth annual symposium on Computational geometry, June 08-11, 2004, Brooklyn, New York, USA.
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Index mode, you can choose 1(CAUCHY) or 2(GAUSSIAN)
	unsigned T;
	/// Window size
	float W;
};
Implementation
#include <lshbox/lsh/psdlsh.h>

4.5 - Spectral Hashing

Reference
Y. Weiss, A. Torralba, R. Fergus. Spectral Hashing. Advances in Neural Information Processing Systems, 2008.
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Binary code bytes
	unsigned N;
	/// Size of vectors in train
	unsigned S;
};
Implementation
#include <lshbox/lsh/shlsh.h>

4.6 - Iterative Quantization

Reference
Gong Y, Lazebnik S, Gordo A, et al. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2013, 35(12): 2916-2929.
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Binary code bytes
	unsigned N;
	/// Size of vectors in train
	unsigned S;
	/// Training iterations
	unsigned I;
};
Implementation
#include <lshbox/lsh/itqlsh.h>

4.7 - Double-Bit Quantization Hashing

Reference
Kong W, Li W. Double-Bit Quantization for Hashing. In AAAI, 2012.

Gong Y, Lazebnik S, Gordo A, et al. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2013, 35(12): 2916-2929.
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Number of projection dimensions,corresponding to 2*N binary code bytes for each vector
	unsigned N;
	/// Training iterations
	unsigned I;
};
Implementation
#include <lshbox/lsh/dbqlsh.h>

4.8 - K-means Based Double-Bit Quantization Hashing

Reference
Zhu, H. K-means based double-bit quantization for hashing.Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP),2014 IEEE Symposium on (pp.1-5). IEEE.
Parameters
struct Parameter
{
	/// Hash table size
	unsigned M;
	/// Number of hash tables
	unsigned L;
	/// Dimension of the vector
	unsigned D;
	/// Number of projection dimensions,corresponding to 2*N binary code bytes for each vector
	unsigned N;
	/// Training iterations
	unsigned I;
};
Implementation
#include <lshbox/lsh/kdbqlsh.h>

According to the test, Double-Bit Quantization Hashing and Iterative Quantization performance very good and are superior to other schemes. Iterative Quantization can get high query accuracy with minimum cost while Double-Bit Quantization Hashing can achieve better query accuracy.

Issues
  • Python module fails to compile

    Python module fails to compile

    I'm attempting to compile LSHBOX with python. The compilation completes when using only the C++ module. I have the boost include headers in the path, using gcc on ubuntu 14.04. I get the following error

    /home/ubuntu/LSHBOX/python/pylshbox.cpp: In member function ‘boost::python::list lshbox::pyRbsLsh::query(boost::python::list&, unsigned int, unsigned int)’:
    /home/ubuntu/LSHBOX/python/pylshbox.cpp:138:79: error: conversion from ‘std::vector<std::pair<float, unsigned int>, std::allocator<std::pair<float, unsigned int> > >’ to non-scalar type ‘std::vector<std::pair<unsigned int, float> >’ requested
         std::vector<std::pair<unsigned, float> > tmp = scanner.topk().getTopk();
    

    The error is repeated several times for the other algorithms. I've been able to compile this in the past, maybe related to some of the recent changes ? Best.

    opened by yhenon 3
  • Python import error in Linux

    Python import error in Linux

    Hi, I have compiled the code but I couldn't find the libpylshbox.so binary in my build directory. Can you explain the Linux compilation process in detail.

    Thank you very much the great tool :+1:

    Regards, Maggie

    opened by chubbymaggie 3
  • Compile Errors in Ubuntu14.04

    Compile Errors in Ubuntu14.04

    [ 10%] Building CXX object tools/CMakeFiles/create_test_data.dir/create_test_data.cpp.o In file included from /home/lili/workspace/LSHBOX/include/lshbox.h:44:0, from /home/lili/workspace/LSHBOX/tools/create_test_data.cpp:29: /home/lili/workspace/LSHBOX/include/lshbox/kdbqlsh.h: In member function ‘void Hi, Thanks a lot for contributing this LSHBOX! Currently there are the following compiling errors in Ubuntu14.04. Any suggestions ? Thanks! lshbox::kdbqLsh::Cluster(unsigned int, Eigen::MatrixXf)’: /home/lili/workspace/LSHBOX/include/lshbox/kdbqlsh.h:273:19: error: ‘FLT_MAX’ was not declared in this scope float minVar = FLT_MAX; ^ make[2]: *** [tools/CMakeFiles/create_test_data.dir/create_test_data.cpp.o] Error 1 make[1]: *** [tools/CMakeFiles/create_test_data.dir/all] Error 2 make: *** [all] Error 2

    opened by LiliMeng 2
  • Typo in example in README.md

    Typo in example in README.md

    There seems to be a typo in the provided example in the README.md file at line 51:

        for (int I = 0; i != Q; ++i)
    

    Should be

        for (int i = 0; i != Q; ++i)
    

    Thank you for the work on the library.

    opened by yhenon 2
  • bug in kdbqlsh.h

    bug in kdbqlsh.h

    In function void lshbox::kdbqLsh<DATATYPE>::DataProjectoin()

    The two matrix multiplication iteration blocks seems wrong. I guess the code here are designed to do matrix multiplications, but mess up the array subscripts.

    for (unsigned i = 0; i != X.rows(); ++i)
            {
                for (unsigned q = 0; q != param.N; ++q)
                {
                    float sum = 0.0;
                    for (unsigned j = 0; j != X.cols(); ++j)
                    {
                        sum += X(i, j) * pcsAll[k][q][j];//bug here
                    }
                    prj(i, q) = sum;
                }
            }
    
    opened by duming 1
  • kdbqlsh.h   save and load function has bug.

    kdbqlsh.h save and load function has bug.

    It seems that the save and load function in kdbqlsh.h is a copy of the same functions in dbqlsh. u0, u1, and u2 are missing from these two functions.
    If you save the index and then load it back, the index won't work.

    opened by duming 1
  • Some question about K means double bit quantization hashing

    Some question about K means double bit quantization hashing

    @primetang I noticed that in your code kdbqlsh.h Instead of using double bit codes such as {11,01,00} to quantitate the feature vector, which proposed in "K-MEANS BASED DOUBLE-BIT QUANTIZATION FOR HASHING Hao Zhu". You are using the sum of a random array to generate the hash value. I looked at several other references you provide in the README file, but I can't find the source of this algorithm.

    Could you please explain the algorithm that you are currently using to me. Thank you.

    Your code:

                if (label == 0)
                {
                    sum += rndArray[k][2 * i + 1];
                }
                if (label == 2)
                {
                    sum += rndArray[k][2 * i];
                }
    
    opened by duming 1
  • Python Import error in linux

    Python Import error in linux

    I'm having troubles to import this in linux. Trying to run the examples without any change yield the error:

    Traceback (most recent call last): File "pylshbox-example.py", line 4, in import pylshbox ImportError: No module named pylshbox

    for any of the 3 versions x86, x64, mingw.

    I tried recompiling but it only generates a pyd which is for windows (afaik).

    Any idea on what could I be doing wrong? Thx in advance.

    opened by TieSKey 1
  • Fix broken headings in Markdown files

    Fix broken headings in Markdown files

    GitHub changed the way Markdown headings are parsed, so this change fixes it.

    See bryant1410/readmesfix for more information.

    Tackles bryant1410/readmesfix#1

    opened by bryant1410 0
  • error in matlab

    error in matlab

    Hello, I wanna use your code, but i have a simple problem and i appreciate if you help me, i have this matrix that attached as '2', as you see these numbers are so close together, so when i run your code matlab is going to close(as you see in image 2 that attached). can you help me please? best regards. 2 1

    opened by mahsagof 1
  • about psdlsh.cpp and create_test_data.cpp

    about psdlsh.cpp and create_test_data.cpp

    when I use your LSHBOX, i meet some problem. link https://github.com/RSIA-LIESMARS-WHU/LSHBOX#for-python i use create_test_data.cpp to create the test file, and use psdlsh.cpp to get the hash val, but there are some abnormal thing: RECALL : 1 +/- 0 COST : 1 +/- 0 and all the data[i]'s hash val is 0.

    opened by zmypla 0
  • matlab functions not working

    matlab functions not working

    Hi

    I executed "cmake .." and make on linux for compilation with MATLAB. For this I had uncommented the corresponding line within CMakeLists.txt. However, when I use the sample code, it gives the error

    Undefined function or variable 'rhplsh'.

    Do we need to compile/add anything else as well ?

    opened by siddharthsrivastava 0
  • Mexa files

    Mexa files

    Hello, The mex files are windows only. I suggest to to create for linux (mexa) and mac (mexmaci64) as well. I tried to create a mex file for linux, however the I get the error:

    mex rhplsh_search.cpp /LSHBOX-master/matlab/Linux/rhplsh_search.cpp:24:20: fatal error: lshbox.h: No such file or directory compilation terminated.

    opened by areeberg 1
Owner
remote sensing image analyzing group in Wuhan University
null
C++ implementation of a fast hash map and hash set using robin hood hashing

A C++ implementation of a fast hash map and hash set using robin hood hashing The robin-map library is a C++ implementation of a fast hash map and has

Thibaut Goetghebuer-Planchon 791 Aug 1, 2022
Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

➵ robin_hood unordered map & set robin_hood::unordered_map and robin_hood::unordered_set is a platform independent replacement for std::unordered_map

Martin Ankerl 1.2k Jul 30, 2022
Prime Number Projects in C#/C++/Python

Primes | A Software Drag Race Source code to Dave's Garage video benchmarking the same prime number sieve in Python, C#, and C++. Community contributi

null 76 Aug 6, 2022
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of bil

Giorgio Vinciguerra 615 Aug 7, 2022
A drop-in replacement for std::list with 293% faster insertion, 57% faster erasure, 17% faster iteration and 77% faster sorting on average. 20-24% speed increase in use-case testing.

plf_list A drop-in replacement for std::list with (on average): 293% faster insertion 57% faster erasure 17% faster iteration 77% faster sorting 70% f

Matt Bentley 116 Jul 25, 2022
ring-span lite - A C++yy-like ring_span type for C++98, C++11 and later in a single-file header-only library

ring-span lite: A circular buffer view for C++98 and later Contents Example usage In a nutshell Dependencies Installation Synopsis Reported to work wi

Martin Moene 122 May 23, 2022
CyberVal is a paste of a internal Valorant Cheat which has been used by several providers like LeagueHell, Enduty and several other pasted chairs.

CyberVal CyberVal is a paste of a internal Valorant Cheat which has been used by several providers like LeagueHell, Enduty and several other pasted ch

zeroday 25 Jul 31, 2022
ParaMonte: Plain Powerful Parallel Monte Carlo and MCMC Library for Python, MATLAB, Fortran, C++, C.

Overview | Installation | Dependencies | Parallelism | Examples | Acknowledgments | License | Authors ParaMonte: Plain Powerful Parallel Monte Carlo L

Computational Data Science Lab 171 Aug 1, 2022
Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and deploy without Python.

Python Inference Script(PyIS) Python Inference Script is a Python package that enables developers to author machine learning workflows in Python and d

Microsoft 10 Feb 23, 2022
HashLibPlus is a recommended C++11 hashing library that provides a fluent interface for computing hashes and checksums of strings, files, streams, bytearrays and untyped data to mention but a few.

HashLibPlus HashLibPlus is a recommended C++11 hashing library that provides a fluent interface for computing hashes and checksums of strings, files,

Telepati 7 Apr 11, 2022
Demonstrates basic advantages of integrating the Data Distribution Service (DDS) and Time-Sensitive Networking (TSN) Ethernet.

ROS2-DDS-TSN integration demo This repository demonstrates basic advantages of integrating the Data Distribution Service (DDS) and Time-Sensitive Netw

NXP 45 Jul 29, 2022
Several algorithms and data structures implemented in C++ by me (credited to others where necessary).

Algorithms This repository contains my implementations of several algorithms and data structures in C++ (credited to others where necessary). It has i

Petar Veličković 575 Aug 3, 2022
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

BLLIP Reranking Parser Copyright Mark Johnson, Eugene Charniak, 24th November 2005 --- August 2006 We request acknowledgement in any publications that

Brown Laboratory for Linguistic Information Processing 212 Jul 19, 2022
Several translations of segment trees for CMU's 15-451 (Algorithms).

segtrees Several translations of segment trees for CMU's 15-451 (Algorithms). Feel free to contact me at [email protected] for questions/comment

Abigale Kim 20 Jun 17, 2022
Implementation of popular uncertain frequent subgraph mining algorithms

UFreS Algorithms for mining frequent subgraph patterns from Uncertain Graph Databases. The Datasets Folder contains the datasets used in our empirical

Riddho Ridwanul Haque 2 Nov 3, 2021
LSH/Hypercube kNN and KMeans++ Clustering on polygonic curves and time series

kNN-and-Clustering-on-Time-Series-and-Curves LSH/Hypercube kNN and KMeans++ Clustering on polygonic curves and time series In this project we will exp

Aristi_Papastavrou 14 Apr 6, 2022
Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration

3D+NeuroSim V1.0 The DNN+NeuroSim framework was developed by Prof. Shimeng Yu's group (Georgia Institute of Technology). The model is made publicly av

NeuroSim 10 Dec 21, 2021
Python envelope for the popular C library libjpeg for handling JPEG files.

jpeglib Python envelope for the popular C library libjpeg for handling JPEG files. libjpeg offers full control over compression and decompression and

Martin Benes 3 Aug 4, 2022
C++ Empirical Transfer Function Estimate (ETFE) Similar to MATLAB tfestimate, pwelch, and cpsd

ETFE.hpp emulates MATLAB's tfestimate, pwelch, and cpsd functions. It calculates the experimental transfer function estimate between input x and output y txy, the power spectral densities pxx and pyy, and the cross spectral density pxy. By default, it behaves exactly as MATLAB's functions, and similarly can be provided with specified windows, overlap, and FFT sizes. The output has been tested to precisely match that of MATLAB's (see matlab\test.m).

Evan Pezent 30 Jun 9, 2022
MATLAB and C++ implementations of sideslip angle estimators

sideslip-angle-vehicle-estimation MATLAB and C++ implementations of sideslip angle estimators Factor graph sideslip angle estimator Papers: "A Factor

Libraries for Multibody Dynamics Simulation (MBDS) 9 Jan 20, 2022
LSH is a simple implementation of a shell in C

It demonstrates the basics of how a shell works. That is: read, parse, fork, exec, and wait. Since its purpose is demonstration (not feature completeness or even fitness for casual use), it has many limitations

Stephen Brennan 1.2k Aug 10, 2022