Faiss is a library for efficient similarity search and clustering of dense vectors.

Overview

Faiss

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

News

See CHANGELOG.md for detailed information about latest features.

Introduction

Faiss contains several methods for similarity search. It assumes that the instances are represented as vectors and are identified by an integer, and that the vectors can be compared with L2 (Euclidean) distances or dot products. Vectors that are similar to a query vector are those that have the lowest L2 distance or the highest dot product with the query vector. It also supports cosine similarity, since this is a dot product on normalized vectors.

Most of the methods, like those based on binary vectors and compact quantization codes, solely use a compressed representation of the vectors and do not require to keep the original vectors. This generally comes at the cost of a less precise search but these methods can scale to billions of vectors in main memory on a single server.

The GPU implementation can accept input from either CPU or GPU memory. On a server with GPUs, the GPU indexes can be used a drop-in replacement for the CPU indexes (e.g., replace IndexFlatL2 with GpuIndexFlatL2) and copies to/from GPU memory are handled automatically. Results will be faster however if both input and output remain resident on the GPU. Both single and multi-GPU usage is supported.

Building

The library is mostly implemented in C++, with optional GPU support provided via CUDA, and an optional Python interface. The CPU version requires a BLAS library. It compiles with a Makefile and can be packaged in a docker image. See INSTALL.md for details.

How Faiss works

Faiss is built around an index type that stores a set of vectors, and provides a function to search in them with L2 and/or dot product vector comparison. Some index types are simple baselines, such as exact search. Most of the available indexing structures correspond to various trade-offs with respect to

  • search time
  • search quality
  • memory used per index vector
  • training time
  • need for external data for unsupervised training

The optional GPU implementation provides what is likely (as of March 2017) the fastest exact and approximate (compressed-domain) nearest neighbor search implementation for high-dimensional vectors, fastest Lloyd's k-means, and fastest small k-selection algorithm known. The implementation is detailed here.

Full documentation of Faiss

The following are entry points for documentation:

Authors

The main authors of Faiss are:

Reference

Reference to cite when you use Faiss in a research paper:

@article{JDH17,
  title={Billion-scale similarity search with GPUs},
  author={Johnson, Jeff and Douze, Matthijs and J{\'e}gou, Herv{\'e}},
  journal={arXiv preprint arXiv:1702.08734},
  year={2017}
}

Join the Faiss community

For public discussion of Faiss or for questions, there is a Facebook group at https://www.facebook.com/groups/faissusers/

We monitor the issues page of the repository. You can report bugs, ask questions, etc.

License

Faiss is MIT-licensed.

Comments
  • faiss::gpu::runMatrixMult failure

    faiss::gpu::runMatrixMult failure

    The full log: Faiss assertion err == CUBLAS_STATUS_SUCCESS failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with T = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at utils/MatrixMult.cu:141Aborted (core dumped)

    I have successfully run demo_ivfpq_indexing_gpu, which I think the faiss was installed successfully.

    bug cant-repro 
    opened by hellolovetiger 36
  • No module named '_swigfaiss' for conda install

    No module named '_swigfaiss' for conda install

    Summary

    Platform

    OS: macOS 10.13.4

    Faiss version:

    Faiss compilation options:

    Running on :

    • [ ] CPU

    Reproduction instructions

    I installed with

    conda install faiss-cpu -c pytorch
    

    and got No module named '_swigfaiss' error. I went into faiss directory and tried to import again, but got the same error message. It is mentioned in the trouble shooting that this error is caused by faiss not being compiled. Since I use conda install, I suppose it is not the case?

    bug install 
    opened by hsiaoma 29
  • make py: fatal error: Python.h: No such file or directory

    make py: fatal error: Python.h: No such file or directory

    I am also facing same issue, i did following steps

    1. Cloned FAISS
    2. updated makefile.inc with anaconda python path and installed necessary dependencies like libopenblas-dev python-numpy python-dev
    3. make (After this step i am not finding any _swigfaiss.so files anywhere)
    4. make py (Gave following error) $ make py g++ -I. -fPIC -m64 -Wall -g -O3 -msse4 -mpopcnt -fopenmp -Wno-sign-compare -std=c++11 -fopenmp -g -fPIC -fopenmp -I~/anaconda2/envs/faissenv/include/python2.7/ -I~/anaconda2/envs/faissenv/lib/python2.7/site-packages/numpy/core/include -shared
      -o python/_swigfaiss.so python/swigfaiss_wrap.cxx libfaiss.a /usr/lib/libopenblas.so.0 python/swigfaiss_wrap.cxx:154:21: fatal error: Python.h: No such file or directory compilation terminated. Makefile:84: recipe for target 'python/_swigfaiss.so' failed make: *** [python/_swigfaiss.so] Error 1 I am able to run cpp implementation, but only this python wrapper is failing, let me know what i am setting wrong. As _swigfaiss.so is not generated, what went wrong while doing make?

    Originally posted by @Mahanteshambi in https://github.com/facebookresearch/faiss/issues/336#issuecomment-365565492

    question cant-repro install 
    opened by daisy-belle 24
  • Faiss import error when run in virtualenv by using own built Faiss-python

    Faiss import error when run in virtualenv by using own built Faiss-python

    Summary

    I have built faiss-core and faiss-python by myself. I installed python into my local virtual env and try to import faiss and I got an error, checked egg file, it does have _swigfaiss.so inside. I checked conda swigfaiss.py, it's still using old swig_import_helper, not sure if caused by this you remove it by using swig create python/swigfaiss.py as follows:

    https://github.com/facebookresearch/faiss/commit/7f5b22b0fff0882ce4afd93ce54cc2833a224909#diff-8cf6167d58ce775a08acafcfe6f40966

    $ ls faiss-1.5.2-py3.6/faiss
    __init__.py	__pycache__	_swigfaiss.so	swigfaiss.py
    

    Platform

    OS: centos 7

    Faiss version: 1.5.2

    Faiss compilation options:

     ./configure  --prefix=/usr --without-cuda --with-blas=/usr/lib64/libblas.so.3 --with-lapack=/usr/lib64/liblapack.so.3
    make
    sudo make install
    make py
    cd ~ && rm -rf env && python3 -m venv env
    source env/bin/activate
    cd ~/faiss && sudo make -C python install
    

    Running on:

    • [X] CPU
    • [ ] GPU

    Interface:

    • [ ] C++
    • [X] Python

    Reproduction instructions

    $ python
    Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)  [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import faiss
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/midas/env/lib/python3.6/site-packages/faiss-1.5.2-py3.6.egg/faiss/__init__.py", line 18, in <module>
      File "/home/midas/env/lib/python3.6/site-packages/faiss-1.5.2-py3.6.egg/faiss/swigfaiss.py", line 13, in <module>
    ImportError: cannot import name '_swigfaiss'
    
    install 
    opened by billyean 23
  • PyTorch tensor / Faiss index interoperability

    PyTorch tensor / Faiss index interoperability

    Summary: This diff allows for native usage of PyTorch tensors for Faiss indexes on both CPU and GPU. It is currently only implemented in this diff for things that inherit from faiss.Index, which covers the non-binary indices, and it patches the same functions on faiss.Index that were also covered by __init__.py for numpy interoperability.

    There must be uniformity among the inputs: if any array input is a Torch tensor, then all array inputs must be Torch tensors. Similarly, if any array input is a numpy ndarray, then all array inputs must be numpy ndarrays.

    If faiss.contrib.torch_utils is imported, it ensures that import faiss has already been performed to patch all of the functions using the base __init__.py numpy wrappers, and then patches the following functions again:

    add
    add_with_ids
    assign
    train
    search
    remove_ids
    reconstruct
    reconstruct_n
    range_search
    update_vectors
    search_and_reconstruct
    sa_encode
    sa_decode
    

    to allow usage of PyTorch CPU tensors, and additionally PyTorch GPU tensors if the index being used is on the GPU.

    numpy functionality is still available when faiss.contrib.torch_utils is imported; we pass through to the original patched numpy function when we detect numpy inputs.

    In addition, to allow for better (asynchronous) GPU usage without requiring the CPU to be involved, all of these functions which construct tensors/arrays for output now take optional arguments for storage (numpy or torch.Tensor) to be provided that will contain the output data. range_search is the only exception to this, as the size of the output data is indeterminate. The eventual GPU implementation will likely require the user to provide a maximum cap on the output size, and allow that to be passed instead. If the optional pre-allocated output values are presented by the user, they are used; otherwise, new return ndarray / Tensors are constructed as before and used for the return. If this feature were not provided on the GPU, then every execution would be completely serial as we would depend upon the CPU to allocate GPU memory before every operation. Instead, now this can function much like NN graph execution on the GPU, assuming that all of the data requirements are pre-allocated, so the execution will run at the full speed of the GPU and not be stalled sequentially launching kernels.

    This diff also exposes the GpuResources shared_ptr object owned by a GPU index. This is required for pytorch GPU so that we can perform proper stream ordering in Faiss with respect to the current pytorch stream. So, Faiss indices now perform more or less as any NN operation in Torch does.

    Note, however, that a Faiss index has its own setting on current device, and if the pytorch GPU tensor inputs are resident on a different device than what the Faiss index expects, a cross-device copy will be initiated. I may choose to make this an error in the future and require matching device to device.

    This diff also found a bug when passing GPU data directly to train() for GpuIndexIVFFlat and GpuIndexIVFScalarQuantizer, as I guess we never tested passing GPU data directly to these functions before. GpuIndexIVFPQ was doing the right thing however.

    The assign function is now also implemented on the GPU as well, and is now marked const to be in line with the search function.

    Also added better checking of non-contiguous inputs for both Torch tensors and numpy ndarrays.

    Updated the knn_gpu function with a base implementation always present that allows for usage of numpy arrays, which is overridden when torch_utils is imported to allow torch usage. This supports row/column major layout, float32/float16 data and int64/int32 indices for both numpy and torch.

    Reviewed By: mdouze

    Differential Revision: D24299400

    CLA Signed fb-exported 
    opened by wickedfoo 21
  • GPU issue when installing from conda

    GPU issue when installing from conda

    Summary

    I install Faiss from conda (GPU version) image

    And I got ImportError: No module named 'swigfaiss' Could you guys help me out? Did I forget anything?

    Platform

    OS: Ubuntu

    Faiss version:

    Faiss compilation options:

    Running on :

    • [ ] CPU
    • [x] GPU

    Reproduction instructions

    image

    GPU install 
    opened by hminle 20
  • Speedup exhaustive_L2sqr_blas for AVX2, ARM NEON and AVX512

    Speedup exhaustive_L2sqr_blas for AVX2, ARM NEON and AVX512

    Summary: Add a fused kernel for exhaustive_L2sqr_blas() call that combines a computation of dot product and the search for the nearest centroid. As a result, no temporary dot product values are written and read in RAM.

    Significantly speeds up the training of PQx[1] indices for low-dimensional PQ vectors ( 1, 2, 4, 8 ), and the effect is higher for higher values of [1]. AVX512 provides additional overloads for dimensionality of 12 and 16.

    The speedup is also beneficial for higher values of pq.cp.max_points_per_centroid (which is 256 by default).

    Speeds up IVFPQ training as well.

    AVX512 kernel is not enabled, but I've seen it speeding up the training TWICE versus AVX2 version. So, please feel free to use it by enabling AVX512 manually.

    Differential Revision: D41166766

    CLA Signed fb-exported 
    opened by alexanderguzhva 18
  • Does Faiss support searching from Disk?

    Does Faiss support searching from Disk?

    I checked this issue[#552] and also this demo file. But when I checked the demo file, it was not for searching from disk, The demo file was about how save an trained index and load the index to memory for searching. Does Faiss really support searching from disk? If it does, could you let me know where I can refer to do it.

    question 
    opened by sam3oh5 18
  • _swigfaiss_avx2.so may not be loaded properly in conda

    _swigfaiss_avx2.so may not be loaded properly in conda

    Summary

    When I install faiss via conda, IndexPQFastScan is slower than IndexPQ. It seems that AVX2 is not activated properly because _swigfaiss_avx2.so is not loaded correctly.

    Platform

    OS: Ubuntu 20.04 on AWS EC2. (ami-0e039c7d64008bd84, c5.large)

    Faiss version: faiss-cpu 1.7.0 (pytorch/linux-64::faiss-cpu-1.7.0-py3.8_h2a577fa_0_cpu)

    Installed from: conda install -c pytorch faiss-cpu

    Faiss compilation options:

    Running on:

    • [x] CPU
    • [ ] GPU

    Interface:

    • [ ] C++
    • [x] Python

    Reproduction instructions

    I found that IndexPQFastScan is slower than IndexPQ for faiss 1.7.0 installed from conda. Here is the benchmark code.

    import faiss
    import numpy as np
    import time
    
    np.random.seed(123)
    D = 128
    N = 1000
    X = np.random.random((N, D)).astype(np.float32)
    M = 64
    nbits = 4
    
    pq = faiss.IndexPQ(D, M, nbits)
    pq.train(X)
    pq.add(X)
    
    pq_fast = faiss.IndexPQFastScan(D, M, nbits)
    pq_fast.train(X)
    pq_fast.add(X)
    
    t0 = time.time()
    d1, ids1 = pq.search(x=X[:3], k=5)
    t1 = time.time()
    print(f"pq: {(t1 - t0) * 1000} msec")
    
    t0 = time.time()
    d2, ids2 = pq_fast.search(x=X[:3], k=5)
    t1 = time.time()
    print(f"pq_fast: {(t1 - t0) * 1000} msec")
    
    assert np.allclose(ids1, ids2)
    

    The result is:

    pq: 0.4680156707763672 msec
    pq_fast: 1.6791820526123047 msec
    

    After investigating, the cause seems that _swigfaiss_avx2.so is not loaded correctly. If I rename _swigfaiss_avx2.so to _swigfaiss.so, the above code works as expected:

    cd ~/anaconda/lib/python3.8/site-packages/faiss/
    mv _swigfaiss.so _swigfaiss.so.bk
    mv _swigfaiss_avx2.so _swigfaiss.so
    

    Then the benchmark results in:

    pq: 0.8258819580078125 msec
    pq_fast: 0.07104873657226562 msec
    

    Here, IndexPQFastScan becomes much faster.

    The root cause seems that swigfaiss.py is somehow exactly the same as swigfaiss_avx2.py.

    diff swigfaiss.py swigfaiss_avx2.py     # same
    

    If I understand correctly, swigfaiss_avx2.py must load _swigfaiss_avx2.so. But currently swigfaiss_avx2.py is the same as swigfaiss.py and loads _swigfaiss.so.

    install 
    opened by matsui528 16
  • Indexing 1B vectors by creating smaller indexes on batches and merging them

    Indexing 1B vectors by creating smaller indexes on batches and merging them

    Need guidance...

    We'll have an application where we will stream a set of vectors (on the order of a billion). We cannot wait until we collect all the vectors to train an index (you recommend IMI at this scale). We are thinking of building indexes for smaller batches of vectors... once we have a batch ready, we could train the index from a sample, create an index for the batch and in the end merge all the indexes. I understand only IVF supports merging of indexes, wanted your thoughts on this approach.

    Thanks

    question GPU 
    opened by mvss80 16
  • CUDA 9 issue: results of GPU Index are not right?

    CUDA 9 issue: results of GPU Index are not right?

    1. The result of GPU index is not the same as CPU, even although on the same dateset with the same index

    import numpy as np
    d = 64                           # dimension
    nb = 100000                      # database size
    nq = 10000                       # nb of queries
    np.random.seed(1234)             # make reproducible
    xb = np.random.random((nb, d)).astype('float32')
    xb[:, 0] += np.arange(nb) / 1000.
    xq = np.random.random((nq, d)).astype('float32')
    xq[:, 0] += np.arange(nq) / 1000.
    #=================================================================
    import faiss                   # make faiss available
    index = faiss.IndexFlatL2(d)   # build the index
    index.add(xb)                  # add vectors to the index
    k = 4                          # we want to see 4 nearest neighbors
    D, I = index.search(xq, k)     # actual search
    print I[-5:]                # neighbors of the 5 last queries
    print D[-5:]
    
    del index, D, I
    #=================================================================
    print "================="
    index = faiss.IndexFlatL2(d)   # build the index
    res = faiss.StandardGpuResources()
    index = faiss.index_cpu_to_gpu(res, 0, index)
    index.add(xb)                  # add vectors to the index
    k = 4                          # we want to see 4 nearest neighbors
    D, I = index.search(xq, k)     # actual search
    print I[-5:]                # neighbors of the 5 last queries
    print D[-5:]
    
    del index, D, I
    
    exit(1)
    

    The result is

    [[ 9900 10500  9309  9831]
     [11055 10895 10812 11321]
     [11353 11103 10164  9787]
     [10571 10664 10632  9638]
     [ 9628  9554 10036  9582]]
    [[ 6.53157043  6.97875977  7.00392151  7.01379395]
     [ 4.33526611  5.23693848  5.31942749  5.70327759]
     [ 6.07269287  6.57675171  6.61395264  6.7322998 ]
     [ 6.63751221  6.64874268  6.85787964  7.00964355]
     [ 6.21836853  6.45251465  6.54876709  6.58129883]]
    =================
    number of GPUs: 1
    [[10500 10500  9831  9831]
     [10895 10895 10812 11321]
     [11103 11103  9787  9787]
     [10632 10632  9638  9638]
     [ 9628  9554  9582  9582]]
    [[ 6.53156281  6.97874451  7.00393677  7.01376343]
     [ 4.33531189  5.23696899  5.31942749  5.70326233]
     [ 6.07269287  6.57672119  6.61393738  6.73226929]
     [ 6.63748169  6.64871216  6.85783386  7.00959778]
     [ 6.21837616  6.45251465  6.54875183  6.58128357]]
    

    The result of the GPU index and CPU index are not the same

    2. Duplicate items in the GPU result

    As the result shown above, there are duplicate ids in the result but with different distances, like [10500 10500 9831 9831].

    Could someone tell me what is the problem and how to fix it, THX!

    bug GPU 
    opened by DrLai12club 16
  • GpuIndexFlatL2 doesn't produce distances for the last 8 queries

    GpuIndexFlatL2 doesn't produce distances for the last 8 queries

    Platform

    OS: Windows 10 Faiss version: 1.7.3

    Installed from: Compiled using Visual Studio 17 2022

    Faiss compilation options: Using MKL 2202.2.1

    Cuda version: 12.0.0

    GPU: GTX 1060

    Running on:

    • [X] CPU
    • [X] GPU

    Interface:

    • [X] C++
    • [ ] Python

    Reproduction instructions

    Using the test file linked below, faiss makes a CPU index and a GPU index. Then performs a query search on the first 1000 vectors from a 100000 vector database. Code copied directly from 1-Flat for the CPU portion, and 4-GPU for the GPU portion.

    Consistently, the last 8 vectors from the distance matrix are all 0's. Whether querying 1000 elements, or 10000 elements, it's only the last 8 elements.

    6-GPU-CPU.zip

    Output of the program is as follows:

    Building data
    Make index
    is_trained = true
    ntotal = 100000
    I (5 first results)=
        0   723   254   152   403    92   368  1129   673   571
        1   995   136   183   223   555   880   671     5    68
        2   312   253    29   124   148   112   718   713   260
        3   983   467    88   786   327   326   684   367  1053
        4   403   112   643   430   679   142   733   119   382
    I (10 last results)=
      990   962  2284   863  1133  1683  1463  2339  1730  2228
      991  1026   995   540  1396   365  1348  1271  1861   975
      992   257   163   135  1489  1315   878  1017   219   777
      993  1331   210  1362   286   444  1329   608  1191   986
      994   155   134   631   469  1044   388  1042   766  1561
      995   511     1   664   991  1800   689    37   634   631
      996   770  1043   827  1264  1310  1828  1504  1535   876
      997  1288   920   742  1432   840  1174  1337  1041  1113
      998   689  1044   810  1229  2199  1448  2112  1888  1442
      999  1722   901  1161  1044  1251   505  1310   791   308
    D (10 last results)=
          0 6.46885 6.56971 6.80382 7.19488 7.25274 7.44602 7.56737 7.75592  7.8215
          0 5.75124 5.96521 6.00626 6.17735  6.6787 6.74106 6.87712 6.89094 6.89425
          0 5.82659 6.08222 6.16805 6.19852 6.25793 6.56962 6.60474 6.71429 6.72893
          0 6.79663 6.83468  6.9018 6.90929 7.06563 7.07221 7.15147 7.18442 7.20781
          0 6.02754 6.53414 6.62136 6.73151 6.83076 6.85785 6.86768 6.87643 6.89012
          0 5.52238 5.78548 5.80803 5.96521 5.97704 6.12522  6.1321 6.18419 6.51028
          0 5.73736 6.25742 6.38132 6.43517 6.63315 6.70425 6.81538 6.84794  6.8531
          0 6.59953 6.84864 7.11777 7.33908 7.38752 7.39641 7.48399 7.52819 7.60603
          0 5.54166 5.68894 5.72082 5.98355 6.49582 6.52649  6.5502 6.66038 6.66049
          0 6.26311 6.37093 6.39842 6.62256 6.73258 6.82148 6.83769 6.84539 6.91491
    is_trained = true
    ntotal = 100000
    I (5 first results)=
        0   723   254   152   403    92   368  1129   673   571
        1   995   136   183   223   555   880   671     5    68
        2   312   253    29   124   148   112   718   713   260
        3   983   467    88   786   327   326   684   367  1053
        4   403   112   643   430   679   142   733   119   382
    I (10 last results)=
      990   962  2284   863  1133  1683  1463  2339  1730  2228
      991  1026   995   540  1396   365  1348  1271  1861   975
      992   257   163   135  1489  1315   878  1017   219   777
      993  1331   210  1362   286   444  1329   608  1191   986
      994   155   134   631   469  1044   388  1042   766  1561
      995   511     1   664   991  1800   689    37   634   631
      996   770  1043   827  1264  1310  1828  1504  1535   876
      997  1288   920   742  1432   840  1174  1337  1041  1113
      998   689  1044   810  1229  2199  1448  2112  1888  1442
      999  1722   901  1161  1044  1251   505  1310   791   308
    D (10 last results)=
    7.62939e-06 6.46885 6.56971 6.80381 7.19488 7.25273 7.44602 7.56738 7.75592  7.8215
          0 5.75124 5.96521 6.00626 6.17735 6.67871 6.74106 6.87711 6.89094 6.89426
          0       0       0       0       0       0       0       0       0       0
          0       0       0       0       0       0       0       0       0       0
          0       0       0       0       0       0       0       0       0       0
          0       0       0       0       0       0       0       0       0       0
          0       0       0       0       0       0       0       0       0       0
          0       0       0       0       0       0       0       0       0       0
          0       0       0       0       0       0       0       0       0       0
          0       0       0       0       0       0       0       0       0       0
    
    cant-repro GPU 
    opened by JulianThijssen 1
  • error in cpu_to_gpu_multiple for 30 million data

    error in cpu_to_gpu_multiple for 30 million data

    Summary

    I am using the code from bench_gpu_1bn.py where I slightly modified the code for getting distance as cosine similarity. In method 'train_coarse_quantizer' I change the index to FlatIP as the following:

    index = faiss.index_cpu_to_gpu_multiple( vres, vdev, faiss.IndexFlatIP(d))

    also in the method 'prepare_trained_index' I made changes the following:

    idx_model = faiss.IndexIVFFlat(coarse_quantizer, d, ncent, faiss.METRIC_INNER_PRODUCT)

    and I am using the provided 'altadd' flag which is using 'compute_populated_index_2' method underneath. Everything was working fine for up to 20 million data and we are getting the expected results. But the following happens when we try to input 30 million data. It fails while executing the following:

    index = faiss.index_cpu_to_gpu_multiple( vres, vdev, indexall, co)

    (this is from the method 'get_populated_index')

    and the error is shown as :

    IndexShards shard 0 select modulo 8 = 0 Faiss assertion 'accu_n == ntotal' failed in virtual void faiss::IndexIVF::copy_subset_to(faiss::IndexIVF&, int, faiss::Index::idx_t, faiss::Index::idx_t) const at /root/miniconda3/conda-bld/faiss-pkg_1641228905850/work/faiss/IndexIVF.cpp:1083

    I have checked the input data with numpy.isnan() and numpy.isinf() methods and no 'nan' or 'inf' found in the input data.

    Platform

    OS: Linux, Nvidia DGX Servers A100,40GB.

    Faiss version: installed via conda (latest pytorch gpu version)

    Running on:

    • [ ] GPU

    Interface:

    • [ ] Python
    cant-repro 
    opened by Das-Aritra 2
  • bug: Intel MKL ERROR: Parameter 3 was incorrect on entry to SGEMM .  crash on win10

    bug: Intel MKL ERROR: Parameter 3 was incorrect on entry to SGEMM . crash on win10

    Summary

    Run the official demo: demo_weighted_kmean in faiss::knn_L2sqr(xq, xb, d, nq, nb, k, unused.data(), gt.data()); The following error occurs. how can i fix it? thank you very much !

    [0.000 s] samping dataset of 128 dim vectors, Q 100 B 10000 T 10000
    
    Intel MKL ERROR: Parameter 3 was incorrect on entry to SGEMM .
    [0.021 s] compute ground truth, k=10
    
    Intel MKL ERROR: Parameter 3 was incorrect on entry to SGEMM .
    
    Intel MKL ERROR: Parameter 3 was incorrect on entry to SGEMM .
    
    Intel MKL ERROR: Parameter 3 was incorrect on entry to SGEMM .
    
    Intel MKL ERROR: Parameter 3 was incorrect on entry to SGEMM .
    

    MKL 2023.0.0 faiss 1.7.3

    OS: <Windows 10>

    Running on:

    • [x] CPU
    • [ ] GPU

    Interface:

    • [x] C++
    • [ ] Python
    install 
    opened by scd8340 4
  • Cannot specify link libraries for target

    Cannot specify link libraries for target "CUDA::nvptxcompiler_static" when build from source.

    Summary

    When I build faiss from source, I meet below error, how can I fix it?

    Platform

    OS: Ubuntu 20.04, run in docker: nvcr.io/nvidia/tensorrt : 21.08-py3 Cuda on OS: 12.0, cuda inside docker: 11.4

    Faiss version: 4ee67aefc21859e02388f32f4876a0cbd21cabd

    Installed from: compliled from source

    Faiss compilation options: use openblass with cmd: cmake -B build .

    Running on:

    • [ ] CPU
    • [x] GPU

    Interface:

    • [ ] C++
    • [x] Python

    Reproduction instructions

    inside docker container: Installl openblas

    apt-get install libopenblas-dev
    

    clone and run cmd in cloned folder:

    cmake -B build .
    

    Logs:

    -- Could NOT find MKL (missing: MKL_LIBRARIES) 
    -- Looking for sgemm_
    -- Looking for sgemm_ - found
    -- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so  
    -- Looking for cheev_
    -- Looking for cheev_ - found
    -- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;-lpthread;-lm;-ldl  
    -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.4.100") 
    CMake Error at /usr/local/lib/python3.8/dist-packages/cmake/data/share/cmake-3.25/Modules/FindCUDAToolkit.cmake:1063 (target_link_libraries):
      Cannot specify link libraries for target "CUDA::nvptxcompiler_static" which
      is not built by this project.
    Call Stack (most recent call first):
      faiss/gpu/CMakeLists.txt:180 (find_package)
    
    
    -- Configuring incomplete, errors occurred!
    See also "/opt/faiss/build/CMakeFiles/CMakeOutput.log".
    See also "/opt/faiss/build/CMakeFiles/CMakeError.log".
    

    In /opt/faiss/build/CMakeFiles/CMakeError.log:

    Performing C++ SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output:
    Change Dir: /opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-MxbOpg
    
    Run Build Command(s):/usr/bin/make -f Makefile cmTC_d9ea7/fast && /usr/bin/make  -f CMakeFiles/cmTC_d9ea7.dir/build.make CMakeFiles/cmTC_d9ea7.dir/build
    make[1]: Entering directory '/opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-MxbOpg'
    Building CXX object CMakeFiles/cmTC_d9ea7.dir/src.cxx.o
    /usr/bin/c++ -DCMAKE_HAVE_LIBC_PTHREAD  -std=gnu++11 -o CMakeFiles/cmTC_d9ea7.dir/src.cxx.o -c /opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-MxbOpg/src.cxx
    Linking CXX executable cmTC_d9ea7
    /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_d9ea7.dir/link.txt --verbose=1
    /usr/bin/c++ CMakeFiles/cmTC_d9ea7.dir/src.cxx.o -o cmTC_d9ea7
    /usr/bin/ld: CMakeFiles/cmTC_d9ea7.dir/src.cxx.o: in function `main':
    src.cxx:(.text+0x46): undefined reference to `pthread_create'
    /usr/bin/ld: src.cxx:(.text+0x52): undefined reference to `pthread_detach'
    /usr/bin/ld: src.cxx:(.text+0x5e): undefined reference to `pthread_cancel'
    /usr/bin/ld: src.cxx:(.text+0x6f): undefined reference to `pthread_join'
    collect2: error: ld returned 1 exit status
    make[1]: *** [CMakeFiles/cmTC_d9ea7.dir/build.make:99: cmTC_d9ea7] Error 1
    make[1]: Leaving directory '/opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-MxbOpg'
    make: *** [Makefile:127: cmTC_d9ea7/fast] Error 2
    
    
    Source file was:
    #include <pthread.h>
    
    static void* test_func(void* data)
    {
      return data;
    }
    
    int main(void)
    {
      pthread_t thread;
      pthread_create(&thread, NULL, test_func, NULL);
      pthread_detach(thread);
      pthread_cancel(thread);
      pthread_join(thread, NULL);
      pthread_atfork(NULL, NULL, NULL);
      pthread_exit(NULL);
    
      return 0;
    }
    
    Determining if the function pthread_create exists in the pthreads failed with the following output:
    Change Dir: /opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-RCuhqj
    
    Run Build Command(s):/usr/bin/make -f Makefile cmTC_c4543/fast && /usr/bin/make  -f CMakeFiles/cmTC_c4543.dir/build.make CMakeFiles/cmTC_c4543.dir/build
    make[1]: Entering directory '/opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-RCuhqj'
    Building CXX object CMakeFiles/cmTC_c4543.dir/CheckFunctionExists.cxx.o
    /usr/bin/c++   -DCHECK_FUNCTION_EXISTS=pthread_create -std=gnu++11 -o CMakeFiles/cmTC_c4543.dir/CheckFunctionExists.cxx.o -c /opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-RCuhqj/CheckFunctionExists.cxx
    Linking CXX executable cmTC_c4543
    /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_c4543.dir/link.txt --verbose=1
    /usr/bin/c++  -DCHECK_FUNCTION_EXISTS=pthread_create CMakeFiles/cmTC_c4543.dir/CheckFunctionExists.cxx.o -o cmTC_c4543  -lpthreads
    /usr/bin/ld: cannot find -lpthreads
    collect2: error: ld returned 1 exit status
    make[1]: *** [CMakeFiles/cmTC_c4543.dir/build.make:99: cmTC_c4543] Error 1
    make[1]: Leaving directory '/opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-RCuhqj'
    make: *** [Makefile:127: cmTC_c4543/fast] Error 2
    
    
    
    Determining if the function sgemm_ exists failed with the following output:
    Change Dir: /opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-7EpCNh
    
    Run Build Command(s):/usr/bin/make -f Makefile cmTC_33a0f/fast && /usr/bin/make  -f CMakeFiles/cmTC_33a0f.dir/build.make CMakeFiles/cmTC_33a0f.dir/build
    make[1]: Entering directory '/opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-7EpCNh'
    Building CXX object CMakeFiles/cmTC_33a0f.dir/CheckFunctionExists.cxx.o
    /usr/bin/c++   -DCHECK_FUNCTION_EXISTS=sgemm_ -std=gnu++11 -o CMakeFiles/cmTC_33a0f.dir/CheckFunctionExists.cxx.o -c /opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-7EpCNh/CheckFunctionExists.cxx
    Linking CXX executable cmTC_33a0f
    /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_33a0f.dir/link.txt --verbose=1
    /usr/bin/c++  -DCHECK_FUNCTION_EXISTS=sgemm_ CMakeFiles/cmTC_33a0f.dir/CheckFunctionExists.cxx.o -o cmTC_33a0f
    /usr/bin/ld: CMakeFiles/cmTC_33a0f.dir/CheckFunctionExists.cxx.o: in function `main':
    CheckFunctionExists.cxx:(.text+0x14): undefined reference to `sgemm_'
    collect2: error: ld returned 1 exit status
    make[1]: *** [CMakeFiles/cmTC_33a0f.dir/build.make:99: cmTC_33a0f] Error 1
    make[1]: Leaving directory '/opt/faiss/build/CMakeFiles/CMakeScratch/TryCompile-7EpCNh'
    make: *** [Makefile:127: cmTC_33a0f/fast] Error 2
                                                      
    
    install 
    opened by NguyenVanThanhHust 0
  • benchmark for hybrid CPU / GPU search

    benchmark for hybrid CPU / GPU search

    Summary: A few experiments with the new GPU support for hybrid search and the precomputed centroids functionality.

    See results here:

    https://docs.google.com/document/d/1d3YuV8uN7hut6aOATCOMx8Ut-QEl_oRnJdPgDBRF1QA/edit#bookmark=id.8qjs2u1ddus4

    Differential Revision: D41301032

    CLA Signed fb-exported 
    opened by mdouze 1
  • segment fault for standalone OPQMatrix training with dimension 768

    segment fault for standalone OPQMatrix training with dimension 768

    Summary

    When running a OPQMatrix training for even a single iteration, I encounter a segmentation fault (or invalid pointer for free()), when the matrix dimension goes up from 512 to 768.

    Platform

    • Ubuntu 18.04.6 LTS (Bionic Beaver)

    • Python 3.7.12

    • 4 CPUs only in a jupyterhub environment.

    Faiss version:

    faiss-gpu==1.7.2

    Installed from:

    pip install faiss-gpu==1.7.2

    Faiss compilation options:

    N/A

    Running on:

    • [x] CPU
    • [ ] GPU

    Interface:

    • [ ] C++
    • [x] Python

    Reproduction instructions

    import faiss
    import numpy as np
    d, m, num_examples = 768, 32, 1<<8
    opq = faiss.OPQMatrix(d, m)
    opq.niter = 1
    rng = np.random.default_rng()
    xb = rng.standard_normal((num_examples, d), dtype=np.float32)
    opq.train(xb)
    
    install 
    opened by yunjiangster 1
Releases(v1.7.3)
  • v1.7.3(Nov 30, 2022)

    Added

    • Sparse k-means routines and moved the generic kmeans to contrib
    • FlatDistanceComputer for all FlatCodes indexes
    • Support for fast accumulation of 4-bit LSQ and RQ
    • Product additive quantization support
    • Support per-query search parameters for many indexes + filtering by ids
    • write_VectorTransform and read_vectorTransform were added to the public API (by @AbdelrahmanElmeniawy)
    • Support for IDMap2 in index_factory by adding "IDMap2" to prefix or suffix of the input String (by @AbdelrahmanElmeniawy)
    • Support for merging all IndexFlatCodes descendants (by @AbdelrahmanElmeniawy)
    • Remove and merge features for IndexFastScan (by @AbdelrahmanElmeniawy)
    • Performance improvements: 1) specialized the AVX2 pieces of code speeding up certain hotspots, 2) specialized kernels for vector codecs (this can be found in faiss/cppcontrib)

    Fixed

    • Fixed memory leak in OnDiskInvertedLists::do_mmap when the file is not closed (by @AbdelrahmanElmeniawy)
    • LSH correctly throws error for metric types other than METRIC_L2 (by @AbdelrahmanElmeniawy)
    Source code(tar.gz)
    Source code(zip)
  • v1.7.2(Jan 10, 2022)

    ADDED

    • Support LSQ on GPU (by @KinglittleQ)
    • Support for exact 1D kmeans (by @KinglittleQ)
    • LUT-based search for additive quantizers
    • Autogenerated Python docstrings from Doxygen comments

    CHANGED

    • Cleanup of index_factory parsing
    Source code(tar.gz)
    Source code(zip)
  • v1.6.4(Oct 22, 2020)

    Features

    • Arbitrary dimensions per sub-quantizer now allowed for GpuIndexIVFPQ.
    • Brute-force kNN on GPU (bfKnn) now accepts int32 indices.
    • Faiss CPU now supports Windows. Conda packages are available from the nightly channel.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Jun 24, 2019)

    Bugfixes:

    • slow scanning of inverted lists (#836).

    Features:

    • add basic support for 6 new metrics in CPU IndexFlat and IndexHNSW (#848);
    • add support for IndexIDMap/IndexIDMap2 with binary indexes (#780).

    Misc:

    • throw python exception for OOM (#758);
    • make DistanceComputer available for all random access indexes;
    • gradually moving from long to int64_t for portability.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(May 30, 2019)

    The license was changed from BSD+Patents to MIT.

    Changelog:

    • propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas;
    • support for searching several inverted lists in parallel (parallel_mode != 0);
    • better support for PQ codes where nbit != 8 or 16;
    • IVFSpectralHash implementation: spectral hash codes inside an IVF;
    • 6-bit per component scalar quantizer (4 and 8 bit were already supported);
    • combinations of inverted lists: HStackInvertedLists and VStackInvertedLists;
    • configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch);
    • more test and demo code compatible with Python 3 (print with parentheses);
    • refactored benchmark code: data loading is now in a single file.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(May 30, 2019)

    Changelog:

    • a MatrixStats object, which reports useful statistics about a dataset;
    • an option to round coordinates during k-means optimization;
    • an alternative option for search in HNSW;
    • moved stats() and imbalance_factor() from IndexIVF to InvertedLists object;
    • range search is now available for IVFScalarQuantizer;
    • support for direct uint_8 codec in ScalarQuantizer;
    • renamed IndexProxy to IndexReplicas (now ;
    • better support for PQ code assignment with external index;
    • support for IMI2x16 (4B virtual centroids!);
    • support for k = 2048 search on GPU (instead of 1024);
    • most CUDA mem alloc failures now throw exceptions instead of terminating on an assertion;
    • support for renaming an ondisk invertedlists;
    • interrupt computations with interrupt signal (ctrl-C) in python;
    • simplified build system (with --with-cuda/--with-cuda-arch options);
    • updated example Dockerfile;
    • conda packages now depend on the cudatoolkit packages, which fixes some interferences with pytorch. Consequentially, faiss-gpu should now be installed by conda install -c pytorch faiss-gpu cudatoolkit=10.0.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(May 30, 2019)

  • v1.4.0(Aug 31, 2018)

    Faiss 1.4.0

    Features:

    • automatic tracking of C++ references in Python
    • non-intel platforms supported -- some functions optimized for ARM
    • override nprobe for concurrent searches
    • support for floating-point quantizers in binary indexes

    Bug fixes:

    • no more segfaults in python (I know it's the same as the first feature but it's important!)
    • fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims
    • fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple

    The Python interface of Faiss closely mimics the C++ interface. This means that all C++ functions, objects, fields and methods are visible and accessible in Python. This is done thanks to SWIG, that automatically generates Python classes from the C++ headers. The downside is that this low-level access means that there is no automatic tracking of C++ references in Python. For example:

    index = IndexIVFFlat(IndexFlatL2(10), 10, 100) 
    

    would crash. Python does not know that the IndexFlatL2 is referenced by the IndexIVFFlat, so the garbage collector deallocates the IndexFlatL2 while IndexIVFFlat still references it. In Faiss 1.4.0, we added code to all such constructors that adds a Python-level reference to the object and prevents deallocation. With this upgrade, there should be no crashes in pure Python any more, you can report them right away as issues.

    Faiss was developed on 64-bit x86 platforms, Linux and Mac OS. There were quite a few locations in the code that shamelessly assumed that they were compiled with SSE support. Faiss 1.4.0 is portable to other hardware, it has pure C++ code for all operations, and SSE/AVX is only enabled if the appropriate macro are set. This was tested on an ARM platform and also a few operations were optimized for the ARM SIMD operations (in utils_simd.cpp).

    To compile on a non-x86 platform, you will need to provide a BLAS library (OpenBLAS works for aarch64) and remove x86-specific flags from the makefile.inc (manually for now). Faiss is not portable to other compilers than g++/clang though.

    The search-time parameters like nprobe for IndexIVF are set in the index object. What if you want to perform concurrent searches from several threads with different search parameters? This was not possible so far. Now there is an IVFSearchParameters object that can override the parameters set at the object level. See tests/test_params_override.cpp

    Faiss' support for binary indexes is recent, and not so many index types are supported. To work around this, we added IndexBinaryFromFloat, a binary index that wraps around any floating-point index. This makes it possible, for example, to use an IndexHNSW as a quantizer for an IndexBinaryIVF. See tests/test_index_binary_from_float.py

    We also fixed a few bugs that correspond to github issues.

    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Jul 12, 2018)

    Features:

    • Support for binary indexes (IndexBinaryFlat, IndexBinaryIVF)
    • Support fp16 encoding in scalar quantizer
    • Support for deduplication in IndexIVFFlat
    • Support for index serialization

    Bugs:

    • Fix MMAP bug for normal indexes
    • Fix propagation of io_flags in read func
    • Fix k-selection for CUDA 9
    • Fix race condition in OnDiskInvertedLists
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Mar 1, 2018)

Owner
Facebook Research
Facebook Research
SOINN / 聚类 / 无监督聚类 / 快速 / clustering / unsupervised clustering / fast

____ ___ ___ _ _ _ _ / ___| / _ \_ _| \ | | \ | | \___ \| | | | || \| | \| | ___) | |_| | || |\ | |\ | |____/ \___/___|_| \_|_| \_| SOIN

lfs 12 Aug 4, 2022
Open-source vector similarity search for Postgres

Open-source vector similarity search for Postgres

Andrew Kane 712 Jan 7, 2023
faiss serving :)

faiss-server faiss-server provides gRPC services to for similarity search using faiss. It is written in C++ and now supports only CPU environments. In

null 111 Dec 9, 2022
Fast and robust template matching with majority neighbour similarity and annulus projection transformation

A-MNS_TemplateMatching This is the official code for the PatternRecognition2020 paper: Fast and robust template matching with majority neighbour simil

Layjuns 22 Dec 30, 2022
TheMathU Similarity Index App will accept a mathematical problem as user input and return a list of similar problems that have memorandums.

Technologies MathU Similarity Index - Segmentation Cult The MathU Similarity Index App accepts a mathematical problem as user input and returns a list

COS 301 - 2022 7 Nov 2, 2022
Super paramagnetic Clustering - Marcelo Blatt, Shai Wiseman, and Eytan Domany (1996)

SPC: Super Paramagnetic Clustering Documentation The file README.PDF includes: installation instructions, example runs, file formats and parameter def

null 2 Aug 30, 2022
Tandem - [CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo Lukas Koestler1*    Nan Yang1,2*,†    Niclas Zeller2,3    Daniel Cremers1

TUM Computer Vision Group 742 Dec 31, 2022
A 3D DNN-based Metric Semantic Dense Mapping pipeline and a Visual Inertial SLAM system

MSDM-SLAM This repository represnets a 3D DNN-based Metric Semantic Dense Mapping pipeline and a Visual Inertial SLAM system that can be run on a grou

ITMO Biomechatronics and Energy Efficient Robotics Laboratory 11 Jul 23, 2022
Dense Depth Estimation from Multiple 360-degree Images Using Virtual Depth

Dense Depth Estimation from Multiple 360-degree Images Using Virtual Depth [Project] [Paper] [arXiv] This is the official code of our APIN 2022 paper

null 8 Nov 7, 2022
nanoflann: a C++11 header-only library for Nearest Neighbor (NN) search with KD-trees

nanoflann 1. About nanoflann is a C++11 header-only library for building KD-Trees of datasets with different topologies: R2, R3 (point clouds), SO(2)

Jose Luis Blanco-Claraco 1.7k Dec 25, 2022
This code accompanies the paper "Human-Level Performance in No-Press Diplomacy via Equilibrium Search".

Diplomacy SearchBot This code accompanies the paper "Human-Level Performance in No-Press Diplomacy via Equilibrium Search". A very brief orientation:

Facebook Research 34 Dec 20, 2022
Ncnn version demo of [CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search (ncnn) The official implementation by pytorch: ht

null 34 Dec 26, 2022
An efficient C++17 GPU numerical computing library with Python-like syntax

MatX - Matrix Primitives Library MatX is a modern C++ library for numerical computing on NVIDIA GPUs. Near-native performance can be achieved while us

NVIDIA Corporation 625 Jan 1, 2023
Deploy SCRFD, an efficient high accuracy face detection approach, in your web browser with ncnn and webassembly

ncnn-webassembly-scrfd open https://nihui.github.io/ncnn-webassembly-scrfd and enjoy build and deploy Install emscripten

null 42 Nov 16, 2022
Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

Triton - a language and compiler for writing highly efficient custom Deep-Learning primitives.

OpenAI 4.6k Dec 26, 2022
A hierarchical parameter server framework based on MXNet. GeoMX also implements multiple communication-efficient strategies.

Introduction GeoMX is a MXNet-based two-layer parameter server framework, aiming at integrating data knowledge that owned by multiple independent part

null 86 Oct 21, 2022
This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

null 33 Jun 27, 2021
An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks

AnalyticMesh Analytic Marching is an exact meshing solution from neural networks. Compared to standard methods, it completely avoids geometric and top

Jiabao Lei 45 Dec 21, 2022