Open-source vector similarity search for Postgres

Overview

pgvector

Open-source vector similarity search for Postgres

CREATE TABLE table (column vector(3));
CREATE INDEX ON table USING ivfflat (column);
SELECT * FROM table ORDER BY column <-> '[1,2,3]' LIMIT 5;

Supports L2 distance, inner product, and cosine distance

Build Status

Installation

Compile and install the extension (supports Postgres 9.6+)

git clone --branch v0.1.0 https://github.com/ankane/pgvector.git
cd pgvector
make
make install # may need sudo

Then load it in databases where you want to use it

CREATE EXTENSION vector;

Getting Started

Create a vector column with 3 dimensions (replace table and column with non-reserved names)

CREATE TABLE table (column vector(3));

Insert values

INSERT INTO table VALUES ('[1,2,3]'), ('[4,5,6]');

Get the nearest neighbor by L2 distance

SELECT * FROM table ORDER BY column <-> '[3,1,2]' LIMIT 1;

Also supports inner product (<#>) and cosine distance (<=>)

Note: <#> returns the negative inner product since Postgres only supports ASC order index scans on operators

Indexing

Speed up queries with an approximate index. Add an index for each distance function you want to use.

L2 distance

CREATE INDEX ON table USING ivfflat (column);

Inner product

CREATE INDEX ON table USING ivfflat (column vector_ip_ops);

Cosine distance

CREATE INDEX ON table USING ivfflat (column vector_cosine_ops);

Indexes should be created after the table has data for optimal clustering. Also, unlike typical indexes which only affect performance, you may see different results for queries after adding an approximate index.

Index Options

Specify the number of inverted lists (100 by default)

CREATE INDEX ON table USING ivfflat (column) WITH (lists = 100);

Query Options

Specify the number of probes (1 by default)

SET ivfflat.probes = 1;

A higher value improves recall at the cost of speed.

Use SET LOCAL inside a transaction to set it for a single query

BEGIN;
SET LOCAL ivfflat.probes = 1;
SELECT ...
COMMIT;

Reference

Vector Type

Each vector takes 4 * dimensions + 8 bytes of storage. Each element is a float, and all elements must be finite (no NaN, Infinity or -Infinity). Vectors can have up to 1024 dimensions.

Vector Operators

Operator Description
+ element-wise addition
- element-wise subtraction
<-> Euclidean distance
<#> negative inner product
<=> cosine distance

Vector Functions

Function Description
cosine_distance(vector, vector) cosine distance
inner_product(vector, vector) inner product
l2_distance(vector, vector) Euclidean distance
vector_dims(vector) number of dimensions
vector_norm(vector) Euclidean norm

Libraries

Libraries that use pgvector:

Thanks

Thanks to:

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/pgvector.git
cd pgvector
make
make install

To run all tests:

make installcheck        # regression tests
make prove_installcheck  # TAP tests

To run single tests:

make installcheck REGRESS=vector                  # regression test
make prove_installcheck PROVE_TESTS=t/001_wal.pl  # TAP test

Directories

  • expected - expected output for regression tests
  • sql - regression tests
  • t - TAP tests

Resources for contributors

Comments
  • bug? - unexpected data beyond EOF in block

    bug? - unexpected data beyond EOF in block

    While using pgvector on a table with frequent updates / inserts on Postgres 14 on macOS on Intel, I've been encountering this error frequently on UPDATES:

    psycopg2.errors.InternalError_: unexpected data beyond EOF in block 1638807 of relation base/24349058/41425278
    HINT:  This has been seen to occur with buggy kernels; consider updating your system.
    

    Looking through the PostgreSQL mailing list about this error, most posts pertain to linux kernels from the ~2010s, and don't seem applicable.

    I've run VACUUM FULL on the table a few times, as well as completely dumping the table using pg_dump, deleting the table and recreating. The table is ~340 GiB and there is also a 13 GiB IVFFlat index referencing one of the vector(768) columns.

    Wondering if there might be a bug in how large vectors are stored.

    My table, notably, contains columns of types:

    • vector(768)
    • vector(768)[]
    • character varying[]
    • character varying

    And each row is easily around 2 or 3 MiB.

    opened by mmisiewicz 10
  • Ideas

    Ideas

    Plan

    • [ ] Use pairing heap for index scan for performance - stages branch
    • [ ] Use mini-batch k-means for index creation for reduced memory - minibatch branch
    • [ ] Add support for product quantization (in-progress)

    Ideas

    • [ ] Use tuplesort_set_bound for performance - bound branch (not needed w/ pairing heap)
    • [ ] Add functions to view lists and/or pages like pageinspect (require superuser)

    On-hold

    • [ ] Add support for parallel index scans (planner gets cost estimate but doesn't use) - parallel-index-scan branch
    • [ ] Change return type of distance functions from float8 to float4 for performance (maybe, needs benchmarking)
    opened by ankane 10
  • add vector/scalar math operations; add sum,avg aggregate functions

    add vector/scalar math operations; add sum,avg aggregate functions

    Thanks for the excellent library!

    I had need for scalar math functions and aggregate functions while using your code, so I've added them into the repo (along with appropriate test cases).

    The main thing I'm not sure about is how you want the code in sql/vector.sql formatted. I've added all my additions at the bottom since that was easiest for me, but I'm not sure if you want it spread out throughout the code in different spots. Also, I had to manually add code to both the sql/vector.sql file and sql/vector--0.1.7--0.1.8.sql files in order for it to work... this seems super error prone... is there not a way to automatically rebuild the sql/vector.sql automatically from all the version bumping files?

    opened by mikeizbicki 9
  • Use blis for vector operations

    Use blis for vector operations

    This is very tentative, needs benchmarks, documentation, etc. But it should speed up many operations. I was curious whether this is something you want to explore. Regards

    opened by maparent 8
  • pgvector vs FAISS

    pgvector vs FAISS

    update: Upgrading to v0.1.1 and building with PG_CFLAGS=-ffast-math make reduced the query time to 2.2s! Big speed jump, but 1.7x slower than the FAISS / Python service.


    I imported 792010 rows of 512d image vectors (~5GB) (aka not random) and ran a tests[0] to find the 4 closests vectors to an exact vector in the dataset.

    Searching with:

    • 1.279357709s - FAISS python web service (using json and IndexFlatL2) (with 791963 vectors [2]).
    • 11.381s - Searching (l2_distance) with pgvector extension (with 792010 rows) .

    Hardware:

    MacBook Pro (15-inch, 2018)
    2.6 GHz 6-Core Intel Core i7
    16 GB 2400 MHz DDR4
    

    Importing took 11.381 seconds with the COPY cmd from a csv file with each row being the vector.

    Any ideas why pgvector would be so much slower? The testing ENVs between the tools was significantly different, to the FAISS's dis-advantage, but FAISS was still much quicker.

    [1] Not a "scientific" test. I had other programs running on the machine when running this test. Mileage may vary. [2] The slight difference is the fais's vector import filters duplicate vectors.

    opened by KevinColemanInc 7
  • Using this extension on Windows OS

    Using this extension on Windows OS

    Hi, I am using this extension on my Windows Based project, PGSQL is v14, I tried to compile this extension both with MSVC 2019 and GCC minggw64. the dll compiled with GCC could not work, PGSQL will crash when you try to touch the vector interface. the dll compiled with MSVC 2019 can work without creating index, but when you start to create an index on vector, the PGSQL service process will crash immediately. Is there anyone have experiences with Windows OS to use this extension?

    opened by cdtaichen 6
  • Error during indexing large data

    Error during indexing large data

    I'm experiencing an unknown error during creating indices for large table. I tried to create index for smaller chunk and it works well, but for larger it throws an error. Could you please suggest where to dig to solve that issue?

    osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 10;
    CREATE INDEX
    osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 1000;
    CREATE INDEX
    osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 10000000;
    CREATE INDEX
    osm=# CREATE INDEX ON emb_planet_osm_nodes USING ivfflat (embedding vector_l2_ops) WHERE id < 1000000000;
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Failed.
    !?>
    

    What information should I provide to debug the problem?

    opened by vkhizanov 6
  • Issue when using L2 distance

    Issue when using L2 distance

    I have a issue when using the 'L2 distance' operator. No matter what I'm comparing it to, it's not respecting the limit operator and it always returns just 1 (and always the same) exact record.

    Here is an example query:

    SELECT * FROM public.connected c ORDER BY factors <-> '[0.05613021180033684,-0.046761274337768555,-0.04585694149136543]' LIMIT 5;

    I have no problems with cosine (<=>) and inner product (<#>), they work as expected.

    opened by atanastrpceski 6
  • Clarification about accuracy of partially indexed dataset

    Clarification about accuracy of partially indexed dataset

    Hi, thanks for this wonderful extension, great work!

    So, as far as I understand, users are supposed to create index on vector column after data has been inserted. I do have the need to regularly insert, update & delete vectors in that column. I set up periodic job that on some timer does REINDEX CONCURRENTLY to keep search queries fast.

    The question is, I might run into scenario when, firstly, 100% percent of the data has been indexed, then 50% of that data is removed and then 50% of new data is added BEFORE reindex happens. It would mean 50% indexed data and 50% non-indexed data. How does the algorithm works in this situation?

    opened by levchik 6
  • bug:  free page can not use again

    bug: free page can not use again

    ivfflatbulkdelete() {
    ....
    /* Set to first free page */
    if (!BlockNumberIsValid(insertPage))
    insertPage = searchPage;
    ...
    }
    

    Here only set the first free page, other deleted pages can not be used anymore, this caused index grow larger and larger.

    any solution? thanks.

    opened by yjhjstz 6
  • Server crash when inserting with ivfflat index with high number of clusters

    Server crash when inserting with ivfflat index with high number of clusters

    Hello, thank you for this amazing extension!

    However we have encountered somes server crashes when inserting rows into a table with an index which has a lists parameter greater than around 6500.

    Reproduction steps:

    CREATE TABLE embed (id integer NOT NULL, vec vector(384) NOT NULL);
    CREATE INDEX ON embed (vec vector_cosine_ops) WITH (lists = 10000);
    

    Then to trigger the crash, insert some rows (as low as 1k did it for us almost everytime).

    Server logs

    TRAP: FailedAssertion("((PageHeader) (page))->pd_special >= SizeOfPageHeaderData", File: "/usr/include/postgresql/server/storage/bufpage.h", Line: 317, PID: 54277)
    postgres: root root [local] COPY(ExceptionalCondition+0xab)[0x5644680a2c2d]
    /usr/lib/postgresql/vector.so(+0x54b1)[0x7f6309e364b1]
    /usr/lib/postgresql/vector.so(+0x585c)[0x7f6309e3685c]
    /usr/lib/postgresql/vector.so(ivfflatinsert+0xd1)[0x7f6309e36abe]
    postgres: root root [local] COPY(index_insert+0x9b)[0x564467bf9a70]
    postgres: root root [local] COPY(ExecInsertIndexTuples+0x1dd)[0x564467da2c76]
    postgres: root root [local] COPY(+0x24e970)[0x564467d0b970]
    postgres: root root [local] COPY(+0x24ebec)[0x564467d0bbec]
    postgres: root root [local] COPY(CopyFrom+0xa64)[0x564467d0c889]
    postgres: root root [local] COPY(DoCopy+0x41f)[0x564467d0afa9]
    postgres: root root [local] COPY(standard_ProcessUtility+0x4a0)[0x564467f6c0b3]
    postgres: root root [local] COPY(ProcessUtility+0xdb)[0x564467f6c7bc]
    postgres: root root [local] COPY(+0x4acaf6)[0x564467f69af6]
    postgres: root root [local] COPY(+0x4acdd6)[0x564467f69dd6]
    postgres: root root [local] COPY(PortalRun+0x1c9)[0x564467f6a1a3]
    postgres: root root [local] COPY(+0x4a9084)[0x564467f66084]
    postgres: root root [local] COPY(PostgresMain+0x83f)[0x564467f682a4]
    postgres: root root [local] COPY(+0x408e46)[0x564467ec5e46]
    postgres: root root [local] COPY(+0x40b318)[0x564467ec8318]
    postgres: root root [local] COPY(+0x40b565)[0x564467ec8565]
    postgres: root root [local] COPY(PostmasterMain+0x1183)[0x564467ec9b9e]
    postgres: root root [local] COPY(main+0x214)[0x564467e0aa52]
    /usr/lib/libc.so.6(+0x23290)[0x7f63154e7290]
    /usr/lib/libc.so.6(__libc_start_main+0x8a)[0x7f63154e734a]
    postgres: root root [local] COPY(_start+0x25)[0x564467b7d045]
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  reaping dead processes
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  server process (PID 54277) was terminated by signal 6: Aborted
    2022-10-26 08:30:59.409 UTC [53734] DETAIL:  Failed process was running: COPY embed (id, vec) FROM STDIN WITH (FORMAT BINARY)
    2022-10-26 08:30:59.409 UTC [53734] LOG:  server process (PID 54277) was terminated by signal 6: Aborted
    2022-10-26 08:30:59.409 UTC [53734] DETAIL:  Failed process was running: COPY embed (id, vec) FROM STDIN WITH (FORMAT BINARY)
    2022-10-26 08:30:59.409 UTC [53734] LOG:  terminating any other active server processes
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  sending SIGQUIT to process 53741
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  sending SIGQUIT to process 53765
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  sending SIGQUIT to process 53737
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  sending SIGQUIT to process 53736
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  sending SIGQUIT to process 53738
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  sending SIGQUIT to process 53739
    2022-10-26 08:30:59.409 UTC [53734] DEBUG:  sending SIGQUIT to process 53740
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  writing stats file "pg_stat/global.stat"
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  writing stats file "pg_stat/db_16385.stat"
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  removing temporary stats file "pg_stat_tmp/db_16385.stat"
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  writing stats file "pg_stat/db_13780.stat"
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  removing temporary stats file "pg_stat_tmp/db_13780.stat"
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  writing stats file "pg_stat/db_0.stat"
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  removing temporary stats file "pg_stat_tmp/db_0.stat"
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2022-10-26 08:30:59.410 UTC [53740] DEBUG:  proc_exit(-1): 0 callbacks to make
    2022-10-26 08:30:59.414 UTC [53734] DEBUG:  reaping dead processes
    2022-10-26 08:30:59.417 UTC [53734] DEBUG:  reaping dead processes
    2022-10-26 08:30:59.417 UTC [53734] DEBUG:  server process (PID 53765) exited with exit code 2
    2022-10-26 08:30:59.417 UTC [53734] DETAIL:  Failed process was running: create index on embed using ivfflat (vec vector_cosine_ops) with (lists = 10000);
    2022-10-26 08:30:59.417 UTC [53734] DEBUG:  reaping dead processes
    2022-10-26 08:30:59.417 UTC [53734] DEBUG:  reaping dead processes
    2022-10-26 08:30:59.417 UTC [53734] DEBUG:  reaping dead processes
    2022-10-26 08:30:59.417 UTC [53734] LOG:  all server processes terminated; reinitializing
    

    GDB stack trace

    Program terminated with signal SIGABRT, Aborted.
    #0  0x00007f631554c64c in ?? () from /usr/lib/libc.so.6
    (gdb) bt
    #0  0x00007f631554c64c in ?? () from /usr/lib/libc.so.6
    #1  0x00007f63154fc958 in raise () from /usr/lib/libc.so.6
    #2  0x00007f63154e653d in abort () from /usr/lib/libc.so.6
    #3  0x00005644680a2c4f in ExceptionalCondition ([email protected]=0x7f6309e3b280 "((PageHeader) (page))->pd_special >= SizeOfPageHeaderData",
        [email protected]=0x7f6309e3b0cb "FailedAssertion", [email protected]=0x7f6309e3b218 "/usr/include/postgresql/server/storage/bufpage.h", [email protected]=317) at assert.c:69
    #4  0x00007f6309e364b1 in PageValidateSpecialPointer ([email protected]=0x564469f59a78 "") at /usr/include/postgresql/server/storage/bufpage.h:317
    #5  0x00007f6309e3685c in InsertTuple ([email protected]=0x7f6309e40fc8, [email protected]=0x564469e47ac0, [email protected]=0x7f6309efb728, [email protected]=0x7fff8198cfb0) at src/ivfinsert.c:90
    #6  0x00007f6309e36abe in ivfflatinsert (index=0x7f6309e40fc8, values=0x7fff8198d0f0, isnull=<optimized out>, heap_tid=0x564469e48c38, heap=0x7f6309efb728, checkUnique=<optimized out>, indexUnchanged=false,
        indexInfo=0x564469c661e8) at src/ivfinsert.c:167
    #7  0x0000564467bf9a70 in index_insert ([email protected]=0x7f6309e40fc8, [email protected]=0x7fff8198d0f0, [email protected]=0x7fff8198d0d0, [email protected]=0x564469e48c38,
        [email protected]=0x7f6309efb728, [email protected]=UNIQUE_CHECK_NO, indexUnchanged=false, indexInfo=0x564469c661e8) at indexam.c:193
    #8  0x0000564467da2c76 in ExecInsertIndexTuples ([email protected]=0x564469c65ed8, slot=0x564469e48c08, [email protected]=0x564469c675a0, [email protected]=false,
        [email protected]=false, [email protected]=0x0, arbiterIndexes=0x0) at execIndexing.c:411
    #9  0x0000564467d0b970 in CopyMultiInsertBufferFlush ([email protected]=0x7fff8198d370, buffer=0x564469c57910) at copyfrom.c:344
    #10 0x0000564467d0bbec in CopyMultiInsertInfoFlush ([email protected]=0x7fff8198d370, [email protected]=0x564469c65ed8) at copyfrom.c:426
    #11 0x0000564467d0c889 in CopyFrom ([email protected]=0x564469c65ca0) at copyfrom.c:1067
    #12 0x0000564467d0afa9 in DoCopy ([email protected]=0x564469c844c0, [email protected]=0x564469b706a0, stmt_location=0, stmt_len=0, [email protected]=0x7fff8198d490) at copy.c:299
    #13 0x0000564467f6c0b3 in standard_ProcessUtility (pstmt=0x564469b71298, queryString=0x564469b6fa10 "COPY embed (id, vec) FROM STDIN WITH (FORMAT BINARY)", readOnlyTree=<optimized out>,
        context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x564469b71388, qc=0x7fff8198d780) at utility.c:739
    #14 0x0000564467f6c7bc in ProcessUtility ([email protected]=0x564469b71298, queryString=<optimized out>, readOnlyTree=<optimized out>, [email protected]=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>,
        queryEnv=<optimized out>, dest=0x564469b71388, qc=0x7fff8198d780) at utility.c:527
    #15 0x0000564467f69af6 in PortalRunUtility ([email protected]=0x564469bd92f0, [email protected]=0x564469b71298, [email protected]=true, [email protected]=false,
        [email protected]=0x564469b71388, [email protected]=0x7fff8198d780) at pquery.c:1155
    #16 0x0000564467f69dd6 in PortalRunMulti ([email protected]=0x564469bd92f0, [email protected]=true, [email protected]=false, [email protected]=0x564469b71388,
        [email protected]=0x564469b71388, [email protected]=0x7fff8198d780) at pquery.c:1312
    #17 0x0000564467f6a1a3 in PortalRun ([email protected]=0x564469bd92f0, [email protected]=9223372036854775807, [email protected]=true, [email protected]=true, [email protected]=0x564469b71388,
        [email protected]=0x564469b71388, qc=0x7fff8198d780) at pquery.c:788
    #18 0x0000564467f66084 in exec_simple_query ([email protected]=0x564469b6fa10 "COPY embed (id, vec) FROM STDIN WITH (FORMAT BINARY)") at postgres.c:1213
    #19 0x0000564467f682a4 in PostgresMain ([email protected]=1, [email protected]=0x7fff8198d980, dbname=<optimized out>, username=<optimized out>) at postgres.c:4496
    #20 0x0000564467ec5e46 in BackendRun ([email protected]=0x564469b97410) at postmaster.c:4530
    #21 0x0000564467ec8318 in BackendStartup ([email protected]try=0x564469b97410) at postmaster.c:4252
    #22 0x0000564467ec8565 in ServerLoop () at postmaster.c:1745
    #23 0x0000564467ec9b9e in PostmasterMain ([email protected]=5, [email protected]=0x564469b691b0) at postmaster.c:1417
    #24 0x0000564467e0aa52 in main (argc=5, argv=0x564469b691b0) at main.c:209
    

    Versions:

    posgresql 14.5 pgvector v0.3.0 (379a760)

    opened by ArthurMelin 5
  • Building on Windows

    Building on Windows

    Continuation of #37

    @cdtaichen Re https://github.com/pgvector/pgvector/issues/37#issuecomment-1270980094, for GCC, I think the commands will probably be similar to Linux, which is:

    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/ivfbuild.o src/ivfbuild.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/ivfflat.o src/ivfflat.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/ivfinsert.o src/ivfinsert.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/ivfkmeans.o src/ivfkmeans.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/ivfscan.o src/ivfscan.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/ivfutils.o src/ivfutils.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/ivfvacuum.o src/ivfvacuum.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -I. -I./ -I/usr/include/postgresql/14/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/vector.o src/vector.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -march=native -ftree-vectorize -fassociative-math -fno-signed-zeros -fno-trapping-math -fPIC -shared -o vector.so src/ivfbuild.o src/ivfflat.o src/ivfinsert.o src/ivfkmeans.o src/ivfscan.o src/ivfutils.o src/ivfvacuum.o src/vector.o -L/usr/lib/x86_64-linux-gnu  -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-10/lib  -Wl,--as-needed 
    

    Also, outstanding questions from #37:

    1. What does the Postgres server log say about the crashes?
    2. Does the windows branch work for MSVC?
    opened by ankane 4
  • Ideas

    Ideas

    Please create a new issue to discuss any ideas or share your own.

    Plan

    • [ ] Use pairing heap for index scan for performance - stages branch
    • [ ] Use mini-batch k-means for index creation for reduced memory - minibatch branch
    • [ ] Add support for product quantization (in-progress)

    Ideas

    • [ ] Use tuplesort_set_bound for performance - bound branch (not needed w/ pairing heap)
    • [ ] Add functions to view lists and/or pages like pageinspect (require superuser)

    On-hold

    • [ ] Add support for parallel index scans (planner gets cost estimate but doesn't use) - parallel-index-scan branch
    • [ ] Change return type of distance functions from float8 to float4 for performance (maybe, needs benchmarking)
    opened by ankane 0
Owner
Andrew Kane
Andrew Kane
The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 0.9, working as a coprocessor to CORE-V's CVA6 core

Ara Ara is a vector unit working as a coprocessor for the CVA6 core. It supports the RISC-V Vector Extension, version 0.9. Dependencies Check DEPENDEN

null 177 Nov 30, 2022
Fast and robust template matching with majority neighbour similarity and annulus projection transformation

A-MNS_TemplateMatching This is the official code for the PatternRecognition2020 paper: Fast and robust template matching with majority neighbour simil

Layjuns 20 Nov 11, 2022
TheMathU Similarity Index App will accept a mathematical problem as user input and return a list of similar problems that have memorandums.

Technologies MathU Similarity Index - Segmentation Cult The MathU Similarity Index App accepts a mathematical problem as user input and returns a list

COS 301 - 2022 7 Nov 2, 2022
This code accompanies the paper "Human-Level Performance in No-Press Diplomacy via Equilibrium Search".

Diplomacy SearchBot This code accompanies the paper "Human-Level Performance in No-Press Diplomacy via Equilibrium Search". A very brief orientation:

Facebook Research 33 Oct 28, 2022
nanoflann: a C++11 header-only library for Nearest Neighbor (NN) search with KD-trees

nanoflann 1. About nanoflann is a C++11 header-only library for building KD-Trees of datasets with different topologies: R2, R3 (point clouds), SO(2)

Jose Luis Blanco-Claraco 1.7k Dec 2, 2022
Ncnn version demo of [CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search (ncnn) The official implementation by pytorch: ht

null 31 Nov 16, 2022
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 169.3k Nov 26, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Nov 22, 2022
🐸 Coqui STT is an open source Speech-to-Text toolkit which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers

Coqui STT ( ?? STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. ?? STT is battle tested in both producti

Coqui.ai 1.6k Nov 29, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
An open source python library for automated feature engineering

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to

alteryx 6.4k Nov 23, 2022
Open source modules to interface Metavision Intelligence Suite with event-based vision hardware equipment

Metavision: installation from source This page describes how to compile and install the OpenEB codebase. For more information, refer to our online doc

PROPHESEE 97 Nov 25, 2022
OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.

OpenEmbedding English version | δΈ­ζ–‡η‰ˆ About OpenEmbedding is an open-source framework for TensorFlow distributed training acceleration. Nowadays, many m

4Paradigm 19 Jul 25, 2022
OpenSpeaker is a completely independent and open source speaker recognition project.

OpenSpeaker is a completely independent and open source speaker recognition project. It provides the entire process of speaker recognition including multi-platform deployment and model optimization.

ZY 34 Nov 20, 2022
Cinder is a community-developed, free and open source library for professional-quality creative coding in C++.

Cinder 0.9.3dev: libcinder.org Cinder is a peer-reviewed, free, open source C++ library for creative coding. Please note that Cinder depends on a few

Cinder 5k Nov 25, 2022
An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit

DREAMPlaceFPGA An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit. This work leverages the open-source A

Rachel Selina Rajarathnam 23 Sep 15, 2022
An open source metroidvania engine.

OpenMV An open source metroidvania. Created from the ground up in C99. I'm creating this project to build the foundations for many games to come. It i

null 53 Nov 9, 2022
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20.6k Dec 2, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65k Nov 24, 2022