How to store high-dimensional (N > 100) vectors and index for fast lookup by cosine similarity?

7 votes

2 answers

2759 views

postgresql database-design index array dimension

                          I am trying to store vectors for word/doc embeddings  in a PostgreSQL table, and want to be able to quickly pull the N rows with highest cosine similarity  to a given query vector. The vectors I'm working with are numpy.arrays of floats with length *100 <= L <= 1000*. 

I looked into the cube module  for similarity search, but it is limited to vectors with *<= 100* dimensions. The embeddings I am using will result in vectors that are 100-dimensions *minimum* and often much higher (depending on settings when training word2vec/doc2vec models).

What is the most efficient way to store large dimensional vectors (numpy float arrays) in Postgres, and perform quick lookup based on cosine similarity (or other vector similarity metrics)?

Asked by J. Taylor (379 rep)

Feb 22, 2019, 12:29 AM
Last activity: Apr 5, 2023, 02:54 PM

How to store high-dimensional (N > 100) vectors and index for fast lookup by cosine similarity?

Related Questions