How to store high-dimensional (N > 100) vectors and index for fast lookup by cosine similarity?
7
votes
2
answers
2759
views
I am trying to store vectors for word/doc embeddings in a PostgreSQL table, and want to be able to quickly pull the N rows with highest cosine similarity to a given query vector. The vectors I'm working with are
numpy.array
s of floats with length *100 <= L <= 1000*.
I looked into the cube
module for similarity search, but it is limited to vectors with *<= 100* dimensions. The embeddings I am using will result in vectors that are 100-dimensions *minimum* and often much higher (depending on settings when training word2vec/doc2vec models).
What is the most efficient way to store large dimensional vectors (numpy float arrays) in Postgres, and perform quick lookup based on cosine similarity (or other vector similarity metrics)?
Asked by J. Taylor
(379 rep)
Feb 22, 2019, 12:29 AM
Last activity: Apr 5, 2023, 02:54 PM
Last activity: Apr 5, 2023, 02:54 PM