Sample Header Ad - 728x90

How to store high-dimensional (N > 100) vectors and index for fast lookup by cosine similarity?

7 votes
2 answers
2759 views
I am trying to store vectors for word/doc embeddings in a PostgreSQL table, and want to be able to quickly pull the N rows with highest cosine similarity to a given query vector. The vectors I'm working with are numpy.arrays of floats with length *100 <= L <= 1000*. I looked into the cube module for similarity search, but it is limited to vectors with *<= 100* dimensions. The embeddings I am using will result in vectors that are 100-dimensions *minimum* and often much higher (depending on settings when training word2vec/doc2vec models). What is the most efficient way to store large dimensional vectors (numpy float arrays) in Postgres, and perform quick lookup based on cosine similarity (or other vector similarity metrics)?
Asked by J. Taylor (379 rep)
Feb 22, 2019, 12:29 AM
Last activity: Apr 5, 2023, 02:54 PM