Julie Tibshirani created LUCENE-10146:
-----------------------------------------

             Summary: Add VectorSimilarityFunction.COSINE
                 Key: LUCENE-10146
                 URL: https://issues.apache.org/jira/browse/LUCENE-10146
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Julie Tibshirani


To perform ANN search with cosine similarity, users are expected to normalize 
the document and query vectors to unit length, then use 
{{VectorSimilarityFunction.DOT_PRODUCT}}. I think it would be good to also 
support cosine similarity directly through {{VectorSimilarityFunction.COSINE}}. 
This would allow users to perform ANN based on cosine similarity, while 
retaining access to the original vectors through {{VectorValues}}. That way 
they can use the original vectors in a reranking step or return them to the 
application for further processing.

It looks like nmslib and hnswlib support cosine similarity. On the other hand, 
FAISS only supports dot product and suggests users normalize the vectors to 
perform cosine similarity 
(https://github.com/facebookresearch/faiss/issues/95). To me adding this one 
additional similarity is worth it in terms of what it lets users accomplish.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to