Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Kent Fitch
My testing shows Lucene's HNSW in a very positive light. The ability to perform blended searches (vector/semantic and text) is valuable, even with high quality embeddings, and helps when the searcher's intent is to search for specific words or phrases (such as a name, or exact concepts) which get

Re: Reindexing leaving behind 0 live doc segments

2023-08-31 Thread Rahul Goswami
Stefan, Mike, Appreciate your responses! I spent some time analyzing your inputs and going further down the rabbit hole. Stefan, I looked at the IndexRearranger code you referenced where it tries to drop the segment. I see that it eventually gets handled via IndexFileDeleter.checkpoint() through

Re: Reindexing leaving behind 0 live doc segments

2023-08-31 Thread Michael McCandless
Hi Rahul, Please do not pursue Approach 2 :) ReadersAndUpdates.release is not something the application should be calling. This path can only lead to pain. It sounds to me like something in Solr is holding an old reader (maybe the last commit point, or reader prior to the refresh after you

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael McCandless
Thanks Michael, very interesting! I of course agree that Lucene is all you need, heh ;) Jimmy Lin also tweeted about the strength of Lucene's HNSW: https://twitter.com/lintool/status/1681333664431460353?s=20 Mike McCandless http://blog.mikemccandless.com On Thu, Aug 31, 2023 at 3:31 AM

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner
Hi Together You might be interesed in this paper / article https://arxiv.org/abs/2308.14963 Thanks Michael - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: