Re: Disconnectedness in HNSW graphs in Lucene

2023-08-24 Thread Benjamin Trent
> Can I create a github issue for this and continue updating there? I think that would be great. If y'all are suffering from this and discovered an issue, others are unknowingly having the same issue. Having tools to discover it and fix it will make Lucene better. Hopefully we find a bug and can a

Re: Disconnectedness in HNSW graphs in Lucene

2023-08-24 Thread Nitiraj Rathore
Thanks Benjamin for the reply and confirming that connected can be issue.( btw I am same Nitiraj, just using my apache.org email id from now on to communicate). I will do some more experiment to reproduce the issue and see the connectedness across the graph and not just with the Entry point. Bu

Re: Disconnectedness in HNSW graphs in Lucene

2023-08-23 Thread Benjamin Trent
Nitiraj, Good experimentation! Connectedness within layers is indeed important. The algorithm itself should ensure connectedness of disjoint NSWs as it mutually connects nodes (selected over diversity). However, if the data is extremely clustered, this can cause connectedness to drop (few densely

Disconnectedness in HNSW graphs in Lucene

2023-08-23 Thread Nitiraj Singh Rathore
Hi Lucene developers, I work for Amazon Retail Product search and we are using Lucene KNN for semantic search of products. We index product embeddings (vectors) into lucene (hnsw graph) and search them by generating query embedding at runtime. The product embeddings also receive regular updates an