Re: Search results consistency with vector search

Dave Thu, 18 Dec 2025 08:50:07 -0800

Personally I would index to one server and replicate it out to the search 
servers on a small interval.   Nrt is just a synonym for replicating the index 
as often as needed.  This would provide consistent results and not require solr 
cloud at all.


> On Dec 18, 2025, at 11:18, Andrey Ukhanov (BLOOMBERG/ 919 3RD A) 
> <[email protected]> wrote:
> 
> You can try tuning "autoCommit" (and "autoSoftCommit") to make the segment 
> fetching more frequent. Depending on what values those are currently set to, 
> it could help. But as with any change, best to test.
> 
> From: [email protected] At: 12/17/25 17:04:16 UTC-5:00To:  
> [email protected]
> Cc:  [email protected]
> Subject: Re: Search results consistency with vector search
> 
> Thanks Andrey for your suggestion. We do need to support near real time
> searches, and we do have frequent index updates , so I believe TLOG replicas
> can't be used. Is there a way TLOG can support near real time searches , say
> for example by tuning commit intervals.
> 
> Regards,
> Rajeswari
> 
> On 12/17/25, 1:02 PM, "Andrey Ukhanov (BLOOMBERG/ 919 3RD A)"
> <[email protected] <mailto:[email protected]>> wrote:
> 
> 
> [You don't often get email from [email protected]
> <mailto:[email protected]>. Learn why this is important at
> https://aka.ms/LearnAboutSenderIdentification
> <https://aka.ms/LearnAboutSenderIdentification> ]
> 
> 
> In a Solr cloud with multiple NRT replicas the leader node will receive the
> updates and distribute them to non-leader replicas. The important, and
> relevant, aspect to highlight is that each NRT replica will then build/manage
> segments individually. That means segment structure across replicas diverges.
> HNSW graph is created per segment. Since segment structure is different across
> replicas, it can lead to behavior you are observing where relevance results
> differ across replicas.
> To ensure consistency of results across replicas, segment structure needs to 
> be
> the same. There are a few ways to accomplish this:
> 1) Use TLOG (and PULL if applicable) replication. Unlike NRT, TLOG ensures 
> that
> segment structure is the same across replicas. More on that here -
> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-inde
> xing.html#types-of-replicas
> <https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-
> indexing.html#types-of-replicas>
> 2) If your index is static (doesn't change very often), you can explore using
> optimize command or re-creating replicas from the leader. More on the here -
> https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-update-ha
> ndlers.html#commit-and-optimize-during-updates
> <https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-updat
> e-handlers.html#commit-and-optimize-during-updates>
> Personally I would recommend option 1.
> 
> 
> From: [email protected] <mailto:[email protected]> At: 12/17/25
> 14:56:04 UTC-5:00To: [email protected] <mailto:[email protected]>
> Cc: [email protected] <mailto:[email protected]>
> Subject: Re: Search results consistency with vector search
> 
> 
> Thanks for your follow up , we are using NRT replicas
> 
> 
> On 12/17/25, 11:01 AM, "Andrey Ukhanov (BLOOMBERG/ 919 3RD A)"
> <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> 
> 
> [You don't often get email from [email protected]
> <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>. Learn why 
> this
> is important at
> https://aka.ms/LearnAboutSenderIdentification
> <https://aka.ms/LearnAboutSenderIdentification>
> <https://aka.ms/LearnAboutSenderIdentification>
> <https://aka.ms/LearnAboutSenderIdentification&gt;> ]
> 
> 
> Hi Rajeswari, what replication model are you using in Solr? NRT or TLOG/PULL?
> 
> 
> From: [email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>> At: 12/17/25
> 13:59:48 UTC-5:00To: [email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>
> Cc: [email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>
> Subject: Search results consistency with vector search
> 
> 
> Hi All,
> 
> 
> Noticed that the vector search results for the same query is different each
> time. Both ordering and the records are also different based on which replica
> it hits.
> 
> 
> All the replicas have same documents and all of them have same embeddings. 
> With
> vector similarity parser with minReturn=0.8 , minTraversse=0.8 , the numFound
> for specific query varies from 111 to 8 , which is a huge variation.
> 
> 
> We are using solr 9.9 and lucene version 9.12.2. I believe this behavior due
> to approximate HNSW construction in each replica.
> 
> 
> Tried with minTraverseas 0.75 instead 0.8 , this fetches more records
> (somewhere in 800s) he variations in numFound is less , but the ordering of
> the records and even the record is different in this case also each time.
> Is this expected ? . What can be done to get consistent results each time.
> Please share your experiences.
> 
> 
> Thanks,
> Rajeswari
> 
>

Re: Search results consistency with vector search

Reply via email to