Re: Search results consistency with vector search

Andrey Ukhanov (BLOOMBERG/ 919 3RD A) Thu, 18 Dec 2025 08:17:59 -0800

You can try tuning "autoCommit" (and "autoSoftCommit") to make the segment 
fetching more frequent. Depending on what values those are currently set to, it 
could help. But as with any change, best to test.

From: [email protected] At: 12/17/25 17:04:16 UTC-5:00To:  
[email protected]
Cc:  [email protected]
Subject: Re: Search results consistency with vector search

Thanks Andrey for your suggestion. We do need to support near real time 
searches, and we do have frequent index updates , so I believe TLOG replicas 
can't be used. Is there a way TLOG can support near real time searches , say 
for example by tuning commit intervals.

Regards,
Rajeswari

On 12/17/25, 1:02 PM, "Andrey Ukhanov (BLOOMBERG/ 919 3RD A)" 
<[email protected] <mailto:[email protected]>> wrote:

[You don't often get email from [email protected] 
<mailto:[email protected]>. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification 
<https://aka.ms/LearnAboutSenderIdentification> ]

In a Solr cloud with multiple NRT replicas the leader node will receive the 
updates and distribute them to non-leader replicas. The important, and 
relevant, aspect to highlight is that each NRT replica will then build/manage 
segments individually. That means segment structure across replicas diverges. 
HNSW graph is created per segment. Since segment structure is different across 
replicas, it can lead to behavior you are observing where relevance results 
differ across replicas.
To ensure consistency of results across replicas, segment structure needs to be 
the same. There are a few ways to accomplish this:
1) Use TLOG (and PULL if applicable) replication. Unlike NRT, TLOG ensures that 
segment structure is the same across replicas. More on that here - 
https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-inde
xing.html#types-of-replicas 
<https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-
indexing.html#types-of-replicas>
2) If your index is static (doesn't change very often), you can explore using 
optimize command or re-creating replicas from the leader. More on the here - 
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-update-ha
ndlers.html#commit-and-optimize-during-updates 
<https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-updat
e-handlers.html#commit-and-optimize-during-updates>
Personally I would recommend option 1.

From: [email protected] <mailto:[email protected]> At: 12/17/25 
14:56:04 UTC-5:00To: [email protected] <mailto:[email protected]>
Cc: [email protected] <mailto:[email protected]>
Subject: Re: Search results consistency with vector search

Thanks for your follow up , we are using NRT replicas

On 12/17/25, 11:01 AM, "Andrey Ukhanov (BLOOMBERG/ 919 3RD A)"
<[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>> wrote:

[You don't often get email from [email protected] 
<mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>. Learn why this 
is important at
https://aka.ms/LearnAboutSenderIdentification 
<https://aka.ms/LearnAboutSenderIdentification>
<https://aka.ms/LearnAboutSenderIdentification> 
<https://aka.ms/LearnAboutSenderIdentification&gt;> ]

Hi Rajeswari, what replication model are you using in Solr? NRT or TLOG/PULL?

From: [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> At: 12/17/25
13:59:48 UTC-5:00To: [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>
Cc: [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>
Subject: Search results consistency with vector search

Hi All,

Noticed that the vector search results for the same query is different each
time. Both ordering and the records are also different based on which replica
it hits.

All the replicas have same documents and all of them have same embeddings. With
vector similarity parser with minReturn=0.8 , minTraversse=0.8 , the numFound
for specific query varies from 111 to 8 , which is a huge variation.

We are using solr 9.9 and lucene version 9.12.2. I believe this behavior due
to approximate HNSW construction in each replica.

Tried with minTraverseas 0.75 instead 0.8 , this fetches more records
(somewhere in 800s) he variations in numFound is less , but the ordering of
the records and even the record is different in this case also each time.
Is this expected ? . What can be done to get consistent results each time.
Please share your experiences.

Thanks,
Rajeswari

Re: Search results consistency with vector search

Reply via email to