Hi,

In our Solr 7.4 cluster, we have noticed that some replicas of some of our
Collections are out of sync, the slave replica has more number of records
than the leader.
This is resulting in different number of records on subsequent queries on
the same Collection. Commit is also not helping in this case.

I'm able to replicate the issue using the steps given below:

   1. Create a collection with 1 shard and 2 rf
   2. Ingest 10k records in the collection
   3. Turn down node with replica 2
   4. Ingest 10k records in the collection
   5. Turn down replica 1
   6. Turn up replica 2, wait till it become leader
   7. Ingest 20k records on replica 2
   8. Turn down replica 2
   9. Turn up replica 1, wait till it become leader or use FORCELEADER
   action of Collections API
   10. Turn up replica 2
   11. Now replica 2 has 30k records and replica 1 has 20k records and they
   never sync

I tried the same steps with TLOG replicas and in that case both replicas
had 20k records in the end and were in sync but 10k records were lost.

Is there any way to sync the replicas? I am looking for a lightweight
solution that doesn't require re-creating the index.

Regards,
Anshuman

Reply via email to