Anshum Gupta created SOLR-8619:
----------------------------------

             Summary: A new replica should not become leader when all current 
replicas are down as it leads to data loss
                 Key: SOLR-8619
                 URL: https://issues.apache.org/jira/browse/SOLR-8619
             Project: Solr
          Issue Type: Bug
            Reporter: Anshum Gupta


Here's what I'm talking about:
* Start a 2 node solrcloud cluster
* Create a 1 shard/1 replica collection
* Add documents
* Shut down the node that has the only active shard
* ADDREPLICA for the shard/collection, so Solr would attempt to add a new 
replica on the other node
* Solr waits for a while before this replica becomes an active leader.
* Index a few new docs
* Bring up the old node
* The replica comes up, with it's old index and then syncs to only contain the 
docs from the new leader.
All old documents are lost in this case

Here are a few things that might work here:
1. Reject an ADDREPLICA call if all current replicas for the shard are down. 
Considering the new replica can not sync from anyone, it doesn't make sense for 
this replica to even come up
2. The replica shouldn't become active/leader unless either it was the last 
known leader or active before it went into recovering state
unless there are no other replicas in the clusterstate.

This might very well be related to SOLR-8173 but we should add a check to 
ADDREPLICA as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to