Please, please, please do _not_ try to use core discovery to add new replicas by manually editing stuff.
bq: and my deployment tools create an empty core on newly provisioned machines. This is a really bad idea (as you have discovered). Basically, your deployment tools have to do everything right to get this to "play nice" with SolrCloud. Your core names can't conflict. You have to spell all the parameters in core.properties right. Etc. There are endless places to go wrong. And this is all done for you (and tested with unit tests) via the Collections API. Assuming that in your scenario you started machine2 before machine1, how would Solr have any clue that that machine1 would _ever_ come back up? It'll do the best it can and try to elect a leader, but there's only one machine to choose from... and it's sorely out of date.... Absolutely use the collections api to add replicas to running SolrCloud clusters. And adding a replica via the Collections API _will_ use core discovery, as in it'll cause a core.properties file to be written on the node in question, populate it with all the necessary parameters, initiate a synch from the (running) leader, put itself into the query rotation automatically when the sync is done etc. All without you 1> having to try to figure all this out yourself 2> take the collection offline Best, Erick On Tue, May 26, 2015 at 2:46 PM, Michael Roberts <mrobe...@tableau.com> wrote: > Hi, > > I have a SolrCloud setup, running 4.10.3. The setup consists of several > cores, each with a single shard and initially each shard has a single replica > (so, basically, one machine). I am using core discovery, and my deployment > tools create an empty core on newly provisioned machines. > > The scenario that I am testing is, Machine 1 is running and writes are > occurring from my application to Solr. At some point, I stop Machine 1, and > reconfigure my application to add Machine 2. Both machines are then started. > > What I would expect to happen at this point, is Machine 2 cannot become > leader because it is behind compared to Machine 1. Machine 2 would then > restore from Machine 1. > > However, looking at the logs. I am seeing Machine 2 become elected leader and > fail the PeerRestore > > 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO > org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to > continue. > 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO > org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - > try and sync > 2015-05-24 17:20:25.997 -0700 (,,,) coreZkRegister-1-thread-4 : INFO > org.apache.solr.update.PeerSync - PeerSync: core=project > url=http://10.32.132.64:11000/solr START > replicas=[http://jchar-1:11000/solr/project/] nUpdates=100 > 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO > org.apache.solr.update.PeerSync - PeerSync: core=project > url=http://10.32.132.64:11000/solr DONE. We have no versions. sync failed. > 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO > org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we > have no versions - we can't sync in that case - we were active before, so > become leader anyway > 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO > org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: > http://10.32.132.64:11000/solr/project/ shard1 > > What is the expected behavior here? What’s the best practice for adding a new > replica? Should I have the SolrCloud running and do it via the Collections > API or can I continue to use core discovery? > > Thanks. > >