Note, it seems like the current Solr's logic relies on persistent master disks. https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java#L615
On Tue, Jun 25, 2019 at 3:16 PM Mikhail Khludnev <m...@apache.org> wrote: > Hello, Patrick. > Can <str name="replicateAfter">commit</str> help you? > > On Tue, Jun 25, 2019 at 12:55 AM Patrick Bordelon < > patrick.borde...@coxautoinc.com> wrote: > >> Hi, >> >> We recently upgraded to SOLR 7.5 in AWS, we had previously been running >> SOLR >> 6.5. In our current configuration we have our applications broken into a >> single instance primary environment and a multi-instance replica >> environment >> separated behind a load balancer for each environment. >> >> Until recently we've been able to reload the primary without the replicas >> updating until there was a full index. However when we upgraded to 7.5 we >> started noticing that after terminating and rebuilding a primary instance >> that the associated replicas would all start showing 0 documents in all >> indexes. After some research we believe we've tracked down the issue. >> SOLR-11293. >> >> SOLR-11293 changes >> < >> https://issues.apache.org/jira/browse/SOLR-11293?focusedCommentId=16182379&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16182379> >> >> >> This fix changed the way the replication handler checks before updating a >> replica when the primary has an empty index. Whether it's from deleting >> the >> old index or from terminating the instance. >> >> This is the code as it was in 6.5 replication handler >> >> if (latestVersion == 0L) { >> if (forceReplication && commit.getGeneration() != 0) { >> // since we won't get the files for an empty index, >> // we just clear ours and commit >> RefCounted<IndexWriter> iw = >> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore); >> try { >> iw.get().deleteAll(); >> } finally { >> iw.decref(); >> } >> SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new >> ModifiableSolrParams()); >> solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req, >> false)); >> } >> >> >> Without forced replication the index on the replica won't perform the >> deletaAll operation and will keep the old index until a new index version >> is >> created. >> >> However in 7.5 the code was changed to this. >> >> if (latestVersion == 0L) { >> if (commit.getGeneration() != 0) { >> // since we won't get the files for an empty index, >> // we just clear ours and commit >> log.info("New index in Master. Deleting mine..."); >> RefCounted<IndexWriter> iw = >> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore); >> try { >> iw.get().deleteAll(); >> } finally { >> iw.decref(); >> } >> assert TestInjection.injectDelayBeforeSlaveCommitRefresh(); >> if (skipCommitOnMasterVersionZero) { >> openNewSearcherAndUpdateCommitPoint(); >> } else { >> SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new >> ModifiableSolrParams()); >> solrCore.getUpdateHandler().commit(new >> CommitUpdateCommand(req, >> false)); >> } >> } >> >> With the removal of the forceReplication check we believe the replica >> always >> deletes it's index when it detects that a new version 0 index is created. >> >> This is a problem as we can't afford to have active replicas to have 0 >> documents on them in the event of a failure of the primary. Since we can't >> control the termination on AWS instances this opens up a problem as any >> primary outage has a chance of jeopardizing the replicas viability. >> >> Is there a way to restore this functionality in the current or future >> releases? We are willing to upgrade to a later version including the >> latest >> if it will help resolve this problem. >> >> If you suggest we use a load balancer health check to prevent this we >> already are. However the load balancer type we are using (application) >> has a >> feature that allows access through it when all instances under it are >> failing. This bypasses our health check and still allows the replicas to >> poll from the primary even when it's not fully loaded. We can't change >> load >> balancer types as there are other features that we are taking advantage of >> and can't change currently. >> >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev