Hi,

We recently upgraded to SOLR 7.5 in AWS, we had previously been running SOLR
6.5. In our current configuration we have our applications broken into a
single instance primary environment and a multi-instance replica environment
separated behind a load balancer for each environment. 

Until recently we've been able to reload the primary without the replicas
updating until there was a full index. However when we upgraded to 7.5 we
started noticing that after terminating and rebuilding a primary instance
that the associated replicas would all start showing 0 documents in all
indexes. After some research we believe we've tracked down the issue.
SOLR-11293.

SOLR-11293 changes
<https://issues.apache.org/jira/browse/SOLR-11293?focusedCommentId=16182379&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16182379>
  

This fix changed the way the replication handler checks before updating a
replica when the primary has an empty index. Whether it's from deleting the
old index or from terminating the instance. 

This is the code as it was in 6.5 replication handler

      if (latestVersion == 0L) {
        if (forceReplication && commit.getGeneration() != 0) {
          // since we won't get the files for an empty index,
          // we just clear ours and commit
          RefCounted<IndexWriter> iw =
solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
          try {
            iw.get().deleteAll();
          } finally {
            iw.decref();
          }
          SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
ModifiableSolrParams());
          solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
false));
        }
                
                
Without forced replication the index on the replica won't perform the
deletaAll operation and will keep the old index until a new index version is
created.

However in 7.5 the code was changed to this.            

      if (latestVersion == 0L) {
        if (commit.getGeneration() != 0) {
          // since we won't get the files for an empty index,
          // we just clear ours and commit
          log.info("New index in Master. Deleting mine...");
          RefCounted<IndexWriter> iw =
solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
          try {
            iw.get().deleteAll();
          } finally {
            iw.decref();
          }
          assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
          if (skipCommitOnMasterVersionZero) {
            openNewSearcherAndUpdateCommitPoint();
          } else {
            SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
ModifiableSolrParams());
            solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
false));
          }
        }
                
With the removal of the forceReplication check we believe the replica always
deletes it's index when it detects that a new version 0 index is created. 

This is a problem as we can't afford to have active replicas to have 0
documents on them in the event of a failure of the primary. Since we can't
control the termination on AWS instances this opens up a problem as any
primary outage has a chance of jeopardizing the replicas viability. 

Is there a way to restore this functionality in the current or future
releases? We are willing to upgrade to a later version including the latest
if it will help resolve this problem.

If you suggest we use a load balancer health check to prevent this we
already are. However the load balancer type we are using (application) has a
feature that allows access through it when all instances under it are
failing. This bypasses our health check and still allows the replicas to
poll from the primary even when it's not fully loaded. We can't change load
balancer types as there are other features that we are taking advantage of
and can't change currently.
                



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to