Re: Replication issue with version 0 index in SOLR 7.5

Mikhail Khludnev Tue, 25 Jun 2019 07:45:09 -0700

Note, it seems like the current Solr's logic relies on persistent master
disks.
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java#L615



On Tue, Jun 25, 2019 at 3:16 PM Mikhail Khludnev <m...@apache.org> wrote:

> Hello, Patrick.
> Can <str name="replicateAfter">commit</str> help you?
>
> On Tue, Jun 25, 2019 at 12:55 AM Patrick Bordelon <
> patrick.borde...@coxautoinc.com> wrote:
>
>> Hi,
>>
>> We recently upgraded to SOLR 7.5 in AWS, we had previously been running
>> SOLR
>> 6.5. In our current configuration we have our applications broken into a
>> single instance primary environment and a multi-instance replica
>> environment
>> separated behind a load balancer for each environment.
>>
>> Until recently we've been able to reload the primary without the replicas
>> updating until there was a full index. However when we upgraded to 7.5 we
>> started noticing that after terminating and rebuilding a primary instance
>> that the associated replicas would all start showing 0 documents in all
>> indexes. After some research we believe we've tracked down the issue.
>> SOLR-11293.
>>
>> SOLR-11293 changes
>> <
>> https://issues.apache.org/jira/browse/SOLR-11293?focusedCommentId=16182379&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16182379>
>>
>>
>> This fix changed the way the replication handler checks before updating a
>> replica when the primary has an empty index. Whether it's from deleting
>> the
>> old index or from terminating the instance.
>>
>> This is the code as it was in 6.5 replication handler
>>
>>       if (latestVersion == 0L) {
>>         if (forceReplication && commit.getGeneration() != 0) {
>>           // since we won't get the files for an empty index,
>>           // we just clear ours and commit
>>           RefCounted<IndexWriter> iw =
>> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>>           try {
>>             iw.get().deleteAll();
>>           } finally {
>>             iw.decref();
>>           }
>>           SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
>> ModifiableSolrParams());
>>           solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
>> false));
>>         }
>>
>>
>> Without forced replication the index on the replica won't perform the
>> deletaAll operation and will keep the old index until a new index version
>> is
>> created.
>>
>> However in 7.5 the code was changed to this.
>>
>>       if (latestVersion == 0L) {
>>         if (commit.getGeneration() != 0) {
>>           // since we won't get the files for an empty index,
>>           // we just clear ours and commit
>>           log.info("New index in Master. Deleting mine...");
>>           RefCounted<IndexWriter> iw =
>> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>>           try {
>>             iw.get().deleteAll();
>>           } finally {
>>             iw.decref();
>>           }
>>           assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
>>           if (skipCommitOnMasterVersionZero) {
>>             openNewSearcherAndUpdateCommitPoint();
>>           } else {
>>             SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
>> ModifiableSolrParams());
>>             solrCore.getUpdateHandler().commit(new
>> CommitUpdateCommand(req,
>> false));
>>           }
>>         }
>>
>> With the removal of the forceReplication check we believe the replica
>> always
>> deletes it's index when it detects that a new version 0 index is created.
>>
>> This is a problem as we can't afford to have active replicas to have 0
>> documents on them in the event of a failure of the primary. Since we can't
>> control the termination on AWS instances this opens up a problem as any
>> primary outage has a chance of jeopardizing the replicas viability.
>>
>> Is there a way to restore this functionality in the current or future
>> releases? We are willing to upgrade to a later version including the
>> latest
>> if it will help resolve this problem.
>>
>> If you suggest we use a load balancer health check to prevent this we
>> already are. However the load balancer type we are using (application)
>> has a
>> feature that allows access through it when all instances under it are
>> failing. This bypasses our health check and still allows the replicas to
>> poll from the primary even when it's not fully loaded. We can't change
>> load
>> balancer types as there are other features that we are taking advantage of
>> and can't change currently.
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Replication issue with version 0 index in SOLR 7.5

Reply via email to