[ 
https://issues.apache.org/jira/browse/KUDU-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin resolved KUDU-2800.
---------------------------------
    Fix Version/s: 1.11.0
                   1.11.1
       Resolution: Invalid

As it turned out, the system treats a replica that's being bootstrapped in a 
consistent way already.  In particular:

* If a tablet replica bootstraps long time and no updates happen during that 
time, the replica is not evicted from the tablet Raft configuration and joins 
the quorum after the bootstrapping process is finished (even if replica was 
booting up longer than specified by the 
{\-\-follower_unavailable_considered_failed_sec} flag.
* If a tablet replica bootstraps long time and a lot of updates happen during 
that time, the replica is replaced if it falls behind the WAL GC threshold.  
The replica eventually finishes the boostrap process and finds out that is has 
been evicted from the tablet Raft configuration.

New tests have been added in [this 
commit|https://github.com/apache/kudu/commit/90ec2715171bbf3fca294d5b1d230f951c71e75c]
 to make sure current behavior is consistent with various scenarios of 
bootstrapping a tablet replica.

> Avoid 'unintended' re-replication of long-bootstrapping tablet replicas
> -----------------------------------------------------------------------
>
>                 Key: KUDU-2800
>                 URL: https://issues.apache.org/jira/browse/KUDU-2800
>             Project: Kudu
>          Issue Type: Improvement
>          Components: consensus, tserver
>    Affects Versions: 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.9.1, 1.10.0
>            Reporter: Alexey Serbin
>            Assignee: Vladimir Verjovkin
>            Priority: Major
>              Labels: newbie
>             Fix For: 1.11.1, 1.11.0
>
>
> As implemented in
> https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576
>  , the logic for tracking 'health' of tablet replicas cannot differentiate 
> between bootstrapping and failed replicas.
> As a result, if a tablet replica is bootstrapping for times longer than the 
> interval specified by {{--follower_unavailable_considered_failed_sec}} 
> run-time flag, the system can start the process of re-replication of the 
> tablet replica elsewhere.
> One option might be sending a specific error with {{ConsensusResponsePB}} in 
> response to a Raft message sent by a leader replica, maybe adding extra 
> information on the current progress of the replica bootstrap process.  As 
> soon as such bootstrapping follower replica isn't failing behind leader's WAL 
> GC threshold, the leader replica will not evict it.  But if the bootstrapping 
> follower replica falls behind the WAL GC threshold, leader replica will evict 
> it and the system will start re-replicating it elsewhere.  In cases when the 
> amount of Raft transactions for a tablet is low, this approach would allow 
> for longer bootstrapping times of tablet replicas.  That might be especially 
> beneficial in cases when a tablet server with IO-heavy tablet replicas is 
> being restarted, and there aren't many incoming updates/inserts for tablets 
> hosted by the tablet server.
> However, the approach above requires the Raft consensus object for a 
> bootstrapping replica to be at least partially functional, so it entails 
> reading at least some information about a replica from the on-disk consensus 
> metadata prior to proper bootstrapping of a tablet replica by a tablet server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to