Re: replace dead node vs remove node

Jonathan Haddad Thu, 22 Mar 2018 20:35:59 -0700

Under normal circumstances this is not true.

Take a look at org.apache.cassandra.service.StorageProxy#performWrite, it
grabs both the natural endpoints and the pending endpoints (new nodes).
They're eventually passed through
to 
org.apache.cassandra.locator.AbstractReplicationStrategy#getWriteResponseHandler,
which keeps track of both the current endpoints and the pending ones.
Later, it gets to the actual work:


performer.apply(mutation, Iterables.concat(naturalEndpoints,
pendingEndpoints), responseHandler, localDataCenter,
consistency_level);

The signature of this method is:

public interface WritePerformer
{
    public void apply(IMutation mutation,
                      Iterable<InetAddressAndPort> targets,
                      AbstractWriteResponseHandler<IMutation> responseHandler,
                      String localDataCenter,
                      ConsistencyLevel consistencyLevel) throws
OverloadedException;
}

Notice the targets?  That's the list of all current owners and pending
owners.  The list is a concatenation of the natural endpoints and the
pending ones.

Pending owners are listed in org.apache.cassandra.locator.TokenMetadata

// this is a cache of the calculation from {tokenToEndpointMap,
bootstrapTokens, leavingEndpoints}
private final ConcurrentMap<String, PendingRangeMaps> pendingRanges =
new ConcurrentHashMap<String, PendingRangeMaps>();


TL;DR: mutations are sent to nodes being bootstrapped.

Jon


On Thu, Mar 22, 2018 at 8:09 PM Peng Xiao <2535...@qq.com> wrote:

> Hi Anthony,
>
> there is a problem with replacing dead node as per the blog,if the
> replacement process takes longer than max_hint_window_in_ms,we must run
> repair to make the replaced node consistent again, since it missed ongoing
> writes during bootstrapping.but for a great cluster,repair is a painful
> process.
>
> Thanks,
> Peng Xiao
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Anthony Grasso"<anthony.gra...@gmail.com>;
> *发送时间:* 2018年3月22日(星期四) 晚上7:13
> *收件人:* "user"<user@cassandra.apache.org>;
> *主题:* Re: replace dead node vs remove node
>
> Hi Peng,
>
> Depending on the hardware failure you can do one of two things:
>
> 1. If the disks are intact and uncorrupted you could just use the disks
> with the current data on them in the new node. Even if the IP address
> changes for the new node that is fine. In that case all you need to do is
> run repair on the new node. The repair will fix any writes the node missed
> while it was down. This process is similar to the scenario in this blog
> post:
> http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html
>
> 2. If the disks are inaccessible or corrupted, then use the method as
> described in the blogpost you linked to. The operation is similar to
> bootstrapping a new node. There is no need to perform any other remove or
> join operation on the failed or new nodes. As per the blog post, you
> definitely want to run repair on the new node as soon as it joins the
> cluster. In this case here, the data on the failed node is effectively lost
> and replaced with data from other nodes in the cluster.
>
> Hope this helps.
>
> Regards,
> Anthony
>
>
> On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote:
>
>> Dear All,
>>
>> when one node failure with hardware errors,it will be in DN status in the
>> cluster.Then if we are not able to handle this error in three hours(max
>> hints window),we will loss data,right?we have to run repair to keep the
>> consistency.
>> And as per
>> https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
>> can replace this dead node,is it the same as bootstrap new node?that means
>> we don't need to remove node and rejoin?
>> Could anyone please advise?
>>
>> Thanks,
>> Peng Xiao
>>
>>
>>
>>
>>

Re: replace dead node vs remove node

Reply via email to