Subrange repair of only the neighbors is sufficient

Break the range covering the dead node into ~100 splits and repair those splits 
individually in sequence. You don’t have to repair the whole range all at once



-- 
Jeff Jirsa


> On Mar 22, 2018, at 8:08 PM, Peng Xiao <2535...@qq.com> wrote:
> 
> Hi Anthony,
> 
> there is a problem with replacing dead node as per the blog,if the 
> replacement process takes longer than max_hint_window_in_ms,we must run 
> repair to make the replaced node consistent again, since it missed ongoing 
> writes during bootstrapping.but for a great cluster,repair is a painful 
> process.
>  
> Thanks,
> Peng Xiao
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "Anthony Grasso"<anthony.gra...@gmail.com>;
> 发送时间: 2018年3月22日(星期四) 晚上7:13
> 收件人: "user"<user@cassandra.apache.org>;
> 主题: Re: replace dead node vs remove node
> 
> Hi Peng,
> 
> Depending on the hardware failure you can do one of two things:
> 
> 1. If the disks are intact and uncorrupted you could just use the disks with 
> the current data on them in the new node. Even if the IP address changes for 
> the new node that is fine. In that case all you need to do is run repair on 
> the new node. The repair will fix any writes the node missed while it was 
> down. This process is similar to the scenario in this blog post: 
> http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html
> 
> 2. If the disks are inaccessible or corrupted, then use the method as 
> described in the blogpost you linked to. The operation is similar to 
> bootstrapping a new node. There is no need to perform any other remove or 
> join operation on the failed or new nodes. As per the blog post, you 
> definitely want to run repair on the new node as soon as it joins the 
> cluster. In this case here, the data on the failed node is effectively lost 
> and replaced with data from other nodes in the cluster.
> 
> Hope this helps.
> 
> Regards,
> Anthony
> 
> 
>> On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote:
>> Dear All,
>> 
>> when one node failure with hardware errors,it will be in DN status in the 
>> cluster.Then if we are not able to handle this error in three hours(max 
>> hints window),we will loss data,right?we have to run repair to keep the 
>> consistency.
>> And as per 
>> https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
>>  can replace this dead node,is it the same as bootstrap new node?that means 
>> we don't need to remove node and rejoin?
>> Could anyone please advise?
>> 
>> Thanks,
>> Peng Xiao
>> 
>>  
>> 
>> 

Reply via email to