Re: Recover lost node from backup or evict/re-add?

Jeff Jirsa Thu, 13 Jun 2019 06:16:45 -0700


> On Jun 13, 2019, at 2:52 AM, Oleksandr Shulgin <oleksandr.shul...@zalando.de> 
> wrote:
> 
>> On Wed, Jun 12, 2019 at 4:02 PM Jeff Jirsa <jji...@gmail.com> wrote:
> 
>> To avoid violating consistency guarantees, you have to repair the replicas 
>> while the lost node is down
> 
> How do you suggest to trigger it?  Potentially replicas of the primary range 
> for the down node are all over the local DC, so I would go with triggering a 
> full cluster repair with Cassandra Reaper.  But isn't it going to fail 
> because of the down node?


Im not sure there’s an easy and obvious path here - this is something TLP may 
want to enhance reaper to help with. 

You have to specify the ranges with -st/-et, and you have to tell it to ignore 
the down host with -hosts. With vnodes you’re right that this may be lots and 
lots of ranges all over the ring.

There’s a patch proposed (maybe committed in 4.0) that makes this a nonissue by 
allowing bootstrap to stream one repaired set and all of the unrepaired replica 
data (which is probably very small if you’re running IR regularly), which 
accomplished the same thing.

> 
> It is also documented (I believe) that one should repair the node after it 
> finishes the "replace address" procedure.  So should one repair before and 
> after?

You do not need to repair after the bootstrap if you repair before. If the docs 
say that, they’re wrong. The joining host gets writes during bootstrap and 
consistency levels are altered during bootstrap to account for the joining host.

Re: Recover lost node from backup or evict/re-add?

Reply via email to