Can anyone help me with join_ring and address my concerns?

Thanks
Anuj 
 
  On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra<anujw_2...@yahoo.co.in> wrote: 
   Hi,
I need to understand the use case of join_ring=false in case of node outages. 
As per https://issues.apache.org/jira/browse/CASSANDRA-6961, you would want 
join_ring=false when you have to repair a node before bringing a node back 
after some considerable outage. The problem I see with join_ring=false is that 
unlike autobootstrap, the node will NOT accept writes while you are running 
repair on it. If a node was down for 5 hours and you bring it back with 
join_ring=false, repair the node for 7 hours and then make it join the ring, it 
will STILL have missed writes because while the time repair was running (7 
hrs), writes only went to other others. So, if you want to make sure that reads 
served by the restored node at CL ONE will return consistent data after the 
node has joined, you wont get that as writes have been missed while the node is 
being repaired. And if you work with Read/Write CL=QUORUM, even if you bring 
back the node without join_ring=false, you would anyways get the desired 
consistency. So, how join_ring would provide any additional consistency in this 
case ??
I can see join_ring=false useful only when I am restoring from Snapshot or 
bootstrapping and there are dropped mutations in my cluster which are not fixed 
by hinted handoff.
For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted Handoff=3 
hrs.10 AM Snapshot taken on all 3 nodes11 AM: Node B goes down for 4 hours3 PM: 
Node B comes up but data is not repaired. So, 1 hr of dropped mutations (2-3 
PM) not fixed via Hinted Handoff.5 PM: Node A crashes.6 PM: Node A restored 
from 10 AM Snapshot, Node A started with join_ring=false, repaired and then 
joined the cluster.
In above restore snapshot example, updates from 2-3 PM were outside hinted 
handoff window of 3 hours. Thus, node B wont get those updates. Node A data for 
2-3 PM is already lost. So, 2-3 PM updates are only on one replica i.e. node C 
and minimum consistency needed is QUORUM so join_ring=false would help. But 
this is very specific use case.  
ThanksAnuj
  

Reply via email to