Re: Confusion regarding the terms "replica" and "replication factor"

David Fischer Wed, 30 May 2012 12:57:54 -0700

Thanks!

My missunderstanding was the snitch names are broken up by DC1:RAC1
and the strategy_options takes only the first part of the snitch
names?




On Wed, May 30, 2012 at 12:14 PM, Jeff Williams
<je...@wherethebitsroam.com> wrote:
> First, note that replication is done at the row level, not at the node level.
>
> That line should look more like:
>
> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
> 1,DC2: 1,DC3: 1 }
>
> This means that each row will have one copy in each DC and within each DC 
> it's placement will be according to the partitioner, so could be on any of 
> the nodes in the each DC.
>
> So, don't think of it as nodes replicating, but rather as how nodes should 
> store a copy of each row in each DC.
>
> Also, replication does not relate the the seed nodes. Seed nodes allow the 
> nodes to find each other initially, but are not special otherwise - any node 
> can be used as a seed node.
>
> So if you had a strategy like:
>
> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
> 3,DC2: 2,DC3: 1 }
>
> Each row would exist on 3 of 4 nodes in DC1, 2 of 4 nodes in DC2 and on one 
> of the nodes in DC3. Again, with the placement in each DC due to the 
> partitioner, based on the row key.
>
> Jeff
>
> On May 29, 2012, at 11:25 PM, David Fischer wrote:
>
>> Ok now i am confused :),
>>
>> ok if i have the following
>> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options =
>> {DC1:R1,DC2:R1,DC3:R1 }
>>
>> this means in each of my datacenters i will have one full replica that
>> also can be seed node?
>> if i have 3 node in addition to the DC replica's with normal token
>> calculations a key can be in any datacenter plus on each of the
>> replicas right?
>> It will show 12 nodes total in its ring
>>
>> On Thu, May 24, 2012 at 2:39 AM, aaron morton <aa...@thelastpickle.com> 
>> wrote:
>>> This is partly historical. NTS (as it is now) has not always existed and 
>>> was not always the default. In days gone by used to be a fella could run a 
>>> mighty fine key-value store using just a Simple Replication Strategy.
>>>
>>> A different way to visualise it is a single ring with a Z axis for the 
>>> DC's. When you look at the ring from the top you can see all the nodes. 
>>> When you look at it from the side you can see the nodes are on levels that 
>>> correspond to their DC. Simple Strategy looks at the ring from the top. NTS 
>>> works through the layers of the ring.
>>>
>>>> If the hierarchy is Cluster ->
>>>> DataCenter -> Node, why exactly do we need globally unique node tokens
>>>> even though nodes are at the lowest level in the hierarchy.
>>> Nodes having a DC is a feature of *some* snitches and utilised by the 
>>> *some* of the replication strategies (and by the messaging system for 
>>> network efficiency). For background, mapping from row tokens to nodes is 
>>> based on http://en.wikipedia.org/wiki/Consistent_hashing
>>>
>>> Hope that helps.
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 24/05/2012, at 1:07 AM, java jalwa wrote:
>>>
>>>> Thanks Aaron. That makes things clear.
>>>> So I guess the 0 - 2^127 range for tokens corresponds to a cluster
>>>> -level top-level ring. and then you add some logic on top of that with
>>>> NTS to logically segment that range into sub-rings as per the notion
>>>> of data clusters defined in NTS. Whats the advantage of having a
>>>> single top-level ring ? intuitively it seems like each replication
>>>> group could have a separate ring so that the same tokens can be
>>>> assigned to nodes in different DC. If the hierarchy is Cluster ->
>>>> DataCenter -> Node, why exactly do we need globally unique node tokens
>>>> even though nodes are at the lowest level in the hierarchy.
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> On Wed, May 23, 2012 at 3:14 AM, aaron morton <aa...@thelastpickle.com> 
>>>> wrote:
>>>>>> Now if a row key hash is mapped to a range owned by a node in DC3,
>>>>>> will the Node in DC3 still store the key as determined by the
>>>>>> partitioner and then walk the ring and store 2 replicas each in DC1
>>>>>> and DC2 ?
>>>>> No, only nodes in the DC's specified in the NTS configuration will be 
>>>>> replicas.
>>>>>
>>>>>> Or will the co-ordinator node be aware of the
>>>>>> replica placement strategy,
>>>>>> and override the partitioner's decision and walk the ring until it
>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining
>>>>>> replicas ?
>>>>> The NTS considers each DC to have it's own ring. This can make token 
>>>>> selection in a multi DC environment confusing at times. There is 
>>>>> something in the DS docs about it.
>>>>>
>>>>> Cheers
>>>>>
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Developer
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>>
>>>>> On 23/05/2012, at 3:16 PM, java jalwa wrote:
>>>>>
>>>>>> Hi all,
>>>>>>              I am a bit confused regarding the terms "replica" and
>>>>>> "replication factor". Assume that I am using RandomPartitioner and
>>>>>> NetworkTopologyStrategy for replica placement.
>>>>>> From what I understand, with a RandomPartitioner, a row key will
>>>>>> always be hashed and be stored on the node that owns the range to
>>>>>> which the key is mapped.
>>>>>> http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy.
>>>>>> The example here, talks about having 2 data centers and a replication
>>>>>> factor of 4 with 2 replicas in each datacenter, so the strategy is
>>>>>> configured as DC1:2 and DC2:2. Now suppose I add another datacenter
>>>>>> DC3, and do not change the NetworkTopologyStrategy.
>>>>>> Now if a row key hash is mapped to a range owned by a node in DC3,
>>>>>> will the Node in DC3 still store the key as determined by the
>>>>>> partitioner and then walk the ring and store 2 replicas each in DC1
>>>>>> and DC2 ? Will that mean that I will then have 5 replicas in the
>>>>>> cluster and not 4 ? Or will the co-ordinator node be aware of the
>>>>>> replica placement strategy,
>>>>>> and override the partitioner's decision and walk the ring until it
>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining
>>>>>> replicas ?
>>>>>>
>>>>>> Thanks.
>>>>>
>>>
>

Re: Confusion regarding the terms "replica" and "replication factor"

Reply via email to