Re: Confusion regarding the terms "replica" and "replication factor"

Edward Capriolo Wed, 30 May 2012 13:32:32 -0700

You can avoid the confusion by using the term natural endpoints. For
example, with a replication factor of 3 natural endpoints for key x
are node1, node2, node11.


The snitch does use the datacenter and the rack but almost all
deployments use a single rack per DC, because when you have more then
one rack in a data center the NTS snitch has some logic to spread the
data between racks. (most people do not want this behavior)


On Wed, May 30, 2012 at 3:57 PM, David Fischer <fischer....@gmail.com> wrote:
> Thanks!
>
> My missunderstanding was the snitch names are broken up by DC1:RAC1
> and the strategy_options takes only the first part of the snitch
> names?
>
>
>
> On Wed, May 30, 2012 at 12:14 PM, Jeff Williams
> <je...@wherethebitsroam.com> wrote:
>> First, note that replication is done at the row level, not at the node level.
>>
>> That line should look more like:
>>
>> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
>> 1,DC2: 1,DC3: 1 }
>>
>> This means that each row will have one copy in each DC and within each DC 
>> it's placement will be according to the partitioner, so could be on any of 
>> the nodes in the each DC.
>>
>> So, don't think of it as nodes replicating, but rather as how nodes should 
>> store a copy of each row in each DC.
>>
>> Also, replication does not relate the the seed nodes. Seed nodes allow the 
>> nodes to find each other initially, but are not special otherwise - any node 
>> can be used as a seed node.
>>
>> So if you had a strategy like:
>>
>> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options = {DC1: 
>> 3,DC2: 2,DC3: 1 }
>>
>> Each row would exist on 3 of 4 nodes in DC1, 2 of 4 nodes in DC2 and on one 
>> of the nodes in DC3. Again, with the placement in each DC due to the 
>> partitioner, based on the row key.
>>
>> Jeff
>>
>> On May 29, 2012, at 11:25 PM, David Fischer wrote:
>>
>>> Ok now i am confused :),
>>>
>>> ok if i have the following
>>> placement_strategy = 'NetworkTopologyStrategy'  and strategy_options =
>>> {DC1:R1,DC2:R1,DC3:R1 }
>>>
>>> this means in each of my datacenters i will have one full replica that
>>> also can be seed node?
>>> if i have 3 node in addition to the DC replica's with normal token
>>> calculations a key can be in any datacenter plus on each of the
>>> replicas right?
>>> It will show 12 nodes total in its ring
>>>
>>> On Thu, May 24, 2012 at 2:39 AM, aaron morton <aa...@thelastpickle.com> 
>>> wrote:
>>>> This is partly historical. NTS (as it is now) has not always existed and 
>>>> was not always the default. In days gone by used to be a fella could run a 
>>>> mighty fine key-value store using just a Simple Replication Strategy.
>>>>
>>>> A different way to visualise it is a single ring with a Z axis for the 
>>>> DC's. When you look at the ring from the top you can see all the nodes. 
>>>> When you look at it from the side you can see the nodes are on levels that 
>>>> correspond to their DC. Simple Strategy looks at the ring from the top. 
>>>> NTS works through the layers of the ring.
>>>>
>>>>> If the hierarchy is Cluster ->
>>>>> DataCenter -> Node, why exactly do we need globally unique node tokens
>>>>> even though nodes are at the lowest level in the hierarchy.
>>>> Nodes having a DC is a feature of *some* snitches and utilised by the 
>>>> *some* of the replication strategies (and by the messaging system for 
>>>> network efficiency). For background, mapping from row tokens to nodes is 
>>>> based on http://en.wikipedia.org/wiki/Consistent_hashing
>>>>
>>>> Hope that helps.
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 24/05/2012, at 1:07 AM, java jalwa wrote:
>>>>
>>>>> Thanks Aaron. That makes things clear.
>>>>> So I guess the 0 - 2^127 range for tokens corresponds to a cluster
>>>>> -level top-level ring. and then you add some logic on top of that with
>>>>> NTS to logically segment that range into sub-rings as per the notion
>>>>> of data clusters defined in NTS. Whats the advantage of having a
>>>>> single top-level ring ? intuitively it seems like each replication
>>>>> group could have a separate ring so that the same tokens can be
>>>>> assigned to nodes in different DC. If the hierarchy is Cluster ->
>>>>> DataCenter -> Node, why exactly do we need globally unique node tokens
>>>>> even though nodes are at the lowest level in the hierarchy.
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>> On Wed, May 23, 2012 at 3:14 AM, aaron morton <aa...@thelastpickle.com> 
>>>>> wrote:
>>>>>>> Now if a row key hash is mapped to a range owned by a node in DC3,
>>>>>>> will the Node in DC3 still store the key as determined by the
>>>>>>> partitioner and then walk the ring and store 2 replicas each in DC1
>>>>>>> and DC2 ?
>>>>>> No, only nodes in the DC's specified in the NTS configuration will be 
>>>>>> replicas.
>>>>>>
>>>>>>> Or will the co-ordinator node be aware of the
>>>>>>> replica placement strategy,
>>>>>>> and override the partitioner's decision and walk the ring until it
>>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining
>>>>>>> replicas ?
>>>>>> The NTS considers each DC to have it's own ring. This can make token 
>>>>>> selection in a multi DC environment confusing at times. There is 
>>>>>> something in the DS docs about it.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> -----------------
>>>>>> Aaron Morton
>>>>>> Freelance Developer
>>>>>> @aaronmorton
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>> On 23/05/2012, at 3:16 PM, java jalwa wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>              I am a bit confused regarding the terms "replica" and
>>>>>>> "replication factor". Assume that I am using RandomPartitioner and
>>>>>>> NetworkTopologyStrategy for replica placement.
>>>>>>> From what I understand, with a RandomPartitioner, a row key will
>>>>>>> always be hashed and be stored on the node that owns the range to
>>>>>>> which the key is mapped.
>>>>>>> http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy.
>>>>>>> The example here, talks about having 2 data centers and a replication
>>>>>>> factor of 4 with 2 replicas in each datacenter, so the strategy is
>>>>>>> configured as DC1:2 and DC2:2. Now suppose I add another datacenter
>>>>>>> DC3, and do not change the NetworkTopologyStrategy.
>>>>>>> Now if a row key hash is mapped to a range owned by a node in DC3,
>>>>>>> will the Node in DC3 still store the key as determined by the
>>>>>>> partitioner and then walk the ring and store 2 replicas each in DC1
>>>>>>> and DC2 ? Will that mean that I will then have 5 replicas in the
>>>>>>> cluster and not 4 ? Or will the co-ordinator node be aware of the
>>>>>>> replica placement strategy,
>>>>>>> and override the partitioner's decision and walk the ring until it
>>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining
>>>>>>> replicas ?
>>>>>>>
>>>>>>> Thanks.
>>>>>>
>>>>
>>

Re: Confusion regarding the terms "replica" and "replication factor"

Reply via email to