You can avoid the confusion by using the term natural endpoints. For example, with a replication factor of 3 natural endpoints for key x are node1, node2, node11.
The snitch does use the datacenter and the rack but almost all deployments use a single rack per DC, because when you have more then one rack in a data center the NTS snitch has some logic to spread the data between racks. (most people do not want this behavior) On Wed, May 30, 2012 at 3:57 PM, David Fischer <fischer....@gmail.com> wrote: > Thanks! > > My missunderstanding was the snitch names are broken up by DC1:RAC1 > and the strategy_options takes only the first part of the snitch > names? > > > > On Wed, May 30, 2012 at 12:14 PM, Jeff Williams > <je...@wherethebitsroam.com> wrote: >> First, note that replication is done at the row level, not at the node level. >> >> That line should look more like: >> >> placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {DC1: >> 1,DC2: 1,DC3: 1 } >> >> This means that each row will have one copy in each DC and within each DC >> it's placement will be according to the partitioner, so could be on any of >> the nodes in the each DC. >> >> So, don't think of it as nodes replicating, but rather as how nodes should >> store a copy of each row in each DC. >> >> Also, replication does not relate the the seed nodes. Seed nodes allow the >> nodes to find each other initially, but are not special otherwise - any node >> can be used as a seed node. >> >> So if you had a strategy like: >> >> placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {DC1: >> 3,DC2: 2,DC3: 1 } >> >> Each row would exist on 3 of 4 nodes in DC1, 2 of 4 nodes in DC2 and on one >> of the nodes in DC3. Again, with the placement in each DC due to the >> partitioner, based on the row key. >> >> Jeff >> >> On May 29, 2012, at 11:25 PM, David Fischer wrote: >> >>> Ok now i am confused :), >>> >>> ok if i have the following >>> placement_strategy = 'NetworkTopologyStrategy' and strategy_options = >>> {DC1:R1,DC2:R1,DC3:R1 } >>> >>> this means in each of my datacenters i will have one full replica that >>> also can be seed node? >>> if i have 3 node in addition to the DC replica's with normal token >>> calculations a key can be in any datacenter plus on each of the >>> replicas right? >>> It will show 12 nodes total in its ring >>> >>> On Thu, May 24, 2012 at 2:39 AM, aaron morton <aa...@thelastpickle.com> >>> wrote: >>>> This is partly historical. NTS (as it is now) has not always existed and >>>> was not always the default. In days gone by used to be a fella could run a >>>> mighty fine key-value store using just a Simple Replication Strategy. >>>> >>>> A different way to visualise it is a single ring with a Z axis for the >>>> DC's. When you look at the ring from the top you can see all the nodes. >>>> When you look at it from the side you can see the nodes are on levels that >>>> correspond to their DC. Simple Strategy looks at the ring from the top. >>>> NTS works through the layers of the ring. >>>> >>>>> If the hierarchy is Cluster -> >>>>> DataCenter -> Node, why exactly do we need globally unique node tokens >>>>> even though nodes are at the lowest level in the hierarchy. >>>> Nodes having a DC is a feature of *some* snitches and utilised by the >>>> *some* of the replication strategies (and by the messaging system for >>>> network efficiency). For background, mapping from row tokens to nodes is >>>> based on http://en.wikipedia.org/wiki/Consistent_hashing >>>> >>>> Hope that helps. >>>> ----------------- >>>> Aaron Morton >>>> Freelance Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 24/05/2012, at 1:07 AM, java jalwa wrote: >>>> >>>>> Thanks Aaron. That makes things clear. >>>>> So I guess the 0 - 2^127 range for tokens corresponds to a cluster >>>>> -level top-level ring. and then you add some logic on top of that with >>>>> NTS to logically segment that range into sub-rings as per the notion >>>>> of data clusters defined in NTS. Whats the advantage of having a >>>>> single top-level ring ? intuitively it seems like each replication >>>>> group could have a separate ring so that the same tokens can be >>>>> assigned to nodes in different DC. If the hierarchy is Cluster -> >>>>> DataCenter -> Node, why exactly do we need globally unique node tokens >>>>> even though nodes are at the lowest level in the hierarchy. >>>>> >>>>> Thanks again. >>>>> >>>>> >>>>> On Wed, May 23, 2012 at 3:14 AM, aaron morton <aa...@thelastpickle.com> >>>>> wrote: >>>>>>> Now if a row key hash is mapped to a range owned by a node in DC3, >>>>>>> will the Node in DC3 still store the key as determined by the >>>>>>> partitioner and then walk the ring and store 2 replicas each in DC1 >>>>>>> and DC2 ? >>>>>> No, only nodes in the DC's specified in the NTS configuration will be >>>>>> replicas. >>>>>> >>>>>>> Or will the co-ordinator node be aware of the >>>>>>> replica placement strategy, >>>>>>> and override the partitioner's decision and walk the ring until it >>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining >>>>>>> replicas ? >>>>>> The NTS considers each DC to have it's own ring. This can make token >>>>>> selection in a multi DC environment confusing at times. There is >>>>>> something in the DS docs about it. >>>>>> >>>>>> Cheers >>>>>> >>>>>> ----------------- >>>>>> Aaron Morton >>>>>> Freelance Developer >>>>>> @aaronmorton >>>>>> http://www.thelastpickle.com >>>>>> >>>>>> On 23/05/2012, at 3:16 PM, java jalwa wrote: >>>>>> >>>>>>> Hi all, >>>>>>> I am a bit confused regarding the terms "replica" and >>>>>>> "replication factor". Assume that I am using RandomPartitioner and >>>>>>> NetworkTopologyStrategy for replica placement. >>>>>>> From what I understand, with a RandomPartitioner, a row key will >>>>>>> always be hashed and be stored on the node that owns the range to >>>>>>> which the key is mapped. >>>>>>> http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy. >>>>>>> The example here, talks about having 2 data centers and a replication >>>>>>> factor of 4 with 2 replicas in each datacenter, so the strategy is >>>>>>> configured as DC1:2 and DC2:2. Now suppose I add another datacenter >>>>>>> DC3, and do not change the NetworkTopologyStrategy. >>>>>>> Now if a row key hash is mapped to a range owned by a node in DC3, >>>>>>> will the Node in DC3 still store the key as determined by the >>>>>>> partitioner and then walk the ring and store 2 replicas each in DC1 >>>>>>> and DC2 ? Will that mean that I will then have 5 replicas in the >>>>>>> cluster and not 4 ? Or will the co-ordinator node be aware of the >>>>>>> replica placement strategy, >>>>>>> and override the partitioner's decision and walk the ring until it >>>>>>> first encounters a node in DC1 or DC2 ? and then place the remaining >>>>>>> replicas ? >>>>>>> >>>>>>> Thanks. >>>>>> >>>> >>