reconsider rack awareness
-------------------------

                 Key: CASSANDRA-3810
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
             Project: Cassandra
          Issue Type: Task
            Reporter: Peter Schuller
            Assignee: Peter Schuller
            Priority: Minor


We believed we wanted to be rack aware because we want to ensure that loosing a 
rack only affects a single replica of any given row key.

When using rack awareness, the first problem you encounter immediately if you 
aren't careful is that you induce hotspots as a result of rack aware replica 
selection. Using the format {{rackname-nodename}}, consider a part of the ring 
that looks like this:

{code}
...
r1-n1
r1-n2
r1-n3
r2-n1
r3-n1
r4-n1
...
{code}

Due to the rack awareness, {{r2-n1}} will be the second replica for all data 
whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they would 
all be forced to skip over any identical racks.

The way we end up allocating nodes in a cluster is to satisfy this criteria:

* Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must 
not have another node in {{r}} within {{rf-1}} steps in the ring in either 
direction.

Any violation of this criteria implies the induction of hotspots due to rack 
awareness.

The realization however, that I had a few days ago, is that *the rackawareness 
is not actually changing replica placement* when using this ring topology. In 
other words, *the way you have to use* rack awareness is to construct the ring 
such that *the rack awareness is a NOOP*.

So, questions:

* Is there any non-hotspot inducing use-case where rack awareness can be used 
("used" in the sense that it actually changes the placement relative to 
non-awareness) effectively without satisfying the criteria above?
* Is it misleading and counter-productive to teach people (via documentation 
for example) to rely on rack awareness in their rings instead of just giving 
them the rule above for ring topology?
* Would it be a better service to the user to provide an easy way to *ensure* 
that the ring topology adheres to this criteria (such as refusing to bootstrap 
a new node if rack awareness is requested, and taking it into consideration on 
automatic token selection (does anyone use that?)), than to "silently" generate 
hotspots by altering the replication strategy? (The "silence" problem is 
magnified by the fact that {{nodetool ring}} doesn't reflect this; so the user 
must take into account both the RF *and* the racks when interpreting {{nodetool 
ring}} output.)

FWIW, internally we just go with the criteria outlined above, and we have a 
separate tool which will print the *actual* ownership percentage of a node in 
the ring (based on the thrift {{describe_ring}} call). Any ring that has node 
selections that causes a violation of the criteria is effectively a 
bug/mis-configured ring, so only in the event of mistakes are we "using" the 
rack awareness (using the definition of "use" above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to