As far as I know it's not possible to leave replication factor undefined - if 
you do then Cassandra will default to RF=1 with SimpleStrategy.

The topology is local to each node, so unless all your nodes have the same 
topology file then it's possible for them each to have a different idea about 
the topology of the cluster.

I'm not sure what you're trying to achieve here, so I'll give an example.

Say you have two datacenters, DC1 and DC2. It's perfectly possible for nodes in 
DC1 to have a topology file that only mentions DC1 nodes and nodes in DC2 to 
have a topology file that only mentions DC2 nodes. You can then define one 
keyspace with strategy options DC1: 3 and another with DC2: 3 and this should 
work fine.

However if you had a keyspace with strategy options DC1: 3, DC2: 3 then you 
would AFAIK never be able to write to that column family because none of the 
nodes know enough about the topology; they can either address DC1, or address 
DC2, but not both.

If there were a third type of node that had topology defined for both DC1 and 
DC2 then these nodes would then be able to update the DC1+DC2 keyspace, even 
though DC1-only and DC2-only nodes would not.

So if there is a clear segregation in your data then splitting the topology may 
be OK, but if not then you will likely find that you can't update the keyspace 
unless a node has sufficient knowledge of the topology.

Depending on your use case a simpler alternative may be to just run two 
clusters instead of trying to define the shape of a single one through topology 
definitions. I think what you're talking about here is on the edge of what 
Cassandra is designed to do; it works best when all nodes are uniform and have 
the same understanding about the cluster.

Richard


From: Bill Au [mailto:bill.w...@gmail.com]
Sent: 19 April 2012 19:58
To: user@cassandra.apache.org
Subject: Re: default required in cassandra-topology.properties?

I had thought that the topology file is used for replicas placement only such 
that for the token range that the unknown node is responsible for, data is 
still read and write there.  It just won't be replicated since replication 
factor is not defined.

Bill
On Thu, Apr 19, 2012 at 1:18 PM, Richard Lowe 
<richard.l...@arkivum.com<mailto:richard.l...@arkivum.com>> wrote:
Yes it is possible. Put the following as the last line of your topology file:

default=unknown:unknown

So long as you don't have any DC or rack with this name your local node will 
not be able to address any nodes that aren't explicitly given in its topology 
file.

However bear in mind that, whilst Cassandra won't try to use replication factor 
to store to these 'unknown' nodes, their token may mean that the 'natural' home 
for a row is on a node that is not addressable. This can create holes in your 
dataset and create situations where data can 'disappear' because the bloom 
filter says the data is on a particular node (due to its token) but the 
coordinator can't contact that node to get at the data.

Careful use of replication factor and NetworkTopologyStrategy can help with 
this, but you should make sure that a node really doesn't need to contact the 
unknown nodes before marking them as such.


Richard


From: Bill Au [mailto:bill.w...@gmail.com<mailto:bill.w...@gmail.com>]
Sent: 19 April 2012 17:16
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: default required in cassandra-topology.properties?

All the examples of cassandra-topology.properties that I have seen have a 
default entry assigning unknown nodes to a specific data center and rack.  Is 
it possible to have Cassandra ignore unknown nodes for the purpose of 
replication?

Bill

Reply via email to