> No. I am not entirely sure from where the confusion comes, so I will
> just try to summarize things from scratch in a brief manner.
>
> Any piece of data you store in Cassandra is going to be in a
> particular row, which has a row key.
>
> That row will have a "replica set" in the Cassandra cluster. For RF=3,
> that replica set contains three nodes. The replicate set is the set of
> nodes that are responsible for keeping data for a row.
>
> In other words, with RF=3, thus a replica set containing 3 nodes for
> each possible row key, there will be 3 copies of the data in total.
>
> All the consistency levels always refer to nodes *in the replica set*.
> For example, CL.ALL requires that all nodes *in the replica set*
> respond. CL.QUORUM requires that a majority of all nodes *in the
> replica set* respond.
>
> From the perspective of a given node in the cluster, assuming for the
> example RF=3, it will contain data for its own token range as well as
> data for two other token ranges.
>
> To re-iterate another point: The choice of consistency level *never*
> affects *which* nodes are responsible for a given row key, nor does it
> affect which rows will eventually receive writes. It *only* affects
> how many nodes must respond before the operation (read or write) is
> considered successful.
>
> Does that make it clearer?

What you just described is how I felt I understood how cassandra
works, even if I didn't exactly convey it in my original posts.

But if this is the case, then my original concern still seems valid.

* Given a 7 node cluster, and RF=3.
* Given only 1 node is online. This node owns a specific token range,
and also may be a member of a replica set for a given row.

I understand that CL.ONE means the read operation will block until at
least one -replica- responds. If this node is not a replica, what
happens?

For write operations, I could use CL.ANY. It just requires that it is
written to 1 node, not necessarily 1 replica. But CL.ANY is not
supported for reading. So if no replicas for the data are online, the
data is not available, correct? This is why I want to maximize the
number of replicas, to maximize availability such that even if N-1
nodes are offline, there is no meaningful difference in operation.

Reply via email to