> No. I am not entirely sure from where the confusion comes, so I will > just try to summarize things from scratch in a brief manner. > > Any piece of data you store in Cassandra is going to be in a > particular row, which has a row key. > > That row will have a "replica set" in the Cassandra cluster. For RF=3, > that replica set contains three nodes. The replicate set is the set of > nodes that are responsible for keeping data for a row. > > In other words, with RF=3, thus a replica set containing 3 nodes for > each possible row key, there will be 3 copies of the data in total. > > All the consistency levels always refer to nodes *in the replica set*. > For example, CL.ALL requires that all nodes *in the replica set* > respond. CL.QUORUM requires that a majority of all nodes *in the > replica set* respond. > > From the perspective of a given node in the cluster, assuming for the > example RF=3, it will contain data for its own token range as well as > data for two other token ranges. > > To re-iterate another point: The choice of consistency level *never* > affects *which* nodes are responsible for a given row key, nor does it > affect which rows will eventually receive writes. It *only* affects > how many nodes must respond before the operation (read or write) is > considered successful. > > Does that make it clearer?
What you just described is how I felt I understood how cassandra works, even if I didn't exactly convey it in my original posts. But if this is the case, then my original concern still seems valid. * Given a 7 node cluster, and RF=3. * Given only 1 node is online. This node owns a specific token range, and also may be a member of a replica set for a given row. I understand that CL.ONE means the read operation will block until at least one -replica- responds. If this node is not a replica, what happens? For write operations, I could use CL.ANY. It just requires that it is written to 1 node, not necessarily 1 replica. But CL.ANY is not supported for reading. So if no replicas for the data are online, the data is not available, correct? This is why I want to maximize the number of replicas, to maximize availability such that even if N-1 nodes are offline, there is no meaningful difference in operation.