Subject says it all. I want to be able to randomly distribute a large set of 
records but keep them clustered in one wide row per node.

As an example, lets say I’ve got a collection of about 1 million records each 
with a unique id. If I just go ahead and set the primary key (and therefore the 
partition key) as the unique id, I’ll get very good random distribution across 
my server cluster. However, each record will be its own row. I’d like to have 
each record belong to one large wide row (per server node) so I can have them 
sorted or clustered on some other column.

If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 
at the time of creation and have the partition key set to this value. But this 
becomes troublesome if I add or remove nodes. What effectively I want is to 
partition on the unique id of the record modulus N (id % N; where N is the 
number of nodes).

I have to imagine there’s a mechanism in Cassandra to simply randomize the 
partitioning without even using a key (and then clustering on some column).

Thanks for any help.

Reply via email to