Hi,
There is a 12 node cluster , still stuck on 1.0.8.
All nodes in the cluster ring are balanced.
Using random partitioner.
All CFs use compression.
Data size on nodes varies from 40G to 75G.
This variance is not due to the bigger nodes having more uncompacted
sstables than others.
Most biggest CFs have exact same row keys, just store different data, so
data for same same key should end up on same node for these CFs.
The keys estimate for each of these biggest CF on the nodes with larger
data size is almost twice larger than key estimate on the nodes with
smallest data size, thus proportional to the data size on the node.
These CFs have about 50-100 millions for rows per node.
I can't understand how statistically it's possible that with random
partitioner some nodes have x2 more keys than others with 50-100
millions of keys per node.
Any ideas how it's possible?
Anything else I can check?
tnx
Alex