What is performance gain of clustering columns

eugene miretsky Tue, 03 Oct 2017 08:31:05 -0700

Hi,

Clustering columns are used to order the data in a partition. However,
since data is split into SSTables, the rows are ordered by clustering key
only within each SSTable. Cassandra still needs to check all SSTables, and
merge the data if it is found in several SSTables. The only scanario where
I can imagine big performance gain is  super wide paritions, where each
partition is within a single SSTable (time series data, where partition
keys are time-buckets)


Has anybody done benchmarks on that and can share the data mode they have
used?

Cheers,
Eugene

What is performance gain of clustering columns

Reply via email to