Hi, Clustering columns are used to order the data in a partition. However, since data is split into SSTables, the rows are ordered by clustering key only within each SSTable. Cassandra still needs to check all SSTables, and merge the data if it is found in several SSTables. The only scanario where I can imagine big performance gain is super wide paritions, where each partition is within a single SSTable (time series data, where partition keys are time-buckets)
Has anybody done benchmarks on that and can share the data mode they have used? Cheers, Eugene