There is an effort underway to support wider rows: https://issues.apache.org/jira/browse/CASSANDRA-9754
This won't help you now though. Even with that improvement you still may need a more optimal data model since large-scale scanning/filtering is always a very bad idea with Cassandra. The data modeling methodology for Cassandra dictates that queries drive the data model and that each form of query requires a separate table ("query table.") Materialized view can automate that process for a lot of cases, but in any case it does sound as if some of your queries do require additional tables. As a general proposition, Cassandra should not be used for heavy filtering - query tables with the filtering criteria baked into the PK is the way to go. -- Jack Krupansky On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com> wrote: > Hi, > > We have sensor input that creates very wide rows and operations on these > rows have started to timeout regulary. We have been trying to find a > solution to dividing wide rows but keep hitting limitations that move the > problem around instead of solving it. > > We have a partition key consisting of a sensorUnitId and a sensorId and > use a time field to access each column in the row. We tried adding a time > based entry, timeShardId, to the partition key that consists of the year > and week of year during which the reading was taken. This works for a > number of queries but for scanning all the readings against a particular > sensorUnitId and sensorId combination, we seem to be stuck. > > We won't know the range of valid values of the timeShardId for a given > sensorUnitId and sensorId combination so would have to write to an > additional table to track the valid timeShardId. We suspect this would > create tombstone accumulation problems given the number of updates required > to the same row so haven't tried this option. > > Alternatively, we hit a different bottleneck in the form of SELECT > DISTINCT in trying to directly access the partition keys. Since SELECT > DISTINCT does not allow for a where clause to filter on the partition key > values, we have to filter several hundred thousand partition keys just to > find those related to the relevant sensorUnitId and sensorId. This problem > will only grow worse for us. > > Are there any other approaches that can be suggested? We have been looking > around, but haven't found any references beyond the initial suggestion to > add some sort of shard id to the partition key to handle wide rows. > > Thanks, > > Jason >