Re: Strategy for dividing wide rows beyond just adding to the partition key

Jack Krupansky Thu, 10 Mar 2016 07:43:06 -0800

There is an effort underway to support wider rows:
https://issues.apache.org/jira/browse/CASSANDRA-9754


This won't help you now though. Even with that improvement you still may
need a more optimal data model since large-scale scanning/filtering is
always a very bad idea with Cassandra.

The data modeling methodology for Cassandra dictates that queries drive the
data model and that each form of query requires a separate table ("query
table.") Materialized view can automate that process for a lot of cases,
but in any case it does sound as if some of your queries do require
additional tables.

As a general proposition, Cassandra should not be used for heavy filtering
- query tables with the filtering criteria baked into the PK is the way to
go.


-- Jack Krupansky

On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.ka...@ymail.com> wrote:

> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Reply via email to