Big partitions are an anti-pattern here is why:

First Cassandra is not an analytic datastore. Sure it has some UDFs and
aggregate UDFs, but the true purpose of the data store is to satisfy point
reads. Operations have strict timeouts:

# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 5000

# How long the coordinator should wait for seq or index scans to complete
range_request_timeout_in_ms: 10000

This means you need to be able to satisfy the operation in 5 seconds. Which
is not only the "think time" for 1 server, but if you are doing a quorum
the operation has to complete and compare on 2 or more servers. Beyond
these cutoffs are thread pools which fill up and start dropping requests
once full.

Something has to give, either functionality or physics. Particularly the
physics of aggregating an ever-growing data set across N replicas in less
than 5 seconds.  How many 2ms point reads will be blocked by 50 ms queries

I do not see the technical limitations of big partitions on disk is the
only hurdle to climb here.

On Fri, Oct 28, 2016 at 10:39 AM, Alexander Dejanovski <> wrote:

> Hi Eric,
> that would be by
> Michael Kjellman and by
> Robert Stupp.
> If you haven't seen it yet, Robert's summit talk on big partitions is
> totally worth it :
> Video :
> Slides :
> partitions-robert-stupp-datastax-cassandra-summit-2016
> Cheers,
> On Fri, Oct 28, 2016 at 4:09 PM Eric Evans <>
> wrote:
>> On Thu, Oct 27, 2016 at 4:13 PM, Alexander Dejanovski
>> <> wrote:
>> > A few patches are pushing the limits of partition sizes so we may soon
>> be
>> > more comfortable with big partitions.
>> You don't happen to have Jira links to these handy, do you?
>> --
>> Eric Evans
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
> Consultant
> Apache Cassandra Consulting

Reply via email to