Maybe it makes sense to describe what you're trying to accomplish in more 

A common bucketing approach is along the lines of year, month, day, hour, 
minute, etc and then use a timeuuid as a cluster column.  

Depending upon the semantics of the transport protocol you plan on utilizing, 
either the client code keep track of pagination, or the app server could, if 
you utilized some type of request/reply/ack flow.  You could keep sequence 
numbers for each client, and begin streaming data to them or allowing query 
upon reconnect, etc.

But again, more details of the use case might prove useful.


> On Jun 7, 2014, at 1:53 PM, Kevin Burton <> wrote:
> Another way around this is to have a separate table storing the number of 
> buckets.
> This way if you have too few buckets, you can just increase them in the 
> future.
> Of course, the older data will still have too few buckets :-(
>> On Sat, Jun 7, 2014 at 11:09 AM, Kevin Burton <> wrote:
>>> On Sat, Jun 7, 2014 at 10:41 AM, Colin Clark <> wrote:
>>> It's an anti-pattern and there are better ways to do this.
>> Entirely possible :)
>> It would be nice to have a document with a bunch of common cassandra design 
>> patterns.
>> I've been trying to track down a pattern for this and a lot of this is 
>> pieced in different places an individual blogs posts so one has to reverse 
>> engineer it.
>>> I have implemented the paging algorithm you've described using wide rows 
>>> and bucketing.  This approach is a more efficient utilization of 
>>> Cassandra's built in wholesome goodness.
>> So.. I assume the general pattern is to:
>> create a bucket.. you create like 2^16 buckets, this is your partition key.  
>> Then you place a timestamp next to the bucket in a primary key.
>> So essentially:
>> primary key( bucket, timestamp )… 
>> .. so to read from this buck you essentially execute: 
>> select * from foo where bucket = 100 and timestamp > 12345790 limit 10000;
>>> Also, I wouldn't let any number of clients (huge) connect directly the 
>>> cluster to do this-put some type of app server in between to handle the 
>>> comm's and fan out.  You'll get better utilization of resources and less 
>>> overhead in addition to flexibility of which data center you're utilizing 
>>> to serve requests. 
>> this is interesting… since the partition is the bucket, you could make some 
>> poor decisions based on the number of buckets.
>> For example, 
>> if you use 2^64 buckets, the number of items in each bucket is going to be 
>> rather small.  So you're going to have tons of queries each fetching 0-1 row 
>> (if you have a small amount of data).
>> But if you use very FEW buckets.. say 5, but you have a cluster of 1000 
>> nodes, then you will have 5 of these buckets on 5 nodes, and the rest of the 
>> nodes without any data.
>> Hm..
>> the byte ordered partitioner solves this problem because I can just pick a 
>> fixed number of buckets and then this is the primary key prefix and the data 
>> in a bucket can be split up across machines based on any arbitrary split 
>> even in the middle of a 'bucket' …
>> -- 
>> Founder/CEO
>> Location: San Francisco, CA
>> Skype: burtonator
>> blog:
>> … or check out my Google+ profile
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
>> people.
> -- 
> Founder/CEO
> Location: San Francisco, CA
> Skype: burtonator
> blog:
> … or check out my Google+ profile
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
> people.

Reply via email to