I’m working on moving a bunch of counters out of our relational database to Cassandra. For the most part, Cassandra is a very nice fit, except for one feature on our website. We manage a time series of view counts for each document, and display a list of the most popular documents in the last seven days. This seems like a pretty strong anti-pattern for Cassandra, but also seems like something a lot of people would want to do. If you’re keeping counters, its pretty likely that you’d want to know which ones have the highest counts.
Here’s what I came up with to implement this feature. Create a counter table with primary key (doc_id, day) and a single counter. Whenever a document is viewed, increment the counter for the document for today and the previous six days. Sometime after midnight each day, compile the counters into a table with primary key (day, count, doc_id) and no additional columns. For each partition in the counter table, I would sum up the counters, delete any counters that are over a week old, and put the sum into the second table with day = today. When I query the table, i would ask for data where day = yesterday. During the compilation process, I would delete old partitions. In theory I’d only need two partitions. One that is being built, and one for querying. I’d be interested to hear critiques on this strategy, as well as hearing how other people have implemented a "most-popular" feature using Cassandra counters. Robert