Hi Andy, There will be 10's millions of uid each with 100's of someid being accessed each day.
Hi Przemek, We currently use counter column families, but they are some of our slowest. (they are also some of our biggest, so the counter type might not be the issue) We have a strong need for a cross DC solution. We could use redis and handle the replication ourselves, but are hoping not to have to do this. Regarding tweaking the compaction thresholds, so you mean increase/decreasing the min/max _compaction_thresholds? I guess decreasing both values will result in more compaction so fewer SSTable reads, so faster reads? (at the cost of heavier cpu/disk usage?) We will always require all of a uids, someid so adding someid to the partition key is not an option at this time. Thanks, Chris From: Przemek Maciolek [mailto:pmacio...@gmail.com] Sent: 05 December 2013 16:04 To: user@cassandra.apache.org Subject: Re: Counters question - is there a better way to count Some big systems using Cassandra's counters were built (such as Rainbird: http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-s trata-2011 ) and seem to be doing great job. If you are concerned with performance, then maybe using memory-based store (such as Redis) will better suit your case (as long as it fits in the memory, but considering the data model, I guess it might work). If you are going to stick with Cassandra, then tweaking compaction threshold can make a visible difference on the read performance, at least from what I have seen. You can also consider changing the PRIMARY KEY to ((uid, someid), time) - this will make the partition key out of uid+someid, rather than just someid. Depending on the access pattern, it might help. On Thu, Dec 5, 2013 at 4:44 PM, Christopher Wirt <chris.w...@struq.com> wrote: I want to build a really simple column family which counts the occurrence of a single event X. Once we reach Y occurrences of X the counter resets to 0 The obvious way to do this is with a counter CF. CREATE TABLE xcounter1 ( id uuid, someid int, count counter ) PRIMARY KEY (uid, someid) This is how I've always done it in the past, but I've been told to avoid counters for various reasons, performance, consistency etc.. I'm not too bothered about 100% absolute consistency, however read performance is certainly a big concern. So I was thinking to avoid using counters I could do something like this. CREATE TABLE xcounter2 ( id uuid, someid int, time timeuuid ) PRIMARY KEY (uid, someid, time) Then retrieve all events and count in memory. Delete all id, someid records once I hit Y. Or I could CREATE TABLE xcounter3 ( id uuid, someid int, time timeuuid, Ycount int ) PRIMARY KEY (uid, someid, time) Insert a 'Ycount' on each occurrence of the event. Only retrieve the last Y value inserted on reading Then delete all records once I hit the magic Y value. Anyone have any interesting thoughts or insight on what is likely to give me the best read performance? There will be 100's of someid to each id. Reads will be 5-10x the writes. Thanks, Chris