You could store the key -> score pairs in Cassandra, pull out the full partition and repopulate the cache in redis with the top N whatever you need. I'd only read the Cassandra values directly in order to repopulate the cache.
I wouldn't try to score the score -> key values, the perf will be a nightmare. On Tue, Jan 17, 2017 at 8:47 AM Mike Torra <mto...@demandware.com> wrote: > Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is > indeed what we use today. > > Caching the resulting 'sorted sets' in redis is exactly what I plan to do. > There will be tens of thousands of these sorted sets, each generally with > <10k items (with maybe a few exceptions going a bit over that). The reason > to periodically calculate the set and store it in cassandra is to avoid > having the client do that work, when the client only really cares about the > top 100 or so items at any given time. Being truly "real time" is not > critical for us, but it is a selling point to be as up to date as possible. > > I'd like to understand the performance issue of frequently updating these > sets. I understand that every time I 'regenerate' the sorted set, any rows > that change will create a tombstone - for example, if "item_1" is in first > place and "item_2" is in second place, then they switch on the next update, > that would be two tombstones. Do you think this will be a big enough > problem that it is worth doing the sorting work client side, on demand, and > just try to eat the performance hit there? My thought was to make a > tradeoff by using more cassandra disk space (ie pre calculating all sets), > in exchange for faster reads when requests actually come in that need this > data. > > From: Benjamin Roth <benjamin.r...@jaumo.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Saturday, January 14, 2017 at 1:25 PM > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Subject: Re: implementing a 'sorted set' on top of cassandra > > Mike mentioned "increment" in his initial post. That let me think of a > case with increments and fetching a top list by a counter like > https://redis.io/commands/zincrby > https://redis.io/commands/zrangebyscore > > 1. Cassandra is absolutely not made to sort by a counter (or a non-counter > numeric incrementing value) but it is made to store counters. In this case > a partition could be seen as a set. > 2. I thought of CS for persistence and - depending on the app requirements > like real-time and set size - still use redis as a read cache > > 2017-01-14 18:45 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>: > > Sorted sets don't have a requirement of incrementing / decrementing. > They're commonly used for thing like leaderboards where the values are > arbitrary. > > In Redis they are implemented with 2 data structures for efficient lookups > of either key or value. No getting around that as far as I know. > > In Cassandra they would require using the score as a clustering column in > order to select top N scores (and paginate). That means a tombstone > whenever the value for a key in the set changes. In sets with high rates of > change that means a lot of tombstones and thus terrible performance. > On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan <doanduy...@gmail.com> wrote: > > Sorting on an "incremented" numeric value has always been a nightmare to > be done properly in C* > > Either use Counter type but then no sorting is possible since counter > cannot be used as type for clustering column (which allows sort) > > Or use simple numeric type on clustering column but then to increment the > value *concurrently* and *safely* it's prohibitive (SELECT to fetch current > value + UPDATE ... IF value = <old_value>) + retry > > > > On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth <benjamin.r...@jaumo.com> > wrote: > > If your proposed solution is crazy depends on your needs :) > It sounds like you can live with not-realtime data. So it is ok to cache > it. Why preproduce the results if you only need 5% of them? Why not use > redis as a cache with expiring sorted sets that are filled on demand from > cassandra partitions with counters? > So redis has much less to do and can scale much better. And you are not > limited on keeping all data in ram as cache data is volatile and can be > evicted on demand. > If this is effective also depends on the size of your sets. CS wont be > able to sort them by score for you, so you will have to load the complete > set to redis for caching and / or do sorting in your app on demand. This > certainly won't work out well with sets with millions of entries. > > 2017-01-13 23:14 GMT+01:00 Mike Torra <mto...@demandware.com>: > > We currently use redis to store sorted sets that we increment many, many > times more than we read. For example, only about 5% of these sets are ever > read. We are getting to the point where redis is becoming difficult to > scale (currently at >20 nodes). > > We've started using cassandra for other things, and now we are > experimenting to see if having a similar 'sorted set' data structure is > feasible in cassandra. My approach so far is: > > 1. Use a counter CF to store the values I want to sort by > 2. Periodically read in all key/values in the counter CF and sort in > the client application (~every five minutes or so) > 3. Write back to a different CF with the ordered keys I care about > > Does this seem crazy? Is there a simpler way to do this in cassandra? > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 > <+49%207161%203048801> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer > > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 > <+49%207161%203048801> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >