On Thu, Aug 18, 2011 at 10:36 AM, Stephen Henderson < stephen.hender...@cognitivematch.com> wrote:
> Thanks Ed/Aaron, that really helped a lot. > > > > Just to clarify on the question of writes (sorry, I worded that badly) - do > write operations insert rows into the cache on all nodes in the replica set > or does the cache only get populated on reads? > > > > Aaron – in terms of scale, our ultimate goal is to achieve 99% reads under > 5ms (ideally <1ms) with upto 20,000 operations a second (split 60/40 > read/write) and upto 2 billion keys. That’s the 12-18 month plan at least, > short-term we’ll be more like 1000 ops/sec and 10 million keys which I think > cassandra could cope with comfortably. We’re currently working out what the > row-size will be, but hoping to be under 2kb max. Consistency isn’t > massively important. Our use case is as a user-profile store for serving > optimised advert-content with quite tight restrictions on response time, so > we have say 10ms to gather as much data about a user as possible before we > have to make a decision on which creative to serve. If we can read a profile > from the store in this time we can serve a personalised ad with a higher > chance of engagement so low-latency is key requirement. > > > > Edward – thanks for the link to the presentation slides. A bit off-topic, > but have you ever had a look at CouchBase (previously “membase”)? It’s > basically memcached with persistence, fault-tolerance and online scaling. > It’s the main alternative platform we’re considering for this project and on > paper it sounds perfect, though we have a few concerns about it (mainly lack > of active community, another nosql platform to learn and general uncertainty > over the upcoming 2.0 release). We’re hoping to do some stress test > comparison tests between the two in the near future and I’ll try to post the > results if they’re not too company-specific. > > > > Thanks again, > > Stephen > > > > *From:* Edward Capriolo [mailto:edlinuxg...@gmail.com] > *Sent:* 18 August 2011 14:14 > *To:* user@cassandra.apache.org > *Subject:* Re: A few questions on row caching and read consistency ONE > > > > > > On Thu, Aug 18, 2011 at 5:01 AM, Stephen Henderson < > stephen.hender...@cognitivematch.com> wrote: > > Hi, > > We're currently in the planning stage of a new project which needs a low > latency, persistent key/value store with a roughly 60:40 read/write split. > We're trying to establish if Cassandra is a good fit for this and in > particular what the hardware requirements would be to have the majority of > rows cached in memory (other nosql platforms like Couchbase/Membase seem > like a more natural fit but we're already reasonably familiar with cassandra > and would rather stick with what we know if it can work). > > If anyone could help answer/clarify the following questions it would be a > great help (all assume that row-caching is enabled for the column family). > > Q. If we're using read consistency ONE does the read request get sent to > all nodes in the replica set and the first to reply is returned (i.e. all > replica nodes will then have that row in their cache), OR does the request > only get sent to a single node in the replica set? If it's the latter would > the same node generally be used for all requests to the same key or would it > always be a random node in the replica set? (i.e. if we have multiple reads > for one key in quick succession would this entail potentially multiple disk > lookups until all nodes in the set have been hit?). > > Q. Related to the above, if only one node recieves the request would the > client (hector in this case) know which node to send the request to directly > or would there be potentially one extra network hop involved (client -> > random node -> node with key). > > Q. Is it possible to do a warm cache load of the most recently accessed > keys on node startup or would we have to do this with a client app? > > Q. With write consistency ANY is it correct that following a write request > all nodes in the replica set will end up with that row in their cache, as > well as on disk, once they receive the write? i.e. total cache size is > (cache_memory_per_node * num_nodes) / num_replicas. > > Q. If the cluster only has a single column family, random partitioning and > no secondary indexes, is there a good metric for estimating how much heap > space we would need to leave aside for everything that isn't the row-cache? > Would it be proportional to the row-cache size or fairly constant? > > > Thanks, > Stephen > > > Stephen Henderson - Lead Developer (Onsite), Cognitive Match > stephen.hender...@cognitivematch.com | http://www.cognitivematch.com > > > > I did a small presentation on this topic a while back. > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp > > > 1. > > a) All reads go to all replica nodes. Even those at READ.ONE. UNLESS you > lower the read_repair_chance for the column family. > > b) Read could hit random nodes same node unless you confirgure dynamic > snitch to pin the request to a single node. This is described in the > cassandra.yaml > > > > 2. Hector and no client that I know of routes requests to proper nodes > based on topology. No information of know of has proven this matters. > > > > 3. Cassandra allows you to save your caches so your node will start up warn > (saving large rowcache is hard, large key cache is easy) > > > > 4. Write.ANY would not change how caching works. > > > > 5. There are some calculations out there based on size of rows. One of the > newer features of cassandra is it automatically resizes the row cache under > memory pressure now. You still have to feel it out but you do not have to > worry about setting it too high as much anymore. > > > > One more note. I you have mentioned the row cache which is awesome it you > can utilize it correctly and your use case is prefect but key cache + page > cache can server very fast as well. > > > > Thank you, > > Edward > > > > > Wait membase is couchbase? I thought it was northscale? (I can not keep up). It seems to have coordinators or masters. http://www.slideshare.net/tim.lossen.de/an-introduction-to-membase Any solution where all the read write traffic travels through a master I do not believe to be scalable. Other solutions that use a master for coordination election but read or write directly to nodes are "more" scalable but more fragile. Q. Why does every scalable architecture except Cassandra seem to have master nodes ? :) It is not in the YCSB so hard to say how fast it is or how well it performs.