>
> If you had one big cache, wouldn't it be the case that it's mostly
> populated with frequently accessed rows, and less populated with rarely
> accessed rows?
>

Yes.

In fact, wouldn't one big cache dynamically and automatically give you
> exactly what you want? If you try to partition the same amount of memory
> manually, by guesswork, among many tables, aren't you always going to do a
> worse job?
>

Suppose you have one CF that's used constantly through interaction by
users.  Suppose you have another CF that's only used periodically by a batch
process, you tend to access most or all of the rows during the batch
process, and it's too large to cache all of the rows.  Normally, you would
dedicate cache space to the first CF as anything with human interaction
tends to have good temporal locality and you want to keep latencies there
low.  On the other hand, caching the second CF provides little to no real
benefit.  When you combine these two CFs, every time your batch process
runs, rows from the second CF will populate the cache and will cause
eviction of rows from the first CF, even though having those rows in the
cache provides little benefit to you.

As another example, if you mix a CF with wide rows and a CF with small rows,
you no longer have the option of using a row cache, even if it makes great
sense for the small-row CF data.

Knowledge of data and access patterns gives you a very good advantage when
it comes to caching your data effectively.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Reply via email to