On 7/12/2011 9:02 PM, Peter Schuller wrote:
Thanks Peter, but... hmmm, are you saying that even after a cache miss which
results in a disk read and blocks being moved to the ssd, that by the next
cache miss for the same data and subsequent same file blocks, that the ssd
is unlikely to have those same blocks present anymore?
I am saying that regardless of whether the cache is memory, ssd, a
combination of both, or anything else, most workloads tend to be
subject to diminishing returns. Doubling cache from 5 gb to 10 gb
might get you from 10% to 50% cache hit ratio, but doubling again to
20 gb might get you to 60% and doubling to 40 gig to 65% (to use some
completely arbitrary random numbers for demonstration purposes).

The reason a cache can be more effective than the ratio of its size
vs. the total data set, is that there is a hotspot/working set that is
smaller than the total data set. If you have completely random access
this won't be the case, and an cache of size n% of total size will
give you a n% cache hit ratio.

But for most workloads, you have a hotter working set so you get more
bang for the buck when caching. For example, if 99% of all accesses
are accessing 10% of the data, then a cache that is the size of 10% of
the data gets you 99% cache hit ratio. But clearly no matter how much
more cache you ever add, you will never ever cache more than 100% of
reads so in this (artificial arbitrary) scenario, once you're caching
10% of your data the cost of cachine the final small percent of
accesses might be 10 times that of the original cache.

I did a quick Google but didn't find a good piece describing it more
properly, but hopefully the above is helpful. Some related reading
might be http://en.wikipedia.org/wiki/Long_Tail


Of course. Thanks for the clarification. On the positive side, this flashcache and other solutions like it could be beneficial for all disk i/o on the system. Writes will always benefit. Reads, only if they are read again before being pushed out by other reads. I wonder if it would help to "prime" the ssd by reading in (and discarding) the top 25% (250/1000GB) of the usual hot data.

aj

Reply via email to