El lun, 09-05-2011 a las 17:58 +0200, Peter Schuller escribió:
> > I have a few sstables with around 500 million keys, and memory usage has
> > grown a lot, I suppose because of the indexes. This sstables are
> > comprised of skinny rows, but a lot of them. Would tuning index interval
> > make the memory usage go down? And what would the performance hit be?
> 
> Assuming no row caching, and assuming you're talking about heap usage
> and not the virtual size of the process in top, the primary two things
> that will grow with row count are (1) bloom filters for sstables and
> (2) the sampled index keys. Bloom filters are of a certain size to
> achieve a sufficiently small false positive rate. That target rate
> could be increased to allow smaller bloom filters, but that is not
> exposed as a configuration option and would require code changes.
> 

No row cache and no key cache. I've tried with both, but the keys being
read are constantly changing, and I didn't see hit ratios beyond 0.8 %.

That reminds me, my false positive ration is stuck at 1.0, so I guess
bloom filters aren't doing a lot for me.

> For key sampling, the primary performance penalty should be CPU and
> maybe some disk. On average, when looking up a key an sstable index
> file, you'll read sample interval/2 entries and deserialize them
> before finding the one you're after. Increasing sampling interval will
> thus increase the amount of deserialization taking place, as well as
> make the average range of data span additional pages on disk. The
> impact on disk is difficult to judge and likely depends a lot on i/o
> scheduling and other details.
> 

So the only thing I can do is test it and see how it goes. To make the
change affective, should I do anything beyond changing the value in
cassandra.yaml and restart the node? I'll try first with 256 and see
what happens.

Reply via email to