El lun, 09-05-2011 a las 17:58 +0200, Peter Schuller escribió: > > I have a few sstables with around 500 million keys, and memory usage has > > grown a lot, I suppose because of the indexes. This sstables are > > comprised of skinny rows, but a lot of them. Would tuning index interval > > make the memory usage go down? And what would the performance hit be? > > Assuming no row caching, and assuming you're talking about heap usage > and not the virtual size of the process in top, the primary two things > that will grow with row count are (1) bloom filters for sstables and > (2) the sampled index keys. Bloom filters are of a certain size to > achieve a sufficiently small false positive rate. That target rate > could be increased to allow smaller bloom filters, but that is not > exposed as a configuration option and would require code changes. >
No row cache and no key cache. I've tried with both, but the keys being read are constantly changing, and I didn't see hit ratios beyond 0.8 %. That reminds me, my false positive ration is stuck at 1.0, so I guess bloom filters aren't doing a lot for me. > For key sampling, the primary performance penalty should be CPU and > maybe some disk. On average, when looking up a key an sstable index > file, you'll read sample interval/2 entries and deserialize them > before finding the one you're after. Increasing sampling interval will > thus increase the amount of deserialization taking place, as well as > make the average range of data span additional pages on disk. The > impact on disk is difficult to judge and likely depends a lot on i/o > scheduling and other details. > So the only thing I can do is test it and see how it goes. To make the change affective, should I do anything beyond changing the value in cassandra.yaml and restart the node? I'll try first with 256 and see what happens.