>>> It's my understanding then for this use case that bloom filters are of >>> little importance and that i can >> Yes. AFAIK there is only one position seek (that will use the bloom filter) at the start of a get_range_slice request. After that the iterators step over the rows in the -Data files.
For the same reason caches may be considered a little less useful. Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/03/2012, at 12:44 PM, Mick Semb Wever wrote: > On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote: >> Are you doing RF=1? > > That is correct. So are you calculations then :-) > > >>> very small, <1k. Data from this cf is only read via hadoop jobs in batch >>> reads of 16k rows at a time. >> [snip] >>> It's my understanding then for this use case that bloom filters are of >>> little importance and that i can >> >> Depends. I'm not familiar enough with how the hadoop integration works >> so someone else will have to comment, but if your hadoop jobs are just >> performan normal reads of keys via thrift and the keys they are >> grabbing are not in token order, those reads would be effectively >> random and bloom filters should still be highly relevant to the amount >> of I/O operations you need to perform. > > They are thrift get_range_slice reads of 16k rows per request. > Hadoop reads are based on tokens, but in my use case the keys are also > ordered and this cluster is using BOP. > > ~mck > > -- > "Living on Earth is expensive, but it does include a free trip around > the sun every year." Unknown > > | http://github.com/finn-no | http://tech.finn.no |