>>> It's my understanding then for this use case that bloom filters are of
>>> little importance and that i can
>> 
Yes.
AFAIK there is only one position seek (that will use the bloom filter)  at the 
start of a get_range_slice request. After that the iterators step over the rows 
in the -Data files. 

For the same reason caches may be considered a little less useful.  

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/03/2012, at 12:44 PM, Mick Semb Wever wrote:

> On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote:
>> Are you doing RF=1? 
> 
> That is correct. So are you calculations then :-)
> 
> 
>>> very small, <1k. Data from this cf is only read via hadoop jobs in batch
>>> reads of 16k rows at a time.
>> [snip]
>>> It's my understanding then for this use case that bloom filters are of
>>> little importance and that i can
>> 
>> Depends. I'm not familiar enough with how the hadoop integration works
>> so someone else will have to comment, but if your hadoop jobs are just
>> performan normal reads of keys via thrift and the keys they are
>> grabbing are not in token order, those reads would be effectively
>> random and bloom filters should still be highly relevant to the amount
>> of I/O operations you need to perform. 
> 
> They are thrift get_range_slice reads of 16k rows per request.
> Hadoop reads are based on tokens, but in my use case the keys are also
> ordered and this cluster is using BOP.
> 
> ~mck
> 
> -- 
> "Living on Earth is expensive, but it does include a free trip around
> the sun every year." Unknown 
> 
> | http://github.com/finn-no | http://tech.finn.no |

Reply via email to