Re: Bloom filter

Per Steffensen Wed, 30 Jul 2014 00:28:34 -0700

On 30/07/14 08:55, jim ferenczi wrote:

Hi Per,
First of all the BloomFilter implementation in Lucene is not exactly a
bloom filter. It uses only one hash function and you cannot set the false
positive ratio beforehand. ElasticSearch has its own bloom filter
implementation (using "guava like" BloomFilter), you should take a look at
their implementation if you really need this feature.

Yes, I am looking into what Lucene can do and how to use it throughSolr. If it does not fit our needs I will enhance it - potentially withinspiration from ES implementation. Thanks

What is your use-case ? If your index fits in RAM the bloom filter won't
help (and it may have a negative impact if you have a lot of segments). In
fact the only use case where the bloom filter can help is when your term
dictionary does not fit in RAM which is rarely the case.

We have so many documents that it will never fit in memory. We useoptimistic locking (our own implementation) to do correct concurrentassembly of documents and to do duplicate control. This require a lot offinding docs from their id, and most of the time the document is notthere, but to be sure we need to check both transactionlog and theactual index (UpdateLog). We would like to use Bloom Filter to quicklytell that a document with a particular id is NOT present.


Regards,
Jim

Regards, Per Steffensen

Re: Bloom filter

Reply via email to