I opened https://issues.apache.org/jira/browse/SOLR-6301


On Wed, Jul 30, 2014 at 1:35 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Per,
>
> There's LUCENE-5675 which has added a new postings format for IDs. Trying
> it out in Solr is in my todo list but maybe you can get to it before me.
>
> https://issues.apache.org/jira/browse/LUCENE-5675
>
>
> On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen <st...@designware.dk>
> wrote:
>
>> On 30/07/14 08:55, jim ferenczi wrote:
>>
>>> Hi Per,
>>> First of all the BloomFilter implementation in Lucene is not exactly a
>>> bloom filter. It uses only one hash function and you cannot set the false
>>> positive ratio beforehand. ElasticSearch has its own bloom filter
>>> implementation (using "guava like" BloomFilter), you should take a look
>>> at
>>> their implementation if you really need this feature.
>>>
>> Yes, I am looking into what Lucene can do and how to use it through Solr.
>> If it does not fit our needs I will enhance it - potentially with
>> inspiration from ES implementation. Thanks
>>
>>  What is your use-case ? If your index fits in RAM the bloom filter won't
>>> help (and it may have a negative impact if you have a lot of segments).
>>> In
>>> fact the only use case where the bloom filter can help is when your term
>>> dictionary does not fit in RAM which is rarely the case.
>>>
>> We have so many documents that it will never fit in memory. We use
>> optimistic locking (our own implementation) to do correct concurrent
>> assembly of documents and to do duplicate control. This require a lot of
>> finding docs from their id, and most of the time the document is not there,
>> but to be sure we need to check both transactionlog and the actual index
>> (UpdateLog). We would like to use Bloom Filter to quickly tell that a
>> document with a particular id is NOT present.
>>
>>>
>>> Regards,
>>> Jim
>>>
>> Regards, Per Steffensen
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to