I opened https://issues.apache.org/jira/browse/SOLR-6301
On Wed, Jul 30, 2014 at 1:35 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Per, > > There's LUCENE-5675 which has added a new postings format for IDs. Trying > it out in Solr is in my todo list but maybe you can get to it before me. > > https://issues.apache.org/jira/browse/LUCENE-5675 > > > On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen <st...@designware.dk> > wrote: > >> On 30/07/14 08:55, jim ferenczi wrote: >> >>> Hi Per, >>> First of all the BloomFilter implementation in Lucene is not exactly a >>> bloom filter. It uses only one hash function and you cannot set the false >>> positive ratio beforehand. ElasticSearch has its own bloom filter >>> implementation (using "guava like" BloomFilter), you should take a look >>> at >>> their implementation if you really need this feature. >>> >> Yes, I am looking into what Lucene can do and how to use it through Solr. >> If it does not fit our needs I will enhance it - potentially with >> inspiration from ES implementation. Thanks >> >> What is your use-case ? If your index fits in RAM the bloom filter won't >>> help (and it may have a negative impact if you have a lot of segments). >>> In >>> fact the only use case where the bloom filter can help is when your term >>> dictionary does not fit in RAM which is rarely the case. >>> >> We have so many documents that it will never fit in memory. We use >> optimistic locking (our own implementation) to do correct concurrent >> assembly of documents and to do duplicate control. This require a lot of >> finding docs from their id, and most of the time the document is not there, >> but to be sure we need to check both transactionlog and the actual index >> (UpdateLog). We would like to use Bloom Filter to quickly tell that a >> document with a particular id is NOT present. >> >>> >>> Regards, >>> Jim >>> >> Regards, Per Steffensen >> > > > > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, Shalin Shekhar Mangar.