BloomFilter-s with Lucene

2009-01-30 Thread Andrzej Bialecki
Hi all, I've been using BloomFilters for various tasks, and I can't shake the feeling that they could be of some use in Lucene internals, to speed up various membership tests, especially if we look for 100% correct negatives, and we can accept a small rate of false positives. For example,

Re: BloomFilter-s with Lucene

2009-01-30 Thread markharw00d
Andrzej Bialecki wrote: Funny, I was having vague thoughts about this today too having been concerned about some of the big arrays that can end up in a typical Lucene app. Aside from providing space-efiicient lookups, another application for BloomFilters is in similarity measures e.g. ANDing

Re: BloomFilter-s with Lucene

2009-01-30 Thread Andrzej Bialecki
markharw00d wrote: Andrzej Bialecki wrote: Funny, I was having vague thoughts about this today too having been concerned about some of the big arrays that can end up in a typical Lucene app. Aside from providing space-efiicient lookups, another application for BloomFilters is in similarity

Re: BloomFilter-s with Lucene

2009-01-30 Thread pdecrem
. -Original Message- From: Andrzej Bialecki a...@getopt.org Date: Fri, 30 Jan 2009 21:42:13 To: java-dev@lucene.apache.org Subject: Re: BloomFilter-s with Lucene markharw00d wrote: Andrzej Bialecki wrote: Funny, I was having vague thoughts about this today too having been concerned

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
array access, if positive do full work with hige switch statement. - Original Message From: Andrzej Bialecki a...@getopt.org To: java-dev@lucene.apache.org Sent: Friday, 30 January, 2009 21:42:13 Subject: Re: BloomFilter-s with Lucene markharw00d wrote: Andrzej Bialecki

Re: BloomFilter-s with Lucene

2009-01-30 Thread Andi Vajda
On Fri, 30 Jan 2009, eks dev wrote: I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent-plain form mapping). Big number of accented characters (this causes big switch statement) that appear seldom in corpus (big majority being not accented).

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
...@osafoundation.org To: java-dev@lucene.apache.org Sent: Friday, 30 January, 2009 23:02:15 Subject: Re: BloomFilter-s with Lucene On Fri, 30 Jan 2009, eks dev wrote: I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent-plain form mapping). Big