Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Michael McCandless
In your approach, roughly how many filters do you have cached? It seems like it could be quite a few (one for each color, one for each type, etc)? You might be able to modify the new (on Lucene trunk) FieldCacheRangeFilter to achieve this same filtering without actually having to

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Jason Rutherglen
Hi M.S., Do you think it would be cool to have some faceting built into Lucene at some point? -J On Tue, Dec 9, 2008 at 10:11 PM, Michael Stoppelman [EMAIL PROTECTED]wrote: Yeah looks similar to what we've implemented for ourselves (although I haven't looked at the implementation). We've got

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Tim Sturge
Mike, Mike, I have an implementation of FieldCacheTermsFilter (which uses field cache to filter for a predefined set of terms) around if either of you are interested. It is faster than materializing the filter roughly when the filter matches more than 1% of the documents. So it's not better for

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Michael McCandless
It'd be great to get this into Lucene. Does FieldCacheTermsFilter let you specify a set of arbitrary terms to filter for, like TermsFilter in contrib/queries? And it's space/time efficient once FieldCache is populated? Mike Tim Sturge wrote: Mike, Mike, I have an implementation of

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Tim Sturge
Yes (mostly). It turns those terms into an OpenBitSet on the term array. Then it does a fastGet() in the next() and skipTo() loops to see if the term for that document is in the set. The issue is that fastGet() is not as fast as the two inequalities in FCRF. I didn't directly benchmark FCTF

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Tim Sturge
It's LUCENE-1487. Tim On 12/10/08 1:13 PM, Tim Sturge [EMAIL PROTECTED] wrote: Yes (mostly). It turns those terms into an OpenBitSet on the term array. Then it does a fastGet() in the next() and skipTo() loops to see if the term for that document is in the set. The issue is that

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-09 Thread Michael McCandless
This use case sounds alot like faceted navigation, which Solr provides. Mike Michael Stoppelman wrote: Hi all, I'm working on upgrading to Lucene 2.4.0 from 2.3.2 and was trying to integrate the new DodIdSet changes since o.a.l.search.Filter#bits() method is now depreciated. For our app

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-09 Thread Michael Stoppelman
Yeah looks similar to what we've implemented for ourselves (although I haven't looked at the implementation). We've got quite a custom version of lucene at this point. Using Solr at this point really isn't a viable option, but thanks for pointing this out. M On Tue, Dec 9, 2008 at 1:47 AM,

Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-08 Thread Michael Stoppelman
Hi all, I'm working on upgrading to Lucene 2.4.0 from 2.3.2 and was trying to integrate the new DodIdSet changes since o.a.l.search.Filter#bits() method is now depreciated. For our app we actually heavily rely on bits from the Filter to do post-query filtering (I explain why below). For example,

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-08 Thread Paul Elschot
Michael, The change from BitSet to DocIdSetIterator implies that you'll need to choose an underlying data structure yourself. A minimal approach would be to use DocIdBitSet around BitSet, but there are better ways. For your application you might consider to replace java's BitSet by lucene's