Boosting results

2008-11-06 Thread Scott Smith
I'm interested in comments on the following problem. I have a set of documents. They fall into 3 categories. Call these categories A, B, and C. Each document has an indexed, non-tokenized field called "category" which contains A, B, or C (they are mutually exclusive categories). All

Re: Boosting results

2008-11-06 Thread Erick Erickson
It seems to me that the easiest thing would be to fire two queries and then just concatenate the results category:A AND body:fred category:B AND body:fred If you really, really didn't want to fire two queries, you could create filters on category A and category B and make a couple of passes thr

Re: Boosting results

2008-11-07 Thread Erick Erickson
dh, sorting. I absolutely love it when I overlook the obvious . [EMAIL PROTECTED] On Fri, Nov 7, 2008 at 4:58 AM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Couldn't you just do a single Query that sorts first by category and second > by relevance? > > Mike > > > Erick Erickson wrote

Re: Boosting results

2008-11-07 Thread Michael McCandless
Couldn't you just do a single Query that sorts first by category and second by relevance? Mike Erick Erickson wrote: It seems to me that the easiest thing would be to fire two queries and then just concatenate the results category:A AND body:fred category:B AND body:fred If you really,

Re: Boosting results

2008-11-07 Thread Matthew DeLoria
This actually brings up an interesting question, and something I have been curious about. In this case, does it make more sense to do Boosting by Category, or to do sorting? From what I understand, Lucene sorting involves putting the relevant fields into memory, and then executing a sort. Is this

RE: Boosting results

2008-11-07 Thread Scott Smith
n multiple fields and "score" (aka relevancy) is one of the pseudo fields. That'll work. Thanks. Scott -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Friday, November 07, 2008 5:59 AM To: java-user@lucene.apache.org Subject: Re: Boosting results duuu

Re: Boosting results

2008-11-07 Thread Michael McCandless
This is a good point. Sorting populates the field cache (internal to Lucene) for that field, meaning it loads all values for all docs and holds them in memory. This makes the first query slow, and, consumes RAM, in proportion to how large your index is. Whereas boosting should be able t

Re: Boosting results

2008-11-07 Thread Peter Keegan
> From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Friday, November 07, 2008 5:59 AM > To: java-user@lucene.apache.org > Subject: Re: Boosting results > > dh, sorting. I absolutely love it when I overlook the obvious . > > [EMAIL PROTECTED] > > On Fri, Nov 7

Re: Boosting results

2008-11-10 Thread Mark Miller
Michael McCandless wrote: But: it's slow to load a field for the first time. LUCENE-1231 (column-stride fields) aims to greatly speed up the load time. Test it out though. In some recent testing I was doing it was *way* faster than I thought it would be based on what I had been reading. Of c

Re: Boosting results

2008-11-10 Thread Michael McCandless
Well .. the FieldCache API is documented here (for 2.4.0): http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/FieldCache.html EG you can load ints (for example) like this: FieldCache.DEFAULT.getInts(reader, "myfield"); This returns an array mapping docID --> int va

Re: Boosting results

2008-11-10 Thread Stefan Trcek
On Friday 07 November 2008 18:46:17 Michael McCandless wrote: > > Sorting populates the field cache (internal to Lucene) for that > field,   meaning it loads all values for all docs and holds them in > memory. This makes the first query slow, and, consumes RAM, in > proportion to how large your ind

Re: Boosting results

2008-11-10 Thread Stefan Trcek
On Monday 10 November 2008 13:55:31 Michael McCandless wrote: > > Finally, you might want to instead look at Solr, which provides facet > counting out of the box, rather than roll your own... Doooh - new api, but it's facet counting sounds good. Any starting points for moving from plain lucene to

Re: Boosting results

2008-11-10 Thread Erik Hatcher
On Nov 10, 2008, at 2:42 PM, Stefan Trcek wrote: On Monday 10 November 2008 13:55:31 Michael McCandless wrote: Finally, you might want to instead look at Solr, which provides facet counting out of the box, rather than roll your own... Doooh - new api, but it's facet counting sounds good. An

Re: Boosting results

2008-11-11 Thread Erik Hatcher
On Nov 11, 2008, at 8:32 AM, Stefan Trcek wrote: On Tuesday 11 November 2008 02:18:39 Erik Hatcher wrote: The integration won't be too painful... the main thing is that Solr requires* some configuration files, literally on the filesystem, in order to fire up and be happy. And you'll need to

Re: Boosting results

2008-11-11 Thread Stefan Trcek
On Monday 10 November 2008 14:58:15 Mark Miller wrote: > > But: it's slow to load a field for the first time.  LUCENE-1231 > > (column-stride fields) aims to greatly speed up the load time. > > Test it out though. In some recent testing I was doing it was *way* > faster than I thought it would be b

Re: Boosting results

2008-11-11 Thread Stefan Trcek
On Tuesday 11 November 2008 02:18:39 Erik Hatcher wrote: > > The integration won't be too painful... the main thing is that Solr > requires* some configuration files, literally on the filesystem, in > order to fire up and be happy. And you'll need to craft Solr's > schema.xml to jive with how you

boosting results with a field from the index

2006-01-03 Thread Klaus Hubert
Hi and a Happy New Year! I created a lucene index with 2 fields (text and importance). The text contains the real text and importance is a field where I manually give a number between 1 and 5 for the related document. When I search the index I find the documents with the highest revelancy weig

Re: boosting results with a field from the index

2006-01-03 Thread Grant Ingersoll
Hi Klaus, You might want to just set the boost value of the Document using your importance number, then Lucene will factor that in automatically when scoring. See the Document#setBoost javadoc for info. You could also sort on the field, I think, so that the more important docs come to the t

Re: boosting results with a field from the index

2006-01-03 Thread Yonik Seeley
Take a look at FunctionQuery http://issues.apache.org/jira/browse/LUCENE-446 It can do relevancy+importance, but not relevancy*importance with the provided classes. It shouldn't be too hard to do the multiplication though. You could also boost the field or document at index time. That gives you

RE: boosting results with a field from the index

2006-01-03 Thread Klaus Hubert
Sent: Tuesday, January 03, 2006 5:26 PM To: java-user@lucene.apache.org Subject: Re: boosting results with a field from the index Hi Klaus, You might want to just set the boost value of the Document using your importance number, then Lucene will factor that in automatically when scoring. Se