PhraseQuery with term positions

2010-01-19 Thread Avi Rosenschein
Hi, I am using PhraseQuery with explicitly set term positions and slop=0, in order to skip stop words. The field in my index is indexed with TermVector positions. When I do a query with stop words skipped, for example "internet for research" (translated into PhraseQuery: "internet ? research"), I

Re: PhraseQuery with term positions

2010-01-19 Thread Avi Rosenschein
upon your needs. > > Erick > > On Tue, Jan 19, 2010 at 6:50 AM, Avi Rosenschein >wrote: > > > Hi, > > > > I am using PhraseQuery with explicitly set term positions and slop=0, in > > order to skip stop words. The field in my index is indexed with > T

Re: If you could have one feature in Lucene...

2010-02-24 Thread Avi Rosenschein
On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll wrote: > What would it be? > For scoring to take into account the non-analyzed token stream. That is, if a field is analyzed (stemmed, lowercased, maybe even stop words removed), that is fine for indexing. But tokens in the query matching the orig

Re: boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)

2010-02-24 Thread Avi Rosenschein
On Wed, Feb 24, 2010 at 11:20 PM, Aaron Lav wrote: > On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote: > > On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll >wrote: > > > > > What would it be? > > > > > > > For scoring to

Re: If you could have one feature in Lucene...

2010-02-25 Thread Avi Rosenschein
> > Similarity can only be set per index, but I want to adjust scoring > behaviour at a field level, to faciliate this could we pass make field name > available to all score methods. > Currently it is only passed to some such as lengthNorm() but not others > such as tf() > > +1 -- Avi

Re: Relevancy Practices

2010-04-30 Thread Avi Rosenschein
On Thu, Apr 29, 2010 at 5:59 PM, Mark Bennett wrote: > Hi Grant, > > You're welcome to use any of my slides (Dave's got them), with attribution > of course. > > BUT > > Have you considered a section something like "why the hell do you think > Relevancy tweaking is gonna save you!?!?" > Basi

Re: Relevancy Practices

2010-05-02 Thread Avi Rosenschein
On 4/30/10, Grant Ingersoll wrote: > > On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >> Also, tuning the algorithms to the users can be very important. For >> instance, we have found that in a basic search functionality, the default >> query parser operator OR works

Re: Relevancy Practices

2010-05-05 Thread Avi Rosenschein
On Wed, May 5, 2010 at 5:08 PM, Grant Ingersoll wrote: > > On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > > > On 4/30/10, Grant Ingersoll wrote: > >> > >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > >>> Also, tuning the algor

Re: BM 25 scoring with lucene

2011-03-08 Thread Avi Rosenschein
The LUCENE-2091.patch file from the jira entry is essentially what we are using. It should work fine. -- Avi 2011/3/2 GĂ©rard Dupont > Hi, > > On 2 March 2011 07:50, Lahiru Samarakoon wrote: > >> Hi All, >> >> Do you have any BM 25 scoring implementation which can be used with >> Lucene? >> > >

Re: tokenizing text using language analyzer but preserving stopwords if possible

2011-12-07 Thread Avi Rosenschein
On Wed, Dec 7, 2011 at 00:41, Ilya Zavorin wrote: > I need to implement a "quick and dirty" or "poor man's" translation of a > foreign language document by looking up each word in a dictionary and > replacing it with the English translation. So what I need is to tokenize > the original foreign te