Re: SpanNearQuery - inOrder parameter

2011-05-19 Thread Doron Cohen
Hi Greg, I created http://issues.apache.org/jira/browse/LUCENE-3120 for this problem, and attached there a more general test that exposes this problem, based on your test case. I am not sure yet that this is indeed a problem to be fixed with regard to span queries (see more there in JIRA) but at

RE: SpanNearQuery - inOrder parameter

2011-05-19 Thread Gregory Tarr
Doron We let our users decide whether they want to force the order or not, so in effect they pass in "inOrder". I would have to detect a repeated term and change the parameter as a result of that in order to workround this - I'd rather not do that though. Thanks Greg -Original Message-

Re: SpanNearQuery - inOrder parameter

2011-05-19 Thread Doron Cohen
Hi Greg, On Thu, May 19, 2011 at 12:26 PM, Gregory Tarr wrote: > We let our users decide whether they want to force the order or not, so > in effect they pass in "inOrder". > > I would have to detect a repeated term and change the parameter as a > result of that in order to workround this - I'd r

Re: Ranking docs with all terms higher

2011-05-19 Thread Michael McCandless
I believe Lucene already does this, with the 'coord' factor in BooleanQuery, which is on by default (ie, if you just "new BooleanQuery()"). Ie your doc c will get a coord factor of 1.0, doc b gets 0.666..., doc a gets 0.. That said, if the term freq is high enough (ie doc a has nacho 4 times)

Re: Ranking docs with all terms higher

2011-05-19 Thread Ian Lea
A little test shows that Mike is correct and lucene does already do this. With norms (default) nacho foo bar, score=0.8660254 foo bar bar, score=0.46461558 nacho nacho nacho nacho, score=0.19245009 Without norms nacho foo bar, score=1.7320508 foo bar bar, score=0.92923117 nacho nacho nacho

Re: Ranking docs with all terms higher

2011-05-19 Thread mark harwood
Of course IDF is a factor too meaning a match on a single rare (to the overall index) term may be worth more than a match on 2 different common (to the index) terms. As Ian suggests a custom Similarity implementation can be used to tune this out. - Original Message From: Ian Lea To: j

Re: Please help me with a basic question...

2011-05-19 Thread Rich Heimann
Thanks Paul, I do not know what duplicates are in this case and it is the denominator of the TF that bothers me more than the numerator of the TF (if that is in fact what you are suggesting). What have been the effects of ignoring the IDF? When is it appropriate. It would seem that by doing so ra

Re: Please help me with a basic question...

2011-05-19 Thread Doron Cohen
Hi Rich, If I understand correctly you are concerned that short documents are preferred too much over long ones, is this really the case? It would help to understand what goes on to look at the Explanation of the score for say two result documents - one that you think is ranked too low, and one tha