Re: highlight - scoring fragments with more of the same token

2006-09-26 Thread Chris Hostetter
: TF is not a factor in fragment scores because I found its typically more : useful to look for fragments containing a strong mix of the query terms : - not merely repetitions of the same term. The idea is the choice of : scorer is pluggable if you don't like the default behaviour. Taking a "coor

Re: highlight - scoring fragments with more of the same token

2006-09-26 Thread markharw00d
I was somewhat surprised to find that highlighting scoring simply counts how many unique query terms appear in the fragment. Guess was expecting a See QueryScorer(Query query, IndexReader reader, String fieldName) constructor - this will factor IDF into weighting for terms. Query boosts are aut

Re: highlight - scoring fragments with more of the same token

2006-09-26 Thread Doron Cohen
markharw00d <[EMAIL PROTECTED]> wrote on 26/09/2006 00:11:12: > If you were to score repeated terms then I suspect it would have to be > done so that the repetitions didn't score as highly as the first > occurrence - otherwise f2 could be selected as a better fragment than f3 > for the query q1 in

Re: highlight - scoring fragments with more of the same token

2006-09-26 Thread markharw00d
If you were to score repeated terms then I suspect it would have to be done so that the repetitions didn't score as highly as the first occurrence - otherwise f2 could be selected as a better fragment than f3 for the query q1 in your example. Repetitions of a term in a fragment could be scored a

highlight - scoring fragments with more of the same token

2006-09-25 Thread Doron Cohen
This question was raised in the user's list - http://www.nabble.com/highlighting-tf2322109.html Assume three fragments and two queries: f1 = aa 11 bb 33 cc f2 = aa 11 bb 11 cc f3 = aa 11 bb 22 cc q1 = 11 22 q2 = 11 Now we call highlighter.getBestFragment(q); For q1, f3 is re