Thanks Upayavira for a quick reply,

True what you have said. But the example you have given is more of
descriptive nature.
Will the same argument apply if field length can't be more than some
threshold (say 10) like in case of "Title"?
"title" are generally short in length.

Consider following examples (mentioning only title field only):-

1) solr
2) Introduction to solr 3.4
3) solr big data

search query q = solr

wouldn't it be good to give  all the 3 documents same textual score?
Later, They can be re-ranked based on some other feature like year
of publishing?

Will it be good to switch off fieldNorm for "title"?

Regards,
Karan Jindal






On Mon, Oct 14, 2013 at 1:14 PM, Upayavira <u...@odoko.co.uk> wrote:

> You search for the word "jack". Which of these three field values best
> matches?
>
> 1) Jack is great.
> 2) Billy was a young man. Billy studied well and lived well. Jack
> didn't. Billy went travelling and had a great time.
> 3) Billy didn't actually like Jack. Jack could at times be difficult.
> Jack would get angry at the smallest things.
>
> Forgetting algorithms, I'd say #1 is a good match, as is #3, but #2
> isn't so good.
>
> So, now looking at how field length normalisation can play into this, we
> can note that:
>
> 1) term frequency for "Jack" is 1. Field length is 3.
> 2) term frequency for "Jack" is 1. Field length is 21.
> 3) term frequency for "Jack" is 3. Field length is 19.
>
> If you only take term frequency into account, #1 and #2 would be equal
> matches, and #3 would be a better match than #1.
>
> However, if you include the field length in your calculations, you can
> reach a better approximation to our original proposition, that #1 and #3
> are better matches.
>
> This simply means the longer the field, the lower the score. For longer
> fields with high term frequencies, the high term frequency can
> counteract the effect of the longer field, giving a (hopefully) similar
> score to a shorter field with fewer term occurrences.
>
> Upayavira
>
> On Mon, Oct 14, 2013, at 08:33 AM, Karan jindal wrote:
> > Hi all,
> >
> > I have a general query about fieldNorm
> > Is it advisable to use fieldNorm (which kinds of gives importance to
> > shorter length fields).
> > Is there any set of standard factors on which the decision of turning
> > fieldNorm on/off can be taken?
> >
> > *In my use case:-*
> > I have a user generated data and primarily there are two searchable
> > fields
> > "title"  and "description" apart from certain other filter flags.
> >
> > It is up to user to make "title" short or long?
> > What will be best is this case?
> >
> > Regards,
> > Karan Jindal
>

Reply via email to