On Monday 14 March 2011 17:27:05 Jonathan Rochkind wrote:
> Aha.  Yeah, I've read the documentation several times,but still find
> myself confused.
> 
> But do I understand this right now:
> 
> If I do omitNorms="true", but still leave "term freq and positions" in
> default case (ie, NOT omitTermFreqAndPositions="true") ... then a
> document with more occurences of a search term will still score higher,
> but it'll just be a factor of the raw number of times it occurs, and not
> the percentage of the total field it covers -- that is, N occurences in
> a short field value will be scored exactly the same as N occurences in a
> different document with a longer field value.

Yes, if you omitNorms but still use TF (which you do) then (without 
considering other score influencing parameters) documents with the same number 
of occurences will have the same score.

In debugQuery you'll always see tf=1 if you use omitTermFreqAndPositions. If 
you use omitNorms you'll always see a norm of 1.

> 
> Phew, this stuff is hard for me to talk about clearly. If that made any
> sense, do I have it right?  If so, that's exactly what I want to try
> out, excellent.
> 
> On 3/14/2011 10:48 AM, Markus Jelsma wrote:
> > You can use omitNorms="true" for any given field. Length normalization
> > will be disabled and index-time boosting will not be available any more.
> > 
> > TermFrequencies can also be disabled by setting
> > omitTermFreqAndPositions="true" for any given field. Omitting TF can be
> > very useful if you need an easy way to prevent spam documents from
> > hijacking the score (if you sort on score of course).
> > 
> > http://wiki.apache.org/solr/SchemaXml
> > 
> > On Monday 14 March 2011 15:39:47 Jonathan Rochkind wrote:
> >> On 3/13/2011 6:24 PM, Ahmet Arslan wrote:
> >>> http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/S
> >>> im ilarity.html#formula_norm
> >>> 
> >>> I can see that the one with 5 matches is longer than the other. Shorter
> >>> documents are favored in solr/lucene with length normalization factor.
> >> 
> >> Is there any easy way to turn this off for a given field?  That is, I
> >> think, to still have the iDF be used, but not the TF. Maybe that's it.
> >> but anyway, to turn off document length normalization, but only for a
> >> certain field?
> >> 
> >> I'm not sure if that's what useNorms does, or if useNorms does _more_
> >> than this, including some things I wouldn't want, or if there is some
> >> other parameter that would do this instead?
> >> 
> >> Thanks for any advice,
> >> 
> >> Jonathan

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to