Aha. Yeah, I've read the documentation several times,but still find myself confused.

But do I understand this right now:

If I do omitNorms="true", but still leave "term freq and positions" in default case (ie, NOT omitTermFreqAndPositions="true") ... then a document with more occurences of a search term will still score higher, but it'll just be a factor of the raw number of times it occurs, and not the percentage of the total field it covers -- that is, N occurences in a short field value will be scored exactly the same as N occurences in a different document with a longer field value.

Phew, this stuff is hard for me to talk about clearly. If that made any sense, do I have it right? If so, that's exactly what I want to try out, excellent.

On 3/14/2011 10:48 AM, Markus Jelsma wrote:
You can use omitNorms="true" for any given field. Length normalization will be
disabled and index-time boosting will not be available any more.

TermFrequencies can also be disabled by setting
omitTermFreqAndPositions="true" for any given field. Omitting TF can be very
useful if you need an easy way to prevent spam documents from hijacking the
score (if you sort on score of course).

http://wiki.apache.org/solr/SchemaXml

On Monday 14 March 2011 15:39:47 Jonathan Rochkind wrote:
On 3/13/2011 6:24 PM, Ahmet Arslan wrote:
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Sim
ilarity.html#formula_norm

I can see that the one with 5 matches is longer than the other. Shorter
documents are favored in solr/lucene with length normalization factor.
Is there any easy way to turn this off for a given field?  That is, I
think, to still have the iDF be used, but not the TF. Maybe that's it.
but anyway, to turn off document length normalization, but only for a
certain field?

I'm not sure if that's what useNorms does, or if useNorms does _more_
than this, including some things I wouldn't want, or if there is some
other parameter that would do this instead?

Thanks for any advice,

Jonathan

Reply via email to