Hi,
I have a question about the current Lucene scoring algoritm. In this scoring
algorithm, the term frequency is calcualted by using the square root of the
number of occuring terms as described in
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote:
However, what exactly is the advantage of using sqare root instead
of log?
Speaking anecdotally, I wouldn't say there's an advantage. There's a
predictable effect: very long documents are rewarded, since the
damping factor is not as strong.