Lucene scoring: Term frequency normalisation

2006-12-12 Thread Karl Koch
Hi, I have a question about the current Lucene scoring algoritm. In this scoring algorithm, the term frequency is calcualted by using the square root of the number of occuring terms as described in http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf

Re: Lucene scoring: Term frequency normalisation

2006-12-12 Thread Marvin Humphrey
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote: However, what exactly is the advantage of using sqare root instead of log? Speaking anecdotally, I wouldn't say there's an advantage. There's a predictable effect: very long documents are rewarded, since the damping factor is not as strong.