Hi Ariya,

DefaultSimilarity does not use raw term frequency, but instead it uses square 
root of raw term frequency.
If you want to observe raw term frequency information in explain section, I 
suggest you to play with
org.apache.lucene.search.similarities.SimilarityBase and its sub-classes.

ahmet




On Thursday, May 21, 2015 3:59 PM, ariya bala <ariya...@gmail.com> wrote:
Hi,

I am puzzled on the Term Frequency Behaviour of the DefaultSimilarity
implementation
I have suppressed the IDF by setting to 1.
TF-IDF would inturn reflect the same value as in Term Frequency

Below are the inferences:
Red coloured are expected to give a hit count(Term Frequency) of 2 but was
one.
*Is it bug or is it how the behaviour is?*

Search Query: AAA BBB
Parsed Query: PhraseQuery(Contents:\"aaa bbb\"~5000)

DocumentContentSlopTFslop0TFslop2TF1AAA BBB-101212BBB AAA-10-213AAA AAA BBB-
101214AAA BBB AAA-201225BBB AAA AAA-10-216AAA BBB BBB-101217BBB AAA BBB-1012
18BBB BBB AAA-10-21

*Am I missing something?!!!!!*


Cheers
*Ariya *

Reply via email to