I have the following setup: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="description" type="text" indexed="true" stored="true" multiValued="false" omitNorms="true" />
I index my corpus, and I can see tf is as usual, in this doc is 14 times in this field: 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440) [DefaultSimilarity], result of: 4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 0.14165252 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 0.0016648936 = queryNorm 31.834784 = fieldWeight in 440, product of: 3.7416575 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) Then I modify my schema: <similarity class="solr.SchemaSimilarityFactory"/> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <similarity class="com.customsolr.NoTfSimilarityFactory"/> </fieldType> I just want to disable term freq > 1, so a term its either present or not. public class NoTfSimilarity extends DefaultSimilarity { public float tf(float freq) { return freq > 0 ? 1.0f : 0.0f; } } But I still see tf=14 in my query?? 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of: 723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 85.08203 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = queryNorm 8.5082035 = fieldWeight in 440, product of: 1.0 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) anyone sees what I am missing? I am on solr4.0 thanks xavier