Maximum Term Frequency and Minimum Document Length

Jonah Schwartz Wed, 04 Feb 2009 16:29:54 -0800

We want to configure solr so that fields are indexed with a maximum term
frequency and a minimum document length. If a term appears more than N times
in a field it will be considered to have appeared only N times. If a
document length is under M terms, it will be considered to exactly M terms.
We have done this in the past in raw Lucene by writing a Similarity class
like this:


public class LimitingSimilarity extends DefaultSimilarity {
   public float lengthNorm(String fieldName, int numTerms) {
       return super.lengthNorm(fieldName, Math.max(minNumTerms, numTerms));
   }
   public float tf(float freq) {
       freq = Math.min(maxTermFrequency,freq);
       return super.tf(freq);
   }
}


Is there a better way to this within solr configuration files?

Thanks,
Jonah

Maximum Term Frequency and Minimum Document Length

Reply via email to