Re: Dealing with keyword stuffing

Chris Hostetter Wed, 27 Jul 2011 20:02:20 -0700

: Presumably, they are doing this by increasing tf (term frequency),
: i.e., by repeating keywords multiple times. If so, you can use a custom
: similarity class that caps term frequency, and/or ensures that the scoring
: increases less than linearly with tf. Please see


in paticular, using something like SweetSpotSimilarity tuned to know what 
values make sense for "good" content in your domain can be useful because 
it can actaully penalize docsuments that are too short/long or have term 
freqs that are outside of a reasonble expected range.

FWIW though: that's really just a generic answer to a generic question.  
the better you understand your data, the better you can configure solr for 
it -- and that goes equally for the advice people can give you about how 
to configure solr.  you haven't given any information about hte nature of 
your data: the types of documets, the authoritaive source, the fields 
involved, where/how/when people edit this data, who is keyword spamming, 
etc.; or how you wnat to use it: what types of queries you need to 
support, what your users objectives are, etc.  That makes it impossible 
for anyone to suggest anything but the most general answer "customize" 
your Similarity.

-Hoss

Re: Dealing with keyword stuffing

Reply via email to