Cool, So I used SweetSpotSimilarity with default params and I see some
improvements. However, I could still see some of the 'stuffed' documents
coming up in the results. I feel that SweetSpotSimilarity alone is not
enough. Going through
http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
On Thu, Jul 28, 2011 at 3:48 PM, Pranav Prakash wrote:
[...]
> I am not sure how to use SweetSpotSimilarity. I am googling on this, but
> any useful insights are so much appreciated.
Replace the existing DefaultSimilarity class in schema.xml (look towards
the bottom of the file) with the SweetSpo
On Thu, Jul 28, 2011 at 08:31, Chris Hostetter wrote:
>
> : Presumably, they are doing this by increasing tf (term frequency),
> : i.e., by repeating keywords multiple times. If so, you can use a custom
> : similarity class that caps term frequency, and/or ensures that the
> scoring
> : increases
: Presumably, they are doing this by increasing tf (term frequency),
: i.e., by repeating keywords multiple times. If so, you can use a custom
: similarity class that caps term frequency, and/or ensures that the scoring
: increases less than linearly with tf. Please see
in paticular, using someth
On Wed, Jul 27, 2011 at 7:15 PM, Pranav Prakash wrote:
> I guess most of you have already handled and many of you might still be
> handling keyword stuffing. Here is my scenario. We have a huge index
> containing about 6m docs. (Not sure if that is huge :-) And every document
> contains title, des