Re: Dealing with keyword stuffing

2011-07-29 Thread Pranav Prakash
Cool, So I used SweetSpotSimilarity with default params and I see some improvements. However, I could still see some of the 'stuffed' documents coming up in the results. I feel that SweetSpotSimilarity alone is not enough. Going through http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf

Re: Dealing with keyword stuffing

2011-07-28 Thread Gora Mohanty
On Thu, Jul 28, 2011 at 3:48 PM, Pranav Prakash wrote: [...] > I am not sure how to use SweetSpotSimilarity. I am googling on this, but > any useful insights are so much appreciated. Replace the existing DefaultSimilarity class in schema.xml (look towards the bottom of the file) with the SweetSpo

Re: Dealing with keyword stuffing

2011-07-28 Thread Pranav Prakash
On Thu, Jul 28, 2011 at 08:31, Chris Hostetter wrote: > > : Presumably, they are doing this by increasing tf (term frequency), > : i.e., by repeating keywords multiple times. If so, you can use a custom > : similarity class that caps term frequency, and/or ensures that the > scoring > : increases

Re: Dealing with keyword stuffing

2011-07-27 Thread Chris Hostetter
: Presumably, they are doing this by increasing tf (term frequency), : i.e., by repeating keywords multiple times. If so, you can use a custom : similarity class that caps term frequency, and/or ensures that the scoring : increases less than linearly with tf. Please see in paticular, using someth

Re: Dealing with keyword stuffing

2011-07-27 Thread Gora Mohanty
On Wed, Jul 27, 2011 at 7:15 PM, Pranav Prakash wrote: > I guess most of you have already handled and many of you might still be > handling keyword stuffing. Here is my scenario. We have a huge index > containing about 6m docs. (Not sure if that is huge :-) And every document > contains title, des