Re: Dealing with keyword stuffing

Pranav Prakash Fri, 29 Jul 2011 05:37:37 -0700

Cool, So I used SweetSpotSimilarity with default params and I see some
improvements. However, I could still see some of the 'stuffed' documents
coming up in the results. I feel that SweetSpotSimilarity alone is not
enough. Going through
http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf I figure out
that there are other things - Pivoted Length Normalization and term
frequency normalization that needs fine tuning too.

Should I create a custom Similarity Class that overrides all the default
behavior? I guess that should help me get more relevant results. Where
should I start beginning with it? Pl. do not assume less obvious things, I
am still learning !! :-)

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>

On Thu, Jul 28, 2011 at 17:03, Gora Mohanty <g...@mimirtech.com> wrote:

> On Thu, Jul 28, 2011 at 3:48 PM, Pranav Prakash <pra...@gmail.com> wrote:
> [...]
> > I am not sure how to use SweetSpotSimilarity. I am googling on this, but
> > any useful insights are so much appreciated.
>
> Replace the existing DefaultSimilarity class in schema.xml (look towards
> the bottom of the file) with the SweetSpotSimilarity class, e.g., have a
> line
> like:
>  <similarity class="org.apache.lucene.search.SweetSpotSimilarity"/>
>
> Regards,
> Gora
>

Re: Dealing with keyword stuffing

Reply via email to