Cool, So I used SweetSpotSimilarity with default params and I see some improvements. However, I could still see some of the 'stuffed' documents coming up in the results. I feel that SweetSpotSimilarity alone is not enough. Going through http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf I figure out that there are other things - Pivoted Length Normalization and term frequency normalization that needs fine tuning too.
Should I create a custom Similarity Class that overrides all the default behavior? I guess that should help me get more relevant results. Where should I start beginning with it? Pl. do not assume less obvious things, I am still learning !! :-) *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny> On Thu, Jul 28, 2011 at 17:03, Gora Mohanty <g...@mimirtech.com> wrote: > On Thu, Jul 28, 2011 at 3:48 PM, Pranav Prakash <pra...@gmail.com> wrote: > [...] > > I am not sure how to use SweetSpotSimilarity. I am googling on this, but > > any useful insights are so much appreciated. > > Replace the existing DefaultSimilarity class in schema.xml (look towards > the bottom of the file) with the SweetSpotSimilarity class, e.g., have a > line > like: > <similarity class="org.apache.lucene.search.SweetSpotSimilarity"/> > > Regards, > Gora >