There's two ways to "tweak" MLT. Use the parameters (such as minimum
term frequency) and so on, or use stop words when indexing. 

I'd suggest you try those as a means to improve quality!

Upayavira

On Tue, Jul 14, 2015, at 09:28 AM, Zheng Lin Edwin Yeo wrote:
> Thanks for your advice. I've indexed more content in and it's working
> better now. Not all the index will be returned everytime now.
> 
> However, I found that the longer documents will tend to have a higher
> score
> than those shorter documents, even though the shorter documents is
> suppose
> to have a better match (more similar) to the query than the longer
> documents. Is it because of words like "and", "the", etc that causes the
> score of the longer documents to increase?
> 
> Is there anyway to configure this so that I can get the shorter documents
> to have a higher score if they are of better match, or is it just more
> indexes will solve this problem?
> 
> 
> Regards,
> Edwin
> 
> 
> 
> On 14 July 2015 at 15:40, Upayavira <u...@odoko.co.uk> wrote:
> 
> > Look at your "interesting terms". If your index is too small, it will
> > consider words like "and", "the", etc to be "interesting" and form a
> > part of the query, thus returning your entire index, which doesn't help.
> >
> > Effectively what MLT does is attempt to pick the 25 (configurable) best
> > terms in the source document and forms a Lucene query based upon them.
> > It takes the frequency of the terms in your index and in the document
> > into account when scoring the terms (much like TF/IDF). For this to
> > really work, you need a reasonable amount of content.
> >
> > Upayavira
> >
> > On Tue, Jul 14, 2015, at 07:40 AM, Zheng Lin Edwin Yeo wrote:
> > > Hi,
> > >
> > > I'm using Solr 5.2.1 and I'm trying to implement MoreLikeThis feature in
> > > Solr.
> > >
> > > But the results that I've been getting for the MoreLikeThis has not been
> > > accurate so far. I've been getting the entire documents in the collection
> > > returned in the "response" section even though the documents has no
> > > similar
> > > match to my query.
> > >
> > > For example, if I have 10 records in the collections, 1 will be under the
> > > "match" section, while the other 9 will be under the "response" section,
> > > even though there's only 1 or 2 that's related to the one under the
> > > "match"
> > > section.
> > >
> > > Below is my configuration in solrconfig.xml:
> > >
> > > <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" >
> > > <lst name="defaults">
> > > <str name="echoParams">explicit</str>
> > > <str name="wt">json</str>
> > > <str name="indent">true</str>
> > >  <str name="defType">edismax</str>
> > > <str name="fl">id, score</str>
> > > <str name="mlt.qf">
> > >  Objective^20.0 Summary^10.0
> > > </str>
> > >
> > > <str name="df">Summary</str>
> > > <str name="mlt.fl">Objective,Summary</str>
> > > <str name="mlt.mintf">2</str>
> > >                         <str name="mlt.mindf">5</str>
> > > <str name="mlt.maxqt">10</str>
> > > <str name="mlt.count">10</str>
> > > <str name="mlt.boost">true</str>
> > > <str name="mlt.interestingTerms">details</str>
> > > </lst>
> > > </requestHandler>
> > >
> > >
> > > Regards,
> > > Edwin
> >

Reply via email to