There's two ways to "tweak" MLT. Use the parameters (such as minimum term frequency) and so on, or use stop words when indexing.
I'd suggest you try those as a means to improve quality! Upayavira On Tue, Jul 14, 2015, at 09:28 AM, Zheng Lin Edwin Yeo wrote: > Thanks for your advice. I've indexed more content in and it's working > better now. Not all the index will be returned everytime now. > > However, I found that the longer documents will tend to have a higher > score > than those shorter documents, even though the shorter documents is > suppose > to have a better match (more similar) to the query than the longer > documents. Is it because of words like "and", "the", etc that causes the > score of the longer documents to increase? > > Is there anyway to configure this so that I can get the shorter documents > to have a higher score if they are of better match, or is it just more > indexes will solve this problem? > > > Regards, > Edwin > > > > On 14 July 2015 at 15:40, Upayavira <u...@odoko.co.uk> wrote: > > > Look at your "interesting terms". If your index is too small, it will > > consider words like "and", "the", etc to be "interesting" and form a > > part of the query, thus returning your entire index, which doesn't help. > > > > Effectively what MLT does is attempt to pick the 25 (configurable) best > > terms in the source document and forms a Lucene query based upon them. > > It takes the frequency of the terms in your index and in the document > > into account when scoring the terms (much like TF/IDF). For this to > > really work, you need a reasonable amount of content. > > > > Upayavira > > > > On Tue, Jul 14, 2015, at 07:40 AM, Zheng Lin Edwin Yeo wrote: > > > Hi, > > > > > > I'm using Solr 5.2.1 and I'm trying to implement MoreLikeThis feature in > > > Solr. > > > > > > But the results that I've been getting for the MoreLikeThis has not been > > > accurate so far. I've been getting the entire documents in the collection > > > returned in the "response" section even though the documents has no > > > similar > > > match to my query. > > > > > > For example, if I have 10 records in the collections, 1 will be under the > > > "match" section, while the other 9 will be under the "response" section, > > > even though there's only 1 or 2 that's related to the one under the > > > "match" > > > section. > > > > > > Below is my configuration in solrconfig.xml: > > > > > > <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" > > > > <lst name="defaults"> > > > <str name="echoParams">explicit</str> > > > <str name="wt">json</str> > > > <str name="indent">true</str> > > > <str name="defType">edismax</str> > > > <str name="fl">id, score</str> > > > <str name="mlt.qf"> > > > Objective^20.0 Summary^10.0 > > > </str> > > > > > > <str name="df">Summary</str> > > > <str name="mlt.fl">Objective,Summary</str> > > > <str name="mlt.mintf">2</str> > > > <str name="mlt.mindf">5</str> > > > <str name="mlt.maxqt">10</str> > > > <str name="mlt.count">10</str> > > > <str name="mlt.boost">true</str> > > > <str name="mlt.interestingTerms">details</str> > > > </lst> > > > </requestHandler> > > > > > > > > > Regards, > > > Edwin > >