Alessandro,

I'd suggest you review the code of the MoreLikeThisHandler. It is a
little knotty, but it would be worth your while understanding what is
going on there.

Basically, there are three phases:

phase #1: parse the source document into a list of terms (avoided if
term vectors enabled and source doc is in index)
phase #2: calculate a score for each of these terms and select the n
highest scoring ones (default 25)
phase #3: build and execute a boolean query using these 25 terms

Phase #2 uses a TF/IDF like approach to calculate the scores for those
"interesting terms".

Once you understand what MLT is doing, you will probably not find it so
hard to create your own version which is better suited to your own
use-case.

Of course, this would probably be better constructed as a QueryParser
rather than a request handler, but that's a detail.

Upayavira

On Fri, Sep 25, 2015, at 11:08 AM, Alessandro Benedetti wrote:
> Hi guys,
> was just investigating a little bit in how to include numeric fields in
> the
> MLT calculations.
> 
> As we know, we are currently building a smart lucene query based on the
> document in input ( the one to search for similar ones) and run this
> query
> to obtain the similar docs.
> Because the MLT is currently built on TF/IDF , it is mainly thought for
> textual fields.
> What about we want to include a numeric factor  in the similarity
> calculus ?
> 
> e.g.
> Solr Document ( Hotel)
> mlt.fl=description,stars,trip_advisor_rating
> 
> To find the similarity based not only on the description, but also on the
> numeric fields ( stars and rating) .
> 
> The first thought I had , is to add a support for boosting functions.
> In this way we are more flexible and we can add how many functions we
> want.
> 
> For example adding :
> bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))
> 
> Also other kind of functions can be applied.
> What do you think ? Do you have any alternative ideas ?
> 
> Cheers
> -- 
> --------------------------
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England

Reply via email to