Hi guys,
was just investigating a little bit in how to include numeric fields in the
MLT calculations.

As we know, we are currently building a smart lucene query based on the
document in input ( the one to search for similar ones) and run this query
to obtain the similar docs.
Because the MLT is currently built on TF/IDF , it is mainly thought for
textual fields.
What about we want to include a numeric factor  in the similarity calculus ?

e.g.
Solr Document ( Hotel)
mlt.fl=description,stars,trip_advisor_rating

To find the similarity based not only on the description, but also on the
numeric fields ( stars and rating) .

The first thought I had , is to add a support for boosting functions.
In this way we are more flexible and we can add how many functions we want.

For example adding :
bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))

Also other kind of functions can be applied.
What do you think ? Do you have any alternative ideas ?

Cheers
-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to