Hi guys, was just investigating a little bit in how to include numeric fields in the MLT calculations.
As we know, we are currently building a smart lucene query based on the document in input ( the one to search for similar ones) and run this query to obtain the similar docs. Because the MLT is currently built on TF/IDF , it is mainly thought for textual fields. What about we want to include a numeric factor in the similarity calculus ? e.g. Solr Document ( Hotel) mlt.fl=description,stars,trip_advisor_rating To find the similarity based not only on the description, but also on the numeric fields ( stars and rating) . The first thought I had , is to add a support for boosting functions. In this way we are more flexible and we can add how many functions we want. For example adding : bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB)) Also other kind of functions can be applied. What do you think ? Do you have any alternative ideas ? Cheers -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England