So let me answer point by point :
1) Similarity is misleading here if you interpret it as a probabilistic
measure.
Given a query, it doesn't exist the "Ideal Document". Both with TF-IDF and
BM25 ( that solves the problem better) you are scoring the document. Higher
the score, higher the relevance
Thanks for the reply, Alessandro.
Can you please elaborate on a point "a document which has a score 50% of
the original doc score, it doesn't
mean it is 50% similar"? I did not understand this for two reasons:
1. In the end, we are calculating similarity score between documents when
we are solv
Hi,
I have been personally working a lot with the MoreLikeThis and I am close to
contribute a refactor of that module ( to break up the monolithic giant
facade class mostly) .
First of all the MoreLikeThis handler will return the original document (
not scored) + the similar documents(scored).
The
Hi,
I am using MoreLikeThis handler to get related documents for a given
document. To determine if I am getting good results or not, here is what I
do:
The same original document should be returned as a top match.
If it is not, then there is some problem with the relevancy.
Then, as same input