Note that the dot product in the vector space world is heavily assoicated with the
concept of correlation coeficiet n statistics:
"A correlation coefficient is a number between -1 and 1 which measures the degree to
which two variables are linearly related. If there is perfect linear relationshi
A colleague of mine just remarked that the indexing problem for
geographical retrieval is a solved problem. One algorithm is specified
in this book, Machine Learning by Tom Mitchell:
http://www.amazon.com/exec/obidos/ASIN/0070428077/qid=1099244886/sr=2-1/
ref=pd_ka_b_2_1/102-6518692-8636163
This
I for one would love to have this functionality, i.e. would use it
immediately if available and efficient. It seems the biggest problem is
how you are going to index the information. If you store and index the
latitude and longitude for a geographically-positioned document, and
then want to find
Hello,
I'm new here, so first of all I'd like to say hello to everyone.
So, hi there...
I just spent two days trying to get Lucene to handle "geographically
constricted" searches for our website. (Check out www.localharvest.org)
I got close, but no cigar. (it works, but is very slow)
We need
Good point on the irrelevance of the non-query-hyperspace document
directions to the hyperplane distance. These other coordinates do
affect the angle to the query vector, but not the distance to the
query-orthogonal hyperplane.
My problem with the units actually arose from the tf's and especially
Exactly, which is what I've proposed. As pointed out in my analysis,
the boost-weighted normalization will not change the order of the
results currently computed, just the magnitudes of the final scores.
Chuck
> -Original Message-
> From: Christoph Goller [mailto:[EMAIL PROTECTED]
Chuck Williams schrieb:
Addendum: I forgot probably the most important point. The current
normalization in Hits changes the final score so that it is not the
distance to the query-orthogonal hyperplane. This normalization renders
the final score ambiguous, and more confused. It's ambiguous sinc
Chuck Williams schrieb:
That's an interesting point that helps to better analyze the situation.
It seems to me the units are arbitrary and so the distance in this case
is not very meaningful. I don't believe Lucene actually uses the
document vector -- it uses the orthogonal projection of the docum
Addendum: I forgot probably the most important point. The current
normalization in Hits changes the final score so that it is not the
distance to the query-orthogonal hyperplane. This normalization renders
the final score ambiguous, and more confused. It's ambiguous since the
normalization may
That's an interesting point that helps to better analyze the situation.
It seems to me the units are arbitrary and so the distance in this case
is not very meaningful. I don't believe Lucene actually uses the
document vector -- it uses the orthogonal projection of the document
vector into the hype
It seems that the attatched jpeg got deleted somehow.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I looked at the scoring mechanism more closely again. Some of you may
remember that there was a discussion about this recently. There was
especially some argument about the theoretical justification of
the current scoring algorithm. Chuck proposed that at least from
a theoretical perspective it wou
12 matches
Mail list logo