RE: About Hit Scoring

2004-10-31 Thread Joaquin Delgado
Note that the dot product in the vector space world is heavily assoicated with the concept of correlation coeficiet n statistics: "A correlation coefficient is a number between -1 and 1 which measures the degree to which two variables are linearly related. If there is perfect linear relationshi

RE: GIS

2004-10-31 Thread Chuck Williams
A colleague of mine just remarked that the indexing problem for geographical retrieval is a solved problem. One algorithm is specified in this book, Machine Learning by Tom Mitchell: http://www.amazon.com/exec/obidos/ASIN/0070428077/qid=1099244886/sr=2-1/ ref=pd_ka_b_2_1/102-6518692-8636163 This

RE: GIS

2004-10-31 Thread Chuck Williams
I for one would love to have this functionality, i.e. would use it immediately if available and efficient. It seems the biggest problem is how you are going to index the information. If you store and index the latitude and longitude for a geographically-positioned document, and then want to find

GIS

2004-10-31 Thread Guillermo Payet
Hello, I'm new here, so first of all I'd like to say hello to everyone. So, hi there... I just spent two days trying to get Lucene to handle "geographically constricted" searches for our website. (Check out www.localharvest.org) I got close, but no cigar. (it works, but is very slow) We need

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
Good point on the irrelevance of the non-query-hyperspace document directions to the hyperplane distance. These other coordinates do affect the angle to the query vector, but not the distance to the query-orthogonal hyperplane. My problem with the units actually arose from the tf's and especially

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
Exactly, which is what I've proposed. As pointed out in my analysis, the boost-weighted normalization will not change the order of the results currently computed, just the magnitudes of the final scores. Chuck > -Original Message- > From: Christoph Goller [mailto:[EMAIL PROTECTED]

Re: About Hit Scoring

2004-10-31 Thread Christoph Goller
Chuck Williams schrieb: Addendum: I forgot probably the most important point. The current normalization in Hits changes the final score so that it is not the distance to the query-orthogonal hyperplane. This normalization renders the final score ambiguous, and more confused. It's ambiguous sinc

Re: About Hit Scoring

2004-10-31 Thread Christoph Goller
Chuck Williams schrieb: That's an interesting point that helps to better analyze the situation. It seems to me the units are arbitrary and so the distance in this case is not very meaningful. I don't believe Lucene actually uses the document vector -- it uses the orthogonal projection of the docum

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
Addendum: I forgot probably the most important point. The current normalization in Hits changes the final score so that it is not the distance to the query-orthogonal hyperplane. This normalization renders the final score ambiguous, and more confused. It's ambiguous since the normalization may

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
That's an interesting point that helps to better analyze the situation. It seems to me the units are arbitrary and so the distance in this case is not very meaningful. I don't believe Lucene actually uses the document vector -- it uses the orthogonal projection of the document vector into the hype

About Hit Scoring

2004-10-31 Thread Christoph Goller
It seems that the attatched jpeg got deleted somehow. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

About Hit Scoring

2004-10-31 Thread Christoph Goller
I looked at the scoring mechanism more closely again. Some of you may remember that there was a discussion about this recently. There was especially some argument about the theoretical justification of the current scoring algorithm. Chuck proposed that at least from a theoretical perspective it wou