RE: About Hit Scoring

2004-10-31 Thread Joaquin Delgado
Note that the dot product in the vector space world is heavily assoicated with the concept of correlation coeficiet n statistics: "A correlation coefficient is a number between -1 and 1 which measures the degree to which two variables are linearly related. If there is perfect linear relationshi

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
istoph Goller [mailto:[EMAIL PROTECTED] > Sent: Sunday, October 31, 2004 8:55 AM > To: Lucene Developers List > Subject: Re: About Hit Scoring > > Chuck Williams schrieb: > > That's an interesting point that helps to better analyze the situation. > &

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
L PROTECTED] > Sent: Sunday, October 31, 2004 9:02 AM > To: Lucene Developers List > Subject: Re: About Hit Scoring > > Chuck Williams schrieb: > > Addendum: I forgot probably the most important point. The current > > normalization in Hits changes

Re: About Hit Scoring

2004-10-31 Thread Christoph Goller
Chuck Williams schrieb: Addendum: I forgot probably the most important point. The current normalization in Hits changes the final score so that it is not the distance to the query-orthogonal hyperplane. This normalization renders the final score ambiguous, and more confused. It's ambiguous sinc

Re: About Hit Scoring

2004-10-31 Thread Christoph Goller
Chuck Williams schrieb: That's an interesting point that helps to better analyze the situation. It seems to me the units are arbitrary and so the distance in this case is not very meaningful. I don't believe Lucene actually uses the document vector -- it uses the orthogonal projection of the docum

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
ck Williams [mailto:[EMAIL PROTECTED] > Sent: Sunday, October 31, 2004 8:13 AM > To: Lucene Developers List > Subject: RE: About Hit Scoring > > That's an interesting point that helps to better analyze the situation. > It seems to me the units are arbitrary and so the

RE: About Hit Scoring

2004-10-31 Thread Chuck Williams
That's an interesting point that helps to better analyze the situation. It seems to me the units are arbitrary and so the distance in this case is not very meaningful. I don't believe Lucene actually uses the document vector -- it uses the orthogonal projection of the document vector into the hype