Floating point precision is not an issue with any of these metrics since the
counts you are dealing with are never large enough for the statistical
uncertainty (roughly sqrt(number of observations)) to outweigh the numerical
accuracy (roughly 10^-7 for float 10^-17 for double).
A much large proble
One thing I found very irritating when using cosine or numbers in the range
0,1 is that sometimes two distinct items have very small values of distance
when you inspect them. I am always worried that precision of float is not
enough to capture that small detail that makes the difference of accept o
As distance goes, I prefer either angle in the 0 to pi range or Euclidean
distance in the range 0 to 2. You are correct that it is weird that most
things are at distance pi/2 or 1, but that is the price of living on an
n-sphere.
For similarity, the only thing that really matters is that 0 is real
On Sat, Dec 26, 2009 at 2:47 PM, Ted Dunning wrote:
> One minor additional point is that you might want to use (1-cos)/2 in order
> to get a result in [0,1].
>
For distance, yeah, this can be fine, but for vectors which can have
negative
components, I don't like doing that with similarity (where
One minor additional point is that you might want to use (1-cos)/2 in order
to get a result in [0,1].
On Sat, Dec 26, 2009 at 1:32 PM, Jake Mannix wrote:
> On Sat, Dec 26, 2009 at 12:18 PM, Ted Dunning
> wrote:
>
> > These are fine as distance measures. It is also common to use
> > sqrt(1-cos^
On Sat, Dec 26, 2009 at 12:18 PM, Ted Dunning wrote:
> These are fine as distance measures. It is also common to use
> sqrt(1-cos^2)
> which is more like an angle, but 1-cos is good enough for almost anything.
>
> With normal text, btw, all of the coordinates are positive so the largest
> possib
These are fine as distance measures. It is also common to use sqrt(1-cos^2)
which is more like an angle, but 1-cos is good enough for almost anything.
With normal text, btw, all of the coordinates are positive so the largest
possible angle is pi/2 (cos = 0, sin = 1).
On Sat, Dec 26, 2009 at 10:5
Anti parallel concept doesnt come in text data. Where all the weights are
positive. Think about it, you really cant have a document where the word
apple occurs -3 times. But if you consider data which actually have -ve
weights(I also havent encounted any such). Then the measure is subject to
interp
Sorry, misfire! I've usually tried to maximize similarity, without ever
using abs, even on text. Antiparallel is dissimilar, no?
On Dec 26, 2009 11:12 AM, "Jake Mannix" wrote:
I've never treated text any differently, and
> > On Dec 26, 2009 10:54 AM, "Robin Anil" wrote: > >
I ran Cosine and
I've never treated text any differently, and
On Dec 26, 2009 10:54 AM, "Robin Anil" wrote:
I ran Cosine and tanimoto distance measure ( d = 1 - similarity measure) on
the following vector pairs
(-1, -1) and (3,3) Cosine : 2.0
Tanimoto: 1.2307692307692308
(1, 1) and (3,3) Cosine : 0.0
Tanimoto
I ran Cosine and tanimoto distance measure ( d = 1 - similarity measure) on
the following vector pairs
(-1, -1) and (3,3) Cosine : 2.0
Tanimoto: 1.2307692307692308
(1, 1) and (3,3) Cosine : 0.0
Tanimoto: 0.5714285714285714
(1, 8) and (8,1) Cosine : 0.7538461538461538
Tanimoto: 0.8596491228070
11 matches
Mail list logo