Re: Can anybody explain the distance method in SquaredEuclideanDistanceMeasure?

Ted Dunning Fri, 04 Nov 2011 09:36:30 -0700

2011/11/4 Grant Ingersoll <[email protected]>

> > Hi All  I'm reading the code of SquaredEuclideanDistanceMeasure, the
> "distance(double centroidLengthSquare, Vector centroid, Vector v)" method
> confused me a lot, i don't know why we choose this expression
> "centroidLengthSquare - 2 * v.dot(centroid) + v.getLengthSquared()" to
> calculate the distance?
>
> Math is as Sebastian said, the purpose is that it can save some steps in
> the computation, for instance, see the KMeans implementation where you are
> often calculating the distance between some point and a centroid.  We have
> the centroidLengthSquare handy, so it saves a fair amount of work.



Grant is right, but he is soft-pedaling the savings.

c^2 is known ahead.  v is very, very sparse.  Thus c v is very sparse and
has the same sparsity pattern as v^2 (which can even be cached).   On the
other hand c-v is much less sparse.

The difference looks minor when you first write it down, but it can easily
be a factor of 1000 or more.

Re: Can anybody explain the distance method in SquaredEuclideanDistanceMeasure?

Reply via email to