I took the plunge and rendered a few plots in R with how the
parameters of streaming-k-means evolve.
Here's the link [1].
[1] https://github.com/dfilimon/knn/wiki/skm-visualization
On Thu, Dec 6, 2012 at 2:01 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Still not that odd if several clusters
Yeah... very useful. Clearly the adaptive limit on the number of surrogate
points is much too restrictive.
On Fri, Dec 7, 2012 at 1:21 AM, Dan Filimon dangeorge.fili...@gmail.comwrote:
I took the plunge and rendered a few plots in R with how the
parameters of streaming-k-means evolve.
Here's
Okay, please disregard the previous e-mail.
That hypothesis is toast; clustering works just fine with ball k-means.
So, the problem lies in streaming k-means somewhere.
On Thu, Dec 6, 2012 at 12:06 AM, Dan Filimon
dangeorge.fili...@gmail.com wrote:
Hi,
One of the most basic tests for
How many clusters are you talking about?
If you pick a modest number then streaming k-means should work well if it
has several times more surrogate points than there are clusters.
Also, typically a hyper-cube test works with very small cluster radius.
Try 0.1 or 0.01. Otherwise, your clusters
I wanted there to be 2^d clusters. I was wrong and didn't check: the
radius is in fact 0.01.
What's happening is that for 10 dimension, I was expecting ~1024
clusters (or at least have small distances) but StreamingKMeans fails
on both accounts.
BallKMeans does in fact get the clusters.
So, yes,
IN order to succeed here, SKM will need to have maxClusters set to 20,000
or so.
The maximum distance between clusters on a 10d hypercube is sqrt(10) = 3.1
or so. If three clusters get smashed together, then you have a threshold
of 1.4 or so.
On Thu, Dec 6, 2012 at 12:22 AM, Dan Filimon
Ahh... this may also be a problem.
You should get better results with a Brute searcher here, but a
ProjectionSearcher with lots of projections may work well.
On Thu, Dec 6, 2012 at 12:22 AM, Dan Filimon dangeorge.fili...@gmail.comwrote:
So, yes, it's probably a bug of some kind since I end up
But the weight referred to is the distance between a centroid and the
mean of a distribution (a cube vertice).
This should still be very small (also BallKMeans gets it right).
On Thu, Dec 6, 2012 at 1:32 AM, Ted Dunning ted.dunn...@gmail.com wrote:
IN order to succeed here, SKM will need to have
Still not that odd if several clusters are getting squashed. This can
happen if the threshold increases too high or if the searcher is unable to
resolve the cube properly. By its nature, the cube is hard to reduce to a
smaller dimension.
On Thu, Dec 6, 2012 at 12:36 AM, Dan Filimon