Re: Clustering points in a unit hypercube

2012-12-06 Thread Dan Filimon
I took the plunge and rendered a few plots in R with how the parameters of streaming-k-means evolve. Here's the link [1]. [1] https://github.com/dfilimon/knn/wiki/skm-visualization On Thu, Dec 6, 2012 at 2:01 AM, Ted Dunning ted.dunn...@gmail.com wrote: Still not that odd if several clusters

Re: Clustering points in a unit hypercube

2012-12-06 Thread Ted Dunning
Yeah... very useful. Clearly the adaptive limit on the number of surrogate points is much too restrictive. On Fri, Dec 7, 2012 at 1:21 AM, Dan Filimon dangeorge.fili...@gmail.comwrote: I took the plunge and rendered a few plots in R with how the parameters of streaming-k-means evolve. Here's

Re: Clustering points in a unit hypercube

2012-12-05 Thread Dan Filimon
Okay, please disregard the previous e-mail. That hypothesis is toast; clustering works just fine with ball k-means. So, the problem lies in streaming k-means somewhere. On Thu, Dec 6, 2012 at 12:06 AM, Dan Filimon dangeorge.fili...@gmail.com wrote: Hi, One of the most basic tests for

Re: Clustering points in a unit hypercube

2012-12-05 Thread Ted Dunning
How many clusters are you talking about? If you pick a modest number then streaming k-means should work well if it has several times more surrogate points than there are clusters. Also, typically a hyper-cube test works with very small cluster radius. Try 0.1 or 0.01. Otherwise, your clusters

Re: Clustering points in a unit hypercube

2012-12-05 Thread Dan Filimon
I wanted there to be 2^d clusters. I was wrong and didn't check: the radius is in fact 0.01. What's happening is that for 10 dimension, I was expecting ~1024 clusters (or at least have small distances) but StreamingKMeans fails on both accounts. BallKMeans does in fact get the clusters. So, yes,

Re: Clustering points in a unit hypercube

2012-12-05 Thread Ted Dunning
IN order to succeed here, SKM will need to have maxClusters set to 20,000 or so. The maximum distance between clusters on a 10d hypercube is sqrt(10) = 3.1 or so. If three clusters get smashed together, then you have a threshold of 1.4 or so. On Thu, Dec 6, 2012 at 12:22 AM, Dan Filimon

Re: Clustering points in a unit hypercube

2012-12-05 Thread Ted Dunning
Ahh... this may also be a problem. You should get better results with a Brute searcher here, but a ProjectionSearcher with lots of projections may work well. On Thu, Dec 6, 2012 at 12:22 AM, Dan Filimon dangeorge.fili...@gmail.comwrote: So, yes, it's probably a bug of some kind since I end up

Re: Clustering points in a unit hypercube

2012-12-05 Thread Dan Filimon
But the weight referred to is the distance between a centroid and the mean of a distribution (a cube vertice). This should still be very small (also BallKMeans gets it right). On Thu, Dec 6, 2012 at 1:32 AM, Ted Dunning ted.dunn...@gmail.com wrote: IN order to succeed here, SKM will need to have

Re: Clustering points in a unit hypercube

2012-12-05 Thread Ted Dunning
Still not that odd if several clusters are getting squashed. This can happen if the threshold increases too high or if the searcher is unable to resolve the cube properly. By its nature, the cube is hard to reduce to a smaller dimension. On Thu, Dec 6, 2012 at 12:36 AM, Dan Filimon