Re: large scale kmeans

Ted Dunning Mon, 14 May 2012 20:56:27 -0700

I have tried it.  And an unnamed large customer of ours has tried it with good 
results.  That isnt much of a track record yet but it is encouraging.

All of this use so far is as part of k-nearest neighbor work so there isn't a 
comparison for pure clustering.  Also, this work is all at 10-50 dimensions so 
you are likely to want to put a random orthogonal projection in front of this 
if you have very high dimension. 

Whether such a projection is required or desirable for text is another 
interesting question.  My instinct is that it won't be necessary.  A projection 
might change the speed but I don't know if it would be better or worse for very 
sparse inputs. 

Sent from my iPhone

On May 14, 2012, at 10:22 PM, Jiaan Zeng <l.alle...@gmail.com> wrote:

> Hi ALL,
> 
> Has anyone tried Dunning's large scale k-means
> (https://github.com/tdunning/knn)? It looks pretty interesting.
> 
> It looks like it does not have a working map reduce version yet
> although the doc states the implementation is straight forward. If
> anyone tried that implementation, could you please share some
> performance numbers (e.g. size of data, running time, cluster
> quality)? I am curious how well this cluster algorithm does since it
> is only an approximation of the traditional kmeans. Are there any
> error boundary?
> 
> -- 
> Regards,
> Jiaan

Re: large scale kmeans

Reply via email to