I have tried it. And an unnamed large customer of ours has tried it with good results. That isnt much of a track record yet but it is encouraging.
All of this use so far is as part of k-nearest neighbor work so there isn't a comparison for pure clustering. Also, this work is all at 10-50 dimensions so you are likely to want to put a random orthogonal projection in front of this if you have very high dimension. Whether such a projection is required or desirable for text is another interesting question. My instinct is that it won't be necessary. A projection might change the speed but I don't know if it would be better or worse for very sparse inputs. Sent from my iPhone On May 14, 2012, at 10:22 PM, Jiaan Zeng <l.alle...@gmail.com> wrote: > Hi ALL, > > Has anyone tried Dunning's large scale k-means > (https://github.com/tdunning/knn)? It looks pretty interesting. > > It looks like it does not have a working map reduce version yet > although the doc states the implementation is straight forward. If > anyone tried that implementation, could you please share some > performance numbers (e.g. size of data, running time, cluster > quality)? I am curious how well this cluster algorithm does since it > is only an approximation of the traditional kmeans. Are there any > error boundary? > > -- > Regards, > Jiaan