Has anyone tried Dunning's large scale k-means
(https://github.com/tdunning/knn)? It looks pretty interesting.

It looks like it does not have a working map reduce version yet
although the doc states the implementation is straight forward. If
anyone tried that implementation, could you please share some
performance numbers (e.g. size of data, running time, cluster
quality)? I am curious how well this cluster algorithm does since it
is only an approximation of the traditional kmeans. Are there any
error boundary?


