the simplest scheme is to initialize distributed matrix of the shape D := (0 | A) where A is your dataset and 0 is a single column indicating current centroid assignment and distribute current centroid matrix C via matrix broadcast (assuming there are few enough centers).
Then alternatively run cluster assignment within mapBlock() operator on D with recomputation of new centroids C afterwards. Recomputation of centroids can be done via aggregating transpose. of course a better scheme includes pre-sketching (k-means ||) and use of a triangle inequality during recomputations. On Wed, Mar 29, 2017 at 8:30 AM, KHATWANI PARTH BHARAT < h2016...@pilani.bits-pilani.ac.in> wrote: > Sir, > I am trying to write the kmeans clustering algorithm using Mahout Samsara > but i am bit confused > about how to leverage Distributed Row Matrix for the same. Can anybody help > me with same. > > > > > > Thanks > Parth Khatwani >