Thanks Suneel, Can someone please explain me a litlte bit about the ClusteringPolicy and the clusterClassifier? and what are the benefits when using it with parallel K-Means?
Thank you so much, Best regards. > Date: Tue, 18 Mar 2014 04:35:14 -0700 > From: suneel_mar...@yahoo.com > Subject: Re: Mahout parallel K-Means - algorithms analysis > To: user@mahout.apache.org > > Canopy and KMeans run independently and do not call eachother. > > For KMEans, the K value has to be specified when invoking KMeans. > > Typically u run Canopy first and then invoke KMeans with the appropriate > K-value as inferred from Canopy. > > > > > > > > On Tuesday, March 18, 2014 4:33 AM, hiroshi leon <hiroshi_8...@hotmail.com> > wrote: > > Thank you Wei and Suneel, > > By the way, does somebody know if the Parallel K-means of Mahout is using > Cannopy clustering at the beginning to generate the initial K in the K-Means > driver class? > > Best regards, > > Hiroshi > > > Date: Mon, 17 Mar 2014 13:05:01 -0700 > > Subject: Re: Mahout parallel K-Means - algorithms analysis > > From: weish...@gmail.com > > To: user@mahout.apache.org > > CC: ted.dunn...@gmail.com > > > > You could take a look > > at org.apache.mahout.clustering.classify/ClusterClassificationMapper > > > > Enjoy, > > Wei Shung > > > > > > On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi > > <suneel_mar...@yahoo.com>wrote: > > > > > The clustering code is cimapper and cireducer. Following the clustering, > > > there is cluster classification which is mapper only. > > > > > > Not sure about the reference paper, this stuffs been around for long but > > > the documentation for kmeans on mahout.apache.org should explain the > > > approach. > > > > > > Sent from my iPhone > > > > > > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hiroshi_8...@hotmail.com> > > > wrote: > > > > > > > > Hello Ted, > > > > > > > > Thank you so much for your reply, the program that I was checking is the > > > KMeansDriver class with the run function, > > > > the buildCluster function in the same class and following the > > > ClusterIterator class with > > > > the iterateMR function. > > > > > > > > I would like to know how where can I check the code that is implemented > > > for the mapper and the > > > > reducer? is it in the CIMappper.class and CIReducer.class? > > > > > > > > Is there a research paper or pseudo-code in which Mahout parallel > > > K-means was based on? > > > > > > > > Thank you so much and have a nice day. > > > > > > > > Best regards > > > > > > > > > > > >> From: ted.dunn...@gmail.com > > > >> Date: Sat, 15 Mar 2014 13:56:56 -0700 > > > >> Subject: Re: Mahout parallel K-Means - algorithms analysis > > > >> To: user@mahout.apache.org > > > >> > > > >> We would love to help. > > > >> > > > >> Can you say which program and which classes you are looking at? > > > >> > > > >> > > > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon < > > > hiroshi_8...@hotmail.com>wrote: > > > >> > > > >>> To whom it may correspond, > > > >>> > > > >>> Hello, I have been checking the algorithm of Mahout 0.9 version > > > >>> k-means > > > >>> using MapReduce and I would like to know where can I check the code of > > > >>> what is happening inside the map function and in the reducer? > > > >>> > > > >>> > > > >>> I was debugging using NetBeans and I was not able to find what is > > > exactly > > > >>> implemented in the Map and Reduce functions... > > > >>> > > > >>> > > > >>> > > > >>> The reason what I am doing this is because I would like to know what > > > >>> is exactly implemented in the version of Mahout 0.9 in order to see > > > >>> which parts where optimized on the K-Means mapReduce algorithm. > > > >>> > > > >>> > > > >>> > > > >>> Do you know which research paper the Mahout K-means was based on or > > > where > > > >>> can I read the pseudo code? > > > >>> > > > >>> > > > >>> > > > >>> Thank you so much! > > > >>> > > > >>> > > > >>> > > > >>> Best regards! > > > >>> > > > >>> Hiroshi > > > > > > >