Re: Mahout parallel K-Means - algorithms analysis

Suneel Marthi Tue, 18 Mar 2014 04:36:55 -0700

Canopy and KMeans run independently and do not call eachother. 

For KMEans, the K value has to be specified when invoking KMeans.


Typically u run Canopy first and then invoke KMeans with the appropriate 
K-value as inferred from Canopy.







On Tuesday, March 18, 2014 4:33 AM, hiroshi leon <hiroshi_8...@hotmail.com> 
wrote:
 
Thank you Wei and Suneel, 

By the way, does somebody know if the Parallel K-means of Mahout is using 
Cannopy clustering at the beginning to generate the initial K in the K-Means 
driver class?

Best regards,

Hiroshi

> Date: Mon, 17 Mar 2014 13:05:01 -0700
> Subject: Re: Mahout parallel K-Means - algorithms analysis
> From: weish...@gmail.com
> To: user@mahout.apache.org
> CC: ted.dunn...@gmail.com
> 
> You could take a look
> at org.apache.mahout.clustering.classify/ClusterClassificationMapper
> 
> Enjoy,
> Wei Shung
> 
> 
> On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi <suneel_mar...@yahoo.com>wrote:
> 
> > The clustering code is cimapper and cireducer.  Following the clustering,
> > there is cluster classification which is mapper only.
> >
> > Not sure about the reference paper, this stuffs been around for long but
> > the documentation for kmeans on mahout.apache.org should explain the
> > approach.
> >
> > Sent from my iPhone
> >
> > > On Mar 15, 2014, at 5:36 PM, hiroshi leon <hiroshi_8...@hotmail.com>
> > wrote:
> > >
> > > Hello Ted,
> > >
> > > Thank you so much for your reply, the program that I was checking is the
> > KMeansDriver class with the run function,
> > > the buildCluster function in the same class and following the
> > ClusterIterator class with
> > > the iterateMR function.
> > >
> > > I would like to know how where can I check the code that is implemented
> > for the mapper and the
> > > reducer? is it in the CIMappper.class and CIReducer.class?
> > >
> > > Is there a research paper or pseudo-code in which Mahout parallel
> > K-means was based on?
> > >
> > > Thank you so much and have a nice day.
> > >
> > > Best regards
> > >
> > >
> > >> From: ted.dunn...@gmail.com
> > >> Date: Sat, 15 Mar 2014 13:56:56 -0700
> > >> Subject: Re: Mahout parallel K-Means - algorithms analysis
> > >> To: user@mahout.apache.org
> > >>
> > >> We would love to help.
> > >>
> > >> Can you say which program and which classes you are looking at?
> > >>
> > >>
> > >> On Sat, Mar 15, 2014 at 12:58 PM, hiroshi leon <
> > hiroshi_8...@hotmail.com>wrote:
> > >>
> > >>> To whom it may correspond,
> > >>>
> > >>> Hello, I have been checking the algorithm of Mahout 0.9 version k-means
> > >>> using MapReduce and I would like to know where can I check the code of
> > >>> what is happening inside the map function and in the reducer?
> > >>>
> > >>>
> > >>> I was debugging using NetBeans and I was not able to find what is
> > exactly
> > >>> implemented in the Map and Reduce functions...
> > >>>
> > >>>
> > >>>
> > >>> The reason what I am doing this is because I would like to know what
> > >>> is exactly implemented in the version of Mahout 0.9 in order to see
> > >>> which parts where optimized on the K-Means mapReduce algorithm.
> > >>>
> > >>>
> > >>>
> > >>> Do you know  which research paper the Mahout K-means was based on or
> > where
> > >>> can I read the pseudo code?
> > >>>
> > >>>
> > >>>
> > >>> Thank you so much!
> > >>>
> > >>>
> > >>>
> > >>> Best regards!
> > >>>
> > >>> Hiroshi
> > >
> >

Re: Mahout parallel K-Means - algorithms analysis

Reply via email to