Stuti, I'm answering your questions in order:
1. From MLLib https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L159 *,* you can see that clustering stops when we have reached*maxIterations* or there are no more*activeRuns*. KMeans is executed *runs* times in parallel, and the best clustering found over all *runs* is returned. For each run, the algorithm will stop if:The number of iteration reaches *maxIterations*, orEvery cluster center moved less than*epsilon *in the last iteration. 2. I can't find the source code for Mahout that refer to the "Convergence Threshold" but I suspect the threshold and MLLib's *epsilon*are the same concepts. There is no concept of parallel runs in Mahout. Ref: https://mahout.apache.org/users/clustering/k-means-clustering.html 3. To set MLLib's KMeans to have *epsilon *of 0.1 and then train the model, you can do the following: new KMeans().setK(k).setMaxIterations( maxIterations).setRuns(runs).setInitializationMode(initializationMode) *.setEpsilon(0.1)*.run(data) Enjoy, Long Pham Software Engineer at Adatao, Inc. longp...@adatao.com On May 15, 2014 7:29 PM, "Stuti Awasthi" <stutiawas...@hcl.com> wrote: > Hi All, > > > > Any ideas on this ?? > > > > Thanks > > Stuti Awasthi > > > > *From:* Stuti Awasthi > *Sent:* Wednesday, May 14, 2014 6:20 PM > *To:* user@spark.apache.org > *Subject:* Understanding epsilon in KMeans > > > > Hi All, > > > > I wanted to understand the functionality of epsilon in KMeans in Spark > MLlib. > > > > As per documentation : > > distance threshold within which we've consider centers to have > converged.If all centers move less than this *Euclidean* distance, we > stop iterating one run. > > > > Now I have assumed that if centers are moving less than epsilon value then > Clustering Stops but then what does it mean by “we stop iterating one run”.. > > > Now suppose I have given maxIterations=10 and epsilon = 0.1 and assume > that centers are afteronly 2 iteration, the epsilon condition is met i.e. > now centers are moving only less than 0.1.. > > > > Now what happens ?? The whole 10 iterations are completed OR the > Clustering stops ?? > > > > My 2nd query is in Mahout, there is a configuration param : “Convergence > Threshold (cd)” which states : “in an iteration, the centroids don’t move > more than this distance, no further iterations are done and clustering > stops.” > > > > So is epsilon and cd similar ?? > > > > 3rd query : > > How to pass epsilon as a configurable param. KMeans.train() does not > provide the way but in code I can see “setEpsilon” as method. SO if I want > to pass the parameter as epsilon=0.1 , how may I do that.. > > > > Pardon my ignorance > > > > Thanks > > Stuti Awasthi > > > > > > > > ::DISCLAIMER:: > > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior > written consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > > ---------------------------------------------------------------------------------------------------------------------------------------------------- >