Yes, both run in parallel. Random is a baseline implementation of
initialization, which may ignore small clusters. k-means++ improves
random initialization by adding weights to points far away to the
current candidates. You can view k-means|| as a more scalable version
of K-means++. We don't provide k-means++ for initialization, but we
used it as part of k-means||. Please check the papers for more
details. -Xiangrui

On Wed, Jul 16, 2014 at 10:27 PM, amin mohebbi <aminn_...@yahoo.com> wrote:
> Thank you for the response-  Can we say that both implementations are
> computing the centroids in parallel?  I mean in both cases will the  data
> and code  send to workers and the results will be collected and passed to
> Driver? and why we have three types of initialization in Mlib ?
> Initialization:
> • random
> • k-means++
> • k-means||
>
>
> Best Regards
>
> .......................................................
>
> Amin Mohebbi
>
> PhD candidate in Software Engineering
>  at university of Malaysia
>
> H&#x2F;P : +60 18 2040 017
>
>
>
> E-Mail : tp025...@ex.apiit.edu.my
>
>               amin_...@me.com
>
>
> On Thursday, July 17, 2014 11:57 AM, Xiangrui Meng <men...@gmail.com> wrote:
>
>
> kmeans.py contains a naive implementation of k-means in python, served
> as an example of how to use pyspark. Please use MLlib's implementation
> in practice. There is a JIRA for making it clear:
> https://issues.apache.org/jira/browse/SPARK-2434
>
> -Xiangrui
>
> On Wed, Jul 16, 2014 at 8:16 PM, amin mohebbi <aminn_...@yahoo.com> wrote:
>> Can anyone explain to me what is difference between kmeans in Mlib and
>> kmeans in examples/src/main/python/kmeans.py?
>>
>>
>> Best Regards
>>
>> .......................................................
>>
>> Amin Mohebbi
>>
>> PhD candidate in Software Engineering
>>  at university of Malaysia
>>
>> H&#x2F;P : +60 18 2040 017
>>
>>
>>
>> E-Mail : tp025...@ex.apiit.edu.my
>>
>>              amin_...@me.com
>
>

Reply via email to