What are the other parameters? Are you just setting k=3? What about # of
runs? How many partitions do you have? How many cores does your machine
have?

Thanks,
Burak

On Mon, Jul 13, 2015 at 10:57 AM, Nirmal Fernando <nir...@wso2.com> wrote:

> Hi Burak,
>
> k = 3
> dimension = 785 features
> Spark 1.4
>
> On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brk...@gmail.com> wrote:
>
>> Hi,
>>
>> How are you running K-Means? What is your k? What is the dimension of
>> your dataset (columns)? Which Spark version are you using?
>>
>> Thanks,
>> Burak
>>
>> On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nir...@wso2.com> wrote:
>>
>>> Hi,
>>>
>>> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of
>>> time (16+ mints).
>>>
>>> It takes lot of time at this task;
>>>
>>> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33)
>>> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70)
>>>
>>> Can this be improved?
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>

Reply via email to