Can you call repartition(8) or 16 on data.rdd(), before KMeans, and also, .cache()?
something like, (I'm assuming you are using Java): ``` JavaRDD<Vector> input = data.repartition(8).cache(); org.apache.spark.mllib.clustering.KMeans.train(input.rdd(), 3, 20); ``` On Mon, Jul 13, 2015 at 11:10 AM, Nirmal Fernando <nir...@wso2.com> wrote: > I'm using; > > org.apache.spark.mllib.clustering.KMeans.train(data.rdd(), 3, 20); > > Cpu cores: 8 (using default Spark conf thought) > > On partitions, I'm not sure how to find that. > > On Mon, Jul 13, 2015 at 11:30 PM, Burak Yavuz <brk...@gmail.com> wrote: > >> What are the other parameters? Are you just setting k=3? What about # of >> runs? How many partitions do you have? How many cores does your machine >> have? >> >> Thanks, >> Burak >> >> On Mon, Jul 13, 2015 at 10:57 AM, Nirmal Fernando <nir...@wso2.com> >> wrote: >> >>> Hi Burak, >>> >>> k = 3 >>> dimension = 785 features >>> Spark 1.4 >>> >>> On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brk...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> How are you running K-Means? What is your k? What is the dimension of >>>> your dataset (columns)? Which Spark version are you using? >>>> >>>> Thanks, >>>> Burak >>>> >>>> On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nir...@wso2.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of >>>>> time (16+ mints). >>>>> >>>>> It takes lot of time at this task; >>>>> >>>>> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33) >>>>> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70) >>>>> >>>>> Can this be improved? >>>>> >>>>> -- >>>>> >>>>> Thanks & regards, >>>>> Nirmal >>>>> >>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>> Mobile: +94715779733 >>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> > > > -- > > Thanks & regards, > Nirmal > > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > >