Re: mllib performance on cluster

Bharath Mundlapudi Tue, 02 Sep 2014 15:10:55 -0700

Those are interesting numbers. You haven't mentioned the dataset size in
your thread. This is a classic example of scalability and performance
assuming your baseline numbers are correct and you tuned correctly
everything on your cluster.


Putting my outside cap, there are multiple reasons for this, we need to
look at all these parameters:
1. This could be an algorithm cost when we move to cluster
2. This could a scalability cost
3. Cluster not tuned well
4. Indeed, there is a problem/performance regression in the framework.






On Tue, Sep 2, 2014 at 1:12 PM, SK <skrishna...@gmail.com> wrote:

> NUm Iterations: For  LR and SVM, I am using the default value of 100.  All
> the other parameters also I am using the default values.  I am pretty much
> reusing the code from BinaryClassification.scala.  For Decision Tree, I
> dont
> see any parameter for number of iterations inthe example code, so I did not
> specify any. I am running each algorithm on my dataset 100 times and taking
> the average runtime.
>
> MY dataset is very dense (hardly any zeros). The labels are 1 and 0.
>
> I did not explicity specify the number of partitions. I did not see any
> code
> for this in the MLLib examples for BinaryClassification and DecisionTree.
>
> hardware:
> local: intel core i7 with 12 cores and 7.8 GB of which I am allocating 4GB
> for the executor memory. According to the application detail stats in the
> spark UI, the total memory consumed is around 1.5 GB.
>
> cluster: 10 nodes with a total of 320 cores, with 16GB per node. According
> to the application detail stats in the spark UI, the total memory consumed
> is around 95.5 GB.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-cluster-tp13290p13299.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: mllib performance on cluster

Reply via email to