mllib performance on cluster

SK Tue, 02 Sep 2014 11:26:02 -0700

Hi,

I evaluated the runtime performance of some of the MLlib classification
algorithms on a local machine and a cluster with 10 nodes. I used standalone
mode and Spark 1.0.1 in both cases. Here are the results for the total
runtime:
                                   Local             Cluster
Logistic regression       138 sec          336 sec
SVM                           138 sec          336 sec
Decision tree                 50 sec         132 sec


My dataset is quite small and my programs are very similar to the mllib
examples that are included in the Spark distribution. Why is the runtime on
the cluster significantly higher (almost 3 times) than that on the local
machine even though the former uses more memory and more nodes? Is it
because of the communication overhead on the cluster? I would like to know
if there is something I need to be doing to optimize the performance on the
cluster or if others have also been getting similar results. 

thanks
   



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-cluster-tp13290.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

mllib performance on cluster

Reply via email to