Hey I was reading the Berkley paper "Spark: Cluster Computung with Working
Sets" and come across an sentence which is bothering me. Currently I am
trying to run an python script on Spark which executes a parallel k-means
... my problem is ...
after the algorithm finish working with the dataset (ca. 50s) it seams that
spark needs the rest of the time (ca 7m) to collect all the data. The paper
from Berkley mentioned that Spark does not support parallel collection.
Is that really the case?\

If I can make something run faster in Spark please tell me how since I have
another problem, that Spark is not really responding to my configuration
changes. I ran over 25 tests with configuration of executor.memory and
task.cpus or akka.threads but nothing changed (conf from 2-62g RAM, 4-912
cpus and 4-912 threads).

I also read that you can not run more than 1 executor per node while Spark
is running in standalone mode. Do I really need to run Spark on Yarn to get
more than 1 executor on a node? If so does anyone has an tutorial how to
install yarn and spark on top of it?

Thank you for your help

best

makevnin



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-1-0-1-stil-collect-results-in-serial-tp11816.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to