A few more data points: my current theory is now that spark's piping
mechanism is considerably slower than just running the C++ app directly on
the node.

I ran the C++ application directly on a node in the cluster, and timed the
execution of various parts of the program, and got ~10 seconds to run the
entire thing, with it taking ~6 seconds to run a particular function, 2
seconds to run another function.

I then use Spark's piping mechanism, and got ~180 seconds to run the entire
thing, 120 seconds to run the 6 second function, and 24 seconds to run the 2
second function. I was under the impression that pipe() would just run the
C++ application on the remote node: is the application supposed to run
slower if you use pipe() to execute it?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Forcing-spark-to-send-exactly-one-element-to-each-worker-node-tp5605p5620.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to