Re: Forcing spark to send exactly one element to each worker node

2014-05-12 Thread NevinLi158
A few more data points: my current theory is now that spark's piping mechanism is considerably slower than just running the C++ app directly on the node. I ran the C++ application directly on a node in the cluster, and timed the execution of various parts of the program, and got ~10 seconds to run

Re: Forcing spark to send exactly one element to each worker node

2014-05-12 Thread NevinLi158
I can't seem to get Spark to run the tasks in parallel. My spark code is the following: //Create commands to be piped into a C++ program List commandList = makeCommandList(Integer.parseInt(step.first()),100); JavaRDD commandListRDD = ctx.parallelize(commandList, commandList.size()); //Run the C+

Re: Forcing spark to send exactly one element to each worker node

2014-05-12 Thread NevinLi158
Fixed the problem as soon as I sent this out, sigh. Apparently you can do this by changing the number of slices to cut the dataset into: I thought that was identical to the amount of partitions, but apparently not. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble

Forcing spark to send exactly one element to each worker node

2014-05-12 Thread NevinLi158
Hi all, I'm currently trying to use pipe to run C++ code on each worker node, and I have an RDD of essentially command line arguments that i'm passing to each node. I want to send exactly one element to each node, but when I run my code, Spark ends up sending multiple elements to a node: is there