I am wondering about the same concept as the OP, did anyone have an answer for this question? I can't see that Spark has loops built in, except to loop over a dataset of existing/known size. Thus I often create a "dummy" ArrayList and pass it to parallelize to control how many times Spark will run the function I supply.
// setup "dummy" ArrayList of size HOW_MANY_DARTS -- how many darts to throw List<Integer> throwsList = new ArrayList<Integer>(HOW_MANY_DARTS); JavaRDD<Integer> dataSet = jsc.parallelize(throwsList, SLICES); Then if I want to do this nested loop the OP was asking about, it seems like I am going to nest parallelize calls, with the outer parallelize being passed an ArrayList of length i and then inner parallelize getting a different ArrayList of length j? Is there not a more direct way to tell Spark to do something a fixed number of times? It seems like it's geared to datasets of existing length, like "iterate over the complete works of Shakespeare" which already has content of a specific length. In my case, for the simplest example, I am trying to run the PiAverage Spark example, which "throws darts" at a circle j times. Then I want to run that PiAverage i times and average the results to see if they really converge on pi. I would be happy to post more code but it seems like a more conceptual question I am asking? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-meet-nested-loop-on-pairRdd-tp21121p25718.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org