" I am working on a spark application that requires the ability to run a function on each node in the cluster " -- Use Apache Ignite instead of Spark. Trust me it's awesome for this use case.
Regards, Rabin Banerjee On Jul 19, 2016 3:27 AM, "joshuata" <joshaspl...@gmail.com> wrote: > I am working on a spark application that requires the ability to run a > function on each node in the cluster. This is used to read data from a > directory that is not globally accessible to the cluster. I have tried > creating an RDD with n elements and n partitions so that it is evenly > distributed among the n nodes, and then mapping a function over the RDD. > However, the runtime makes no guarantees that each partition will be stored > on a separate node. This means that the code will run multiple times on the > same node while never running on another. > > I have looked through the documentation and source code for both RDDs and > the scheduler, but I haven't found anything that will do what I need. Does > anybody know of a solution I could use? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Execute-function-once-on-each-node-tp27351.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >