Deenar, dmpour is correct in that there's a many-to-many mapping between
executors and partitions (an executor can be assigned multiple partitions,
and a given partition can in principle move a different executor).

I'm not sure why you seem to require this problem statement to be solved
with RDDs. It is fairly easy to have something executed once per JVM, using
the pattern I suggested. Is there some other requirement I have missed?

Sent while mobile. Pls excuse typos etc.
On Mar 27, 2014 9:06 AM, "dmpour23" <dmpou...@gmail.com> wrote:

> How exactly does rdd.mapPartitions  be executed once in each VM?
>
> I am running  mapPartitions and the call function seems not to execute the
> code?
>
> JavaPairRDD<String, String> twos = input.map(new
> Split()).sortByKey().partitionBy(new HashPartitioner(k));
> twos.values().saveAsTextFile(args[2]);
>
> JavaRDD<String> ls = twos.values().mapPartitions(new
> FlatMapFunction<Iterator&lt;String>, String>() {
> @Override
> public Iterable<String> call(Iterator<String> arg0) throws Exception {
>    System.out.println("Usage should call my jar once: " + arg0);
>    return Lists.newArrayList();}
> });
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-tp3203p3353.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to