Repeated Broadcasts

Daedalus Thu, 19 Jun 2014 22:55:22 -0700

I'm trying to use Spark (Java) for an optimization algorithm that needs
repeated server-node exchanges of information. (The ADMM algorithm for
whoever is familiar). In each iteration, I need to update a set of values on
the nodes, and collect them on the server, which will update it's own set of
values, and pass this to ALL nodes.


Say each node optimizes a variable X={x1, x2, x3...}
While the server optimizes a variable Z={z1, z2, z3...}

I am currently using an Accumulable object to collect the updated X's from
each node into an array maintained on the server. 
Each node requires a copy of Z to optimize X, and this value of Z will
change on every iteration during optimization.

So, is there any computational advantage to using broadcasting Z at each
iteration over simply passing it as a parameter to each node?/ (Remember, Z
changes on each iteration)/

That is, which of the following snippets should I be implementing:

for(i=0; i<iters; i++){

    broadVar=sc.broadcast(Z);
    dataRDD.foreach(new voidFunction&lt;Data>(){
        public void call(Data d){
            X=d.optimize(broadVar.value());
            accum.add(X);
        }
    });

    Z=optimize_Z(accum);
}

*OR*

for(i=0; i<iters; i++){

    dataRDD.foreach(new voidFunction&lt;Data>(){
        public void call(Data d){
            X=d.optimize(Z);
            accum.add(X);
        }
    });

    Z=optimize_Z(accum);
}




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Repeated-Broadcasts-tp7977.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Repeated Broadcasts

Reply via email to