I'm trying to use Spark (Java) for an optimization algorithm that needs repeated server-node exchanges of information. (The ADMM algorithm for whoever is familiar). In each iteration, I need to update a set of values on the nodes, and collect them on the server, which will update it's own set of values, and pass this to ALL nodes.
Say each node optimizes a variable X={x1, x2, x3...} While the server optimizes a variable Z={z1, z2, z3...} I am currently using an Accumulable object to collect the updated X's from each node into an array maintained on the server. Each node requires a copy of Z to optimize X, and this value of Z will change on every iteration during optimization. So, is there any computational advantage to using broadcasting Z at each iteration over simply passing it as a parameter to each node?/ (Remember, Z changes on each iteration)/ That is, which of the following snippets should I be implementing: for(i=0; i<iters; i++){ broadVar=sc.broadcast(Z); dataRDD.foreach(new voidFunction<Data>(){ public void call(Data d){ X=d.optimize(broadVar.value()); accum.add(X); } }); Z=optimize_Z(accum); } *OR* for(i=0; i<iters; i++){ dataRDD.foreach(new voidFunction<Data>(){ public void call(Data d){ X=d.optimize(Z); accum.add(X); } }); Z=optimize_Z(accum); } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Repeated-Broadcasts-tp7977.html Sent from the Apache Spark User List mailing list archive at Nabble.com.