But when i put broadcast variable out of for-circle, it workes well(if not
concerned about memory issue as you pointed out): 
 1  var rdd1 = ... 
 2  var rdd2 = ... 
 3  var kv = ... 
 4  var kvGlobal = sc.broadcast(kv)               // broadcast kv 
 5  for (i <- 0 until n) { 
 6    rdd1 = rdd2.map { 
 7      case t => doSomething(t, kvGlobal.value) 
 8    }.cache()
 9    var tmp = rdd1.reduceByKey().collect() 
10    kv = updateKV(tmp)                               // update kv for each
iteration 
11    kvGlobal = sc.broadcast(kv)               // broadcast kv 
12    rdd2 = rdd1 
13 } 
14 rdd2.saveAsTextFile() 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5497.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to