But when i put broadcast variable out of for-circle, it workes well(if not concerned about memory issue as you pointed out): 1 var rdd1 = ... 2 var rdd2 = ... 3 var kv = ... 4 var kvGlobal = sc.broadcast(kv) // broadcast kv 5 for (i <- 0 until n) { 6 rdd1 = rdd2.map { 7 case t => doSomething(t, kvGlobal.value) 8 }.cache() 9 var tmp = rdd1.reduceByKey().collect() 10 kv = updateKV(tmp) // update kv for each iteration 11 kvGlobal = sc.broadcast(kv) // broadcast kv 12 rdd2 = rdd1 13 } 14 rdd2.saveAsTextFile()
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5497.html Sent from the Apache Spark User List mailing list archive at Nabble.com.