Can someone look at my questions? Thanks again!
________________________________ From: Haopu Wang Sent: 2016年6月12日 16:40 To: user@spark.apache.org Subject: Should I avoid "state" in an Spark application? I have a Spark application whose structure is below: var ts: Long = 0L dstream1.foreachRDD{ (x, time) => { ts = time x.do_something()... } } ...... process_data(dstream2, ts, ......) I assume foreachRDD function call can update "ts" variable which is then used in the Spark tasks of "process_data" function. >From my test result of a standalone Spark cluster, it is working. But should I >concern if switch to YARN? And I saw some articles are recommending to avoid state in Scala programming. Without the state variable, how could that be done? Any comments or suggestions are appreciated. Thanks, Haopu