RE: Should I avoid "state" in an Spark application?

Haopu Wang Sun, 12 Jun 2016 18:13:04 -0700

Can someone look at my questions? Thanks again!

________________________________

From: Haopu Wang 
Sent: 2016年6月12日 16:40
To: user@spark.apache.org
Subject: Should I avoid "state" in an Spark application?

I have a Spark application whose structure is below:

    var ts: Long = 0L

    dstream1.foreachRDD{

        (x, time) => {

            ts = time

            x.do_something()...

        }

    }

    ......

    process_data(dstream2, ts, ......)

I assume foreachRDD function call can update "ts" variable which is then used 
in the Spark tasks of "process_data" function.

>From my test result of a standalone Spark cluster, it is working. But should I 
>concern if switch to YARN?

And I saw some articles are recommending to avoid state in Scala programming. 
Without the state variable, how could that be done?

Any comments or suggestions are appreciated.

Thanks,

Haopu

RE: Should I avoid "state" in an Spark application?

Reply via email to