When to use streaming state and when an external storage?

2016-01-06 Thread Rado Buranský
What are pros/cons and general idea behind state in Spark Streaming? By state I mean state created by "mapWithState" (or updateStateByKey). When to use it and when not? Is it a good idea to accumulate a state in jobs running continuously years? Example: Remember IP adresses of returning visitors.

Re: Does state survive application restart in StatefulNetworkWordCount?

2016-01-04 Thread Rado Buranský
, Tathagata Das wrote: > It does get recovered if you restart from checkpoints. See the example > RecoverableNetworkWordCount.scala > > On Sat, Jan 2, 2016 at 6:22 AM, Rado Buranský > wrote: > >> I am trying to understand how state in Spark Streaming works in general. >>

Does state survive application restart in StatefulNetworkWordCount?

2016-01-02 Thread Rado Buranský
I am trying to understand how state in Spark Streaming works in general. If I run this example program twice will the second run see state from the first run? https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala It s