Hi,
I'm in the process of migrating our DStream jobs to structured streaming and
I'm looking for an advise on how to provide initial state for the
mapGroupsWithState, similarly to what DStream's mapWithState does:
stream.mapWithState(StateSpec.function(updateState
_
.
Is there an alternative method to initialize state?
InputQueueStream joined to window would seem to work, but InputQueueStream does
not allow checkpointing
Sent from Outlook Mail
From: Tathagata Das
Sent: Sunday, November 22, 2015 8:01 PM
To: Bryan
Cc: user
Subject: Re: Initial State
There is
There is a way. Please see the scala docs.
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions
The first version of updateStateByKey has the parameter "initialRDD"
On Fri, Nov 20, 2015 at 6:52 PM, Bryan wrote:
> All,
>
> Is there a wa
All,
Is there a way to introduce an initial RDD without doing updateStateByKey? I
have an initial set of counts, and the algorithm I am using requires that I
accumulate additional counts from streaming data, age off older counts, and
make some calculations on them. The accumulation of counts us
0, 2015 at 2:07 AM, Imran Alam wrote:
>
>> We have a streaming job that makes use of reduceByKeyAndWindow
>> <https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L334-L341>.
>> We want this t
streaming/dstream/PairDStreamFunctions.scala#L334-L341>.
> We want this to work with an initial state. The idea is to avoid losing
> state if the streaming job is restarted, also to take historical data into
> account for the windows. But reduceByKeyAndWindow doesn't accept any
>
We have a streaming job that makes use of reduceByKeyAndWindow
<https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L334-L341>.
We want this to work with an initial state. The idea is to avoid losing
state
In Spark Streaming, is there a way to initialize the state
of updateStateByKey before it starts processing RDDs? I noticed that there
is an overload of updateStateByKey that takes an initialRDD in the latest
sources (although not in the 1.2.0 release). Is there another way to do
this until this fea