Are you talking about reduceByKeyAndWindow with or without inverse reduce? TD
On Fri, Jul 10, 2015 at 2:07 AM, Imran Alam <im...@newscred.com> wrote: > We have a streaming job that makes use of reduceByKeyAndWindow > <https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L334-L341>. > We want this to work with an initial state. The idea is to avoid losing > state if the streaming job is restarted, also to take historical data into > account for the windows. But reduceByKeyAndWindow doesn't accept any > "initialRDD" parameter like updateStateByKey > <https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L435-L445> > does. > > The plan is to extend reduceByKeyAndWindow to accept an "initalRDDs" > parameter, so that the DStream starts with those RDDs as initial value of > "generatedRDD" rather than an empty map. But the "generatedRDD" is a > private variable, so I'm bit confused on how to proceed with the plan. > >