Re: Fault tolerant broadcast in updateStateByKey

Amit Sela Tue, 07 Feb 2017 15:34:52 -0800

I'm updating the Broadcast between batches, but I've ended up doing it in a
listener, thanks!


On Wed, Feb 8, 2017 at 12:31 AM Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> broadcasts are not saved in checkpoints. so you have to save it externally
> yourself, and recover it before restarting the stream from checkpoints.
>
> On Tue, Feb 7, 2017 at 3:55 PM, Amit Sela <amitsel...@gmail.com> wrote:
>
> I know this approach, only thing is, it relies on the transformation being
> an RDD transfomration as well and so could be applied via foreachRDD and
> using the rdd context to avoid a stale context after recovery/resume.
> My question is how to void stale context in a DStream-only transformation
> such as updateStateByKey / mapWithState ?
>
> On Tue, Feb 7, 2017 at 9:19 PM Shixiong(Ryan) Zhu <shixi...@databricks.com>
> wrote:
>
> It's documented here:
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints
>
> On Tue, Feb 7, 2017 at 8:12 AM, Amit Sela <amitsel...@gmail.com> wrote:
>
> Hi all,
>
> I was wondering if anyone ever used a broadcast variable within
> an updateStateByKey op. ? Using it is straight-forward but I was wondering
> how it'll work after resuming from checkpoint (using the rdd.context()
> trick is not possible here) ?
>
> Thanks,
> Amit
>
>
>
>

Re: Fault tolerant broadcast in updateStateByKey

Reply via email to