Hi Gregory,

I have similar issue when dealing with historical data. We choose Lambda
and figure out use case specific hand off protocol.
Unless storage side can support replay logs within a time range, Streaming
application authors still needs to carry extra work to implement batching
layer
<https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/examples.html>

What we learned is backfill historical log streams might be too expensive/
inefficient for streaming framework to handle since streaming framework
focus on optimizing unknown streams.

Hope it helps.

Chen

On Thu, Jan 25, 2018 at 12:49 PM, Gregory Fee <g...@lyft.com> wrote:

> Hi group, I want to bootstrap some aggregates based on historic data in S3
> and then keep them updated based on a stream. To do this I was thinking of
> doing something like processing all of the historic data, doing a save
> point, then restoring my program from that save point but with a stream
> source instead. Does this seem like a reasonable approach or is there a
> better way to approach this functionality? There does not appear to be a
> straightforward way of doing it the way I was thinking so
> any advice would be appreciated.
>
> --
> *Gregory Fee*
> Engineer
> 425.830.4734 <+14258304734>
> [image: Lyft] <http://www.lyft.com>
>

Reply via email to