Hi Gregory, I have similar issue when dealing with historical data. We choose Lambda and figure out use case specific hand off protocol. Unless storage side can support replay logs within a time range, Streaming application authors still needs to carry extra work to implement batching layer <https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/examples.html>
What we learned is backfill historical log streams might be too expensive/ inefficient for streaming framework to handle since streaming framework focus on optimizing unknown streams. Hope it helps. Chen On Thu, Jan 25, 2018 at 12:49 PM, Gregory Fee <g...@lyft.com> wrote: > Hi group, I want to bootstrap some aggregates based on historic data in S3 > and then keep them updated based on a stream. To do this I was thinking of > doing something like processing all of the historic data, doing a save > point, then restoring my program from that save point but with a stream > source instead. Does this seem like a reasonable approach or is there a > better way to approach this functionality? There does not appear to be a > straightforward way of doing it the way I was thinking so > any advice would be appreciated. > > -- > *Gregory Fee* > Engineer > 425.830.4734 <+14258304734> > [image: Lyft] <http://www.lyft.com> >