Hi Henkka,
You might want to consider implementing a dedicated job for state
bootstrapping that uses the same operator UUID and state names. That might
be easier than integrating the logic into your regular job.
I think you have to use the monitoring file source because AFAIK it won't
be possible
Hi,
With state bootstrapping I mean loading the state with initial data before
starting the actual job. For example, in our case I would like to load
information like registration date of our users (>5 years of data) so that
I can enrich our event data in streaming (5 days retention).
Before watc
Hi Henkka, If you want to customize the datastream text source for your
purpose. You can use a read counter, if the value of counter would not change
in a interval you can guess all the data has been read. Just a idea, you can
choose other solution. About creating a savepoint automatically on jo
Hi,
Thanks. Just to clarify, where would you then invoke the savepoint
creation? I basically need to know when all data is read, create a
savepoint and then exit. I think I could just as well use the
PROCESS_CONTINUOSLY, monitor the stream outside and then use rest api to
cancel with savepoint.
A
Hi Henkka,
The behavior of the text file source meets expectation. Flink will not keep
your source task thread when it exit from it's invoke method. That means
you should keep your source task alive. So to implement this, you should
customize a text file source (implement SourceFunction interface)