Re: Re: Bootstrapping the state

2018-07-23 Thread Fabian Hueske
Hi Henkka, You might want to consider implementing a dedicated job for state bootstrapping that uses the same operator UUID and state names. That might be easier than integrating the logic into your regular job. I think you have to use the monitoring file source because AFAIK it won't be possible

Re: Re: Bootstrapping the state

2018-07-22 Thread Henri Heiskanen
Hi, With state bootstrapping I mean loading the state with initial data before starting the actual job. For example, in our case I would like to load information like registration date of our users (>5 years of data) so that I can enrich our event data in streaming (5 days retention). Before watc

Re: Re: Bootstrapping the state

2018-07-20 Thread Vino yang
Hi Henkka, If you want to customize the datastream text source for your purpose. You can use a read counter, if the value of counter would not change in a interval you can guess all the data has been read. Just a idea, you can choose other solution. About creating a savepoint automatically on jo

Re: Bootstrapping the state

2018-07-20 Thread Henri Heiskanen
Hi, Thanks. Just to clarify, where would you then invoke the savepoint creation? I basically need to know when all data is read, create a savepoint and then exit. I think I could just as well use the PROCESS_CONTINUOSLY, monitor the stream outside and then use rest api to cancel with savepoint. A

Re: Bootstrapping the state

2018-07-19 Thread vino yang
Hi Henkka, The behavior of the text file source meets expectation. Flink will not keep your source task thread when it exit from it's invoke method. That means you should keep your source task alive. So to implement this, you should customize a text file source (implement SourceFunction interface)