Savepoints with bootstraping a datastream function

2021-07-05 Thread Rakshit Ramesh
I'm trying to bootstrap state into a KeyedProcessFunction equivalent that takes in a DataStream but I'm unable to find a reference for the same. I found this gist https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf But it seems to only apply for DataSet. My usecase is to manually t

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Arvid Heise
Hi Rakshit, The example is valid. The state processor API is kinda working like a DataSet application but the state is meant to be read in DataStream. Please check out the SavepointWriterITCase [1] for a full example. There is no checkpoint/savepoint in DataSet applications. Checkpoints can be st

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Rakshit Ramesh
Yes I could understand restoring a savepoint to a datastream. What I couldn't figure out is to create a NewSavepoint for a datastream. What I understand is that NewSavepoints only take in Bootstrap transformation for Dataset Transform functions. About the checkpoints, does CheckpointConfig.Exter

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Arvid Heise
I don't quite understand your question. You use Savepoint API to create a savepoint with a batch job (that's why it's DataSet Transform currently). That savepoint can only be restored through a datastream application. Dataset applications cannot start from a savepoint. So I don't understand why yo

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Rakshit Ramesh
Sorry for being a little vague there. I want to create a Savepoint from a DataStream right before the job is finished or cancelled. What you have shown in the IT case is how a datastream can be bootstrapped with state that is formed formed by means of DataSet. My jobs are triggered by a scheduler p

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Arvid Heise
Hi Rakshit, It sounds to me as if you don't need the Savepoint API at all. You can (re)start all applications with the previous state (be it retained checkpoint or savepoint). You just need to provide the path to that in your application invocation [1] (every entry point has such a parameter, you

Re: Savepoints with bootstraping a datastream function

2021-07-07 Thread Rakshit Ramesh
Yes! I was only worried about the jobid changing and the checkpoint being un-referenceable. But since I can pass a path to the checkpoint that will not be an issue. Thanks a lot for your suggestions! On Thu, 8 Jul 2021 at 11:26, Arvid Heise wrote: > Hi Rakshit, > > It sounds to me as if you do

Re: Savepoints with bootstraping a datastream function

2021-07-10 Thread Rakshit Ramesh
Hi Arvid, Since I'm trying to save checkpoints for a bounded process the checkpoint isn't being written on time since the job finishes before that can happen. Looks like one major feature that would be required for this to work is discussed in FLIP-147 https://cwiki.apache.org/confluence/display/F

Re: Savepoints with bootstraping a datastream function

2021-07-13 Thread Arvid Heise
The only known workaround is to provide your own source(function) that doesn't finish until all of the source subtasks finish and a final checkpoint is completed. However, coordinating the sources with the old SourceFunction interface requires some external tool. FLIP-147 is targeted for 1.14 in A

Re: Savepoints with bootstraping a datastream function

2021-07-13 Thread Rakshit Ramesh
That's great news! Thanks. On Tue, 13 Jul 2021 at 14:17, Arvid Heise wrote: > The only known workaround is to provide your own source(function) that > doesn't finish until all of the source subtasks finish and a final > checkpoint is completed. However, coordinating the sources with the old > So

Re: Savepoints with bootstraping a datastream function

2021-09-17 Thread Rakshit Ramesh
Hi Arvid. I went through the code, confluence and jira on FLIP-147. I couldn't determine if it's possible to manually trigger a savepoint/checkpoint as I couldn't find any javadoc apis for the same. Also, would the setting "ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH" still create a checkpoint if my enti

Re: Re: Savepoints with bootstraping a datastream function

2021-09-17 Thread Yun Gao
--Original Mail -- Sender:Rakshit Ramesh Send Date:Fri Sep 17 17:20:40 2021 Recipients:Arvid Heise CC:user Subject:Re: Savepoints with bootstraping a datastream function Hi Arvid. I went through the code, confluence and jira on FLIP-147. I couldn't dete

Re: Re: Savepoints with bootstraping a datastream function

2021-09-20 Thread Rakshit Ramesh
t; *Send Date:*Fri Sep 17 17:20:40 2021 > *Recipients:*Arvid Heise > *CC:*user > *Subject:*Re: Savepoints with bootstraping a datastream function > >> Hi Arvid. >> I went through the code, confluence and jira on FLIP-147. >> I couldn't determine if it's possible