Hi Joern, Very thanks for sharing the detailed scenarios! It inspires a lot.
If I understand right, could it might be summaried as follows? 1. There is a batch job to first intialize the state, the state is used in the stream mode, and the stream pipeline is different from the the batch job. 2. Currently it is implemented by extracting the state and output it to the sink, then load it on startup, but there might be some inconvenience due to possible additional development and performance (the state is large) issue. We would try to have some more thoughts on this scenario~ Best, Yun ------------------Original Mail ------------------ Sender:Joern Kottmann <kottm...@gmail.com> Send Date:Tue Dec 7 16:58:03 2021 Recipients:Yun Gao <yungao...@aliyun.com> CC:vtygoss <vtyg...@126.com>, Alexander Preuß <alexanderpre...@ververica.com>, user@flink.apache.org <user@flink.apache.org> Subject:Re: Re: Re: how to run streaming process after batch process is completed? Hello, One of the applications Spire [1] is using Flink for is to process AIS [2] data collected by our satellites and from other sources. AIS is transmitting a ships' static and dynamic information, such as names, callsigns or positions. One of the challenges processing AIS data is that there are no unique keys, since the mmsi or imo can be spoofed or is sometimes shared between vessels. To deal with multiple vessels per mmsi we use a Keyed Process Function that keeps state per detected vessel, data about the vessel is stored in the state of the function and is hard to transfer out of the batch processing. Batch processing really helps to collect data about a vessel and is therefore necessary for us before we can switch to stream mode. Since the state and the outputs are not the same the reconstruction of the state for stream mode can't be achieved by feeding the outputs into the pipeline via some source. Therefore we need code in our batch job just to deal with extracting the state. A vessel is usually outputted for each update that is received for it, but outputting it together with it's entire state is not desirable for performance reasons in batch mode. Also some vessels should never be outputted but need to be restored. The pipeline has a couple of stateful functions and the more we add the harder it gets to restore the state. Best, Jörn