Iterator Data Sync

Mikhail Pryakhin Mon, 18 Mar 2019 07:10:38 -0700

Hello Flink community!

I've come across of employing an "Iterator Data Sync"[1] approach to test 
output from a streaming pipeline. The pipeline consists of a single 
ProcessFunction which side-outputs some events. I'd like to collect both the 
primary and the side-output streams in my test. I do so by calling 
DataStreamUtils#collect[2]. The problem is that the implementation of 
DataStreamUtils#collect[2] method calls the StreamEnvironment#execute[3] method 
which makes it impossible to collect output from both streams. 
The preferable behaviour would be not to trigger a pipeline execution and leave 
it to a user. 
What do you think about that? I don't mind to submit a PR.


[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#iterator-data-sink
[2]https://github.com/apache/flink/blob/e07fc39d4bb15dabdedb2eb80b862646de32d82c/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L85
[3]https://github.com/apache/flink/blob/e07fc39d4bb15dabdedb2eb80b862646de32d82c/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L158
 
<https://github.com/apache/flink/blob/e07fc39d4bb15dabdedb2eb80b862646de32d82c/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L158>

Kind Regards,
Mike Pryakhin

smime.p7s
Description: S/MIME cryptographic signature

Iterator Data Sync

Reply via email to