Re: FLINK DATASTREAM Processing Question

Timo Walther Tue, 08 Sep 2020 07:28:21 -0700

Hi Vijay,

one comment to add is that the performance might suffer with multiplemap() calls. For safety reason, records between chained operators areserialized and deserialized in order to strictly don't influence eachother. If all functions of a pipeline are guaranteed to not modifyincoming objects, you can enable the object reuse mode (seeenableObjectReuse)[1].


Please correct me if I'm wrong.

Regards,
Timo

[1]https://ci.apache.org/projects/flink/flink-docs-stable/dev/execution_configuration.html


On 07.09.20 18:02, Dawid Wysakowicz wrote:

Hi,
You can see the execution plan viaStreamExecutionEnvironment#getExecutionPlan(). You can visualize itin[1]. You can also submit your job and check the execution plan in Web UI.
As for the question which option is preferred it is very subjective. Aslong as in the option b) both maps are chained, there will be no muchdifference how the two options behave. Executing the map(2) will be justa method call.
You could also think about completely separating the tasks:

convertedFields = datastream.map(#convertFields)

convertedFields.addSink(.../*write to DB*/);

convertedFields.map(/* convert to avro */);

Best,

Dawid

[1] https://flink.apache.org/visualizer/

On 05/09/2020 01:13, Vijayendra Yadav wrote:
Hi Team,

I have a generic Question.

Let's say I have 2 Actions to be taken on Flink DATASTREAM (Kafka).
1) Convert some data fields, and write to external Database
2) Transform #1 converted data fields in to different record formatsay AVRO
*Here are Two approaches that are possible.*

a) One Map function doing both #1 and #2 actions

*datastream.map(1)*

b) Two maps doing separate Actions #1 and #2 .
*datastream.map().map(1).map(2) *
*
*
Which one would you prefer ?

Is there a way I can see Execution plan ?
*
*
Regards,
Vijay

Re: FLINK DATASTREAM Processing Question

Reply via email to