Re: Dataset using several count operator in the same environment

Timo Walther Wed, 29 Nov 2017 07:24:33 -0800

Hi Ebru,

the count() operator is a very simple utility functions that callsexecute() internally. If you want to have a more complex pipeline youcan take a look at how our WordCount [0] example works. The generalconcept is to emit a 1 for every record and sum the ones in parallel. Ifyou need an overall count, you need to set the parallelism of the lastoperator to 1 (operator(xxx).setParallelism(1)), but this means thatyour job is not executed in parallel anymore.

It might also make sense to take a look at Flink's Table & SQL API [1]which makes such operations easier.


Hope that helps.

Regards,
Timo

[0]https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java[1]https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/index.html



Am 11/29/17 um 3:26 PM schrieb ebru:

Hi all,

We are trying to use more than one count operator for dataset, but it executes 
first count and skips other operations. Also we call env.execute().
How can we solve this problem?

-Ebru

Re: Dataset using several count operator in the same environment

Reply via email to