Robert, you can find the pipeline of this particular test here:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py
.
The documentation for running this kind of tests, including how to set up a
Flink cluster, is on CWIKI:
https://cwiki.apache.org/con
The jenkins jobs for the Flink load tests:
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy
The documentation for the test contains how to run it on each runner:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_te
Yes, it is possible that this had an influence--Reads are now all
implemented as SDFs and Creates involve a reshuffle to better
redistribute data. This much of a change is quite surprising. Where is
the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10
iterations | Batch" and how does
Hi all,
We have a couple of Python load tests running on Flink in which we are
testing the performance of ParDo, GroupByKey, CoGroupByKey and Combine
operations.
Recently, I've discovered that the runtime of all those tests rose up
significantly. It happened between the 6th and 7th of December (t