Re: Performance drops in Python PortableRunner tests

2020-01-02 Thread Kamil Wasilewski
Robert, you can find the pipeline of this particular test here: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_tests/pardo_test.py . The documentation for running this kind of tests, including how to set up a Flink cluster, is on CWIKI: https://cwiki.apache.org/con

Re: Performance drops in Python PortableRunner tests

2019-12-20 Thread Pablo Estrada
The jenkins jobs for the Flink load tests: https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy The documentation for the test contains how to run it on each runner: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/load_te

Re: Performance drops in Python PortableRunner tests

2019-12-20 Thread Robert Bradshaw
Yes, it is possible that this had an influence--Reads are now all implemented as SDFs and Creates involve a reshuffle to better redistribute data. This much of a change is quite surprising. Where is the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10 iterations | Batch" and how does

Performance drops in Python PortableRunner tests

2019-12-20 Thread Kamil Wasilewski
Hi all, We have a couple of Python load tests running on Flink in which we are testing the performance of ParDo, GroupByKey, CoGroupByKey and Combine operations. Recently, I've discovered that the runtime of all those tests rose up significantly. It happened between the 6th and 7th of December (t