When it comes down to the actual runtime, what really matters is the plan optimization and the operator impl & shuffling. You might be interested in this blog: https://flink.apache.org/2022/05/06/exploring-the-thread-mode-in-pyflink/, which did a benchmark on the latter with the common the JSON processing scenario with UDFs in Java/Python under thread mode/Python under process mode.
Best, Zhanghao Chen ________________________________ From: Niklas Wilcke Sent: Monday, April 15, 2024 15:17 To: user Subject: Pyflink Performance and Benchmark Hi Flink Community, I wanted to reach out to you to get some input about Pyflink performance. Are there any resources available about Pyflink benchmarks and maybe a comparison with the Java API? I wasn't able to find something valuable, but maybe I missed something? I am aware that benchmarking in this case is really dependent and that a general statement is difficult. I'm rather looking for numbers to get a first impression or maybe a framework to do some benchmarking on my own. Any help is highly appreciated. Thank you! Kind regards, Niklas