Re: PyFlink performance and deployment issues

Xingbo Huang Thu, 08 Jul 2021 02:01:41 -0700

Hi Wouter,

The JIRA is https://issues.apache.org/jira/browse/FLINK-23309. `bundle
time` is from the perspective of your e2e latency. Regarding the `bundle
size`, generally larger value will provide better throughput, but it should
not be set too large, which may cause no output to be seen downstream for a
long time and the pressure will be too great during checkpoint.


Best,
Xingbo

Wouter Zorgdrager <zorgdrag...@gmail.com> 于2021年7月8日周四 下午4:32写道：

> Hi Xingbo, all,
>
> That is good to know, thank you. Is there any Jira issue I can track? I'm
> curious to follow this progress! Do you have any recommendations with
> regard to these two configuration values, to get somewhat reasonable
> performance?
>
> Thanks a lot!
> Wouter
>
> On Thu, 8 Jul 2021 at 10:26, Xingbo Huang <hxbks...@gmail.com> wrote:
>
>> Hi Wouter,
>>
>> In fact, our users have encountered the same problem. Whenever the
>> `bundle size` or `bundle time` is reached, the data in the buffer needs to
>> be sent from the jvm to the pvm, and then waits for the pym to be processed
>> and sent back to the jvm to send all the results to the downstream
>> operator, which leads to a large delay, especially when it is a small size
>> event as small messages are hard to be processed in pipeline.
>>
>> I have been solving this problem recently and I plan to make this
>> optimization to release-1.14.
>>
>> Best,
>> Xingbo
>>
>> Wouter Zorgdrager <zorgdrag...@gmail.com> 于2021年7月8日周四 下午3:41写道：
>>
>>> Hi Dian, all,
>>>
>>>  I will come back to the other points asap. However, I’m still confused
>>> about this performance. Is this what I can expect in PyFlink in terms of
>>> performance? ~ 1000ms latency for single events? I also had a very simple
>>> setup where I send 1000 events to Kafka per second and response
>>> times/latencies was around 15 seconds for single events. I understand there
>>> is some Python/JVM overhead but since Flink is so performant, I would
>>> expect much better numbers. In the current situation, PyFlink would just be
>>> unusable if you care about latency. Is this something that you expect to be
>>> improved in the future?
>>>
>>> I will verify how this works out for Beam in a remote environment.
>>>
>>> Thanks again!
>>> Wouter
>>>
>>>
>>> On Thu, 8 Jul 2021 at 08:28, Dian Fu <dian0511...@gmail.com> wrote:
>>>
>>>> Hi Wouter,
>>>>
>>>> 1) Regarding the performance difference between Beam and PyFlink, I
>>>> guess it’s because you are using an in-memory runner when running it
>>>> locally in Beam. In that case, the code path is totally differently
>>>> compared to running in a remote cluster.
>>>> 2) Regarding to `flink run`, I’m surprising that it’s running locally.
>>>> Could you submit a java job with similar commands to see how it runs?
>>>> 3) Regarding to `flink run-application`, could you share the exception
>>>> stack?
>>>>
>>>> Regards,
>>>> Dian
>>>>
>>>> 2021年7月6日 下午4:58，Wouter Zorgdrager <zorgdrag...@gmail.com> 写道：
>>>>
>>>> uses
>>>>
>>>>
>>>>

Re: PyFlink performance and deployment issues

Reply via email to