Re: Support minibatch for TopNFunction

2024-04-10 Thread Roman Boyko
Finally I published the POC of the minibatch for TopN function [1]. It covers all the implementations of TopN functions because it buffers the records before putting them to the collector inside AbstractTopNFunction. For proving the performance optimization I used the nexmark q19 which was enhanced

Re: Support minibatch for TopNFunction

2024-03-27 Thread Roman Boyko
Hi Xushuai! Thank you for your reply! 1. Yes, you are absolutely right - we can't fold the records inside output buffer if the current record, which is provided to output, has accumulate type (+I or +U). Only revoke type of records (-U or -D which produced by current TopN function or received by

Re: Support minibatch for TopNFunction

2024-03-27 Thread shuai xu
Hi, Roman Thanks for your proposal. I think this is an interesting idea and it might be useful when there are operators downstream of the TopN. And I have some questions about your proposal after reading your doc. 1. From the input-output perspective, only the accumulated data seems to be sent

Re: Support minibatch for TopNFunction

2024-03-25 Thread Roman Boyko
Hi Ron, Thank you so much for your reply! 1. I added the description to Motivation part of my document [1] 2. I suppose to inject this functionality to AbstractTopNFunction, thus it will work for all its implementations. It doesn't depend of implementation (either it would be AppendOnlyTopNFunctio

Re: Support minibatch for TopNFunction

2024-03-25 Thread Ron liu
Hi, Roman Thanks for your proposal, I intuitively feel that this optimization would be very useful to reduce the amount of message amplification for TopN operators. After briefly looking at your google docs, I have the following questions: 1. Whether you can describe in detail the principle of so

Re: Support minibatch for TopNFunction

2024-03-23 Thread Roman Boyko
Hi Flink Community, I tried to describe my idea about minibatch for TopNFunction in this doc - https://docs.google.com/document/d/1YPHwxKfiGSUOUOa6bc68fIJHO_UojTwZEC29VVEa-Uk/edit?usp=sharing Looking forward to your feedback, thank you On Tue, 19 Mar 2024 at 12:24, Roman Boyko wrote: > Hello F

Support minibatch for TopNFunction

2024-03-18 Thread Roman Boyko
Hello Flink Community, The same problem with record amplification as described in FLIP-415: Introduce a new join operator to support minibatch[1] exists for most of implementations of AbstractTopNFunction. Especially when the rank is provided to output. For example, when calculating Top100 with ra