Finally I published the POC of the minibatch for TopN function [1]. It
covers all the implementations of TopN functions because it buffers the
records before putting them to the collector inside AbstractTopNFunction.
For proving the performance optimization I used the nexmark q19 which was
enhanced
Hi Xushuai!
Thank you for your reply!
1. Yes, you are absolutely right - we can't fold the records inside output
buffer if the current record, which is provided to output, has accumulate
type (+I or +U). Only revoke type of records (-U or -D which produced by
current TopN function or received by
Hi, Roman
Thanks for your proposal. I think this is an interesting idea and it might be
useful when there are operators downstream of the TopN.
And I have some questions about your proposal after reading your doc.
1. From the input-output perspective, only the accumulated data seems to be
sent
Hi Ron,
Thank you so much for your reply!
1. I added the description to Motivation part of my document [1]
2. I suppose to inject this functionality to AbstractTopNFunction, thus it
will work for all its implementations. It doesn't depend of implementation
(either it would be AppendOnlyTopNFunctio
Hi, Roman
Thanks for your proposal, I intuitively feel that this optimization would
be very useful to reduce the amount of message amplification for TopN
operators. After briefly looking at your google docs, I have the following
questions:
1. Whether you can describe in detail the principle of so
Hi Flink Community,
I tried to describe my idea about minibatch for TopNFunction in this doc -
https://docs.google.com/document/d/1YPHwxKfiGSUOUOa6bc68fIJHO_UojTwZEC29VVEa-Uk/edit?usp=sharing
Looking forward to your feedback, thank you
On Tue, 19 Mar 2024 at 12:24, Roman Boyko wrote:
> Hello F
Hello Flink Community,
The same problem with record amplification as described in FLIP-415: Introduce
a new join operator to support minibatch[1] exists for most of
implementations of AbstractTopNFunction. Especially when the rank is
provided to output. For example, when calculating Top100 with ra