退订所有邮件列表







----- 原始邮件 -----


发件人:Teoh, Hong<lian...@amazon.co.uk>

发送时间:2023-06-15 05:19:57

收件人:Lu Niu<qqib...@gmail.com>

抄送:d...@flink.apache.org<d...@flink.apache.org>;user<user@flink.apache.org>

主 题:Re: AsyncFunction vs Async Sink

Hi Lu,


Thanks for your question. See below for my understanding.


I would recommend using the Async Sink if you are writing to the external service as the final output of your job graph, and if you don’t have the ordered requirement that updates to the external system must be done before updates to some other external system within the same job graph. (More explained later).


The abstraction of the Async Sink is a sink, meaning it is a terminal operator in the job graph. The abstraction is intended to simplify the writing of a sink - meaning the base implementation will handle batching, state management and rate limiting. You only need to provide the client and request structure to be used to interact with the external service. This makes writing and maintaining the sink easier (if you simply want to write to a destination with at least once processing). 


The AsyncFunction, as I understand it is more used for data enrichment, and is not a terminal operator in the job graph. This means the return value from the external service will continue to be passed on down the Flink Job graph. This is useful for data enrichment using the external service, or if we want to ensure the system being called in the AsyncFunction is updated BEFORE any data is written to the sinks further down the job graph.


For example:


Kinesis Source -> Map -> AsyncFunction (Updates DynamoDB) -> Kinesis Sink


We can be sure that the updates to DynamoDB for a particular record happens before the record is written to the Kinesis Sink.



Hope the above clarifies your question!


Regards,
Hong


 
On 14 Jun 2023, at 19:27, Lu Niu <qqib...@gmail.com> wrote:

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


   

Hi, Flink dev and users

If I want to async write to an external service, which API shall I use, AsyncFunction or Async Sink?

My understanding after checking the code are:

  1. Both APIs guarantee at least once write to external service. As both API internally stores in-flight requests in the checkpoint.

  2. Async Sink provides a batching request feature. This can be implemented with Map + AsyncFunction. Map function groups requests in batches and pass it to AsyncFunction.The batching implementation can refer to AbstractMapBundleOperator if don’t want to use state.

  3. Async Sink supports retry on failed requests. AsyncFunction also supports retry in latest flink version.

  4. Async Sink supports rate limiting, AsyncFunction doesn’t.

  5. AsyncFunction can be used to implement read-update-write. Async Sink cannot.

Best

Lu








Reply via email to