Hi Lu,

> 1. Is there any problem if we use Async Function for such a user case? We can 
> simply drop the output and use Unordered mode.


As far as I can tell, it is similar, other than the retry strategy available 
for AsyncFunctions and batching for Async Sink. Both should work on Flink.


> 2. For AsyncFunction and  Async Sink. does it make sense that both could 
> share the same underlying implementation and the features like batching and 
> rate limiting can benefit both?

Good question - I think there are quite a lot of similarities, that’s why the 
interface is similar. However, I think the end use-case is different. For 
example, AsyncSink might want to implement support for some form of 
2phase-commit on Sink (at least once guarantee). This is slightly more 
complicated on AsyncFunction.



Regards,
Hong



On 15 Jun 2023, at 00:26, Lu Niu <qqib...@gmail.com> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Thanks, Hong!

I understand that if the user case is to simply write sth to an external 
service, Async Sink is a good option that provides features like batching, 
state management and rate limiting. I have some follow up questions:

1. Is there any problem if we use Async Function for such a user case? We can 
simply drop the output and use Unordered mode.
2. For AsyncFunction and  Async Sink. does it make sense that both could share 
the same underlying implementation and the features like batching and rate 
limiting can benefit both?

Best
Lu


On Wed, Jun 14, 2023 at 2:20 PM Teoh, Hong 
<lian...@amazon.co.uk<mailto:lian...@amazon.co.uk>> wrote:
Hi Lu,

Thanks for your question. See below for my understanding.

I would recommend using the Async Sink if you are writing to the external 
service as the final output of your job graph, and if you don’t have the 
ordered requirement that updates to the external system must be done before 
updates to some other external system within the same job graph. (More 
explained later).

The abstraction of the Async Sink is a sink, meaning it is a terminal operator 
in the job graph. The abstraction is intended to simplify the writing of a sink 
- meaning the base implementation will handle batching, state management and 
rate limiting. You only need to provide the client and request structure to be 
used to interact with the external service. This makes writing and maintaining 
the sink easier (if you simply want to write to a destination with at least 
once processing).

The AsyncFunction, as I understand it is more used for data enrichment, and is 
not a terminal operator in the job graph. This means the return value from the 
external service will continue to be passed on down the Flink Job graph. This 
is useful for data enrichment using the external service, or if we want to 
ensure the system being called in the AsyncFunction is updated BEFORE any data 
is written to the sinks further down the job graph.

For example:

Kinesis Source -> Map -> AsyncFunction (Updates DynamoDB) -> Kinesis Sink

We can be sure that the updates to DynamoDB for a particular record happens 
before the record is written to the Kinesis Sink.


Hope the above clarifies your question!

Regards,
Hong


On 14 Jun 2023, at 19:27, Lu Niu <qqib...@gmail.com<mailto:qqib...@gmail.com>> 
wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



Hi, Flink dev and users

If I want to async write to an external service, which API shall I use, 
AsyncFunction or Async Sink?

My understanding after checking the code are:

  1.  Both APIs guarantee at least once write to external service. As both API 
internally stores in-flight requests in the checkpoint.
  2.  Async Sink provides a batching request feature. This can be implemented 
with Map + AsyncFunction. Map function groups requests in batches and pass it 
to AsyncFunction.The batching implementation can refer to 
AbstractMapBundleOperator if don’t want to use state.
  3.  Async Sink supports retry on failed requests. AsyncFunction also supports 
retry in latest flink version.
  4.  Async Sink supports rate limiting, AsyncFunction doesn’t.
  5.  AsyncFunction can be used to implement read-update-write. Async Sink 
cannot.

Best

Lu


Reply via email to