Hi Prasanna,

   1) Semantically both a) and b) would be Ok. If the Custom sink could be 
chained with the map operator (I assume the map operator is the "Processing" in 
the graph), there should be also no much difference physically, if they could 
not chain, then writting a custom sink would cause another pass of network 
transferring, but the custom sink would be run in a different thread, thus much 
more computation resources could be exploited. 
   2) To achieve at-least-once, you need to implment the "CheckpointedFunction" 
interface, and ensures flushing all the data to the outside systems when 
snapshotting states. Since if the checkpointing succeed, the previous data will 
not be replayed after failover, thus these pieces of data need to be ensured 
written out before the checkpoint succeeds.
   3) From my side I don't think there are significant disadvantages of writing 
custom sink functions. 

Best,
 Yun


------------------------------------------------------------------
Sender:Prasanna kumar<prasannakumarram...@gmail.com>
Date:2020/08/22 02:00:51
Recipient:user<user@flink.apache.org>; <d...@flink.apache.org>
Theme:SDK vs Connectors

Hi Team,

Following is the pipeline 
Kafka => Processing => SNS Topics .

Flink Does not provide a SNS connector out of the box. 

a) I implemented the above by using AWS SDK and published the messages in the 
Map operator itself.  
The pipeline is working well. I see messages flowing to SNS topics.

b) Another approach is that I could write a custom sink function and still 
publish to SNS using SDK in this stage. 

Questions
1) What would be the primary difference between approach a) and b). Is there 
any significant advantage of one over the other ?

2) Would at least once guarantee be confirmed if we follow the above approach?

3) Would there be any significant disadvantages(rather what we need to be 
careful ) of writing our custom sink functions ?

Thanks,
Prasanna. 

Reply via email to