Jark Wu created FLINK-21191:
-------------------------------

             Summary: Support reducing buffer for upsert-kafka sink
                 Key: FLINK-21191
                 URL: https://issues.apache.org/jira/browse/FLINK-21191
             Project: Flink
          Issue Type: New Feature
          Components: Connectors / Kafka
            Reporter: Jark Wu
             Fix For: 1.13.0


Currently, if there is a job agg -> filter -> upsert-kafka, then upsert-kafka 
will receive -U and +U for every updates instead of only a +U. This will 
produce a lot of tombstone messages in Kafka. It's not just about the 
unnecessary data volume in Kafka, but users may processes that trigger side 
effects when a tombstone records is ingested from a Kafka topic. 

A simple solution would be add a reducing buffer for the upsert-kafka, to 
reduce the -U and +U before emitting to the underlying sink. This should be 
very similar to the implementation of upsert JDBC sink. 

We can even extract the reducing logic out of the JDBC connector and it can be 
reused by other connectors. 
This should be something like `BufferedUpsertSinkFunction` which has a reducing 
buffer and flush to the underlying SinkFunction
once checkpointing or buffer timeout. We can put it in `flink-connector-base` 
which can be shared for builtin connectors and custom connectors. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to