Hi,

I'm concerned about the impacts of Kafka's compactions when sending data
between running flink jobs.

For example, one job produces retract stream records in sequence of
(false, (user_id: 1, state: "california") -- retract
(true, (user_id: 1, state: "ohio")) -- append
Which is consumed by Kafka and keyed by user_id, this could end up
compacting to just
(true, (user_id: 1, state: "ohio")) -- append
If some other downstream Flink job has a filter on state == "california"
and reads from the Kafka stream, I assume it will miss the retract message
altogether and produce incorrect results.

Is this true? How do we prevent this from happening? We need to use
compaction since all our jobs are based on CDC and we can't just drop data
after x number of days.

Thanks

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to