"dynamic" bucketing sink

Christophe Jolif Fri, 23 Mar 2018 10:20:36 -0700

Hi all,

I'm using the nice topic pattern feature on the KafkaConsumer to read from
multiple topics, automatically discovering new topics added into the system.


At the end of the processing I'm sinking the result into a Hadoop
Filesystem using a BucketingSink.

All works great until I get the requirement to sink into a different Hadoop
Filesystem based on the input topic.

One way to do this would obviously be to get rid of the topic pattern and
start a (similar) job per topic which would each get its own sink to its
own filesystem. And start new jobs when new topics are added. But that's
far from being ideal. This would lead to the usual issues with Flink and a
dynamic number of jobs (requiring new task slots...) also obviously it
would require some external machinery to know new topics have been added
and create new jobs etc...

What would be the recommended way to have a "dynamic" BucketingSink that
can not only write to several basePath (not too hard I guess) but also
dynamically add new base path when new topics are coming into the system.

Thanks,
-- 
Christophe

"dynamic" bucketing sink

Reply via email to