Source Connector Task in a distributed env

Venkata S A Wed, 24 Apr 2019 01:09:12 -0700

Hello Team,

              I am developing a custom Source Connector that watches a
given directory for any new files. My question is in a Distributed
environment, how will the tasks in different nodes handle the file Queue?


              Referring to this sample
<https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory>
,
poll() in SourceTask is polling the directory at specified interval for a
new files and fetching the files in a Queue as below:

Queue<File> queue = ((DirWatcher) task).getFilesQueue();
>

So, When in a 3 node cluster, this is run individually by each task. But
then, How is the synchronization happening between all the tasks in
different nodes to avoid duplication of file reading to kafka ?


Thank you,
Venkata S

Source Connector Task in a distributed env

Reply via email to