Hello Team,

              I am developing a custom Source Connector that watches a
given directory for any new files. My question is in a Distributed
environment, how will the tasks in different nodes handle the file Queue?

              Referring to this sample
<https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory>
,
poll() in SourceTask is polling the directory at specified interval for a
new files and fetching the files in a Queue as below:

Queue<File> queue = ((DirWatcher) task).getFilesQueue();
>

So, When in a 3 node cluster, this is run individually by each task. But
then, How is the synchronization happening between all the tasks in
different nodes to avoid duplication of file reading to kafka ?


Thank you,
Venkata S

Reply via email to