Your connector sounds a lot like this one
https://github.com/jcustenborder/kafka-connect-spooldir

I do not think you can run such a connector in distributed mode though.
Typically something like this runs in standalone mode to avoid conflicts.

-hans


On Wed, Apr 24, 2019 at 1:08 AM Venkata S A <asaid...@gmail.com> wrote:

> Hello Team,
>
>               I am developing a custom Source Connector that watches a
> given directory for any new files. My question is in a Distributed
> environment, how will the tasks in different nodes handle the file Queue?
>
>               Referring to this sample
> <
> https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory
> >
> ,
> poll() in SourceTask is polling the directory at specified interval for a
> new files and fetching the files in a Queue as below:
>
> Queue<File> queue = ((DirWatcher) task).getFilesQueue();
> >
>
> So, When in a 3 node cluster, this is run individually by each task. But
> then, How is the synchronization happening between all the tasks in
> different nodes to avoid duplication of file reading to kafka ?
>
>
> Thank you,
> Venkata S
>

Reply via email to