Thank you Ryann & Hans. I will look into it.
The spooldir, I explored it too and found that it too suits for standalone
as you mentioned.

'Venkata

On Wed 24 Apr, 2019, 22:34 Hans Jespersen, <h...@confluent.io> wrote:

> Your connector sounds a lot like this one
> https://github.com/jcustenborder/kafka-connect-spooldir
>
> I do not think you can run such a connector in distributed mode though.
> Typically something like this runs in standalone mode to avoid conflicts.
>
> -hans
>
>
> On Wed, Apr 24, 2019 at 1:08 AM Venkata S A <asaid...@gmail.com> wrote:
>
> > Hello Team,
> >
> >               I am developing a custom Source Connector that watches a
> > given directory for any new files. My question is in a Distributed
> > environment, how will the tasks in different nodes handle the file Queue?
> >
> >               Referring to this sample
> > <
> >
> https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory
> > >
> > ,
> > poll() in SourceTask is polling the directory at specified interval for a
> > new files and fetching the files in a Queue as below:
> >
> > Queue<File> queue = ((DirWatcher) task).getFilesQueue();
> > >
> >
> > So, When in a 3 node cluster, this is run individually by each task. But
> > then, How is the synchronization happening between all the tasks in
> > different nodes to avoid duplication of file reading to kafka ?
> >
> >
> > Thank you,
> > Venkata S
> >
>

Reply via email to