Thank you Ryann & Hans. I will look into it. The spooldir, I explored it too and found that it too suits for standalone as you mentioned.
'Venkata On Wed 24 Apr, 2019, 22:34 Hans Jespersen, <h...@confluent.io> wrote: > Your connector sounds a lot like this one > https://github.com/jcustenborder/kafka-connect-spooldir > > I do not think you can run such a connector in distributed mode though. > Typically something like this runs in standalone mode to avoid conflicts. > > -hans > > > On Wed, Apr 24, 2019 at 1:08 AM Venkata S A <asaid...@gmail.com> wrote: > > > Hello Team, > > > > I am developing a custom Source Connector that watches a > > given directory for any new files. My question is in a Distributed > > environment, how will the tasks in different nodes handle the file Queue? > > > > Referring to this sample > > < > > > https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory > > > > > , > > poll() in SourceTask is polling the directory at specified interval for a > > new files and fetching the files in a Queue as below: > > > > Queue<File> queue = ((DirWatcher) task).getFilesQueue(); > > > > > > > So, When in a 3 node cluster, this is run individually by each task. But > > then, How is the synchronization happening between all the tasks in > > different nodes to avoid duplication of file reading to kafka ? > > > > > > Thank you, > > Venkata S > > >