Hi, I have a question regarding Flume suitability for a particular use case.
Task: There is an incoming constant stream of links that point to files. Those files to be fetched and stored in HDFS. Desired implementation: 1. Each link to a file is stored in Kafka queue Q1. 2. Flume A1.source monitors Q1 for new links. 3. Upon retrieving a link from Q1, A1.source fetches the file. The file eventually is stored in HDFS by A1.sink My concern here is a seemingly overloaded functionality of A1.source. The A1.source would have to perform two activities: 1.) to periodically poll queue Q1 for new links to files and then 2.) fetch those files. What do you think? Is there a cleaner way to achieve this, e.g. by using an interceptor to fetch files? Would this be appropriate? Best, GIntas
