Use case for Flume

Gintautas Sulskus Tue, 05 Sep 2017 05:01:08 -0700

Hi,

I have a question regarding Flume suitability for a particular use case.


Task: There is an incoming constant stream of links that point to files.
Those files to be fetched and stored in HDFS.

Desired implementation:

1. Each link to a file is stored in Kafka queue Q1.
2. Flume A1.source monitors Q1 for new links.
3. Upon retrieving a link from Q1, A1.source fetches the file. The file
eventually is stored in HDFS by A1.sink

My concern here is a seemingly overloaded functionality of A1.source. The
A1.source would have to perform two activities: 1.) to periodically poll
queue Q1 for new links to files and then  2.) fetch those files.

What do you think? Is there a cleaner way to achieve this, e.g. by using an
interceptor to fetch files? Would this be appropriate?

Best,
GIntas

Use case for Flume

Reply via email to