Re: Reading Flume spoolDir in parallel

Hari Shreedharan Tue, 16 Sep 2014 11:31:23 -0700

Unfortunately, no. The spoolDir source was kept single-threaded so that
deserializer implementations can be kept simple. The approach with mutliple
spoolDir sources is the correct one, though they can all write to the same
channel(s) - so you'd need only a larger number of sources, they can all
share the same channel(s) and you don't need more sinks unless you want to
pull data out faster.


On Tue, Sep 16, 2014 at 11:26 AM, Haidang N <[email protected]> wrote:

> Since I'm not allowed to set up Flume on prod servers, I have to download
> the logs, put them in a Flume spoolDir and have a sink to consume from the
> channel and write to Cassandra. Everything is working fine.
>
>
> However, as I have a lot of log files in the spoolDir, and the current
> setup is only processing 1 file at a time, it's taking a while. I want to
> be able to process many files concurrently. One way I thought of is to use
> the spoolDir but distribute the files into 5-10 different directories, and
> define multiple sources/channels/sinks, but this is a bit clumsy. Is there
> a better way to achieve this?
>
>
> Thanks
>
>
>

Re: Reading Flume spoolDir in parallel

Reply via email to